dicta-il
/

dictalm-7b

 ---
 license: cc-by-4.0
+language:
+- he
+inference: false
 ---
+# **DictaLM**: A Large Generative Language Model for Modern Hebrew
+A large generative pretrained transformer (GPT) language model for Hebrew, released [link to be added].
+- This is an alpha version of the model, and there are many improvements to come.
+- We are actively working on improving the model, so stay tuned.
+This is the base-model pretrained on general text completion. On it's own, it isn't very useful, but it can be fine-tuned for specific tasks (instruct, chat, QA, and more).
+You can access the instruct-tuned model [here](https://huggingface.co/dicta-il/dictalm-7b-instruct).
+## Sample usage (for text completion):
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictalm-7b')
+model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True).cuda()
+model.eval()
+with torch.inference_mode():
+    # this prompt was taken from the headline of a [YNet](https://www.ynet.co.il/architecture/article/b1j3bzcrn) article.
+    prompt = 'מנורה מכובע ים וכוסות מבקבוקי פלסטיק: הצצה'
+    kwargs = dict(
+        inputs=tokenizer(prompt, return_tensors='pt').input_ids.to(model.device),
+        do_sample=True,
+        top_k=50,
+        top_p=0.95,
+        temperature=0.75,
+        max_length=100,
+        min_new_tokens=5
+    )
+    print(tokenizer.batch_decode(model.generate(**kwargs), skip_special_tokens=True))
+```
+There are many different parameters you can input into `kwargs` for different results (greedy, beamsearch, different samplign configurations, longer/shorter respones, etc.).
+You can view the full list of parameters you can pass to the `generate` function [here](https://huggingface.co/docs/transformers/v4.33.0/en/main_classes/text_generation#transformers.GenerationMixin.generate).
+### Alternative ways to initialize the model:
+If you have multiple smaller GPUs, and the package `accelerate` is installed, you can initialize the model split across the devices:
+```python
+model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, device_map='auto')
+```
+If you are running on linux and have the `bitsandbytes` package installed, you can initialize the model in 4/8 bit inference mode:
+```python
+model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, load_in_8bit=True)
+```
+If you have [FlashAttention](https://github.com/Dao-AILab/flash-attention) installed in your environment, you can instruct the model to use the flash attention implementation (either V1 or V2, whichever is installed):
+```python
+model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, use_flash_attention=True)
+```
+## Citation
+If you use DictaLM in your research, please cite ```ADD CITATION HERE```
+**BibTeX:**
+```ADD BIBTEXT HERE```
+## License
+Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
+This work is licensed under a
+[Creative Commons Attribution 4.0 International License][cc-by].
+[![CC BY 4.0][cc-by-image]][cc-by]
+[cc-by]: http://creativecommons.org/licenses/by/4.0/
+[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
+[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg