File size: 3,096 Bytes

89a586f
 
 
 
b2312cb
89a586f
b2312cb
89a586f
b2312cb
89a586f
 
 
 
 
b2312cb
89a586f
 
b2312cb
 
89a586f
b2312cb
89a586f
b2312cb
89a586f
 
 
 
 
 
b2312cb
89a586f
 
b2312cb
89a586f
 
b2312cb
89a586f
b2312cb
89a586f
b2312cb
 
89a586f
 
 
 
 
 
b2312cb
89a586f
 
 
 
 
b2312cb
 
89a586f
b2312cb
89a586f
b2312cb
89a586f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2312cb
 
89a586f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2312cb
89a586f
 
 
 
 
 
 
 
 
 
 
 
 
b2312cb
89a586f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2312cb
89a586f

---
license: apache-2.0
pipeline_tag: text-generation
---
# 🦙💻 Safurai-Csharp-34B

📝 [Article](https://www.safurai.com/blog/introducing-safurai-csharp)

<center><img src="https://media.discordapp.net/attachments/1071900237414801528/1165927645469478942/mrciffa_A_cartoon_samurai_wearing_a_black_jacket_as_a_chemistry_d4c17e16-567a-41da-9e0e-2902e93def2c.png?ex=6548a1bc&is=65362cbc&hm=5721b5c15d8f97374212970a7d01f17923ef5015d385230b8ae5542fd2d0df21&=&width=1224&height=1224" width="300"></center>

This is a [`codellama/CodeLlama-7b-hf`](https://huggingface.co/codellama/CodeLlama-7b-hf) model fine-tuned using QLoRA (4-bit precision) on the [`mlabonne/Evol-Instruct-Python-1k`](https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-1k).

## 🔧 Training

It was trained on an  in 1h 11m 44s with the following configuration file:

```yaml
base_model: codellama/CodeLlama-34b-hf
base_model_config: codellama/CodeLlama-34b-hf
model_type: LlamaForCausalLM
tokenizer_type: CodeLlamaTokenizer
is_llama_derived_model: true
hub_model_id: "Safurai/Evol-csharp-v1"

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: Safurai/EvolInstruct-csharp-16k-13B-Alpaca
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./qlora-out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: codellama-csharp
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0003

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 40
eval_steps: 40
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"
```

Here are the loss curves:

![](https://i.imgur.com/zrBq01N.png)

It is mainly designed for experimental purposes, not for inference.

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

## 💻 Usage

``` python
# pip install transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/EvolCodeLlama-7b"
prompt = "Your csharp request"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    f'{prompt}',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=1000,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")
```