|
|
|
--- |
|
license: other |
|
tags: |
|
- generated_from_trainer |
|
- google/gemma |
|
- PyTorch |
|
- transformers |
|
- trl |
|
- peft |
|
- tensorboard |
|
model-index: |
|
- name: pygemma-2b-ultra-plus-4 |
|
results: [] |
|
datasets: |
|
- Vezora/Tested-143k-Python-Alpaca |
|
language: |
|
- en |
|
license_name: gemma-terms-of-use |
|
license_link: https://ai.google.dev/gemma/terms |
|
base_model: google/gemma-2b |
|
widget: |
|
- example_title: Compute Sum |
|
messages: |
|
- role: system |
|
content: Welcome to PyGemma, your AI-powered Python assistant. I'm here to help you answer common questions about the Python programming language. Let's dive into Python! |
|
- role: user |
|
content: Create a function to calculate the sum of a sequence of integers. |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for pygemma-2b-ultra-plus-4: |
|
|
|
🐍💬🤖 |
|
|
|
|
|
**pygemma-2b-ultra-plus-4** is a language model that is trained to act as Python assistant. It is a finetuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) that was trained using `SFTTrainer` on publicly available dataset |
|
[Vezora/Tested-143k-Python-Alpaca](https://huggingface.co/datasets/Vezora/Tested-143k-Python-Alpaca). |
|
|
|
|
|
## Training Metrics |
|
|
|
[The training metrics can be found on **TensorBoard**](https://huggingface.co/Menouar/pygemma-2b-ultra-plus-4/tensorboard). |
|
|
|
|
|
## Training hyperparameters |
|
|
|
The following hyperparameters were used during the training: |
|
|
|
|
|
- output_dir: peft-lora-model |
|
|
|
- overwrite_output_dir: True |
|
|
|
- do_train: False |
|
|
|
- do_eval: False |
|
|
|
- do_predict: False |
|
|
|
- evaluation_strategy: no |
|
|
|
- prediction_loss_only: False |
|
|
|
- per_device_train_batch_size: 2 |
|
|
|
- per_device_eval_batch_size: None |
|
|
|
- per_gpu_train_batch_size: None |
|
|
|
- per_gpu_eval_batch_size: None |
|
|
|
- gradient_accumulation_steps: 4 |
|
|
|
- eval_accumulation_steps: None |
|
|
|
- eval_delay: 0 |
|
|
|
- learning_rate: 2e-05 |
|
|
|
- weight_decay: 0.0 |
|
|
|
- adam_beta1: 0.9 |
|
|
|
- adam_beta2: 0.999 |
|
|
|
- adam_epsilon: 1e-08 |
|
|
|
- max_grad_norm: 0.3 |
|
|
|
- num_train_epochs: 1 |
|
|
|
- max_steps: -1 |
|
|
|
- lr_scheduler_type: cosine |
|
|
|
- lr_scheduler_kwargs: {} |
|
|
|
- warmup_ratio: 0.1 |
|
|
|
- warmup_steps: 0 |
|
|
|
- log_level: passive |
|
|
|
- log_level_replica: warning |
|
|
|
- log_on_each_node: True |
|
|
|
- logging_dir: peft-lora-model/runs/Mar23_06-23-59_676c0e3f20e7 |
|
|
|
- logging_strategy: steps |
|
|
|
- logging_first_step: False |
|
|
|
- logging_steps: 10 |
|
|
|
- logging_nan_inf_filter: True |
|
|
|
- save_strategy: epoch |
|
|
|
- save_steps: 500 |
|
|
|
- save_total_limit: None |
|
|
|
- save_safetensors: True |
|
|
|
- save_on_each_node: False |
|
|
|
- save_only_model: False |
|
|
|
- no_cuda: False |
|
|
|
- use_cpu: False |
|
|
|
- use_mps_device: False |
|
|
|
- seed: 42 |
|
|
|
- data_seed: None |
|
|
|
- jit_mode_eval: False |
|
|
|
- use_ipex: False |
|
|
|
- bf16: True |
|
|
|
- fp16: False |
|
|
|
- fp16_opt_level: O1 |
|
|
|
- half_precision_backend: auto |
|
|
|
- bf16_full_eval: False |
|
|
|
- fp16_full_eval: False |
|
|
|
- tf32: None |
|
|
|
- local_rank: 0 |
|
|
|
- ddp_backend: None |
|
|
|
- tpu_num_cores: None |
|
|
|
- tpu_metrics_debug: False |
|
|
|
- debug: [] |
|
|
|
- dataloader_drop_last: False |
|
|
|
- eval_steps: None |
|
|
|
- dataloader_num_workers: 0 |
|
|
|
- dataloader_prefetch_factor: None |
|
|
|
- past_index: -1 |
|
|
|
- run_name: peft-lora-model |
|
|
|
- disable_tqdm: False |
|
|
|
- remove_unused_columns: True |
|
|
|
- label_names: None |
|
|
|
- load_best_model_at_end: False |
|
|
|
- metric_for_best_model: None |
|
|
|
- greater_is_better: None |
|
|
|
- ignore_data_skip: False |
|
|
|
- fsdp: [] |
|
|
|
- fsdp_min_num_params: 0 |
|
|
|
- fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
|
- fsdp_transformer_layer_cls_to_wrap: None |
|
|
|
- accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True) |
|
|
|
- deepspeed: None |
|
|
|
- label_smoothing_factor: 0.0 |
|
|
|
- optim: adamw_torch_fused |
|
|
|
- optim_args: None |
|
|
|
- adafactor: False |
|
|
|
- group_by_length: False |
|
|
|
- length_column_name: length |
|
|
|
- report_to: ['tensorboard'] |
|
|
|
- ddp_find_unused_parameters: None |
|
|
|
- ddp_bucket_cap_mb: None |
|
|
|
- ddp_broadcast_buffers: None |
|
|
|
- dataloader_pin_memory: True |
|
|
|
- dataloader_persistent_workers: False |
|
|
|
- skip_memory_metrics: True |
|
|
|
- use_legacy_prediction_loop: False |
|
|
|
- push_to_hub: False |
|
|
|
- resume_from_checkpoint: None |
|
|
|
- hub_model_id: None |
|
|
|
- hub_strategy: every_save |
|
|
|
- hub_token: None |
|
|
|
- hub_private_repo: False |
|
|
|
- hub_always_push: False |
|
|
|
- gradient_checkpointing: True |
|
|
|
- gradient_checkpointing_kwargs: {'use_reentrant': False} |
|
|
|
- include_inputs_for_metrics: False |
|
|
|
- fp16_backend: auto |
|
|
|
- push_to_hub_model_id: None |
|
|
|
- push_to_hub_organization: None |
|
|
|
- push_to_hub_token: None |
|
|
|
- mp_parameters: |
|
|
|
- auto_find_batch_size: False |
|
|
|
- full_determinism: False |
|
|
|
- torchdynamo: None |
|
|
|
- ray_scope: last |
|
|
|
- ddp_timeout: 1800 |
|
|
|
- torch_compile: False |
|
|
|
- torch_compile_backend: None |
|
|
|
- torch_compile_mode: None |
|
|
|
- dispatch_batches: None |
|
|
|
- split_batches: None |
|
|
|
- include_tokens_per_second: False |
|
|
|
- include_num_input_tokens_seen: False |
|
|
|
- neftune_noise_alpha: None |
|
|
|
- distributed_state: Distributed environment: NO |
|
Num processes: 1 |
|
Process index: 0 |
|
Local process index: 0 |
|
Device: cuda |
|
|
|
|
|
- _n_gpu: 1 |
|
|
|
- __cached__setup_devices: cuda:0 |
|
|
|
- deepspeed_plugin: None |
|
|
|
|