Menlo
/

llama3-s-2024-07-08

@@ -1,199 +1,180 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: apache-2.0
+datasets:
+- jan-hq/instruction-speech-v1
+language:
+- en
+tags:
+- sound language model
 ---
 ## Model Details
+We have developed and released the family Llama-3-8B-Sound. This family is natively understanding audio and text input.
+We continue to expand [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with sound understanding capabilities by leveraging 700M tokens [Instruction Speech v1](https://huggingface.co/datasets/Vi-VLM/Vista) dataset.
+**Model developers** Homebrew Research.
+**Input** Text and sound.
+**Output** Text.
+**Model Architecture** Llama-3.
+**Language(s):** English.
+## Intended Use
+**Intended Use Cases** This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.
+**Out-of-scope** The use of Llama-3-Sound in any manner that violates applicable laws or regulations is strictly prohibited.
 ## How to Get Started with the Model
+> TODO
+## Training process
+**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
+![training_loss_curve/png](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/12vqghBGus1Bb2OTjNezl.png)
+### Hardware
+**GPU Configuration**: Cluster of 8x NVIDIA H100-SXM-80GB.
+**GPU Usage**:
+  - **Continual Training**: 8 hours.
+### Training Arguments
+| Parameter                  | Continual Training      |
+|----------------------------|-------------------------|
+| **Epoch**                  | 1                       |
+| **Global batch size**      | 128                     |
+| **Learning Rate**          | 5e-5                    |
+| **Learning Scheduler**     | Cosine with warmup      |
+| **Optimizer**              | [Adam-mini](https://arxiv.org/abs/2406.16793)               |
+| **Warmup Ratio**           | 0.1                     |
+| **Weight Decay**           | 0.01                    |
+| **beta1**                  | 0.9                     |
+| **beta2**                  | 0.98                    |
+| **epsilon**                | 1e-6                    |
+| **Gradient Cliping**       | 1.0                     |
+### Accelerate FSDP Config
+```
+compute_environment: LOCAL_MACHINE
+debug: false
+distributed_type: FSDP
+downcast_bf16: 'no'
+enable_cpu_affinity: true
+fsdp_config:
+  fsdp_activation_checkpointing: true
+  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+  fsdp_backward_prefetch: BACKWARD_PRE
+  fsdp_cpu_ram_efficient_loading: true
+  fsdp_forward_prefetch: false
+  fsdp_offload_params: false
+  fsdp_sharding_strategy: FULL_SHARD
+  fsdp_state_dict_type: SHARDED_STATE_DICT
+  fsdp_sync_module_states: true
+  fsdp_use_orig_params: false
+machine_rank: 0
+main_training_function: main
+mixed_precision: bf16
+num_machines: 1
+num_processes: 8
+rdzv_backend: static
+same_network: true
+tpu_env: []
+tpu_use_cluster: false
+tpu_use_sudo: false
+use_cpu: false
+```
+## Examples
+1. Good example:
+```
+# Question:
+Does the following review have a positive or negative opinion of the movie?
+i thought the relationships were wonderful ,
+Options: A). negative; B). positive;
+# Expected Output:
+The review has a B). positive opinion about the movie because they think the relationships were wonderful.
+# Model's Output:
+The review is positive.
+```
+2. Misunderstanding example:
+```
+# Question:
+Translate the following sentence to Russian:
+6 Work in JOBS and careers after 40
+Russian:
+# Expected Output:
+6 Работать в РАБОТАХ и карьере после 40
+Russian (Cyrillic): 6 Работать в РАБОТАХ и карьере после 40
+Let me explain this to you like you're five years old: This sentence means that people can still work and have careers, even if they're older than 40 years old. The words are in a different language, called Russian.
+# Model's Output:
+В случае, когда вы хотите перевести фразу "The first step is to identify the language of the text." на русский язык, вы можете использовать следующую фразу: "Первый шаг - определить язык текста."
+```
+3. Off-tracked example:
+```
+# Quesion:
+Answer the following question: -  The roots of the growing plants start to break up the rock - The plant acids dissolve the rock - The rock is broken down into smaller pieces - Erosion begins.  What might be the first step of the process?
+Answer:
+# Expected Output:
+The first step of the process might be: The roots of the growing plants start to break up the rock.
+# Model's Output:
+Step 1: Identify the key elements in the premise and hypothesis.
+Premise: "A man in a blue shirt and a red hat is holding a sign."
+Key elements: man, blue shirt, red hat, holding, sign
+Hypothesis: "A man is holding a sign."
+Key elements: man, holding, sign
+Step 2: Compare the key elements in the premise and hypothesis.
+In the premise, we have a man wearing a blue shirt and a red hat, while in the hypothesis, we only have a man. The premise also mentions that the man is holding a sign, which is also mentioned in the hypothesis.
+Step 3: Determine if the hypothesis is entailed by the premise.
+Since the hypothesis states that a man is holding a sign, and the premise confirms that a man is holding a sign, we can conclude that the hypothesis is entailed by the premise. The additional information about the man's clothing in the premise does not contradict or negate the
+```
+Despite being undertrained, the model demonstrates an emerging grasp of sound-text semantics.
+## Citation Information
 **BibTeX:**
+```
+@article{Llama-3-Sound: Sound Instruction LLM 2024,
+  title={Llama-3-Sound},
+  author={JanAI},
+  year=2024,
+  month=July},
+  url={https://huggingface.co/jan-hq/llama-3-sound-init-checkpoint-4340}
+```
+## Acknowledgement
+- **[WhisperSpeech]**
+- **[Encodec]**
+- **[Meta-Llama-3-8B-Instruct]**