prince-canuma
/

Damysus-2.7B-Chat

@@ -109,7 +109,19 @@ I highly recommend it to anyone who enjoys a well-crafted and emotionally engagi
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data. This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
 ### Training Procedure
@@ -123,39 +135,57 @@ I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a
 3. Mask instructions (System and User) at training time.
 #### Training Hyperparameters
-  - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[TODO]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[TODO]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
 [TODO]
 ### Results
-[TODO]
 ## Technical Specifications
@@ -180,6 +210,12 @@ I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a
 - Bitsandbytes
 - Plotly
 ## Citation
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
@@ -192,3 +228,21 @@ I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a
       year={2024},
 }
 ```

 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data.
+In the course of this study, the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset was used, representing a meticulously curated subset derived from the broader OpenOrca dataset.  This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
+Subsequently, two distinct subsets were crafted, comprising 102,000 and 1,000 samples, denoted as:
+- [prince-canuma/SmallOrca](https://huggingface.co/datasets/prince-canuma/SmallOrca)
+- [prince-canuma/TinyOrca](https://huggingface.co/datasets/prince-canuma/TinyOrca)
+Although experimentation was conducted with both datasets, optimal results were achieved through fine-tuning on a modest set of 200 samples.
+Notably, the investigation revealed that augmenting the training data beyond this threshold predominantly enhanced the model's proficiency in generating Chain-of-Thought responses.
+However, it is imperative to note that the preference for Chain-of-Thought responses may not be universally applicable. Particularly in scenarios like the RAG setup,
+succinct answers to prompts are often favored, especially for straightforward queries.
 ### Training Procedure
 3. Mask instructions (System and User) at training time.
 #### Training Hyperparameters
+  - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+[TODO]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
+We evaluate models on 7 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
+- AI2 Reasoning Challenge (25-shot) - a set of grade-school science questions.
+- HellaSwag (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
+- MMLU (5-shot) - a test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more.
+- TruthfulQA (0-shot) - a test to measure a model's propensity to reproduce falsehoods commonly found online. Note: TruthfulQA is technically a 6-shot task in the Harness because each example is prepended with 6 Q/A pairs, even in the 0-shot setting.
+- Winogrande (5-shot) - an adversarial and difficult Winograd benchmark at scale, for commonsense reasoning.
+- GSM8k (5-shot) - diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.
+For all these evaluations, a higher score is a better score. We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
+Read more [here](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 [TODO]
 ### Results
+```json
+{
+    "AVG": {
+        "acc": 60.49
+    },
+    "ARC": {
+        "acc": 59.81
+    },
+    "HellaSwag": {
+        "acc": 74.52
+    },
+    "MMLU": {
+        "acc": 56.33
+    },
+    "truthfulqa": {
+        "acc": 46.74,
+    },
+    "winogrande": {
+        "acc": 75.00,
+    },
+    "gsm8k": {
+        "acc": 50.64,
+    }
+}
+```
 ## Technical Specifications
 - Bitsandbytes
 - Plotly
+## Future work
+I plan to explore the following tuning setups:
+- Function calling
+- DPO
 ## Citation
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
       year={2024},
 }
 ```
+```bibtex
+@misc{SlimOrca,
+  title = {SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification},
+  author = {Wing Lian and Guan Wang and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
+  year = {2023},
+  publisher = {HuggingFace},
+  url = {https://https://huggingface.co/Open-Orca/SlimOrca}
+}
+```
+```bibtex
+@misc{open-llm-leaderboard,
+  author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
+  title = {Open LLM Leaderboard},
+  year = {2023},
+  publisher = {Hugging Face},
+  howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
+}
+```