emrgnt-cmplxty
commited on
Commit
•
39e70f7
1
Parent(s):
fda0386
Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,34 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
# Doc / guide: https://huggingface.co/docs/hub/model-cards
|
4 |
-
{}
|
5 |
---
|
6 |
|
7 |
-
# Model Card
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: mit
|
|
|
|
|
3 |
---
|
4 |
|
5 |
+
# SciPhi-Self-RAG-Mistral-7B-32k Model Card
|
6 |
|
7 |
+
The SciPhi-Self-RAG-Mistral-7B-32k is a Large Language Model (LLM) fine-tuned from Mistral-7B-v0.1. This model underwent the fine-tuning process described in the [SciPhi-Mistral-7B-32k](https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k) model card.
|
8 |
+
|
9 |
+
SciPhi-AI is available via a free hosted API, though the exposed model can vary. Currently, SciPhi-Self-RAG-Mistral-7B-32k is available. More details can be found in the docs [here](https://sciphi.readthedocs.io/en/latest/setup/quickstart.html).
|
10 |
+
|
11 |
+
|
12 |
+
## Model Architecture
|
13 |
+
|
14 |
+
Base Model: Mistral-7B-v0.1
|
15 |
+
|
16 |
+
**Architecture Features:**
|
17 |
+
- Transformer-based model
|
18 |
+
- Grouped-Query Attention
|
19 |
+
- Sliding-Window Attention
|
20 |
+
- Byte-fallback BPE tokenizer
|
21 |
+
|
22 |
+
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
23 |
+
|
24 |
+
## References
|
25 |
+
|
26 |
+
1. Lian, W., Goodson, B., Wang, G., Pentland, E., Cook, A., Vong, C., & Teknium. (2023). MistralOrca: Mistral-7B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset. *HuggingFace repository*. [Link](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
|
27 |
+
2. Mukherjee, S., Mitra, A., Jawahar, G., Agarwal, S., Palangi, H., & Awadallah, A. (2023). Orca: Progressive Learning from Complex Explanation Traces of GPT-4. *arXiv preprint arXiv:2306.02707*.
|
28 |
+
3. Longpre, S., Hou, L., Vu, T., Webson, A., Chung, H. W., Tay, Y., Zhou, D., Le, Q. V., Zoph, B., Wei, J., & Roberts, A. (2023). The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. *arXiv preprint arXiv:2301.13688*.
|
29 |
+
4. Mistral AI. (2023). Model Card for Mistral-7B-v0.1. The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks tested. For full details, please refer to the paper and release blog post. Model Architecture: Transformer with Grouped-Query Attention, Sliding-Window Attention, and Byte-fallback BPE tokenizer. [Link](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
30 |
+
|
31 |
+
|
32 |
+
## Acknowledgements
|
33 |
+
|
34 |
+
Thank you to the [AI Alignment Lab](https://huggingface.co/Alignment-Lab-AI), [vikp](https://huggingface.co/vikp), [jph00](https://huggingface.co/jph00) and others who contributed to this work.
|