jamander commited on
Commit
c3cdc0f
·
verified ·
1 Parent(s): b9383c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -6,3 +6,53 @@ pipeline_tag: text-generation
6
  library_name: transformers.js
7
  ---
8
  base_model: mistralai/Mistral-7B-v0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  library_name: transformers.js
7
  ---
8
  base_model: mistralai/Mistral-7B-v0.1
9
+
10
+ ---
11
+ license: mit
12
+ language:
13
+ - en
14
+ pipeline_tag: text-generation
15
+ library_name: transformers
16
+ base_model: mistralai/Mistral-7B-v0.1
17
+ ---
18
+
19
+ # Project-Frankenstein
20
+
21
+ ## Model Overview
22
+
23
+ **Model Name:** Project-Frankenstein
24
+ **Model Type:** Text Generation
25
+ **Base Model:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
26
+ **Fine-tuned by:** Jack Mander
27
+
28
+ **Description:**
29
+ Project-Frankenstein is a text generation model fine-tuned to generate fan fiction in the style of Mary Shelley's "Frankenstein." It uses the complete text of "Frankenstein" as its training data to produce coherent and stylistically consistent fan fiction.
30
+
31
+ ## Model Details
32
+
33
+ **Model Architecture:**
34
+ - **Base Model:** Mistral-7B-v0.1
35
+ - **Tokenizer:** AutoTokenizer from Hugging Face Transformers
36
+ - **Training Framework:** Transformers, Peft, and Accelerate libraries
37
+
38
+ **Training Data:**
39
+ - The model was fine-tuned using the text of "Frankenstein" by Mary Shelley.
40
+ - The text was split into training and test datasets using an 80/20 split.
41
+ - Converted Pandas DataFrames to Hugging Face Datasets.
42
+
43
+ **Hyperparameters:**
44
+ - **Learning Rate:** 2e-5
45
+ - **Epochs:** 2
46
+ - **Optimizer:** Paged AdamW 8-bit
47
+
48
+ ## Training Procedure
49
+
50
+ The model was trained on a Tesla T4 GPU using Google Colab. The training involved the following steps:
51
+ 1. **Data Preparation:**
52
+ - The text of "Frankenstein" was preprocessed and split into training and test datasets.
53
+ 2. **Model Training:**
54
+ - The model was trained for 2 epochs with a learning rate of 2e-5 using the Paged AdamW 8-bit optimizer.
55
+
56
+ ## Example Generations
57
+
58
+ **Base Model Generation:**