johnhew commited on
Commit
e7930d4
·
1 Parent(s): 5b1efcc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -1
README.md CHANGED
@@ -2,5 +2,152 @@
2
  pipeline_tag: text-generation
3
  tags:
4
  - text-generation-inference
 
 
5
  library_name: transformers
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  pipeline_tag: text-generation
3
  tags:
4
  - text-generation-inference
5
+ - backpack
6
+ - backpackmodel
7
  library_name: transformers
8
+ license: apache-2.0
9
+ datasets:
10
+ - openwebtext
11
+ language:
12
+ - en
13
+ ---
14
+
15
+ ---
16
+
17
+ ---
18
+
19
+
20
+ # Model Card for Backpack-GPT2
21
+
22
+ <!-- Provide a quick summary of what the model is/does. [Optional] -->
23
+ The Backpack-GPT2 language model is an instance of the [Backpack architecture](https://arxiv.org/abs/2305.16765), intended to combine strong modeling performance with an interface for interpretability and control.
24
+ Most details about this model and its training should be accessed in the paper, [Backpack Language Models](https://arxiv.org/abs/2305.16765).
25
+
26
+ See also [backpackmodels.science](backpackmodels.science).
27
+
28
+ # Table of Contents
29
+
30
+ - [Model Card for Backpack-GPT2](#model-card-for--model_id-)
31
+ - [Table of Contents](#table-of-contents)
32
+ - [Model Details](#model-details)
33
+ - [Model Description](#model-description)
34
+ - [Uses](#uses)
35
+ - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
36
+ - [Training Details](#training-details)
37
+ - [Training Data](#training-data)
38
+ - [Training Procedure](#training-procedure)
39
+ - [Environmental Impact](#environmental-impact)
40
+ - [Technical Specifications [optional]](#technical-specifications-optional)
41
+ - [Model Architecture and Objective](#model-architecture-and-objective)
42
+ - [Compute Infrastructure](#compute-infrastructure)
43
+ - [Hardware](#hardware)
44
+ - [Software](#software)
45
+ - [Citation](#citation)
46
+ - [Model Card Authors [optional]](#model-card-authors-optional)
47
+ - [Model Card Contact](#model-card-contact)
48
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
49
+
50
+
51
+ # Model Details
52
+
53
+ ## Model Description
54
+
55
+ <!-- Provide a longer summary of what this model is/does. -->
56
+ The Backpack-GPT2 is a [Backpack-based language model](https://arxiv.org/abs/2305.16765), an architecture intended to combine strong modeling performance with an interface for interpretability and control.
57
+
58
+ - **Developed by:** John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang
59
+ - **Shared by [Optional]:** More information needed
60
+ - **Model type:** Language model
61
+ - **Language(s) (NLP):** en
62
+ - **License:** apache-2.0
63
+ - **Resources for more information:**
64
+ - [GitHub Repo](https://github.com/john-hewitt/backpacks-flash-attn)
65
+ - [Associated Paper](https://huggingface.co/datasets/openwebtext)
66
+
67
+ # Uses
68
+
69
+ This model is intended for use in the study and development of increasingly interpretable methods in natural language processing.
70
+ It is not directly fit for any production use.
71
+
72
+
73
+ # Bias, Risks, and Limitations
74
+
75
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
76
+
77
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
78
+ This model in particular is limited in its capabilities, and with a brand new architecture, less is known about its biases than, e.g., Transformer-based models.
79
+
80
+
81
+ # Training Details
82
+
83
+ ## Training Data
84
+
85
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ This model was trained on the [OpenWebText](https://huggingface.co/datasets/openwebtext) corpus.
88
+
89
+
90
+ ## Training Procedure
91
+
92
+ This model was trained for 100k gradient steps with a batch size of 512k tokens and a linearly decaying learning rate from 6e-4 to zero, with a linear warmup of 5k steps.
93
+
94
+ # Environmental Impact
95
+
96
+ - **Hardware Type:** 4 A100 GPUs (40G)
97
+ - **Hours used:** Roughly 4 days.
98
+ - **Cloud Provider:** Stanford compute.
99
+ - **Compute Region:** Stanford energy grid.
100
+
101
+ ## Model Architecture and Objective
102
+
103
+ This model was trained to minimize the cross-entropy loss, and is a [Backpack language model](https://arxiv.org/pdf/2305.16765.pdf).
104
+
105
+ ## Compute Infrastructure
106
+
107
+ This model was trained on a slurm cluster.
108
+
109
+ ### Hardware
110
+
111
+ This model was trained on 4 A100s.
112
+
113
+ ### Software
114
+
115
+ This model was trained with [FlashAttention](https://github.com/HazyResearch/flash-attention) and [PyTorch](https://pytorch.org/)
116
+
117
+ # Citation
118
+
119
+ **BibTeX:**
120
+
121
+ ```
122
+ @InProceedings{hewitt2023backpack,
123
+ author = "Hewitt, John and Thickstun, John and Manning, Christopher D. and Liang, Percy",
124
+ title = "Backpack Language Models",
125
+ booktitle = "Proceedings of the Association for Computational Linguistics",
126
+ year = "2023",
127
+ publisher = "Association for Computational Linguistics",
128
+ location = "Toronto, Canada",
129
+ }
130
+ ```
131
+
132
+
133
+ # Model Card Authors [optional]
134
+
135
+ <!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
136
+
137
+ John Hewitt
138
+
139
+ # Model Card Contact
140
+
141
142
+
143
+ # How to Get Started with the Model
144
+
145
+
146
+
147
+
148
+ <details>
149
+ <summary> Click to expand </summary>
150
+
151
+ More information needed
152
+
153
+ </details>