Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ inference: false
|
|
8 |
|
9 |
# Monarch Mixer-BERT
|
10 |
|
11 |
-
|
12 |
|
13 |
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
14 |
|
|
|
8 |
|
9 |
# Monarch Mixer-BERT
|
10 |
|
11 |
+
An 80M checkpoint of M2-BERT, pretrained with sequence length 8192, and it has been fine-tuned for long-context retrieval.
|
12 |
|
13 |
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
14 |
|