BEE-spoke-data
/

bert-plus-L8-4096-v1.0

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Feb 14, 2024

Commit

d0aecc8

·

verified ·

1 Parent(s): 501873a

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -750,6 +750,15 @@ Thus far, all completed in fp32 (_using nvidia tf32 dtype behind the scenes when
 | bert-base-uncased                  | 110M  | 52.1  | 93.5 | 88.9 | 85.8 | 71.2 | 84.0 | 90.5 | 66.4 | 79.05    |
 | roberta-base                       | 125M  | 64.0  | 95.0 | 90.0 | 91.0 | 92.0 | 88.0 | 93.0 | 79.0 | 86.5     |
 ### Observations:
 1. **Performance Variation Across Models and Tasks**: The data highlights significant performance variability both across and within models for different GLUE tasks. This variability underscores the complexity of natural language understanding tasks and the need for models to be versatile in handling different types of linguistic challenges.

 | bert-base-uncased                  | 110M  | 52.1  | 93.5 | 88.9 | 85.8 | 71.2 | 84.0 | 90.5 | 66.4 | 79.05    |
 | roberta-base                       | 125M  | 64.0  | 95.0 | 90.0 | 91.0 | 92.0 | 88.0 | 93.0 | 79.0 | 86.5     |
+and some comparisons to recent BERT models taken from [nomic's blog post]():
+| Model         | Size  | CoLA  | SST2 | MRPC | STSB | QQP  | MNLI | QNLI | RTE  | Avg   |
+|---------------|-------|-------|------|------|------|------|------|------|------|-------|
+| NomicBERT     | 137M   | 50.00 | 93.00| 88.00| 90.00| 92.00| 86.00| 92.00| 82.00| 84.00 |
+| RobertaBase   | 125M    | 64.00 | 95.00| 90.00| 91.00| 92.00| 88.00| 93.00| 79.00| 86.00 |
+| JinaBERTBase  | 137M    | 51.00 | 95.00| 88.00| 90.00| 81.00| 86.00| 92.00| 79.00| 83.00 |
+| MosaicBERT    | 137M (??)    | 59.00 | 94.00| 89.00| 90.00| 92.00| 86.00| 91.00| 83.00| 85.00 |
 ### Observations:
 1. **Performance Variation Across Models and Tasks**: The data highlights significant performance variability both across and within models for different GLUE tasks. This variability underscores the complexity of natural language understanding tasks and the need for models to be versatile in handling different types of linguistic challenges.