hyunjongkimmath
commited on
Commit
·
4d9c18f
1
Parent(s):
23c3126
Update README.md
Browse files
README.md
CHANGED
@@ -3,30 +3,29 @@ tags:
|
|
3 |
- fastai
|
4 |
---
|
5 |
|
6 |
-
# Amazing!
|
7 |
|
8 |
-
|
9 |
|
10 |
-
|
11 |
-
1. Fill out this model card with more information (see the template below and the [documentation here](https://huggingface.co/docs/hub/model-repos))!
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
|
|
19 |
|
20 |
-
|
21 |
|
|
|
22 |
|
23 |
-
# Model card
|
24 |
|
25 |
-
## Model description
|
26 |
-
More information needed
|
27 |
|
28 |
-
##
|
29 |
-
|
|
|
|
|
|
|
30 |
|
31 |
-
|
32 |
-
More information needed
|
|
|
3 |
- fastai
|
4 |
---
|
5 |
|
|
|
6 |
|
7 |
+
# math_text_tag_categorization model
|
8 |
|
9 |
+
## Description
|
|
|
10 |
|
11 |
+
math_text_tag_categorization is a multi-label text classification model. It was trained via the ULMFiT approach (cf. [The fastai book](https://github.com/fastai/fastbook)'s presentation of ULMFiT) - the author of this repository fine-tuned a language model available in [fast.ai](https://github.com/fastai) on a corpus of mathematical text in LaTeX, then fine-tuned the encoder obtained from the fine-tuned language model for a multi-label classification.
|
12 |
|
13 |
+
The model classifies whether a mathematical text is or contains the following common types of mathematical text: definition, notation, concept (i.e. theorems, propositions, corollaries, lemmas, etc.), proof, narrative (e.g. the text one encounters in the beginning of a chapter or section in a book or in between theorems), exercise, remark, example.
|
14 |
|
15 |
+
## Intended uses & limitations
|
16 |
|
17 |
+
This model is intended to take as input mathematical text that one might encounter in an undergraduate/graduate/research setting and output some tags concerning what kind of text the input is. The input text is also intended to take text of at most a few tens of thousands of characters long (or several pages of most undergraduate or graduate textbooks), but in practice, the author has experienced better results with shorter text.
|
18 |
|
19 |
+
This model was trained on a corpus mostly of algebra, algebraic geometry, arithmetic geometry, and number theory, which are the author's primary mathematical interests.
|
20 |
|
21 |
+
## How to use
|
22 |
|
|
|
23 |
|
|
|
|
|
24 |
|
25 |
+
## Evaluation metrics
|
26 |
+
During training, the model has achieved over 95% accuracy on its validation dataset, which was chosen randomly from its entire dataset, according to fastai's [multi_accuracy](https://docs.fast.ai/metrics.html) metric.
|
27 |
+
|
28 |
+
## TODO's
|
29 |
+
The model has been trained on text tokenized via fastai's default word-tokenizing methods. These tokenizations do not necessarily tokenize common LaTeX tokens such as the dollar sign `$` and backslash `\` and thus the author imagines that the model can be improved with modified tokenizations.
|
30 |
|
31 |
+
The model also outputs whether a text "should be" deleted, split, or merged - this was originally intended for the author's personal use, but the author has neither found the model to be actually useful in these categorizations nor taken the time to remove this feature.
|
|