hyunjongkimmath commited on
Commit
4d9c18f
·
1 Parent(s): 23c3126

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -3,30 +3,29 @@ tags:
3
  - fastai
4
  ---
5
 
6
- # Amazing!
7
 
8
- 🥳 Congratulations on hosting your fastai model on the Hugging Face Hub!
9
 
10
- # Some next steps
11
- 1. Fill out this model card with more information (see the template below and the [documentation here](https://huggingface.co/docs/hub/model-repos))!
12
 
13
- 2. Create a demo in Gradio or Streamlit using 🤗 Spaces ([documentation here](https://huggingface.co/docs/hub/spaces)).
14
 
15
- 3. Join the fastai community on the [Fastai Discord](https://discord.com/invite/YKrxeNn)!
16
 
17
- Greetings fellow fastlearner 🤝! Don't forget to delete this content from your model card.
18
 
 
19
 
20
- ---
21
 
 
22
 
23
- # Model card
24
 
25
- ## Model description
26
- More information needed
27
 
28
- ## Intended uses & limitations
29
- More information needed
 
 
 
30
 
31
- ## Training and evaluation data
32
- More information needed
 
3
  - fastai
4
  ---
5
 
 
6
 
7
+ # math_text_tag_categorization model
8
 
9
+ ## Description
 
10
 
11
+ math_text_tag_categorization is a multi-label text classification model. It was trained via the ULMFiT approach (cf. [The fastai book](https://github.com/fastai/fastbook)'s presentation of ULMFiT) - the author of this repository fine-tuned a language model available in [fast.ai](https://github.com/fastai) on a corpus of mathematical text in LaTeX, then fine-tuned the encoder obtained from the fine-tuned language model for a multi-label classification.
12
 
13
+ The model classifies whether a mathematical text is or contains the following common types of mathematical text: definition, notation, concept (i.e. theorems, propositions, corollaries, lemmas, etc.), proof, narrative (e.g. the text one encounters in the beginning of a chapter or section in a book or in between theorems), exercise, remark, example.
14
 
15
+ ## Intended uses & limitations
16
 
17
+ This model is intended to take as input mathematical text that one might encounter in an undergraduate/graduate/research setting and output some tags concerning what kind of text the input is. The input text is also intended to take text of at most a few tens of thousands of characters long (or several pages of most undergraduate or graduate textbooks), but in practice, the author has experienced better results with shorter text.
18
 
19
+ This model was trained on a corpus mostly of algebra, algebraic geometry, arithmetic geometry, and number theory, which are the author's primary mathematical interests.
20
 
21
+ ## How to use
22
 
 
23
 
 
 
24
 
25
+ ## Evaluation metrics
26
+ During training, the model has achieved over 95% accuracy on its validation dataset, which was chosen randomly from its entire dataset, according to fastai's [multi_accuracy](https://docs.fast.ai/metrics.html) metric.
27
+
28
+ ## TODO's
29
+ The model has been trained on text tokenized via fastai's default word-tokenizing methods. These tokenizations do not necessarily tokenize common LaTeX tokens such as the dollar sign `$` and backslash `\` and thus the author imagines that the model can be improved with modified tokenizations.
30
 
31
+ The model also outputs whether a text "should be" deleted, split, or merged - this was originally intended for the author's personal use, but the author has neither found the model to be actually useful in these categorizations nor taken the time to remove this feature.