hyunjongkimmath
/

information_note_type

Model card Files Files and versions Community

hyunjongkimmath commited on Oct 14, 2022

Commit

4d9c18f

·

1 Parent(s): 23c3126

Update README.md

Files changed (1) hide show

README.md +14 -15

README.md CHANGED Viewed

@@ -3,30 +3,29 @@ tags:
 - fastai
 ---
-# Amazing!
-🥳 Congratulations on hosting your fastai model on the Hugging Face Hub!
-# Some next steps
-1. Fill out this model card with more information (see the template below and the [documentation here](https://huggingface.co/docs/hub/model-repos))!
-2. Create a demo in Gradio or Streamlit using 🤗 Spaces ([documentation here](https://huggingface.co/docs/hub/spaces)).
-3. Join the fastai community on the [Fastai Discord](https://discord.com/invite/YKrxeNn)!
-Greetings fellow fastlearner 🤝! Don't forget to delete this content from your model card.
----
-# Model card
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed

 - fastai
 ---
+# math_text_tag_categorization model
+## Description
+math_text_tag_categorization is a multi-label text classification model. It was trained via the ULMFiT approach (cf. [The fastai book](https://github.com/fastai/fastbook)'s presentation of ULMFiT) - the author of this repository fine-tuned a language model available in [fast.ai](https://github.com/fastai) on a corpus of mathematical text in LaTeX, then fine-tuned the encoder obtained from the fine-tuned language model for a multi-label classification.
+The model classifies whether a mathematical text is or contains the following common types of mathematical text: definition, notation, concept (i.e. theorems, propositions, corollaries, lemmas, etc.), proof, narrative (e.g. the text one encounters in the beginning of a chapter or section in a book or in between theorems), exercise, remark, example.
+## Intended uses & limitations
+This model is intended to take as input mathematical text that one might encounter in an undergraduate/graduate/research setting and output some tags concerning what kind of text the input is. The input text is also intended to take text of at most a few tens of thousands of characters long (or several pages of most undergraduate or graduate textbooks), but in practice, the author has experienced better results with shorter text.
+This model was trained on a corpus mostly of algebra, algebraic geometry, arithmetic geometry, and number theory, which are the author's primary mathematical interests.
+## How to use
+## Evaluation metrics
+During training, the model has achieved over 95% accuracy on its validation dataset, which was chosen randomly from its entire dataset, according to fastai's [multi_accuracy](https://docs.fast.ai/metrics.html) metric.
+## TODO's
+The model has been trained on text tokenized via fastai's default word-tokenizing methods. These tokenizations do not necessarily tokenize common LaTeX tokens such as the dollar sign `$` and backslash `\` and thus the author imagines that the model can be improved with modified tokenizations.
+The model also outputs whether a text "should be" deleted, split, or merged - this was originally intended for the author's personal use, but the author has neither found the model to be actually useful in these categorizations nor taken the time to remove this feature.