About

This repository is a boilerplate to push a mask-filling model to the HuggingFace Model Hub.

Upload to huggingface

Download your tokenizer, model checkpoints, and optionally the training logs (events.out.*) to the ./ckpt directory (do not include any large files except pytorch_model.bin and log files events.out.*).

Optionally, test model using the MLM task:

pip install pya0 # for math token preprocessing
# testing local checkpoints:
python test.py ./ckpt/math-tokenizer ./ckpt/2-2-0/encoder.ckpt
# testing Model Hub checkpoints:
python test.py approach0/coco-mae-220 approach0/coco-mae-220

Note
Modify the test examples in test.txt to play with it. The test file is tab-separated, the first column is additional positions you want to mask for the right-side sentence (useful for masking tokens in math markups). A zero means no additional mask positions.

To upload to huggingface, use the upload2hgf.sh script. Before runnig this script, be sure to check:

  • git-lfs is installed
  • having git-remote named hgf reference to https://huggingface.co/your/repo
  • model contains all the files needed: config.json and pytorch_model.bin
  • tokenizer contains all the files needed: added_tokens.json, special_tokens_map.json, tokenizer_config.json, vocab.txt and tokenizer.json
  • no tokenizer_file field in tokenizer_config.json (sometimes it is located locally at ~/.cache)
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.