Could you please make the tokenizer_config.json/vocab.txt file public?

#1
by DualK - opened

Hello, I am a student who is working on a project about efficient fine-tuning of parameters of a base model, which involves modification of the model structure. Is it possible to open source it further?

Hi @DualK , the model architectures and tokenizers are fully open-sourced on GitHub https://github.com/genbio-ai/ModelGenerator.

Here is the vocabulary you're looking for https://github.com/genbio-ai/ModelGenerator/blob/main/modelgenerator/huggingface_models/rnabert/vocab.txt.

We also have HF Transformers LoRA PEFT enabled in ModelGenerator as well, with some nice low-memory checkpointing behavior. See this example command https://genbio-ai.github.io/ModelGenerator/quick_start/#use-lora-for-parameter-efficient-finetuning. Feel free to make PRs to the ModelGenerator repo if you design PEFT techniques you'd like to provide to the community.

probablybots changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment