Could you please make the tokenizer_config.json/vocab.txt file public?
Hello, I am a student who is working on a project about efficient fine-tuning of parameters of a base model, which involves modification of the model structure. Is it possible to open source it further?
Hi @DualK , the model architectures and tokenizers are fully open-sourced on GitHub https://github.com/genbio-ai/ModelGenerator.
Here is the vocabulary you're looking for https://github.com/genbio-ai/ModelGenerator/blob/main/modelgenerator/huggingface_models/rnabert/vocab.txt.
We also have HF Transformers LoRA PEFT enabled in ModelGenerator as well, with some nice low-memory checkpointing behavior. See this example command https://genbio-ai.github.io/ModelGenerator/quick_start/#use-lora-for-parameter-efficient-finetuning. Feel free to make PRs to the ModelGenerator repo if you design PEFT techniques you'd like to provide to the community.