|
--- |
|
license: apache-2.0 |
|
base_model: Qwen/Qwen3-1.7B |
|
tags: |
|
- biology |
|
- protein |
|
- gene-ontology |
|
- GO-terms |
|
--- |
|
|
|
# Qwen3-1.7B-GO |
|
|
|
Qwen3 1.7B model enhanced with pre-trained Gene Ontology (GO) term embeddings. |
|
|
|
## Model Description |
|
|
|
This model is based on Qwen3 1.7B and includes: |
|
- Pre-trained embeddings for GO terms |
|
- Special tokens for protein sequence handling |
|
- Fine-tuned on GO term descriptions and relationships |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("wanglab/Qwen3-1.7B-go") |
|
tokenizer = AutoTokenizer.from_pretrained("wanglab/Qwen3-1.7B-go") |
|
|
|
# Example with GO terms |
|
text = "What is the function of GO:0008150?" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## GO Terms |
|
|
|
The model includes embeddings for Gene Ontology terms, allowing it to understand and reason about: |
|
- Biological processes (GO:0008150) |
|
- Molecular functions (GO:0003674) |
|
- Cellular components (GO:0005575) |
|
|
|
## Training |
|
|
|
GO embeddings were pre-trained using QLora on GO term descriptions and relationships. |
|
|