### Model Description This model card describes the distilled version of ProtGPT2, referred to as `protgpt2-distilled-tiny`. The distillation process for this model follows the methodology of knowledge distillation from a larger teacher model to a smaller, more efficient student model. The process combines both "Soft Loss" (Knowledge Distillation Loss) and "Hard Loss" (Cross-Entropy Loss) to ensure the student model not only generalizes like its teacher but also retains practical prediction capabilities. ### Technical Details **Distillation Parameters:** - **Temperature (T):** 10 - **Alpha (α):** 0.1 - **Model Architecture:** - **Number of Layers:** 4 - **Number of Attention Heads:** 4 - **Embedding Size:** 512 **Dataset Used:** - The model was distilled using a subset of the evaluation dataset provided by `nferruz/UR50_2021_04`. **Loss Formulation:** - **Soft Loss:** \( L_{soft} = \text{KL}(\text{softmax}(\frac{s}{T}), \text{softmax}(\frac{t}{T})) \) - **Hard Loss:** \( L_{hard} = -\sum_{i} y_i \log(\text{softmax}(s_i)) \) - **Combined Loss:** \( L = \alpha L_{hard} + (1 - \alpha) L_{soft} \) ### Performance The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improvement in inference speed—up to 6 times faster than the pretrained version—while maintaining comparable perplexities. ![Evals](https://images.mobilism.org/?di=PYFQ1N5V) ### Usage ``` from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextGenerationPipeline import re # Load the model and tokenizer model_name = "littleworth/protgpt2-distilled-tiny" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) # Ensure tokenizer is padding from the left tokenizer.padding_side = "left" # Initialize the pipeline text_generator = TextGenerationPipeline( model=model, tokenizer=tokenizer, device=0 ) # specify device if needed # Generate sequences sequences = text_generator( "<|endoftext|>", max_length=100, do_sample=True, top_k=950, repetition_penalty=1.2, num_return_sequences=10, pad_token_id=tokenizer.eos_token_id, # Set pad_token_id to eos_token_id eos_token_id=0, truncation=True, ) for i, seq in enumerate(sequences): seq["generated_text"] = seq["generated_text"].replace("<|endoftext|>", "") # Remove newline characters and non-alphabetical characters seq["generated_text"] = "".join( char for char in seq["generated_text"] if char.isalpha() ) print(f">Seq_{i}") print(seq["generated_text"]) ``` ### Use Cases 1. **High-Throughput Screening in Drug Discovery:** The distilled ProtGPT2 is ideal for rapid screening of mutation effects in protein sequences within pharmaceutical research. For example, it can quickly predict the stability of protein variants in large datasets, speeding up the identification of viable drug targets. 2. **Portable Diagnostics in Healthcare:** This model is suitable for use in handheld diagnostic devices that perform real-time protein analysis in clinical settings. For instance, it can be used in portable devices to analyze blood samples for markers of diseases, providing immediate results to healthcare providers in remote areas. 3. **Interactive Learning Tools in Academia:** The distilled model can be integrated into educational software tools that allow biology students to simulate and study the impact of genetic mutations on protein structures. This hands-on learning helps students understand protein dynamics without the need for high-end computational facilities. ### References - Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531. - Original ProtGPT2 Paper: [Link to paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329459/)