littleworth commited on
Commit
bf78fda
1 Parent(s): 653e8ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -25,12 +25,54 @@ The distilled model, `protgpt2-distilled-tiny`, exhibits a significant improveme
25
 
26
  ![Evals](https://images.mobilism.org/?di=PYFQ1N5V)
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ### Use Cases
30
  1. **High-Throughput Screening in Drug Discovery:** The distilled ProtGPT2 is ideal for rapid screening of mutation effects in protein sequences within pharmaceutical research. For example, it can quickly predict the stability of protein variants in large datasets, speeding up the identification of viable drug targets.
31
  2. **Portable Diagnostics in Healthcare:** This model is suitable for use in handheld diagnostic devices that perform real-time protein analysis in clinical settings. For instance, it can be used in portable devices to analyze blood samples for markers of diseases, providing immediate results to healthcare providers in remote areas.
32
  3. **Interactive Learning Tools in Academia:** The distilled model can be integrated into educational software tools that allow biology students to simulate and study the impact of genetic mutations on protein structures. This hands-on learning helps students understand protein dynamics without the need for high-end computational facilities.
33
- 4.
34
  ### References
35
  - Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531.
36
  - Original ProtGPT2 Paper: [Link to paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329459/)
 
25
 
26
  ![Evals](https://images.mobilism.org/?di=PYFQ1N5V)
27
 
28
+ ### Usage
29
+
30
+ ```
31
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextGenerationPipeline
32
+ import re
33
+
34
+ # Load the model and tokenizer
35
+ model_name = "littleworth/protgpt2-distilled-tiny"
36
+ tokenizer = GPT2Tokenizer.from_pretrained(model_name)
37
+ model = GPT2LMHeadModel.from_pretrained(model_name)
38
+
39
+ # Ensure tokenizer is padding from the left
40
+ tokenizer.padding_side = "left"
41
+
42
+ # Initialize the pipeline
43
+ text_generator = TextGenerationPipeline(
44
+ model=model, tokenizer=tokenizer, device=0
45
+ ) # specify device if needed
46
+
47
+ # Generate sequences
48
+ sequences = text_generator(
49
+ "<|endoftext|>",
50
+ max_length=100,
51
+ do_sample=True,
52
+ top_k=950,
53
+ repetition_penalty=1.2,
54
+ num_return_sequences=10,
55
+ pad_token_id=tokenizer.eos_token_id, # Set pad_token_id to eos_token_id
56
+ eos_token_id=0,
57
+ truncation=True,
58
+ )
59
+
60
+ for i, seq in enumerate(sequences):
61
+ seq["generated_text"] = seq["generated_text"].replace("<|endoftext|>", "")
62
+
63
+ # Remove newline characters and non-alphabetical characters
64
+ seq["generated_text"] = "".join(
65
+ char for char in seq["generated_text"] if char.isalpha()
66
+ )
67
+ print(f">Seq_{i}")
68
+ print(seq["generated_text"])
69
+ ```
70
 
71
  ### Use Cases
72
  1. **High-Throughput Screening in Drug Discovery:** The distilled ProtGPT2 is ideal for rapid screening of mutation effects in protein sequences within pharmaceutical research. For example, it can quickly predict the stability of protein variants in large datasets, speeding up the identification of viable drug targets.
73
  2. **Portable Diagnostics in Healthcare:** This model is suitable for use in handheld diagnostic devices that perform real-time protein analysis in clinical settings. For instance, it can be used in portable devices to analyze blood samples for markers of diseases, providing immediate results to healthcare providers in remote areas.
74
  3. **Interactive Learning Tools in Academia:** The distilled model can be integrated into educational software tools that allow biology students to simulate and study the impact of genetic mutations on protein structures. This hands-on learning helps students understand protein dynamics without the need for high-end computational facilities.
75
+
76
  ### References
77
  - Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531.
78
  - Original ProtGPT2 Paper: [Link to paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329459/)