Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,8 @@ ZymCTRL is based on the CTRL Transformer architecture and contains 36 layers wit
|
|
10 |
ZymCTRL is a decoder-only transformer model pre-trained on the BRENDA database (version July 2022). The pre-training was done on the raw sequences without FASTA headers, with the EC classes prepended to each sequence. The databases can be found here: xx.
|
11 |
|
12 |
ZymCTRL was trained with an autoregressive objective, i.e., the model learns to predict the next token given a sequence context. Because the first tokens on each sequence encode the EC numbers, the model learns the dependencies among EC classes and their corresponding sequences, and is able to _speak_ the enzyme language.
|
|
|
|
|
13 |
|
14 |
### **How to use ZymCTRL**
|
15 |
ZymCTRL can be used with the HuggingFace transformer python package. Detailed installation instructions can be found here: https://huggingface.co/docs/transformers/installation
|
|
|
10 |
ZymCTRL is a decoder-only transformer model pre-trained on the BRENDA database (version July 2022). The pre-training was done on the raw sequences without FASTA headers, with the EC classes prepended to each sequence. The databases can be found here: xx.
|
11 |
|
12 |
ZymCTRL was trained with an autoregressive objective, i.e., the model learns to predict the next token given a sequence context. Because the first tokens on each sequence encode the EC numbers, the model learns the dependencies among EC classes and their corresponding sequences, and is able to _speak_ the enzyme language.
|
13 |
+
|
14 |
+
Because there are stark differences in the number of members among EC classes, we also tokenized the EC numbers. In this manner, EC numbers '2.7.1.1' and '2.7.1.2' share the first three tokens (six including separators).
|
15 |
|
16 |
### **How to use ZymCTRL**
|
17 |
ZymCTRL can be used with the HuggingFace transformer python package. Detailed installation instructions can be found here: https://huggingface.co/docs/transformers/installation
|