nferruz commited on
Commit
7e8a9c6
1 Parent(s): 8f56644

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -11,11 +11,11 @@ inference:
11
  # **ZymCTRL**
12
 
13
  ZymCTRL (Enzyme Control) ([ see preprint ](https://www.biorxiv.org/content/10.1101/2024.05.03.592223v1))
14
- is a conditional language model for the generation of artificial functional enzymes. It was trained on the entire BRENDA database of enzymes, comprising over 37 M sequences.
15
  Given a user-defined Enzymatic Commission (EC) number, the model generates protein sequences that fulfill that catalytic reaction.
16
  The generated sequences are ordered, globular, and distant to natural ones, while their intended catalytic properties match those defined by users.
17
 
18
- If you don't know the EC number of your protein of interest, have a look at the BRENDA webpage: https://www.brenda-enzymes.org/ecexplorer.php?browser=1
19
 
20
  See below for information about the model, how to generate sequences, and how to save and rank them by perplexity.
21
 
@@ -23,7 +23,7 @@ See below for information about the model, how to generate sequences, and how to
23
  ZymCTRL is based on the [CTRL Transformer](https://arxiv.org/abs/1909.05858) architecture (which in turn is very similar to ChatGPT) and contains 36 layers
24
  with a model dimensionality of 1280, totaling 738 million parameters.
25
 
26
- ZymCTRL is a decoder-only transformer model pre-trained on the BRENDA database
27
  (version July 2022). The pre-training was done on the raw sequences without FASTA headers,
28
  with the EC classes prepended to each sequence. The databases will be uploaded soon.
29
 
 
11
  # **ZymCTRL**
12
 
13
  ZymCTRL (Enzyme Control) ([ see preprint ](https://www.biorxiv.org/content/10.1101/2024.05.03.592223v1))
14
+ is a conditional language model for the generation of artificial functional enzymes. It was trained on Uniprot database of sequences containing EC annotations, comprising over 37 M sequences.
15
  Given a user-defined Enzymatic Commission (EC) number, the model generates protein sequences that fulfill that catalytic reaction.
16
  The generated sequences are ordered, globular, and distant to natural ones, while their intended catalytic properties match those defined by users.
17
 
18
+ If you don't know the EC number of your protein of interest, have a look for example here: https://www.brenda-enzymes.org/ecexplorer.php?browser=1
19
 
20
  See below for information about the model, how to generate sequences, and how to save and rank them by perplexity.
21
 
 
23
  ZymCTRL is based on the [CTRL Transformer](https://arxiv.org/abs/1909.05858) architecture (which in turn is very similar to ChatGPT) and contains 36 layers
24
  with a model dimensionality of 1280, totaling 738 million parameters.
25
 
26
+ ZymCTRL is a decoder-only transformer model pre-trained on the Uniprot subset of enzyme sequences, totalling 37M sequences.
27
  (version July 2022). The pre-training was done on the raw sequences without FASTA headers,
28
  with the EC classes prepended to each sequence. The databases will be uploaded soon.
29