bernardo-de-almeida commited on
Commit
efc34b9
·
verified ·
1 Parent(s): 4c6ceaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -18,6 +18,15 @@ It enables users — even those with no coding background — to interact with b
18
  - **Paper:** [ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks](https://www.biorxiv.org/content/10.1101/2024.04.30.591835v1.full.pdf)
19
 
20
 
 
 
 
 
 
 
 
 
 
21
  ### Architecture and Parameters
22
  ChatNT is built on a three‑module design: a 500M‑parameter [Nucleotide Transformer v2](https://www.nature.com/articles/s41592-024-02523-z) DNA encoder pre‑trained on genomes from 850 species
23
  (handling up to 12 kb per sequence, Dalla‑Torre et al., 2024), an English‑aware Perceiver Resampler that linearly projects and gated‑attention compresses
@@ -36,10 +45,6 @@ Examples of questions and sequences for each task, as well as additional task in
36
  DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and
37
  outputs use the LLaMA tokenizer, augmented with `<DNA>` as a special token to mark sequence insertion points.
38
 
39
- ### Credit and License
40
- The DNA encoder is the Nucleotide Transformer v2 ([Dalla‑Torre et al., 2024](https://www.nature.com/articles/s41592-024-02523-z)), and the English decoder is Vicuna‑7B (
41
- [Chiang et al., 2023](https://lmsys.org/blog/2023-03-30-vicuna/)). All code and model artifacts are released under ???.
42
-
43
  ### Limitations and Disclaimer
44
  ChatNT can only handle questions related to the 27 tasks it has been trained on. ChatNT is **not** a clinical or diagnostic tool.
45
  It can produce incorrect or “hallucinated” answers, particularly on out‑of‑distribution inputs, and its numeric predictions may suffer digit‑level errors. Confidence
 
18
  - **Paper:** [ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks](https://www.biorxiv.org/content/10.1101/2024.04.30.591835v1.full.pdf)
19
 
20
 
21
+ ### License Summary
22
+ 1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.
23
+ 2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
24
+ 3. You may **not** use the Licensed Models or any of its Outputs in connection with:
25
+ 1. any Commercial Purposes, unless agreed by Us under a separate licence;
26
+ 2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
27
+ 3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
28
+ 4. in violation of any applicable laws and regulations.
29
+
30
  ### Architecture and Parameters
31
  ChatNT is built on a three‑module design: a 500M‑parameter [Nucleotide Transformer v2](https://www.nature.com/articles/s41592-024-02523-z) DNA encoder pre‑trained on genomes from 850 species
32
  (handling up to 12 kb per sequence, Dalla‑Torre et al., 2024), an English‑aware Perceiver Resampler that linearly projects and gated‑attention compresses
 
45
  DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and
46
  outputs use the LLaMA tokenizer, augmented with `<DNA>` as a special token to mark sequence insertion points.
47
 
 
 
 
 
48
  ### Limitations and Disclaimer
49
  ChatNT can only handle questions related to the 27 tasks it has been trained on. ChatNT is **not** a clinical or diagnostic tool.
50
  It can produce incorrect or “hallucinated” answers, particularly on out‑of‑distribution inputs, and its numeric predictions may suffer digit‑level errors. Confidence