Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,15 @@ It enables users — even those with no coding background — to interact with b
|
|
18 |
- **Paper:** [ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks](https://www.biorxiv.org/content/10.1101/2024.04.30.591835v1.full.pdf)
|
19 |
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
### Architecture and Parameters
|
22 |
ChatNT is built on a three‑module design: a 500M‑parameter [Nucleotide Transformer v2](https://www.nature.com/articles/s41592-024-02523-z) DNA encoder pre‑trained on genomes from 850 species
|
23 |
(handling up to 12 kb per sequence, Dalla‑Torre et al., 2024), an English‑aware Perceiver Resampler that linearly projects and gated‑attention compresses
|
@@ -36,10 +45,6 @@ Examples of questions and sequences for each task, as well as additional task in
|
|
36 |
DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and
|
37 |
outputs use the LLaMA tokenizer, augmented with `<DNA>` as a special token to mark sequence insertion points.
|
38 |
|
39 |
-
### Credit and License
|
40 |
-
The DNA encoder is the Nucleotide Transformer v2 ([Dalla‑Torre et al., 2024](https://www.nature.com/articles/s41592-024-02523-z)), and the English decoder is Vicuna‑7B (
|
41 |
-
[Chiang et al., 2023](https://lmsys.org/blog/2023-03-30-vicuna/)). All code and model artifacts are released under ???.
|
42 |
-
|
43 |
### Limitations and Disclaimer
|
44 |
ChatNT can only handle questions related to the 27 tasks it has been trained on. ChatNT is **not** a clinical or diagnostic tool.
|
45 |
It can produce incorrect or “hallucinated” answers, particularly on out‑of‑distribution inputs, and its numeric predictions may suffer digit‑level errors. Confidence
|
|
|
18 |
- **Paper:** [ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks](https://www.biorxiv.org/content/10.1101/2024.04.30.591835v1.full.pdf)
|
19 |
|
20 |
|
21 |
+
### License Summary
|
22 |
+
1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.
|
23 |
+
2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
|
24 |
+
3. You may **not** use the Licensed Models or any of its Outputs in connection with:
|
25 |
+
1. any Commercial Purposes, unless agreed by Us under a separate licence;
|
26 |
+
2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
|
27 |
+
3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
|
28 |
+
4. in violation of any applicable laws and regulations.
|
29 |
+
|
30 |
### Architecture and Parameters
|
31 |
ChatNT is built on a three‑module design: a 500M‑parameter [Nucleotide Transformer v2](https://www.nature.com/articles/s41592-024-02523-z) DNA encoder pre‑trained on genomes from 850 species
|
32 |
(handling up to 12 kb per sequence, Dalla‑Torre et al., 2024), an English‑aware Perceiver Resampler that linearly projects and gated‑attention compresses
|
|
|
45 |
DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and
|
46 |
outputs use the LLaMA tokenizer, augmented with `<DNA>` as a special token to mark sequence insertion points.
|
47 |
|
|
|
|
|
|
|
|
|
48 |
### Limitations and Disclaimer
|
49 |
ChatNT can only handle questions related to the 27 tasks it has been trained on. ChatNT is **not** a clinical or diagnostic tool.
|
50 |
It can produce incorrect or “hallucinated” answers, particularly on out‑of‑distribution inputs, and its numeric predictions may suffer digit‑level errors. Confidence
|