monsoon-nlp commited on
Commit
b2175f4
·
1 Parent(s): 8158986
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,9 +1,47 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
4
  ---
5
 
6
  # DNA and Block Diffusion
7
 
8
- Untrained architecture test using the Block Diffusion architecture and
9
- AgroNT's six-nucleotide-length tokens.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - biology
5
+ - bd3lm
6
+ license: apache-2.0
7
+ base_model: kuleshov-group/bd3lm-owt-block_size1024-pretrain
8
  ---
9
 
10
  # DNA and Block Diffusion
11
 
12
+ Untrained architecture test using the [Block Diffusion](https://github.com/kuleshov-group/bd3lms) architecture and
13
+ [AgroNT](https://huggingface.co/InstaDeepAI/agro-nucleotide-transformer-1b)'s six-nucleotide-length tokens.
14
+
15
+ ### Loading model
16
+
17
+ ```python
18
+ from transformers import AutoModelForMaskedLM
19
+ m = AutoModelForMaskedLM.from_pretrained(
20
+ "monsoon-nlp/dna-blockdiff",
21
+ trust_remote_code=True,
22
+ )
23
+ ```
24
+
25
+ ### Generating text
26
+
27
+ ```bash
28
+ cd bd3lms
29
+ python -u main.py \
30
+ loader.eval_batch_size=1 \
31
+ model=small \
32
+ algo=bd3lm \
33
+ algo.T=900 \
34
+ algo.backbone=hf_dit \
35
+ algo.sampler=analytic \
36
+ data=openwebtext-split \
37
+ model.length=2048 \
38
+ block_size=4 \
39
+ wandb=null \
40
+ mode=sample_eval \
41
+ eval.checkpoint_path="monsoon-nlp/dna-blockdiff" \
42
+ model.attn_backend=sdpa \
43
+ sampling.nucleus_p=0.9 \
44
+ sampling.kv_cache=true \
45
+ sampling.logdir=$PWD/sample_logs/samples_genlen_bd3lm_blocksize4 \
46
+ data.tokenizer_name_or_path="monsoon-nlp/dna-blockdiff"
47
+ ```