pmillana commited on
Commit
2fa1eca
·
verified ·
1 Parent(s): 3888382

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ # BarcodeBERT for Taxonomic Classification
8
+
9
+ A pre-trained transformer model for inference on insect DNA barcoding data.
10
+
11
+ To use **BarcodeBERT** as a feature extractor:
12
+
13
+ ```python
14
+ from transformers import AutoTokenizer, AutoModel
15
+
16
+ # Load the tokenizer
17
+ tokenizer = AutoTokenizer.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)
18
+
19
+ #Load the model
20
+ model = AutoModel.from_pretrained("bioscan-ml/BarcodeBERT", trust_remote_code=True)
21
+
22
+ # Sample sequence
23
+ dna_seq = 'ACGCGCTGACGCATCAGCATACGA'
24
+
25
+ # Tokenize
26
+ input_seq = tokenizer(dna_seq, return_tensors = 'pt')['input_ids']
27
+
28
+ # Pass through the model
29
+ output = model(input_seq)['hidden_states'][-1]
30
+
31
+ # Compute Global Average Pooling
32
+ features = output.mean(1)
33
+ ```
34
+
35
+ ## Citation
36
+
37
+ If you find BarcodeBERT useful in your research please consider citing:
38
+
39
+ @misc{arias2023barcodebert,
40
+ title={{BarcodeBERT}: Transformers for Biodiversity Analysis},
41
+ author={Pablo Millan Arias
42
+ and Niousha Sadjadi
43
+ and Monireh Safari
44
+ and ZeMing Gong
45
+ and Austin T. Wang
46
+ and Scott C. Lowe
47
+ and Joakim Bruslund Haurum
48
+ and Iuliia Zarubiieva
49
+ and Dirk Steinke
50
+ and Lila Kari
51
+ and Angel X. Chang
52
+ and Graham W. Taylor
53
+ },
54
+ year={2023},
55
+ eprint={2311.02401},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.LG},
58
+ doi={10.48550/arxiv.2311.02401},
59
+ }
60
+
61
+
62
+