Spaces:

LofiAmazon
/

LofiAmazonSpace

Running

App Files Files Community

jennzhuge commited on Jun 3, 2024

Commit

08031ca

verified ·

1 Parent(s): 8260f31

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -26,17 +26,17 @@ Enter a DNA sequence and the coordinates where you sampled it. (We can easily ex
 Prehaps we have a DNA sequence for which the highest genus probability is very low (this could be because scientists have not managed to directly sample any specimens of the genus, so our training dataset, BOLD, doesn't contain any examples), we can still examine the DNA embedding of the sequence in relation to known samples. The t-SNE plot shows the embedding space of the top k most common species in the dataset. Use the slider to choose k. We can see clear group distinctions between species.
 ## BarcodeBERT DNA Embeddings.
-The model we used to train the DNA embeddings is BarcodeBERT (https://github.com/Kari-Genomics-Lab/BarcodeBERT). We trained the model on the 'nucraw' column of DNA sequences from the latest release of the BOLD Database (http://www.boldsystems.org/index.php/datapackages/Latest). We followed the preprocessing steps outlined by the BarcodeBERT approach (https://arxiv.org/pdf/2311.02401).
 ## Classification Model Performance and Baseline Comparison of Ecological Layer Data Inclusion
-We trained a fine-tuned single fully connected linear layer on our DNA embeddings with ecological layer data and without ecological layer data to predict genuses. We found that the inclusion on ecological layer data improved our accuracy by _____.
-| Model       | Test Accuracy |
 | ----------- | ------------- |
 | DNA Only    |        %  |
 | DNA & Env   |        %  |
 Our results can be validated with the
-BarcodeBERT-Finetuned-Amazon-DNAOnly (https://huggingface.co/LofiAmazon/BarcodeBERT-Finetuned-Amazon-DNAOnly) and BarcodeBERT-Finetuned-Amazon (https://huggingface.co/LofiAmazon/BarcodeBERT-Finetuned-Amazon) models, testing splits of the BOLD-Embeddings-Amazon (https://huggingface.co/datasets/LofiAmazon/BOLD-Embeddings-Amazon), BOLD-Embeddings-Ecolayers-Amazon (https://huggingface.co/datasets/LofiAmazon/BOLD-Embeddings-Ecolayers-Amazon) datasets, and the fine tuning script on our repo (https://github.com/vshulev/amazon-lofi-beats/blob/master/fine_tune.py).
 ## Future Work and Downstream Tasks
 We describe interesting avenues for future work that were not in scope due to time constraints.

 Prehaps we have a DNA sequence for which the highest genus probability is very low (this could be because scientists have not managed to directly sample any specimens of the genus, so our training dataset, BOLD, doesn't contain any examples), we can still examine the DNA embedding of the sequence in relation to known samples. The t-SNE plot shows the embedding space of the top k most common species in the dataset. Use the slider to choose k. We can see clear group distinctions between species.
 ## BarcodeBERT DNA Embeddings.
+The model we used to train the DNA embeddings is [BarcodeBERT](https://github.com/Kari-Genomics-Lab/BarcodeBERT). We trained the model on the 'nucraw' column of DNA sequences from the latest release of the [BOLD Database](http://www.boldsystems.org/index.php/datapackages/Latest). We followed the preprocessing steps outlined by the BarcodeBERT approach (https://arxiv.org/pdf/2311.02401).
 ## Classification Model Performance and Baseline Comparison of Ecological Layer Data Inclusion
+We trained a fine-tuned single fully connected linear layer on our DNA embeddings with ecological layer data and without ecological layer data to predict genuses. We found that the model achieved a test accuracy of 82%.
+<!-- | Model       | Test Accuracy |
 | ----------- | ------------- |
 | DNA Only    |        %  |
 | DNA & Env   |        %  |
+ -->
 Our results can be validated with the
+[BarcodeBERT-Finetuned-Amazon-DNAOnly](https://huggingface.co/LofiAmazon/BarcodeBERT-Finetuned-Amazon-DNAOnly) and [BarcodeBERT-Finetuned-Amazon](https://huggingface.co/LofiAmazon/BarcodeBERT-Finetuned-Amazon) models, testing splits of the [BOLD-Embeddings-Amazon](https://huggingface.co/datasets/LofiAmazon/BOLD-Embeddings-Amazon), [BOLD-Embeddings-Ecolayers-Amazon](https://huggingface.co/datasets/LofiAmazon/BOLD-Embeddings-Ecolayers-Amazon) datasets, and the fine tuning script on our [repo](https://github.com/vshulev/amazon-lofi-beats/blob/master/fine_tune.py).
 ## Future Work and Downstream Tasks
 We describe interesting avenues for future work that were not in scope due to time constraints.