jennzhuge commited on
Commit
5d47dac
·
1 Parent(s): 1772551

added requirements

Browse files
Files changed (2) hide show
  1. README.md +12 -1
  2. requirements.txt +4 -1
README.md CHANGED
@@ -12,6 +12,17 @@ pinned: false
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
 
15
- Welcome to Lofi Amazon Rainforest Beats to Hack/AI to's DNA Identifier Tool.
 
 
16
  To get started, upload DNA sequences and the coordinates where you sampled them.
17
  Our tool will output the top three most probable genuses that your sample belongs to based on DNA and environmental factors such as elevation, annual precipitation, or human activity levels of the sample location. You can also see the top three most probable genuses based on DNA similarity alone.
 
 
 
 
 
 
 
 
 
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
 
14
 
15
+ # Welcome to Lofi Amazon Rainforest Beats to Hack/AI to's DNA Identifier Tool.
16
+
17
+ ## Genus Prediction
18
  To get started, upload DNA sequences and the coordinates where you sampled them.
19
  Our tool will output the top three most probable genuses that your sample belongs to based on DNA and environmental factors such as elevation, annual precipitation, or human activity levels of the sample location. You can also see the top three most probable genuses based on DNA similarity alone.
20
+
21
+ ## DNA Embedding Space Visualization
22
+ Prehaps we have a DNA sequence for which the highest genus probability is very low (this could be because scientists have not managed to directly sample any specimens of the genus, so our training dataset, BOLD, doesn't contain any examples), we can still examine the DNA embedding of the sequence in relation to known samples. The t-SNE plots show the embedding space of the top N most common species in the area surrounding the given coordinate. We can see clear group distinctions between species. The following t-SNE plot show how the sample sequence embedding is positioned in the space and identified nearest species clusters.
23
+
24
+ # Downstream Tasks
25
+
26
+ Potential downstream tasks include:
27
+ - Identifying invasive species.
28
+ - Reclassifying wrongly classified species. for example red panda is called a panda, but it's actually more genetically similar to a raccoon.
requirements.txt CHANGED
@@ -2,4 +2,7 @@ huggingface-hub==0.23.2
2
  pandas==2.2.2
3
  torch==2.3.0
4
  tqdm==4.66.4
5
- transformers==4.41.2
 
 
 
 
2
  pandas==2.2.2
3
  torch==2.3.0
4
  tqdm==4.66.4
5
+ transformers==4.41.2
6
+ sklearn
7
+ numpy
8
+ datasets