VyLala commited on
Commit
4adfff6
·
verified ·
1 Parent(s): 59fce1c

Update mtdna_tool_explainer_updated.html

Browse files
Files changed (1) hide show
  1. mtdna_tool_explainer_updated.html +106 -85
mtdna_tool_explainer_updated.html CHANGED
@@ -6,106 +6,127 @@
6
  <title>mtDNA Tool – System Overview</title>
7
 
8
  <style>
9
- body {
10
- background-color: #ffffff !important;
11
- color: #222222 !important;
12
- font-family: Arial, sans-serif !important;
13
- line-height: 1.6 !important;
14
- padding: 2rem !important;
15
- max-width: 900px !important;
16
- margin: auto !important;
17
- }
18
-
19
- h1, h2 {
20
- color: #1a1a1a !important; /* darker headings */
21
- }
22
- img {
23
- max-width: 100%;
24
- border: 1px solid #ccc;
25
- padding: 5px;
26
- background: #fff;
27
- }
28
- code {
29
- background: #f5f5f5;
30
- color: #c7254e;
31
- padding: 2px 4px;
32
- border-radius: 4px;
33
- font-family: Consolas, monospace;
34
- }
35
- .highlight {
36
- background: #ffffcc;
37
- padding: 4px 8px;
38
- border-left: 4px solid #ffcc00;
39
- margin: 1rem 0;
40
- color: #333;
41
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  </style>
43
-
44
  </head>
 
45
  <body>
 
46
 
47
- <h1>mtDNA Location Classifier – Brief System Pipeline and Usage Guide</h1>
48
 
49
- <p>The <strong>mtDNA Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from mtDNA GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p>
50
 
51
- <h2>System Overview Diagram</h2>
52
- <p>The figure below shows the core execution flow—from input accession to final output.</p>
53
- <img src="./A_flowchart_in_the_image_illustrates_a_data_proces.png" alt="mtDNA Pipeline Flowchart">
54
 
55
- <h2>Key Steps</h2>
56
- <ol>
57
- <li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li>
58
 
59
- <li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li>
60
 
61
- <li><strong>Text Extraction & Preprocessing</strong>:
62
- <ul>
63
- <li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li>
64
- <li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li>
65
- </ul>
66
- </li>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- <li><strong>LLM-based Inference (Gemini + RAG)</strong>:
 
69
  <ul>
70
- <li>Chunks are embedded with FAISS and stored for reuse.</li>
71
- <li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li>
 
 
 
72
  </ul>
73
- </li>
74
 
75
- <li><strong>Result Structuring</strong>:
76
  <ul>
77
- <li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li>
78
- <li>Summarized and saved using <code>save_to_excel</code>.</li>
 
 
 
79
  </ul>
80
- </li>
81
- </ol>
82
-
83
- <h2>Output Format</h2>
84
- <p>The final output is an Excel file with the following fields:</p>
85
- <ul>
86
- <li><code>Sample ID</code></li>
87
- <li><code>Predicted Country</code> and <code>Country Explanation</code></li>
88
- <li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li>
89
- <li><code>Sources</code> (links to articles)</li>
90
- <li><code>Time Cost</code></li>
91
- </ul>
92
-
93
- <h2>System Highlights</h2>
94
- <ul>
95
- <li>RAG + Gemini integration for improved explanation and transparency</li>
96
- <li>Excel export for structured research use</li>
97
- <li>Optional ethnic/location/language inference using isolate names</li>
98
- <li>Quality check (e.g., fallback on short explanations, low token count)</li>
99
- <li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li>
100
- </ul>
101
-
102
- <h2>Citation</h2>
103
- <div class="highlight">
104
- Phung, V. (2025). mtDNA Location Classifier. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation
105
- </div>
106
-
107
- <h2>Contact</h2>
108
- <p>If you are a researcher working with historical mtDNA data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p>
109
 
 
 
 
 
 
 
 
 
 
110
  </body>
111
  </html>
 
 
6
  <title>mtDNA Tool – System Overview</title>
7
 
8
  <style>
9
+ .custom-container {
10
+ background-color: #ffffff !important;
11
+ color: #222222 !important;
12
+ font-family: Arial, sans-serif !important;
13
+ line-height: 1.6 !important;
14
+ padding: 2rem !important;
15
+ max-width: 900px !important;
16
+ margin: auto !important;
17
+ }
18
+
19
+ .custom-container h1,
20
+ .custom-container h2,
21
+ .custom-container h3,
22
+ .custom-container strong,
23
+ .custom-container b,
24
+ .custom-container p,
25
+ .custom-container li,
26
+ .custom-container ol,
27
+ .custom-container ul,
28
+ .custom-container span {
29
+ color: #222222 !important;
30
+ font-weight: normal !important;
31
+ }
32
+
33
+ .custom-container h1,
34
+ .custom-container h2 {
35
+ font-weight: bold !important;
36
+ }
37
+
38
+ .custom-container img {
39
+ max-width: 100%;
40
+ border: 1px solid #ccc;
41
+ padding: 5px;
42
+ background: #fff;
43
+ }
44
+
45
+ .custom-container code {
46
+ background: #f5f5f5;
47
+ color: #c7254e;
48
+ padding: 2px 4px;
49
+ border-radius: 4px;
50
+ font-family: Consolas, monospace;
51
+ }
52
+
53
+ .custom-container .highlight {
54
+ background: #ffffcc;
55
+ padding: 4px 8px;
56
+ border-left: 4px solid #ffcc00;
57
+ margin: 1rem 0;
58
+ color: #333 !important;
59
+ }
60
  </style>
 
61
  </head>
62
+
63
  <body>
64
+ <div class="custom-container">
65
 
66
+ <h1>mtDNA Location Classifier – Brief System Pipeline and Usage Guide</h1>
67
 
68
+ <p>The <strong>mtDNA Tool</strong> is a lightweight pipeline designed to help researchers extract metadata such as geographic origin, sample type (ancient/modern), and optional niche labels (e.g., ethnicity, specific location) from mtDNA GenBank accession numbers. It supports batch input and produces structured Excel summaries.</p>
69
 
70
+ <h2>System Overview Diagram</h2>
71
+ <p>The figure below shows the core execution flow—from input accession to final output.</p>
72
+ <img src="./A_flowchart_in_the_image_illustrates_a_data_proces.png" alt="mtDNA Pipeline Flowchart">
73
 
74
+ <h2>Key Steps</h2>
75
+ <ol>
76
+ <li><strong>Input</strong>: One or more GenBank accession numbers are submitted (e.g., via UI, CSV, or text).</li>
77
 
78
+ <li><strong>Metadata Collection</strong>: Using <code>fetch_ncbi_metadata</code>, the pipeline retrieves metadata like country, isolate, collection date, and reference title. If available, supplementary material and full-text articles are parsed using DOI, PubMed, or Google Custom Search.</li>
79
 
80
+ <li><strong>Text Extraction & Preprocessing</strong>:
81
+ <ul>
82
+ <li>All available documents are parsed and cleaned (tables, paragraphs, overlapping sections).</li>
83
+ <li>Text is merged into two formats: a smaller <code>chunk</code> and a full <code>all_output</code>.</li>
84
+ </ul>
85
+ </li>
86
+
87
+ <li><strong>LLM-based Inference (Gemini + RAG)</strong>:
88
+ <ul>
89
+ <li>Chunks are embedded with FAISS and stored for reuse.</li>
90
+ <li>The Gemini model answers specific queries like predicted country, sample type, and any niche label requested by the user.</li>
91
+ </ul>
92
+ </li>
93
+
94
+ <li><strong>Result Structuring</strong>:
95
+ <ul>
96
+ <li>Each output includes predicted fields + explanation text (methods used, quotes, sources).</li>
97
+ <li>Summarized and saved using <code>save_to_excel</code>.</li>
98
+ </ul>
99
+ </li>
100
+ </ol>
101
 
102
+ <h2>Output Format</h2>
103
+ <p>The final output is an Excel file with the following fields:</p>
104
  <ul>
105
+ <li><code>Sample ID</code></li>
106
+ <li><code>Predicted Country</code> and <code>Country Explanation</code></li>
107
+ <li><code>Predicted Sample Type</code> and <code>Sample Type Explanation</code></li>
108
+ <li><code>Sources</code> (links to articles)</li>
109
+ <li><code>Time Cost</code></li>
110
  </ul>
 
111
 
112
+ <h2>System Highlights</h2>
113
  <ul>
114
+ <li>RAG + Gemini integration for improved explanation and transparency</li>
115
+ <li>Excel export for structured research use</li>
116
+ <li>Optional ethnic/location/language inference using isolate names</li>
117
+ <li>Quality check (e.g., fallback on short explanations, low token count)</li>
118
+ <li>Report Button – After results are displayed, users can submit errors or mismatches using the report text box below the output table</li>
119
  </ul>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
+ <h2>Citation</h2>
122
+ <div class="highlight">
123
+ Phung, V. (2025). mtDNA Location Classifier. HuggingFace Spaces. https://huggingface.co/spaces/VyLala/mtDNALocation
124
+ </div>
125
+
126
+ <h2>Contact</h2>
127
+ <p>If you are a researcher working with historical mtDNA data or edge-case accessions and need scalable inference or logging, reach out through the HuggingFace space or email provided in the repo README.</p>
128
+
129
+ </div>
130
  </body>
131
  </html>
132
+