Update README.md
Browse files
README.md
CHANGED
@@ -7,17 +7,12 @@ license: apache-2.0
|
|
7 |
---
|
8 |
|
9 |
|
10 |
-
|
11 |
-
## β¨ Latest News
|
12 |
-
|
13 |
-
- [11/06/2024]: Our paper is available on arXiv. You can access it [here](https://arxiv.org/abs/2411.02959).
|
14 |
-
- [11/05/2024]: The open-source toolkit and models are released. You can apply HtmlRAG in your own RAG systems now.
|
15 |
-
|
16 |
-
|
17 |
## Model Information
|
18 |
|
|
|
|
|
19 |
<p align="left">
|
20 |
-
|
21 |
</p>
|
22 |
|
23 |
We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
|
|
|
7 |
---
|
8 |
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
## Model Information
|
11 |
|
12 |
+
We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
|
13 |
+
|
14 |
<p align="left">
|
15 |
+
Useful links: π <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> β’ π€ <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> ⒠𧩠<a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a>
|
16 |
</p>
|
17 |
|
18 |
We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
|