Update README.md
Browse files
README.md
CHANGED
@@ -68,6 +68,7 @@ To stop the server:
|
|
68 |
- [Models](#models)
|
69 |
- [Data](#data)
|
70 |
- [Usage](#usage)
|
|
|
71 |
|
72 |
## Dependencies
|
73 |
* Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
|
@@ -79,7 +80,7 @@ To stop the server:
|
|
79 |
|
80 |
## Models
|
81 |
|
82 |
-
There are two kinds of models in the project. The default model is a visual model which has been trained by
|
83 |
Alibaba Research Group. If you would like to take a look at their original project, you can visit
|
84 |
[this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
|
85 |
and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
|
@@ -161,3 +162,30 @@ excluding "footers" and "footnotes," which are positioned at the end of the outp
|
|
161 |
Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
|
162 |
we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
|
163 |
using distance as a criterion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
- [Models](#models)
|
69 |
- [Data](#data)
|
70 |
- [Usage](#usage)
|
71 |
+
- [Benchmark](#benchmark)
|
72 |
|
73 |
## Dependencies
|
74 |
* Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
|
|
|
80 |
|
81 |
## Models
|
82 |
|
83 |
+
There are two kinds of models in the project. The default model is a visual model (specifically called as Vision Grid Transformer - VGT) which has been trained by
|
84 |
Alibaba Research Group. If you would like to take a look at their original project, you can visit
|
85 |
[this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
|
86 |
and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
|
|
|
162 |
Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
|
163 |
we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
|
164 |
using distance as a criterion.
|
165 |
+
|
166 |
+
|
167 |
+
## Benchmark
|
168 |
+
|
169 |
+
These are the benchmark results for VGT model on PubLayNet dataset:
|
170 |
+
|
171 |
+
<table>
|
172 |
+
<tr>
|
173 |
+
<th>Overall</th>
|
174 |
+
<th>Text</th>
|
175 |
+
<th>Title</th>
|
176 |
+
<th>List</th>
|
177 |
+
<th>Table</th>
|
178 |
+
<th>Figure</th>
|
179 |
+
</tr>
|
180 |
+
<tr>
|
181 |
+
<td>0.962</td>
|
182 |
+
<td>0.950</td>
|
183 |
+
<td>0.939</td>
|
184 |
+
<td>0.968</td>
|
185 |
+
<td>0.981</td>
|
186 |
+
<td>0.971</td>
|
187 |
+
</tr>
|
188 |
+
</table>
|
189 |
+
|
190 |
+
You can check this link to see the comparison with the other models:
|
191 |
+
https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val
|