Transformers
ali6parmak commited on
Commit
26f2535
·
verified ·
1 Parent(s): 808ef5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -68,6 +68,7 @@ To stop the server:
68
  - [Models](#models)
69
  - [Data](#data)
70
  - [Usage](#usage)
 
71
 
72
  ## Dependencies
73
  * Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
@@ -79,7 +80,7 @@ To stop the server:
79
 
80
  ## Models
81
 
82
- There are two kinds of models in the project. The default model is a visual model which has been trained by
83
  Alibaba Research Group. If you would like to take a look at their original project, you can visit
84
  [this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
85
  and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
@@ -161,3 +162,30 @@ excluding "footers" and "footnotes," which are positioned at the end of the outp
161
  Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
162
  we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
163
  using distance as a criterion.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  - [Models](#models)
69
  - [Data](#data)
70
  - [Usage](#usage)
71
+ - [Benchmark](#benchmark)
72
 
73
  ## Dependencies
74
  * Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
 
80
 
81
  ## Models
82
 
83
+ There are two kinds of models in the project. The default model is a visual model (specifically called as Vision Grid Transformer - VGT) which has been trained by
84
  Alibaba Research Group. If you would like to take a look at their original project, you can visit
85
  [this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
86
  and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
 
162
  Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
163
  we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
164
  using distance as a criterion.
165
+
166
+
167
+ ## Benchmark
168
+
169
+ These are the benchmark results for VGT model on PubLayNet dataset:
170
+
171
+ <table>
172
+ <tr>
173
+ <th>Overall</th>
174
+ <th>Text</th>
175
+ <th>Title</th>
176
+ <th>List</th>
177
+ <th>Table</th>
178
+ <th>Figure</th>
179
+ </tr>
180
+ <tr>
181
+ <td>0.962</td>
182
+ <td>0.950</td>
183
+ <td>0.939</td>
184
+ <td>0.968</td>
185
+ <td>0.981</td>
186
+ <td>0.971</td>
187
+ </tr>
188
+ </table>
189
+
190
+ You can check this link to see the comparison with the other models:
191
+ https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val