Update README.md
Browse files
README.md
CHANGED
@@ -40,20 +40,15 @@ Clone the service:
|
|
40 |
|
41 |
Start the service:
|
42 |
|
43 |
-
# With GPU support:
|
44 |
make start
|
45 |
|
46 |
-
# Without GPU support [if you do not have a GPU on your system]
|
47 |
-
make start_no_gpu
|
48 |
-
|
49 |
-
|
50 |
Get the segments of a PDF:
|
51 |
|
52 |
# With visual models
|
53 |
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
54 |
|
55 |
# With non-visual models [with the models in this model card]
|
56 |
-
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
57 |
|
58 |
|
59 |
To stop the server:
|
@@ -104,11 +99,11 @@ Also for training the LightGBM models, we again used this dataset. There are 11
|
|
104 |
1: "Caption"
|
105 |
2: "Footnote"
|
106 |
3: "Formula"
|
107 |
-
4: "
|
108 |
-
5: "
|
109 |
-
6: "
|
110 |
7: "Picture"
|
111 |
-
8: "
|
112 |
9: "Table"
|
113 |
10: "Text"
|
114 |
11: "Title"
|
@@ -126,7 +121,7 @@ As we mentioned at the [Quick Start](#quick-start), you can use the service simp
|
|
126 |
This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
|
127 |
want to use the not visual models, which are the LightGBM models, you can use this command:
|
128 |
|
129 |
-
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
130 |
|
131 |
The shape of the response will be the same in both of these commands.
|
132 |
|
@@ -139,6 +134,8 @@ When the process is done, the output will include a list of SegmentBox elements
|
|
139 |
"width": Width of the segment
|
140 |
"height": Height of the segment
|
141 |
"page_number": Page number which the segment belongs to
|
|
|
|
|
142 |
"text": Text inside the segment
|
143 |
"type": Type of the segment (one of the categories mentioned above)
|
144 |
}
|
|
|
40 |
|
41 |
Start the service:
|
42 |
|
|
|
43 |
make start
|
44 |
|
|
|
|
|
|
|
|
|
45 |
Get the segments of a PDF:
|
46 |
|
47 |
# With visual models
|
48 |
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060
|
49 |
|
50 |
# With non-visual models [with the models in this model card]
|
51 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
|
52 |
|
53 |
|
54 |
To stop the server:
|
|
|
99 |
1: "Caption"
|
100 |
2: "Footnote"
|
101 |
3: "Formula"
|
102 |
+
4: "List item"
|
103 |
+
5: "Page footer"
|
104 |
+
6: "Page header"
|
105 |
7: "Picture"
|
106 |
+
8: "Section header"
|
107 |
9: "Table"
|
108 |
10: "Text"
|
109 |
11: "Title"
|
|
|
121 |
This command will run the code on visual model. So you should be prepared that it will use lots of resources. But if you
|
122 |
want to use the not visual models, which are the LightGBM models, you can use this command:
|
123 |
|
124 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' -F "fast=true" localhost:5060
|
125 |
|
126 |
The shape of the response will be the same in both of these commands.
|
127 |
|
|
|
134 |
"width": Width of the segment
|
135 |
"height": Height of the segment
|
136 |
"page_number": Page number which the segment belongs to
|
137 |
+
"page_width": Width of the page which the segment belongs to
|
138 |
+
"page_height": Width of the page which the segment belongs to
|
139 |
"text": Text inside the segment
|
140 |
"type": Type of the segment (one of the categories mentioned above)
|
141 |
}
|