Simon Clematide
commited on
Commit
·
5f39ead
1
Parent(s):
d208191
Update README.md to include OCR Quality Assessment and Bloom Filter integration details
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
---
|
2 |
license: gpl-3.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: gpl-3.0
|
3 |
---
|
4 |
+
|
5 |
+
# OCR Quality Assessment using Unigram Language Model
|
6 |
+
|
7 |
+
This HuggingFace model repository contains a unigram language model built for OCR quality assessment.
|
8 |
+
|
9 |
+
## Model & Bloom Filter Integration
|
10 |
+
|
11 |
+
The build process creates bloom filter dictionaries with the following metadata:
|
12 |
+
|
13 |
+
- **Version:** A specific version identifier (e.g. v1.0.0)
|
14 |
+
- **Language:** The target language (e.g. en)
|
15 |
+
- **Model Name:** A short identifier (e.g. wp for Wikipedia)
|
16 |
+
- **False Positive Probability:** The target FP probability (e.g. 0.001)
|
17 |
+
|
18 |
+
The bloom filter dictionaries are first generated in a designated build directory (`BUILD_DIR`). They are then copied into this repository following a _flat hierarchy_ structure. This means all built bloom filter files reside in a single directory (e.g. `/bloom`) without further nested subfolders, ensuring a streamlined layout.
|
19 |
+
|
20 |
+
## Deployment Workflow
|
21 |
+
|
22 |
+
The Makefile targets:
|
23 |
+
|
24 |
+
- **copy-bloom:** Copies the built bloom filter file to `bloom/`.
|
25 |
+
- **commit-bloom:** Automatically stages and commits the update with a descriptive commit message.
|
26 |
+
- **push-bloom:** Pushes the commit to the remote repository.
|
27 |
+
- **deploy-bloom:** Aggregates the above steps into one deployment command.
|
28 |
+
|
29 |
+
This integration maintains a modular workflow where build artifacts created in `BUILD_DIR` are rapidly incorporated into the HuggingFace model repository.
|
30 |
+
|
31 |
+
# ...existing model usage and evaluation instructions...
|