Simon Clematide commited on
Commit
5f39ead
·
1 Parent(s): d208191

Update README.md to include OCR Quality Assessment and Bloom Filter integration details

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
  ---
2
  license: gpl-3.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
  ---
4
+
5
+ # OCR Quality Assessment using Unigram Language Model
6
+
7
+ This HuggingFace model repository contains a unigram language model built for OCR quality assessment.
8
+
9
+ ## Model & Bloom Filter Integration
10
+
11
+ The build process creates bloom filter dictionaries with the following metadata:
12
+
13
+ - **Version:** A specific version identifier (e.g. v1.0.0)
14
+ - **Language:** The target language (e.g. en)
15
+ - **Model Name:** A short identifier (e.g. wp for Wikipedia)
16
+ - **False Positive Probability:** The target FP probability (e.g. 0.001)
17
+
18
+ The bloom filter dictionaries are first generated in a designated build directory (`BUILD_DIR`). They are then copied into this repository following a _flat hierarchy_ structure. This means all built bloom filter files reside in a single directory (e.g. `/bloom`) without further nested subfolders, ensuring a streamlined layout.
19
+
20
+ ## Deployment Workflow
21
+
22
+ The Makefile targets:
23
+
24
+ - **copy-bloom:** Copies the built bloom filter file to `bloom/`.
25
+ - **commit-bloom:** Automatically stages and commits the update with a descriptive commit message.
26
+ - **push-bloom:** Pushes the commit to the remote repository.
27
+ - **deploy-bloom:** Aggregates the above steps into one deployment command.
28
+
29
+ This integration maintains a modular workflow where build artifacts created in `BUILD_DIR` are rapidly incorporated into the HuggingFace model repository.
30
+
31
+ # ...existing model usage and evaluation instructions...