cdlib
/

marc-match-ai

pytorch_model_hub_mixin

model_hub_mixin

entity-matching

Model card Files Files and versions

RvanB commited on May 2, 2024

Commit

e928541

·

1 Parent(s): 4cd9732

Update readme

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -22,12 +22,12 @@ Try out our [interactive demo](https://huggingface.co/spaces/cdlib/marc-match-ai
 - Adjustable Matching Threshold: Allows tuning the balance between false positives and false negatives based on specific use cases.
 ## Performance
-This model achieves 98.46% accuracy on our validation set (see our [dataset](https://github.com/cdlib/marc-ai)), and had comparable accuracy with SCSB, Goldrush, and OCLC matching (with and without merging with the WorldCat API). Each matching algorithm was run on a common set of English monographs to produce a union set of all of the algorithms' matches, and a matching threshold of 0.99 was chosen for our model to minimize false positives.  Disagreements between the algorithms were manually reviewed, resulting in false positives and false negatives for those disagreements:
 | Algorithm       | % False Positives | % False Negatives |
 |-----------------|-------------------|-------------------|
 | Goldrush        | 0.30%             | 4.79%             |
 | SCSB            | 0.52%             | 0.40%             |
 | __Our Model__   | __0.23%__         | __1.95%__         |
-| OCLC            | 0.05%             | 2.73%             |
-| OCLC Reconciled | 0.10%             | 1.23%             |

 - Adjustable Matching Threshold: Allows tuning the balance between false positives and false negatives based on specific use cases.
 ## Performance
+This model achieves 98.46% accuracy on our validation set (see our [GitHub repository](https://github.com/cdlib/marc-ai) for datasets).
+It has also had comparable accuracy with SCSB, and Goldrush on a separate set of English monographs. Each matching algorithm was run on a common set to produce a union set of all of the algorithms' matches. Using a matching threshold of 0.99 to minimize false positives, we were able to compare the algorithms' matches. Disagreements between the algorithms were manually reviewed, resulting in false positives and false negatives for those disagreements:
 | Algorithm       | % False Positives | % False Negatives |
 |-----------------|-------------------|-------------------|
 | Goldrush        | 0.30%             | 4.79%             |
 | SCSB            | 0.52%             | 0.40%             |
 | __Our Model__   | __0.23%__         | __1.95%__         |