document_redaction / tools /aws_textract.py

Commit History

Fixed issue in Docker containers built locally without correct folder permissions. Improved config file. Updated Gradio version to fix issue with selecting filtered rows. Minor bug fixes.
a33b955

seanpedrickcase commited on

Implemented Textract document API calls and associated output tracking/download. Fixes to config and cost code implementation. General minor bug fixes.
ed5f8c7

seanpedrickcase commited on

Major update. General code revision. Improved config variables. Dataframe based review frame now includes text, items can be searched and excluded. Costs now estimated. Option for adding cost codes added. Option to extract text only.
0ea8b9e

seanpedrickcase commited on

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements
6319afc

seanpedrickcase commited on

Added features to review dataframe to filter and exclude features based on text. Text should now appear consistently in review_df (for boxes not modified). Larger spacy model returned to use. Gradio upgrade.
66e145d

seanpedrickcase commited on

Laid groundwork for passing in AWS API keys. Duplicate pages option should now work for pages with no text.
7907ad4

seanpedrickcase commited on

Fix bug to identify all handwriting labels. Now only concatenates entity_type boxes if they have different labels.
0d3554e

seanpedrickcase commited on

Updated packages. Reinstituted multithreading with page load, now with order protected. Smaller spacy model used for speed. Textract calls should now be faster
f0c28d7

seanpedrickcase commited on

Started adding in support for custom deny list. Fixed textract call issue. Removed multithreading for now as it mixes up pages
e3365ed

seanpedrickcase commited on

Multithreaded file preparation. Can call Textract without signature detection
9504619

seanpedrickcase commited on

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.
e2aae24

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

Upgraded packages. Fixed some issues with review process. Better progress reporting for user.
5b4b5fb

seanpedrickcase commited on

General improvement in quick image matching and merging
84c83c0

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Added AWS Textract support. Allowed for OCR logs export.
e9c4101

seanpedrickcase commited on