document_redaction / Dockerfile

Commit History

Adapted Dockerfile for systems with read only file system. Minor package updates.
a7566b9

seanpedrickcase commited on

Fixed issue in Docker containers built locally without correct folder permissions. Improved config file. Updated Gradio version to fix issue with selecting filtered rows. Minor bug fixes.
a33b955

seanpedrickcase commited on

Modified Dockerfile for correct logging folder ownership
0b9e789

seanpedrickcase commited on

Implemented Textract document API calls and associated output tracking/download. Fixes to config and cost code implementation. General minor bug fixes.
ed5f8c7

seanpedrickcase commited on

Updated Dockerfile to remove references to NLTK, as removed from requirements
208e806

seanpedrickcase commited on

Allowed for output files to be saved into user-specific folders. Added deny list capability to xlsx/csv file redaction
dacc782

seanpedrickcase commited on

Allowed for Textract and Comprehend API calls through AWS keys. File preparation function incorporated into main redaction function to avoid needing user to 'check in' during redaction process
391712c

seanpedrickcase commited on

Added git to the correct area in Dockerfile (build as opposed to run area)
520f2c4

seanpedrickcase commited on

Added git to Dockerfile to be able to install git-based custom gradio components
4790eb4

seanpedrickcase commited on

Added tab to be able to compare pages across multiple documents and redact duplicates
a265560

seanpedrickcase commited on

Enhance file handling and UI features: improved Gradio app layout with fill width option, and integrated new settings for deny, and fully redacted lists (placeholders so far). Updated file conversion functions to handle CSV inputs and added CSV review file generation for redactions. Now retains all original and merged redaction boxes.
a770956

seanpedrickcase commited on

Can now define queue size, max file size, and server port in environment variables
dc17f6e

seanpedrickcase commited on

Updated Dockerfile and entrypoint file to hopefully deal correctly with APP_MODE environment variable
7c7fd7c

seanpedrickcase commited on

Moved chmod command to before user switch in Dockerfile
05c20d6

seanpedrickcase commited on

Ensure entrypoint.sh is copied
3dc1171

seanpedrickcase commited on

Modified Dockerfile hopefully to not need Lambda overrides. Looking into custom headers from Cloudfront to try to get them to work
bf7bb79

seanpedrickcase commited on

Created custom csvlogger to try to overcome AWS Lambda's incompatibility with multithread locks
34bd97b

seanpedrickcase commited on

Changed app_mode arg position in dockerfile, changed default to gradio
d0b63c6

seanpedrickcase commited on

Moved entrypoint.sh creation to before user switch to avoid permission errors
7e8c1c9

seanpedrickcase commited on

Updated Dockerfile and requirements to include relevant Lambda packages
3f9e976

seanpedrickcase commited on

Switched start py file through Dockerfile to lambda_entrypoint. Added gradio links from this .py
6622361

seanpedrickcase commited on

Some more debugging. Added aws-lambda-adapter just in case that's useful in AWS Lambda
a3ba5e2

seanpedrickcase commited on

Added option for running redact function through CLI (i.e. not going through Gradio UI or API). Test functions for running this through AWS Lambda.
e5dfae7

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Added TLDExtract cache files so that internet connection is not required
dce6100

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

correctly spelled --no-cache-dir this time
452d304

seanpedrickcase commited on

Unspecifying gradio and spacy in requirements, then reinstalling latest gradio afterwards in Dockerfile. All to try to avoid typer conflict
619a281

seanpedrickcase commited on

Created output folder specifically in Dockerfile
d32c12a

seanpedrickcase commited on

Specify GRADIO_SERVER_NAME variable in Dockerfile as 0.0.0.0
85a7cbf

seanpedrickcase commited on

Modified Dockerfile to run with user 1000. Changed port to standard 7860 and removed server name specification.
71761cb

seanpedrickcase commited on

Added opencv installation to dockerfile and reverted to slim-bookworm
bffbd2b

seanpedrickcase commited on

Changed base python distribution to (hopefully) have access to tesseract-ocr package
5f91219

seanpedrickcase commited on

Added -y to tesseract-ocr installation in Dockerfile
b723aad

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on