Commit
·
9f51e70
1
Parent(s):
10da194
Updated CDK code for custom KMS keys, new VPCs. Minor package updates.
Browse files- .dockerignore +1 -0
- .github/workflows/check_file_size.yml +0 -16
- .gitignore +1 -0
- README.md +1 -1
- cdk/cdk_config.py +10 -2
- cdk/cdk_stack.py +271 -179
- cdk/check_resources.py +134 -133
- cdk/post_cdk_build_quickstart.py +2 -2
- cdk/requirements.txt +3 -3
- index.qmd +1 -1
- pyproject.toml +5 -5
- requirements.txt +4 -4
- src/installation_guide.qmd +106 -84
.dockerignore
CHANGED
@@ -13,6 +13,7 @@ tesseract/*
|
|
13 |
poppler/*
|
14 |
build/*
|
15 |
dist/*
|
|
|
16 |
build_deps/*
|
17 |
user_guide/*
|
18 |
cdk/config/*
|
|
|
13 |
poppler/*
|
14 |
build/*
|
15 |
dist/*
|
16 |
+
docs/*
|
17 |
build_deps/*
|
18 |
user_guide/*
|
19 |
cdk/config/*
|
.github/workflows/check_file_size.yml
DELETED
@@ -1,16 +0,0 @@
|
|
1 |
-
name: Check file size
|
2 |
-
on: # or directly `on: [push]` to run the action on every push on any branch
|
3 |
-
pull_request:
|
4 |
-
branches: [main]
|
5 |
-
|
6 |
-
# to run this workflow manually from the Actions tab
|
7 |
-
workflow_dispatch:
|
8 |
-
|
9 |
-
jobs:
|
10 |
-
sync-to-hub:
|
11 |
-
runs-on: ubuntu-latest
|
12 |
-
steps:
|
13 |
-
- name: Check large files
|
14 |
-
uses: ActionsDesk/[email protected]
|
15 |
-
with:
|
16 |
-
filesizelimit: 10485760 # this is 10MB so we can sync to HF Spaces
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.gitignore
CHANGED
@@ -25,6 +25,7 @@ cdk/cdk.out/*
|
|
25 |
cdk/archive/*
|
26 |
tld/*
|
27 |
tmp/*
|
|
|
28 |
cdk.out/*
|
29 |
cdk.json
|
30 |
cdk.context.json
|
|
|
25 |
cdk/archive/*
|
26 |
tld/*
|
27 |
tmp/*
|
28 |
+
docs/*
|
29 |
cdk.out/*
|
30 |
cdk.json
|
31 |
cdk.context.json
|
README.md
CHANGED
@@ -10,7 +10,7 @@ license: agpl-3.0
|
|
10 |
---
|
11 |
# Document redaction
|
12 |
|
13 |
-
version: 0.7.
|
14 |
|
15 |
Redact personally identifiable information (PII) from documents (pdf, images), open text, or tabular data (xlsx/csv/parquet). Please see the [User Guide](#user-guide) for a walkthrough on how to use the app. Below is a very brief overview.
|
16 |
|
|
|
10 |
---
|
11 |
# Document redaction
|
12 |
|
13 |
+
version: 0.7.1
|
14 |
|
15 |
Redact personally identifiable information (PII) from documents (pdf, images), open text, or tabular data (xlsx/csv/parquet). Please see the [User Guide](#user-guide) for a walkthrough on how to use the app. Below is a very brief overview.
|
16 |
|
cdk/cdk_config.py
CHANGED
@@ -82,8 +82,12 @@ CONTEXT_FILE = get_or_create_env_var('CONTEXT_FILE', 'cdk.context.json') # Defin
|
|
82 |
CDK_FOLDER = get_or_create_env_var('CDK_FOLDER', '') # FULL_PATH_TO_CDK_FOLDER_HERE (with forward slash)
|
83 |
RUN_USEAST_STACK = get_or_create_env_var('RUN_USEAST_STACK', 'False')
|
84 |
|
85 |
-
### VPC
|
86 |
VPC_NAME = get_or_create_env_var('VPC_NAME', '')
|
|
|
|
|
|
|
|
|
87 |
EXISTING_IGW_ID = get_or_create_env_var('EXISTING_IGW_ID', '')
|
88 |
SINGLE_NAT_GATEWAY_ID = get_or_create_env_var('SINGLE_NAT_GATEWAY_ID', '')
|
89 |
|
@@ -122,6 +126,10 @@ ECR_CDK_REPO_NAME = get_or_create_env_var('ECR_CDK_REPO_NAME', f"{CDK_PREFIX}{EC
|
|
122 |
S3_LOG_CONFIG_BUCKET_NAME = get_or_create_env_var('S3_LOG_CONFIG_BUCKET_NAME', f"{CDK_PREFIX}s3-logs".lower()) # S3 bucket names need to be lower case
|
123 |
S3_OUTPUT_BUCKET_NAME = get_or_create_env_var('S3_OUTPUT_BUCKET_NAME', f"{CDK_PREFIX}s3-output".lower())
|
124 |
|
|
|
|
|
|
|
|
|
125 |
### ECS
|
126 |
FARGATE_TASK_DEFINITION_NAME = get_or_create_env_var('FARGATE_TASK_DEFINITION_NAME', f"{CDK_PREFIX}FargateTaskDefinition")
|
127 |
TASK_DEFINITION_FILE_LOCATION = get_or_create_env_var('TASK_DEFINITION_FILE_LOCATION', CDK_FOLDER + CONFIG_FOLDER + "task_definition.json")
|
@@ -160,7 +168,7 @@ CLOUDFRONT_DOMAIN = get_or_create_env_var('CLOUDFRONT_DOMAIN', "cloudfront_place
|
|
160 |
|
161 |
|
162 |
# Certificate for Application load balancer (optional, for HTTPS and logins through the ALB)
|
163 |
-
|
164 |
SSL_CERTIFICATE_DOMAIN = get_or_create_env_var('SSL_CERTIFICATE_DOMAIN', '') # e.g. example.com or www.example.com
|
165 |
|
166 |
# This should be the CloudFront domain, the domain linked to your ACM certificate, or the DNS of your application load balancer in console afterwards
|
|
|
82 |
CDK_FOLDER = get_or_create_env_var('CDK_FOLDER', '') # FULL_PATH_TO_CDK_FOLDER_HERE (with forward slash)
|
83 |
RUN_USEAST_STACK = get_or_create_env_var('RUN_USEAST_STACK', 'False')
|
84 |
|
85 |
+
### VPC and connections
|
86 |
VPC_NAME = get_or_create_env_var('VPC_NAME', '')
|
87 |
+
NEW_VPC_DEFAULT_NAME = get_or_create_env_var('NEW_VPC_DEFAULT_NAME', f'{CDK_PREFIX}vpc')
|
88 |
+
NEW_VPC_CIDR = get_or_create_env_var('NEW_VPC_CIDR', '') # "10.0.0.0/24"
|
89 |
+
|
90 |
+
|
91 |
EXISTING_IGW_ID = get_or_create_env_var('EXISTING_IGW_ID', '')
|
92 |
SINGLE_NAT_GATEWAY_ID = get_or_create_env_var('SINGLE_NAT_GATEWAY_ID', '')
|
93 |
|
|
|
126 |
S3_LOG_CONFIG_BUCKET_NAME = get_or_create_env_var('S3_LOG_CONFIG_BUCKET_NAME', f"{CDK_PREFIX}s3-logs".lower()) # S3 bucket names need to be lower case
|
127 |
S3_OUTPUT_BUCKET_NAME = get_or_create_env_var('S3_OUTPUT_BUCKET_NAME', f"{CDK_PREFIX}s3-output".lower())
|
128 |
|
129 |
+
### KMS KEYS FOR S3 AND SECRETS MANAGER
|
130 |
+
USE_CUSTOM_KMS_KEY = get_or_create_env_var('USE_CUSTOM_KMS_KEY', '1')
|
131 |
+
CUSTOM_KMS_KEY_NAME = get_or_create_env_var('CUSTOM_KMS_KEY_NAME', f"alias/{CDK_PREFIX}kms-key".lower())
|
132 |
+
|
133 |
### ECS
|
134 |
FARGATE_TASK_DEFINITION_NAME = get_or_create_env_var('FARGATE_TASK_DEFINITION_NAME', f"{CDK_PREFIX}FargateTaskDefinition")
|
135 |
TASK_DEFINITION_FILE_LOCATION = get_or_create_env_var('TASK_DEFINITION_FILE_LOCATION', CDK_FOLDER + CONFIG_FOLDER + "task_definition.json")
|
|
|
168 |
|
169 |
|
170 |
# Certificate for Application load balancer (optional, for HTTPS and logins through the ALB)
|
171 |
+
ACM_SSL_CERTIFICATE_ARN = get_or_create_env_var('ACM_SSL_CERTIFICATE_ARN', '')
|
172 |
SSL_CERTIFICATE_DOMAIN = get_or_create_env_var('SSL_CERTIFICATE_DOMAIN', '') # e.g. example.com or www.example.com
|
173 |
|
174 |
# This should be the CloudFront domain, the domain linked to your ACM certificate, or the DNS of your application load balancer in console afterwards
|
cdk/cdk_stack.py
CHANGED
@@ -21,11 +21,12 @@ from aws_cdk import (
|
|
21 |
aws_elasticloadbalancingv2 as elbv2,
|
22 |
aws_logs as logs,
|
23 |
aws_wafv2 as wafv2,
|
|
|
24 |
aws_dynamodb as dynamodb # Import the DynamoDB module
|
25 |
)
|
26 |
|
27 |
from constructs import Construct
|
28 |
-
from cdk_config import CDK_PREFIX, VPC_NAME, AWS_MANAGED_TASK_ROLES_LIST, GITHUB_REPO_USERNAME, GITHUB_REPO_NAME, GITHUB_REPO_BRANCH, ECS_TASK_MEMORY_SIZE, ECS_TASK_CPU_SIZE, CUSTOM_HEADER, CUSTOM_HEADER_VALUE, AWS_REGION, CLOUDFRONT_GEO_RESTRICTION, DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS, GRADIO_SERVER_PORT, PUBLIC_SUBNETS_TO_USE, PUBLIC_SUBNET_CIDR_BLOCKS, PUBLIC_SUBNET_AVAILABILITY_ZONES, PRIVATE_SUBNETS_TO_USE, PRIVATE_SUBNET_CIDR_BLOCKS, PRIVATE_SUBNET_AVAILABILITY_ZONES, CODEBUILD_PROJECT_NAME, ECS_SECURITY_GROUP_NAME, ALB_NAME_SECURITY_GROUP_NAME, ALB_NAME, COGNITO_USER_POOL_NAME, COGNITO_USER_POOL_CLIENT_NAME, COGNITO_USER_POOL_CLIENT_SECRET_NAME, FARGATE_TASK_DEFINITION_NAME, ECS_SERVICE_NAME, WEB_ACL_NAME, CLOUDFRONT_DISTRIBUTION_NAME, ECS_TASK_ROLE_NAME, ALB_TARGET_GROUP_NAME, S3_LOG_CONFIG_BUCKET_NAME, S3_OUTPUT_BUCKET_NAME,
|
29 |
from cdk_functions import create_subnets, create_web_acl_with_common_rules, add_custom_policies, add_alb_https_listener_with_cert, create_nat_gateway # Only keep CDK-native functions
|
30 |
|
31 |
def _get_env_list(env_var_name: str) -> List[str]:
|
@@ -67,20 +68,80 @@ class CdkStack(Stack):
|
|
67 |
print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
|
68 |
return []
|
69 |
# Optional: Add validation that all items in the list are dicts
|
70 |
-
return ctx_value
|
|
|
|
|
71 |
|
72 |
|
73 |
# --- VPC and Subnets (Assuming VPC is always lookup, Subnets are created/returned by create_subnets) ---
|
74 |
-
|
75 |
-
|
76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
self,
|
78 |
-
"
|
79 |
-
vpc_name=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
)
|
81 |
-
print("Successfully
|
82 |
-
|
83 |
-
|
|
|
|
|
|
|
84 |
|
85 |
# --- Subnet Handling (Check Context and Create/Import) ---
|
86 |
# Initialize lists to hold ISubnet objects (L2) and CfnSubnet/CfnRouteTable (L1)
|
@@ -100,35 +161,68 @@ class CdkStack(Stack):
|
|
100 |
print("vpc.public_subnets:", vpc.public_subnets)
|
101 |
print("vpc.private_subnets:", vpc.private_subnets)
|
102 |
|
103 |
-
|
104 |
-
|
|
|
|
|
|
|
|
|
|
|
105 |
|
106 |
-
|
107 |
-
# for subnet in vpc.public_subnets:
|
108 |
-
# az = subnet.availability_zone
|
109 |
-
# if az not in public_subnets_by_az:
|
110 |
-
# public_subnets_by_az[az] = []
|
111 |
-
# public_subnets_by_az[az].append(subnet)
|
112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
selected_public_subnets = vpc.select_subnets(subnet_type=ec2.SubnetType.PUBLIC, one_per_az=True)
|
114 |
private_subnets_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, one_per_az=True)
|
115 |
-
private_subnets_isolated = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED, one_per_az=True)
|
116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
combined_subnet_objects = []
|
118 |
|
119 |
-
if
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
122 |
else:
|
123 |
self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found to select the first one.")
|
124 |
|
125 |
# Add all PRIVATE_ISOLATED subnets *except* the first one (if they exist)
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
|
|
|
|
|
|
132 |
|
133 |
# Create an ec2.SelectedSubnets object from the combined private subnet list.
|
134 |
selected_private_subnets = vpc.select_subnets(
|
@@ -139,29 +233,6 @@ class CdkStack(Stack):
|
|
139 |
print("selected_private_subnets:", selected_private_subnets)
|
140 |
|
141 |
|
142 |
-
#self.private_route_tables_cfn = []
|
143 |
-
|
144 |
-
# for subnet in vpc.private_subnets:
|
145 |
-
# az = subnet.availability_zone
|
146 |
-
# if az not in private_subnets_by_az:
|
147 |
-
# private_subnets_by_az[az] = []
|
148 |
-
# private_subnets_by_az[az].append(subnet)
|
149 |
-
|
150 |
-
#selected_public_subnets: List[ec2.ISubnet] = []
|
151 |
-
#selected_private_subnets: List[ec2.ISubnet] = []
|
152 |
-
|
153 |
-
# Select one public subnet per AZ, preferring the first one found
|
154 |
-
# for az in sorted(public_subnets_by_az.keys()):
|
155 |
-
# if public_subnets_by_az[az]:
|
156 |
-
# selected_public_subnets.append(public_subnets_by_az[az][0])
|
157 |
-
# print(f"Selected existing public subnet: {public_subnets_by_az[az][0].subnet_id} from AZ {az}.")
|
158 |
-
|
159 |
-
# Select one private subnet per AZ, preferring the first one found
|
160 |
-
# for az in sorted(private_subnets_by_az.keys()):
|
161 |
-
# if private_subnets_by_az[az]:
|
162 |
-
# selected_private_subnets.append(private_subnets_by_az[az][0])
|
163 |
-
# print(f"Selected existing private subnet: {private_subnets_by_az[az][0].subnet_id} from AZ {az}.")
|
164 |
-
|
165 |
if len(selected_public_subnets.subnet_ids) < 2 or len(selected_private_subnets.subnet_ids) < 2:
|
166 |
raise Exception("Need at least two public or private subnets in different availability zones")
|
167 |
|
@@ -222,11 +293,11 @@ class CdkStack(Stack):
|
|
222 |
self.public_subnets.extend(newly_created_public_subnets)
|
223 |
self.public_route_tables_cfn.extend(newly_created_public_rts_cfn)
|
224 |
|
225 |
-
if not self.public_subnets:
|
226 |
-
raise Exception("No public subnets found or created, exiting.")
|
227 |
-
|
228 |
|
229 |
# --- NAT Gateway Creation/Lookup ---
|
|
|
230 |
self.single_nat_gateway_id = None
|
231 |
|
232 |
nat_gw_id_from_context = SINGLE_NAT_GATEWAY_ID
|
@@ -234,9 +305,20 @@ class CdkStack(Stack):
|
|
234 |
if nat_gw_id_from_context:
|
235 |
print(f"Using existing NAT Gateway ID from context: {nat_gw_id_from_context}")
|
236 |
self.single_nat_gateway_id = nat_gw_id_from_context
|
237 |
-
|
238 |
-
|
239 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
240 |
print("NAT Gateway ID not found in context. Creating a new one.")
|
241 |
# Place the NAT GW in the first available public subnet
|
242 |
first_public_subnet = self.public_subnets[0]
|
@@ -248,7 +330,7 @@ class CdkStack(Stack):
|
|
248 |
nat_gateway_id_context_key=SINGLE_NAT_GATEWAY_ID
|
249 |
)
|
250 |
else:
|
251 |
-
print("WARNING: No public subnets available. Cannot create a NAT Gateway.")
|
252 |
|
253 |
|
254 |
# --- 4. Process Private Subnets ---
|
@@ -280,17 +362,24 @@ class CdkStack(Stack):
|
|
280 |
else:
|
281 |
print("No private subnets specified for creation in context ('private_subnets_to_create').")
|
282 |
|
283 |
-
if not self.private_subnets:
|
|
|
|
|
|
|
|
|
284 |
raise Exception("No private subnets found or created, exiting.")
|
285 |
|
286 |
# --- 5. Sanity Check and Output ---
|
287 |
-
|
288 |
-
# Output the single NAT Gateway ID for verification
|
289 |
if self.single_nat_gateway_id:
|
290 |
CfnOutput(self, "SingleNatGatewayId", value=self.single_nat_gateway_id,
|
291 |
-
description="ID of the single NAT Gateway
|
|
|
|
|
292 |
else:
|
293 |
-
|
|
|
|
|
294 |
|
295 |
# --- Outputs for other stacks/regions ---
|
296 |
# These are crucial for cross-stack, cross-region referencing
|
@@ -303,34 +392,6 @@ class CdkStack(Stack):
|
|
303 |
self.params["public_route_tables"] = self.public_route_tables_cfn
|
304 |
|
305 |
|
306 |
-
#class CdkStackMain(Stack):
|
307 |
-
# def __init__(self, scope: Construct, construct_id: str, private_subnets:List[ec2.ISubnet]=[], private_route_tables: List[ec2.CfnRouteTable]=[], public_subnets:List[ec2.ISubnet]=[], public_route_tables: List[ec2.CfnRouteTable]=[], **kwargs) -> None:
|
308 |
-
# super().__init__(scope, construct_id, **kwargs)
|
309 |
-
|
310 |
-
# --- Helper to get context values ---
|
311 |
-
# def get_context_bool(key: str, default: bool = False) -> bool:
|
312 |
-
# return self.node.try_get_context(key) or default
|
313 |
-
|
314 |
-
# def get_context_str(key: str, default: str = None) -> str:
|
315 |
-
# return self.node.try_get_context(key) or default
|
316 |
-
|
317 |
-
# def get_context_dict(key: str, default: dict = None) -> dict:
|
318 |
-
# return self.node.try_get_context(key) or default
|
319 |
-
|
320 |
-
# def get_context_list_of_dicts(key: str) -> List[Dict[str, Any]]:
|
321 |
-
# ctx_value = self.node.try_get_context(key)
|
322 |
-
|
323 |
-
# if not isinstance(ctx_value, list):
|
324 |
-
# print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
|
325 |
-
# return []
|
326 |
-
# # Optional: Add validation that all items in the list are dicts
|
327 |
-
# return ctx_value
|
328 |
-
|
329 |
-
# self.private_subnets: List[ec2.ISubnet] = private_subnets
|
330 |
-
# self.private_route_tables_cfn: List[ec2.CfnRouteTable] = private_route_tables
|
331 |
-
# self.public_subnets: List[ec2.ISubnet] = public_subnets
|
332 |
-
# self.public_route_tables_cfn: List[ec2.CfnRouteTable] = public_route_tables
|
333 |
-
|
334 |
private_subnet_selection = ec2.SubnetSelection(subnets=self.private_subnets)
|
335 |
public_subnet_selection = ec2.SubnetSelection(subnets=self.public_subnets)
|
336 |
|
@@ -340,16 +401,6 @@ class CdkStack(Stack):
|
|
340 |
for sub in public_subnet_selection.subnets:
|
341 |
print("public subnet:", sub.subnet_id, "is in availability zone:", sub.availability_zone)
|
342 |
|
343 |
-
# try:
|
344 |
-
# vpc = ec2.Vpc.from_lookup(
|
345 |
-
# self,
|
346 |
-
# "VPC",
|
347 |
-
# vpc_name=VPC_NAME
|
348 |
-
# )
|
349 |
-
# print("Successfully looked up VPC")
|
350 |
-
# except Exception as e:
|
351 |
-
# raise Exception(f"Could not look up VPC with name '{VPC_NAME}' due to: {e}")
|
352 |
-
|
353 |
print("Private subnet route tables:", self.private_route_tables_cfn)
|
354 |
|
355 |
# Add the S3 Gateway Endpoint to the VPC
|
@@ -368,31 +419,60 @@ class CdkStack(Stack):
|
|
368 |
description="The id for the S3 Gateway Endpoint.") # Specify the S3 service
|
369 |
|
370 |
# --- IAM Roles ---
|
371 |
-
|
372 |
-
|
373 |
-
|
374 |
-
|
375 |
-
"
|
376 |
-
|
377 |
-
|
378 |
-
|
379 |
-
|
380 |
-
|
381 |
-
|
382 |
-
|
383 |
-
|
384 |
-
|
385 |
-
|
386 |
-
|
387 |
-
|
388 |
-
|
389 |
-
|
390 |
-
|
391 |
-
|
392 |
-
|
|
|
|
|
|
|
393 |
}
|
394 |
-
|
395 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
396 |
|
397 |
if get_context_bool(f"exists:{codebuild_role_name}"):
|
398 |
# If exists, lookup/import the role using ARN from context
|
@@ -458,13 +538,21 @@ class CdkStack(Stack):
|
|
458 |
bucket = s3.Bucket.from_bucket_name(self, "LogConfigBucket", bucket_name=log_bucket_name)
|
459 |
print("Using existing S3 bucket", log_bucket_name)
|
460 |
else:
|
461 |
-
|
462 |
-
|
463 |
-
|
464 |
-
|
465 |
-
|
466 |
-
|
467 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
468 |
print("Created S3 bucket", log_bucket_name)
|
469 |
|
470 |
# Add policies - this will apply to both created and imported buckets
|
@@ -491,18 +579,31 @@ class CdkStack(Stack):
|
|
491 |
output_bucket = s3.Bucket.from_bucket_name(self, "OutputBucket", bucket_name=output_bucket_name)
|
492 |
print("Using existing Output bucket", output_bucket_name)
|
493 |
else:
|
494 |
-
|
495 |
-
|
|
|
496 |
s3.LifecycleRule(
|
497 |
expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
|
498 |
)
|
499 |
],
|
500 |
-
versioned=False,
|
501 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
502 |
removal_policy=RemovalPolicy.DESTROY,
|
503 |
-
# IMPORTANT: Set auto_delete_objects to True to empty the bucket before deletion
|
504 |
auto_delete_objects=True
|
505 |
-
|
|
|
506 |
print("Created Output bucket:", output_bucket_name)
|
507 |
|
508 |
# Add policies to output bucket
|
@@ -602,14 +703,7 @@ class CdkStack(Stack):
|
|
602 |
# --- Security Groups ---
|
603 |
try:
|
604 |
ecs_security_group_name = ECS_SECURITY_GROUP_NAME
|
605 |
-
|
606 |
-
# Use CDK's from_lookup_by_name which handles lookup or throws an error if not found
|
607 |
-
#try:
|
608 |
-
# ecs_security_group = ec2.SecurityGroup.from_lookup_by_name(
|
609 |
-
# self, "ECSSecurityGroup", vpc=vpc, security_group_name=ecs_security_group_name
|
610 |
-
# )
|
611 |
-
# print(f"Using existing Security Group: {ecs_security_group_name}")
|
612 |
-
# except Exception: # If lookup fails, create
|
613 |
try:
|
614 |
ecs_security_group = ec2.SecurityGroup(
|
615 |
self,
|
@@ -622,12 +716,7 @@ class CdkStack(Stack):
|
|
622 |
print("Failed to create ECS security group due to:", e)
|
623 |
|
624 |
alb_security_group_name = ALB_NAME_SECURITY_GROUP_NAME
|
625 |
-
|
626 |
-
# alb_security_group = ec2.SecurityGroup.from_lookup_by_name(
|
627 |
-
# self, "ALBSecurityGroup", vpc=vpc, security_group_name=alb_security_group_name
|
628 |
-
# )
|
629 |
-
# print(f"Using existing Security Group: {alb_security_group_name}")
|
630 |
-
# except Exception: # If lookup fails, create
|
631 |
try:
|
632 |
alb_security_group = ec2.SecurityGroup(
|
633 |
self,
|
@@ -717,8 +806,6 @@ class CdkStack(Stack):
|
|
717 |
print("Successfully created new Application Load Balancer")
|
718 |
except Exception as e:
|
719 |
raise Exception("Could not handle application load balancer due to:", e)
|
720 |
-
|
721 |
-
|
722 |
|
723 |
# --- Cognito User Pool ---
|
724 |
try:
|
@@ -738,7 +825,7 @@ class CdkStack(Stack):
|
|
738 |
print(f"Created new user pool {user_pool.user_pool_id}.")
|
739 |
|
740 |
# If you're using a certificate, assume that you will be using the ALB Cognito login features. You need different redirect URLs to accept the token that comes from Cognito authentication.
|
741 |
-
if
|
742 |
redirect_uris = [COGNITO_REDIRECTION_URL, COGNITO_REDIRECTION_URL + "/oauth2/idpresponse"]
|
743 |
else:
|
744 |
redirect_uris = [COGNITO_REDIRECTION_URL]
|
@@ -786,31 +873,39 @@ class CdkStack(Stack):
|
|
786 |
|
787 |
# --- Secrets Manager Secret ---
|
788 |
try:
|
789 |
-
|
790 |
-
|
791 |
# Lookup by name
|
792 |
secret = secretsmanager.Secret.from_secret_name_v2(self, "CognitoSecret", secret_name=secret_name)
|
793 |
print(f"Using existing Secret {secret_name}.")
|
794 |
-
|
795 |
-
|
796 |
-
|
797 |
-
|
798 |
-
|
799 |
-
|
800 |
-
|
801 |
-
|
802 |
-
|
803 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
804 |
|
805 |
except Exception as e:
|
806 |
raise Exception("Could not handle Secrets Manager secret due to:", e)
|
807 |
|
808 |
# --- Fargate Task Definition ---
|
809 |
try:
|
810 |
-
# For task definitions, re-creating with the same logical ID creates new revisions.
|
811 |
-
# If you want to use a *specific existing revision*, you'd need to look it up by ARN.
|
812 |
-
# If you want to update the latest revision, defining it here is the standard.
|
813 |
-
# Let's assume we always define it here to get revision management.
|
814 |
fargate_task_definition_name = FARGATE_TASK_DEFINITION_NAME
|
815 |
|
816 |
read_only_file_system = ECS_READ_ONLY_FILE_SYSTEM == 'True'
|
@@ -906,8 +1001,8 @@ class CdkStack(Stack):
|
|
906 |
|
907 |
cdk_managed_log_group = logs.LogGroup(self, "MyTaskLogGroup", # CDK Logical ID
|
908 |
log_group_name=log_group_name_from_config,
|
909 |
-
retention=logs.RetentionDays.ONE_MONTH,
|
910 |
-
removal_policy=RemovalPolicy.DESTROY
|
911 |
)
|
912 |
|
913 |
epheremal_storage_volume_cdk_obj = ecs.Volume(
|
@@ -926,10 +1021,7 @@ class CdkStack(Stack):
|
|
926 |
cpu_architecture=ecs.CpuArchitecture.X86_64,
|
927 |
operating_system_family=ecs.OperatingSystemFamily.LINUX
|
928 |
),
|
929 |
-
# 1. Specify the total ephemeral storage for the task
|
930 |
ephemeral_storage_gib=21, # Minimum is 21 GiB
|
931 |
-
# 2. Define the volume at the task level
|
932 |
-
# This volume will use the ephemeral storage configured above.
|
933 |
volumes=[epheremal_storage_volume_cdk_obj]
|
934 |
)
|
935 |
print("Fargate task definition defined.")
|
@@ -1093,7 +1185,7 @@ class CdkStack(Stack):
|
|
1093 |
print(f"ALB listener on port {listener_port} defined.")
|
1094 |
|
1095 |
|
1096 |
-
if
|
1097 |
http_listener.add_action(
|
1098 |
"DefaultAction", # Logical ID for the default action
|
1099 |
action=elbv2.ListenerAction.redirect(protocol='HTTPS',
|
@@ -1135,7 +1227,7 @@ class CdkStack(Stack):
|
|
1135 |
print("Added targets and actions to ALB HTTP listener.")
|
1136 |
|
1137 |
# Now the same for HTTPS if you have an ACM certificate
|
1138 |
-
if
|
1139 |
listener_port_https = 443
|
1140 |
# Check if Listener exists - from_listener_arn or lookup by port/ALB
|
1141 |
|
@@ -1143,7 +1235,7 @@ class CdkStack(Stack):
|
|
1143 |
self,
|
1144 |
"MyHttpsListener", # Logical ID for the HTTPS listener
|
1145 |
alb,
|
1146 |
-
|
1147 |
default_target_group=target_group,
|
1148 |
enable_cognito_auth=True,
|
1149 |
cognito_user_pool=user_pool,
|
|
|
21 |
aws_elasticloadbalancingv2 as elbv2,
|
22 |
aws_logs as logs,
|
23 |
aws_wafv2 as wafv2,
|
24 |
+
aws_kms as kms,
|
25 |
aws_dynamodb as dynamodb # Import the DynamoDB module
|
26 |
)
|
27 |
|
28 |
from constructs import Construct
|
29 |
+
from cdk_config import CDK_PREFIX, VPC_NAME, AWS_MANAGED_TASK_ROLES_LIST, GITHUB_REPO_USERNAME, GITHUB_REPO_NAME, GITHUB_REPO_BRANCH, ECS_TASK_MEMORY_SIZE, ECS_TASK_CPU_SIZE, CUSTOM_HEADER, CUSTOM_HEADER_VALUE, AWS_REGION, CLOUDFRONT_GEO_RESTRICTION, DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS, GRADIO_SERVER_PORT, PUBLIC_SUBNETS_TO_USE, PUBLIC_SUBNET_CIDR_BLOCKS, PUBLIC_SUBNET_AVAILABILITY_ZONES, PRIVATE_SUBNETS_TO_USE, PRIVATE_SUBNET_CIDR_BLOCKS, PRIVATE_SUBNET_AVAILABILITY_ZONES, CODEBUILD_PROJECT_NAME, ECS_SECURITY_GROUP_NAME, ALB_NAME_SECURITY_GROUP_NAME, ALB_NAME, COGNITO_USER_POOL_NAME, COGNITO_USER_POOL_CLIENT_NAME, COGNITO_USER_POOL_CLIENT_SECRET_NAME, FARGATE_TASK_DEFINITION_NAME, ECS_SERVICE_NAME, WEB_ACL_NAME, CLOUDFRONT_DISTRIBUTION_NAME, ECS_TASK_ROLE_NAME, ALB_TARGET_GROUP_NAME, S3_LOG_CONFIG_BUCKET_NAME, S3_OUTPUT_BUCKET_NAME, ACM_SSL_CERTIFICATE_ARN, CLUSTER_NAME, CODEBUILD_ROLE_NAME, ECS_TASK_EXECUTION_ROLE_NAME, ECR_CDK_REPO_NAME, ECS_LOG_GROUP_NAME, SAVE_LOGS_TO_DYNAMODB, ACCESS_LOG_DYNAMODB_TABLE_NAME, FEEDBACK_LOG_DYNAMODB_TABLE_NAME, USAGE_LOG_DYNAMODB_TABLE_NAME, TASK_DEFINITION_FILE_LOCATION, EXISTING_IGW_ID, SINGLE_NAT_GATEWAY_ID, NAT_GATEWAY_NAME, COGNITO_USER_POOL_DOMAIN_PREFIX, COGNITO_REDIRECTION_URL, AWS_ACCOUNT_ID, ECS_USE_FARGATE_SPOT, ECS_READ_ONLY_FILE_SYSTEM, USE_CLOUDFRONT, LOAD_BALANCER_WEB_ACL_NAME, NEW_VPC_DEFAULT_NAME, NEW_VPC_CIDR, USE_CUSTOM_KMS_KEY, S3_KMS_KEY_NAME
|
30 |
from cdk_functions import create_subnets, create_web_acl_with_common_rules, add_custom_policies, add_alb_https_listener_with_cert, create_nat_gateway # Only keep CDK-native functions
|
31 |
|
32 |
def _get_env_list(env_var_name: str) -> List[str]:
|
|
|
68 |
print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
|
69 |
return []
|
70 |
# Optional: Add validation that all items in the list are dicts
|
71 |
+
return ctx_value
|
72 |
+
|
73 |
+
self.template_options.description = "Deployment of the 'doc_redaction' PDF, image, and XLSX/CSV redaction app. Git repo available at: https://github.com/seanpedrick-case/doc_redaction."
|
74 |
|
75 |
|
76 |
# --- VPC and Subnets (Assuming VPC is always lookup, Subnets are created/returned by create_subnets) ---
|
77 |
+
new_vpc_created = False
|
78 |
+
if VPC_NAME:
|
79 |
+
print("Looking for current VPC:", VPC_NAME)
|
80 |
+
try:
|
81 |
+
vpc = ec2.Vpc.from_lookup(
|
82 |
+
self,
|
83 |
+
"VPC",
|
84 |
+
vpc_name=VPC_NAME
|
85 |
+
)
|
86 |
+
print("Successfully looked up VPC:", vpc.vpc_id)
|
87 |
+
except Exception as e:
|
88 |
+
raise Exception(f"Could not look up VPC with name '{VPC_NAME}' due to: {e}")
|
89 |
+
|
90 |
+
elif NEW_VPC_DEFAULT_NAME:
|
91 |
+
new_vpc_created = True
|
92 |
+
print(f"NEW_VPC_DEFAULT_NAME ('{NEW_VPC_DEFAULT_NAME}') is set. Creating a new VPC.")
|
93 |
+
|
94 |
+
# Configuration for the new VPC
|
95 |
+
# You can make these configurable via context as well, e.g.,
|
96 |
+
# new_vpc_cidr = self.node.try_get_context("new_vpc_cidr") or "10.0.0.0/24"
|
97 |
+
# new_vpc_max_azs = self.node.try_get_context("new_vpc_max_azs") or 2 # Use 2 AZs by default for HA
|
98 |
+
# new_vpc_nat_gateways = self.node.try_get_context("new_vpc_nat_gateways") or new_vpc_max_azs # One NAT GW per AZ for HA
|
99 |
+
# or 1 for cost savings if acceptable
|
100 |
+
if not NEW_VPC_CIDR:
|
101 |
+
raise Exception("App has been instructed to create a new VPC but not VPC CDR range provided to variable NEW_VPC_CIDR")
|
102 |
+
|
103 |
+
print("Provided NEW_VPC_CIDR range:", NEW_VPC_CIDR)
|
104 |
+
|
105 |
+
new_vpc_cidr = NEW_VPC_CIDR
|
106 |
+
new_vpc_max_azs = 2 # Creates resources in 2 AZs. Adjust as needed.
|
107 |
+
|
108 |
+
# For "a NAT gateway", you can set nat_gateways=1.
|
109 |
+
# For resilience (NAT GW per AZ), set nat_gateways=new_vpc_max_azs.
|
110 |
+
# The Vpc construct will create NAT Gateway(s) if subnet_type PRIVATE_WITH_EGRESS is used
|
111 |
+
# and nat_gateways > 0.
|
112 |
+
new_vpc_nat_gateways = 1 # Creates a single NAT Gateway for cost-effectiveness.
|
113 |
+
# If you need one per AZ for higher availability, set this to new_vpc_max_azs.
|
114 |
+
|
115 |
+
vpc = ec2.Vpc(
|
116 |
self,
|
117 |
+
"MyNewLogicalVpc", # This is the CDK construct ID
|
118 |
+
vpc_name=NEW_VPC_DEFAULT_NAME,
|
119 |
+
ip_addresses=ec2.IpAddresses.cidr(new_vpc_cidr),
|
120 |
+
max_azs=new_vpc_max_azs,
|
121 |
+
nat_gateways=new_vpc_nat_gateways, # Number of NAT gateways to create
|
122 |
+
subnet_configuration=[
|
123 |
+
ec2.SubnetConfiguration(
|
124 |
+
name="Public", # Name prefix for public subnets
|
125 |
+
subnet_type=ec2.SubnetType.PUBLIC,
|
126 |
+
cidr_mask=28 # Adjust CIDR mask as needed (e.g., /24 provides ~250 IPs per subnet)
|
127 |
+
),
|
128 |
+
ec2.SubnetConfiguration(
|
129 |
+
name="Private", # Name prefix for private subnets
|
130 |
+
subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, # Ensures these subnets have NAT Gateway access
|
131 |
+
cidr_mask=28 # Adjust CIDR mask as needed
|
132 |
+
)
|
133 |
+
# You could also add ec2.SubnetType.PRIVATE_ISOLATED if needed
|
134 |
+
],
|
135 |
+
# Internet Gateway is created and configured automatically for PUBLIC subnets.
|
136 |
+
# Route tables for public subnets will point to the IGW.
|
137 |
+
# Route tables for PRIVATE_WITH_EGRESS subnets will point to the NAT Gateway(s).
|
138 |
)
|
139 |
+
print(f"Successfully created new VPC: {vpc.vpc_id} with name '{NEW_VPC_DEFAULT_NAME}'")
|
140 |
+
# If nat_gateways > 0, vpc.nat_gateway_ips will contain EIPs if Vpc created them.
|
141 |
+
# vpc.public_subnets, vpc.private_subnets, vpc.isolated_subnets are populated.
|
142 |
+
|
143 |
+
else:
|
144 |
+
raise Exception("VPC_NAME for current VPC not found, and NEW_VPC_DEFAULT_NAME not found to create a new VPC")
|
145 |
|
146 |
# --- Subnet Handling (Check Context and Create/Import) ---
|
147 |
# Initialize lists to hold ISubnet objects (L2) and CfnSubnet/CfnRouteTable (L1)
|
|
|
161 |
print("vpc.public_subnets:", vpc.public_subnets)
|
162 |
print("vpc.private_subnets:", vpc.private_subnets)
|
163 |
|
164 |
+
if vpc.public_subnets: # These are already one_per_az if max_azs was used and Vpc created them
|
165 |
+
self.public_subnets.extend(vpc.public_subnets)
|
166 |
+
else:
|
167 |
+
self.node.add_warning("No public subnets found in the VPC.")
|
168 |
+
|
169 |
+
# Get private subnets with egress specifically
|
170 |
+
#selected_private_subnets_with_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS)
|
171 |
|
172 |
+
print(f"Selected from VPC: {len(self.public_subnets)} public, {len(self.private_subnets)} private_with_egress subnets.")
|
|
|
|
|
|
|
|
|
|
|
173 |
|
174 |
+
if len(self.public_subnets) < 1 or len(self.private_subnets) < 1 : # Simplified check for new VPC
|
175 |
+
# If new_vpc_max_azs was 1, you'd have 1 of each. If 2, then 2 of each.
|
176 |
+
# The original check ' < 2' might be too strict if new_vpc_max_azs=1
|
177 |
+
pass # For new VPC, allow single AZ setups if configured that way. The VPC construct ensures one per AZ up to max_azs.
|
178 |
+
|
179 |
+
if not self.public_subnets and not self.private_subnets:
|
180 |
+
print("Error: No public or private subnets could be found in the VPC for automatic selection. "
|
181 |
+
"You must either specify subnets in *_SUBNETS_TO_USE or ensure the VPC has discoverable subnets.")
|
182 |
+
raise RuntimeError("No suitable subnets found for automatic selection.")
|
183 |
+
else:
|
184 |
+
print(f"Automatically selected {len(self.public_subnets)} public and {len(self.private_subnets)} private subnets based on VPC properties.")
|
185 |
+
|
186 |
selected_public_subnets = vpc.select_subnets(subnet_type=ec2.SubnetType.PUBLIC, one_per_az=True)
|
187 |
private_subnets_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, one_per_az=True)
|
|
|
188 |
|
189 |
+
if private_subnets_egress.subnets:
|
190 |
+
self.private_subnets.extend(private_subnets_egress.subnets)
|
191 |
+
else:
|
192 |
+
self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found in the VPC.")
|
193 |
+
|
194 |
+
try:
|
195 |
+
private_subnets_isolated = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED, one_per_az=True)
|
196 |
+
except Exception as e:
|
197 |
+
private_subnets_isolated = []
|
198 |
+
print("Could not find any isolated subnets due to:", e)
|
199 |
+
|
200 |
+
|
201 |
+
|
202 |
+
###
|
203 |
combined_subnet_objects = []
|
204 |
|
205 |
+
if private_subnets_isolated:
|
206 |
+
if private_subnets_egress.subnets:
|
207 |
+
# Add the first PRIVATE_WITH_EGRESS subnet
|
208 |
+
combined_subnet_objects.append(private_subnets_egress.subnets[0])
|
209 |
+
elif not private_subnets_isolated:
|
210 |
+
if private_subnets_egress.subnets:
|
211 |
+
# Add the first PRIVATE_WITH_EGRESS subnet
|
212 |
+
combined_subnet_objects.extend(private_subnets_egress.subnets)
|
213 |
else:
|
214 |
self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found to select the first one.")
|
215 |
|
216 |
# Add all PRIVATE_ISOLATED subnets *except* the first one (if they exist)
|
217 |
+
try:
|
218 |
+
if len(private_subnets_isolated.subnets) > 1:
|
219 |
+
combined_subnet_objects.extend(private_subnets_isolated.subnets[1:])
|
220 |
+
elif private_subnets_isolated.subnets: # Only 1 isolated subnet, add a warning if [1:] was desired
|
221 |
+
self.node.add_warning("Only one PRIVATE_ISOLATED subnet found, private_subnets_isolated.subnets[1:] will be empty.")
|
222 |
+
else:
|
223 |
+
self.node.add_warning("No PRIVATE_ISOLATED subnets found.")
|
224 |
+
except Exception as e:
|
225 |
+
print("Could not identify private isolated subnets due to:", e)
|
226 |
|
227 |
# Create an ec2.SelectedSubnets object from the combined private subnet list.
|
228 |
selected_private_subnets = vpc.select_subnets(
|
|
|
233 |
print("selected_private_subnets:", selected_private_subnets)
|
234 |
|
235 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
236 |
if len(selected_public_subnets.subnet_ids) < 2 or len(selected_private_subnets.subnet_ids) < 2:
|
237 |
raise Exception("Need at least two public or private subnets in different availability zones")
|
238 |
|
|
|
293 |
self.public_subnets.extend(newly_created_public_subnets)
|
294 |
self.public_route_tables_cfn.extend(newly_created_public_rts_cfn)
|
295 |
|
296 |
+
if not self.public_subnets and not names_to_create_public and not PUBLIC_SUBNETS_TO_USE :
|
297 |
+
raise Exception("No public subnets found or created, exiting.")
|
|
|
298 |
|
299 |
# --- NAT Gateway Creation/Lookup ---
|
300 |
+
print("Creating NAT gateway/located existing")
|
301 |
self.single_nat_gateway_id = None
|
302 |
|
303 |
nat_gw_id_from_context = SINGLE_NAT_GATEWAY_ID
|
|
|
305 |
if nat_gw_id_from_context:
|
306 |
print(f"Using existing NAT Gateway ID from context: {nat_gw_id_from_context}")
|
307 |
self.single_nat_gateway_id = nat_gw_id_from_context
|
308 |
+
|
309 |
+
elif new_vpc_created and new_vpc_nat_gateways > 0 and hasattr(vpc, 'nat_gateways') and vpc.nat_gateways:
|
310 |
+
self.single_nat_gateway_id = vpc.nat_gateways[0].gateway_id
|
311 |
+
print(f"Using NAT Gateway {self.single_nat_gateway_id} created by the new VPC construct.")
|
312 |
+
|
313 |
+
if not self.single_nat_gateway_id:
|
314 |
+
print("Creating a new NAT gateway")
|
315 |
+
|
316 |
+
if hasattr(vpc, 'nat_gateways') and vpc.nat_gateways:
|
317 |
+
print("Existing NAT gateway found in vpc")
|
318 |
+
pass
|
319 |
+
|
320 |
+
# If not in context, create a new one, but only if we have a public subnet.
|
321 |
+
elif self.public_subnets:
|
322 |
print("NAT Gateway ID not found in context. Creating a new one.")
|
323 |
# Place the NAT GW in the first available public subnet
|
324 |
first_public_subnet = self.public_subnets[0]
|
|
|
330 |
nat_gateway_id_context_key=SINGLE_NAT_GATEWAY_ID
|
331 |
)
|
332 |
else:
|
333 |
+
print("WARNING: No public subnets available and NAT gateway not found in existing VPC. Cannot create a NAT Gateway.")
|
334 |
|
335 |
|
336 |
# --- 4. Process Private Subnets ---
|
|
|
362 |
else:
|
363 |
print("No private subnets specified for creation in context ('private_subnets_to_create').")
|
364 |
|
365 |
+
# if not self.private_subnets:
|
366 |
+
# raise Exception("No private subnets found or created, exiting.")
|
367 |
+
|
368 |
+
if not self.private_subnets and not names_to_create_private and not PRIVATE_SUBNETS_TO_USE:
|
369 |
+
# This condition might need adjustment for new VPCs.
|
370 |
raise Exception("No private subnets found or created, exiting.")
|
371 |
|
372 |
# --- 5. Sanity Check and Output ---
|
373 |
+
# Output the single NAT Gateway ID for verification
|
|
|
374 |
if self.single_nat_gateway_id:
|
375 |
CfnOutput(self, "SingleNatGatewayId", value=self.single_nat_gateway_id,
|
376 |
+
description="ID of the single NAT Gateway resolved or created.")
|
377 |
+
elif NEW_VPC_DEFAULT_NAME and (self.node.try_get_context("new_vpc_nat_gateways") or 1) > 0:
|
378 |
+
print("INFO: A new VPC was created with NAT Gateway(s). Their routing is handled by the VPC construct. No single_nat_gateway_id was explicitly set for separate output.")
|
379 |
else:
|
380 |
+
out_message = "WARNING: No single NAT Gateway was resolved or created explicitly by the script's logic after VPC setup."
|
381 |
+
print(out_message)
|
382 |
+
raise Exception(out_message)
|
383 |
|
384 |
# --- Outputs for other stacks/regions ---
|
385 |
# These are crucial for cross-stack, cross-region referencing
|
|
|
392 |
self.params["public_route_tables"] = self.public_route_tables_cfn
|
393 |
|
394 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
395 |
private_subnet_selection = ec2.SubnetSelection(subnets=self.private_subnets)
|
396 |
public_subnet_selection = ec2.SubnetSelection(subnets=self.public_subnets)
|
397 |
|
|
|
401 |
for sub in public_subnet_selection.subnets:
|
402 |
print("public subnet:", sub.subnet_id, "is in availability zone:", sub.availability_zone)
|
403 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
404 |
print("Private subnet route tables:", self.private_route_tables_cfn)
|
405 |
|
406 |
# Add the S3 Gateway Endpoint to the VPC
|
|
|
419 |
description="The id for the S3 Gateway Endpoint.") # Specify the S3 service
|
420 |
|
421 |
# --- IAM Roles ---
|
422 |
+
if USE_CUSTOM_KMS_KEY == '1':
|
423 |
+
kms_key = kms.Key(self, "RedactionSharedKmsKey", alias=S3_KMS_KEY_NAME, removal_policy=RemovalPolicy.DESTROY)
|
424 |
+
|
425 |
+
custom_sts_kms_policy_dict = {
|
426 |
+
"Version": "2012-10-17",
|
427 |
+
"Statement": [
|
428 |
+
{
|
429 |
+
"Sid": "STSCallerIdentity",
|
430 |
+
"Effect": "Allow",
|
431 |
+
"Action": [
|
432 |
+
"sts:GetCallerIdentity"
|
433 |
+
],
|
434 |
+
"Resource": "*"
|
435 |
+
},
|
436 |
+
{
|
437 |
+
"Sid": "KMSAccess",
|
438 |
+
"Effect": "Allow",
|
439 |
+
"Action": [
|
440 |
+
"kms:Encrypt",
|
441 |
+
"kms:Decrypt",
|
442 |
+
"kms:GenerateDataKey"
|
443 |
+
],
|
444 |
+
"Resource": kms_key.key_arn # Use key_arn, as it's the full ARN, safer than key_id
|
445 |
+
}
|
446 |
+
]
|
447 |
}
|
448 |
+
else:
|
449 |
+
kms_key = None
|
450 |
+
|
451 |
+
custom_sts_kms_policy_dict = {
|
452 |
+
"Version": "2012-10-17",
|
453 |
+
"Statement": [
|
454 |
+
{
|
455 |
+
"Sid": "STSCallerIdentity",
|
456 |
+
"Effect": "Allow",
|
457 |
+
"Action": [
|
458 |
+
"sts:GetCallerIdentity"
|
459 |
+
],
|
460 |
+
"Resource": "*"
|
461 |
+
} ,
|
462 |
+
{
|
463 |
+
"Sid": "KMSSecretsManagerDecrypt", # Explicitly add decrypt for default key
|
464 |
+
"Effect": "Allow",
|
465 |
+
"Action": [
|
466 |
+
"kms:Decrypt"
|
467 |
+
],
|
468 |
+
"Resource": f"arn:aws:kms:{AWS_REGION}:{AWS_ACCOUNT_ID}:key/aws/secretsmanager"
|
469 |
+
}
|
470 |
+
]
|
471 |
+
}
|
472 |
+
custom_sts_kms_policy = json.dumps(custom_sts_kms_policy_dict, indent=4)
|
473 |
+
|
474 |
+
try:
|
475 |
+
codebuild_role_name = CODEBUILD_ROLE_NAME
|
476 |
|
477 |
if get_context_bool(f"exists:{codebuild_role_name}"):
|
478 |
# If exists, lookup/import the role using ARN from context
|
|
|
538 |
bucket = s3.Bucket.from_bucket_name(self, "LogConfigBucket", bucket_name=log_bucket_name)
|
539 |
print("Using existing S3 bucket", log_bucket_name)
|
540 |
else:
|
541 |
+
if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
|
542 |
+
bucket = s3.Bucket(self, "LogConfigBucket", bucket_name=log_bucket_name,
|
543 |
+
versioned=False,
|
544 |
+
removal_policy=RemovalPolicy.DESTROY,
|
545 |
+
auto_delete_objects=True,
|
546 |
+
encryption=s3.BucketEncryption.KMS,
|
547 |
+
encryption_key=kms_key
|
548 |
+
)
|
549 |
+
else:
|
550 |
+
bucket = s3.Bucket(self, "LogConfigBucket", bucket_name=log_bucket_name,
|
551 |
+
versioned=False,
|
552 |
+
removal_policy=RemovalPolicy.DESTROY,
|
553 |
+
auto_delete_objects=True
|
554 |
+
)
|
555 |
+
|
556 |
print("Created S3 bucket", log_bucket_name)
|
557 |
|
558 |
# Add policies - this will apply to both created and imported buckets
|
|
|
579 |
output_bucket = s3.Bucket.from_bucket_name(self, "OutputBucket", bucket_name=output_bucket_name)
|
580 |
print("Using existing Output bucket", output_bucket_name)
|
581 |
else:
|
582 |
+
if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
|
583 |
+
output_bucket = s3.Bucket(self, "OutputBucket", bucket_name=output_bucket_name,
|
584 |
+
lifecycle_rules=[
|
585 |
s3.LifecycleRule(
|
586 |
expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
|
587 |
)
|
588 |
],
|
589 |
+
versioned=False,
|
590 |
+
removal_policy=RemovalPolicy.DESTROY,
|
591 |
+
auto_delete_objects=True,
|
592 |
+
encryption=s3.BucketEncryption.KMS,
|
593 |
+
encryption_key=kms_key
|
594 |
+
)
|
595 |
+
else:
|
596 |
+
output_bucket = s3.Bucket(self, "OutputBucket", bucket_name=output_bucket_name,
|
597 |
+
lifecycle_rules=[
|
598 |
+
s3.LifecycleRule(
|
599 |
+
expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
|
600 |
+
)
|
601 |
+
],
|
602 |
+
versioned=False,
|
603 |
removal_policy=RemovalPolicy.DESTROY,
|
|
|
604 |
auto_delete_objects=True
|
605 |
+
)
|
606 |
+
|
607 |
print("Created Output bucket:", output_bucket_name)
|
608 |
|
609 |
# Add policies to output bucket
|
|
|
703 |
# --- Security Groups ---
|
704 |
try:
|
705 |
ecs_security_group_name = ECS_SECURITY_GROUP_NAME
|
706 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
707 |
try:
|
708 |
ecs_security_group = ec2.SecurityGroup(
|
709 |
self,
|
|
|
716 |
print("Failed to create ECS security group due to:", e)
|
717 |
|
718 |
alb_security_group_name = ALB_NAME_SECURITY_GROUP_NAME
|
719 |
+
|
|
|
|
|
|
|
|
|
|
|
720 |
try:
|
721 |
alb_security_group = ec2.SecurityGroup(
|
722 |
self,
|
|
|
806 |
print("Successfully created new Application Load Balancer")
|
807 |
except Exception as e:
|
808 |
raise Exception("Could not handle application load balancer due to:", e)
|
|
|
|
|
809 |
|
810 |
# --- Cognito User Pool ---
|
811 |
try:
|
|
|
825 |
print(f"Created new user pool {user_pool.user_pool_id}.")
|
826 |
|
827 |
# If you're using a certificate, assume that you will be using the ALB Cognito login features. You need different redirect URLs to accept the token that comes from Cognito authentication.
|
828 |
+
if ACM_SSL_CERTIFICATE_ARN:
|
829 |
redirect_uris = [COGNITO_REDIRECTION_URL, COGNITO_REDIRECTION_URL + "/oauth2/idpresponse"]
|
830 |
else:
|
831 |
redirect_uris = [COGNITO_REDIRECTION_URL]
|
|
|
873 |
|
874 |
# --- Secrets Manager Secret ---
|
875 |
try:
|
876 |
+
secret_name = COGNITO_USER_POOL_CLIENT_SECRET_NAME
|
877 |
+
if get_context_bool(f"exists:{secret_name}"):
|
878 |
# Lookup by name
|
879 |
secret = secretsmanager.Secret.from_secret_name_v2(self, "CognitoSecret", secret_name=secret_name)
|
880 |
print(f"Using existing Secret {secret_name}.")
|
881 |
+
else:
|
882 |
+
if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
|
883 |
+
secret = secretsmanager.Secret(self, "CognitoSecret", # Logical ID
|
884 |
+
secret_name=secret_name, # Explicit resource name
|
885 |
+
secret_object_value={
|
886 |
+
"REDACTION_USER_POOL_ID": SecretValue.unsafe_plain_text(user_pool.user_pool_id), # Use the CDK attribute
|
887 |
+
"REDACTION_CLIENT_ID": SecretValue.unsafe_plain_text(user_pool_client.user_pool_client_id), # Use the CDK attribute
|
888 |
+
"REDACTION_CLIENT_SECRET": user_pool_client.user_pool_client_secret # Use the CDK attribute
|
889 |
+
},
|
890 |
+
encryption_key=kms_key
|
891 |
+
)
|
892 |
+
else:
|
893 |
+
secret = secretsmanager.Secret(self, "CognitoSecret", # Logical ID
|
894 |
+
secret_name=secret_name, # Explicit resource name
|
895 |
+
secret_object_value={
|
896 |
+
"REDACTION_USER_POOL_ID": SecretValue.unsafe_plain_text(user_pool.user_pool_id), # Use the CDK attribute
|
897 |
+
"REDACTION_CLIENT_ID": SecretValue.unsafe_plain_text(user_pool_client.user_pool_client_id), # Use the CDK attribute
|
898 |
+
"REDACTION_CLIENT_SECRET": user_pool_client.user_pool_client_secret # Use the CDK attribute
|
899 |
+
}
|
900 |
+
)
|
901 |
+
|
902 |
+
print(f"Created new secret {secret_name}.")
|
903 |
|
904 |
except Exception as e:
|
905 |
raise Exception("Could not handle Secrets Manager secret due to:", e)
|
906 |
|
907 |
# --- Fargate Task Definition ---
|
908 |
try:
|
|
|
|
|
|
|
|
|
909 |
fargate_task_definition_name = FARGATE_TASK_DEFINITION_NAME
|
910 |
|
911 |
read_only_file_system = ECS_READ_ONLY_FILE_SYSTEM == 'True'
|
|
|
1001 |
|
1002 |
cdk_managed_log_group = logs.LogGroup(self, "MyTaskLogGroup", # CDK Logical ID
|
1003 |
log_group_name=log_group_name_from_config,
|
1004 |
+
retention=logs.RetentionDays.ONE_MONTH,
|
1005 |
+
removal_policy=RemovalPolicy.DESTROY
|
1006 |
)
|
1007 |
|
1008 |
epheremal_storage_volume_cdk_obj = ecs.Volume(
|
|
|
1021 |
cpu_architecture=ecs.CpuArchitecture.X86_64,
|
1022 |
operating_system_family=ecs.OperatingSystemFamily.LINUX
|
1023 |
),
|
|
|
1024 |
ephemeral_storage_gib=21, # Minimum is 21 GiB
|
|
|
|
|
1025 |
volumes=[epheremal_storage_volume_cdk_obj]
|
1026 |
)
|
1027 |
print("Fargate task definition defined.")
|
|
|
1185 |
print(f"ALB listener on port {listener_port} defined.")
|
1186 |
|
1187 |
|
1188 |
+
if ACM_SSL_CERTIFICATE_ARN:
|
1189 |
http_listener.add_action(
|
1190 |
"DefaultAction", # Logical ID for the default action
|
1191 |
action=elbv2.ListenerAction.redirect(protocol='HTTPS',
|
|
|
1227 |
print("Added targets and actions to ALB HTTP listener.")
|
1228 |
|
1229 |
# Now the same for HTTPS if you have an ACM certificate
|
1230 |
+
if ACM_SSL_CERTIFICATE_ARN:
|
1231 |
listener_port_https = 443
|
1232 |
# Check if Listener exists - from_listener_arn or lookup by port/ALB
|
1233 |
|
|
|
1235 |
self,
|
1236 |
"MyHttpsListener", # Logical ID for the HTTPS listener
|
1237 |
alb,
|
1238 |
+
ACM_SSL_CERTIFICATE_ARN=ACM_SSL_CERTIFICATE_ARN,
|
1239 |
default_target_group=target_group,
|
1240 |
enable_cognito_auth=True,
|
1241 |
cognito_user_pool=user_pool,
|
cdk/check_resources.py
CHANGED
@@ -49,150 +49,151 @@ def check_and_set_context():
|
|
49 |
context_data = {}
|
50 |
|
51 |
# --- Find the VPC ID first ---
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
|
106 |
-
all_proposed_subnets_data.append({
|
107 |
-
'name': name,
|
108 |
-
'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
|
109 |
-
'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i]
|
110 |
-
})
|
111 |
-
if private_ready_for_full_validation:
|
112 |
-
for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
|
113 |
-
all_proposed_subnets_data.append({
|
114 |
-
'name': name,
|
115 |
-
'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
|
116 |
-
'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i]
|
117 |
-
})
|
118 |
-
|
119 |
-
|
120 |
-
print(f"Target VPC ID for Boto3 lookup: {vpc_id}")
|
121 |
-
|
122 |
-
# Fetch all existing subnets in the target VPC once to avoid repeated API calls
|
123 |
-
try:
|
124 |
-
existing_aws_subnets = _get_existing_subnets_in_vpc(vpc_id)
|
125 |
-
except Exception as e:
|
126 |
-
print(f"Failed to fetch existing VPC subnets. Aborting. Error: {e}")
|
127 |
-
raise SystemExit(1) # Exit immediately if we can't get baseline data
|
128 |
-
|
129 |
-
print("\n--- Running Name-Only Subnet Existence Check Mode ---")
|
130 |
-
# Fallback: check only by name using the existing data
|
131 |
-
checked_public_subnets = {}
|
132 |
-
if PUBLIC_SUBNETS_TO_USE:
|
133 |
-
for subnet_name in PUBLIC_SUBNETS_TO_USE:
|
134 |
-
print("subnet_name:", subnet_name)
|
135 |
-
exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
|
136 |
-
checked_public_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
|
137 |
-
|
138 |
-
# If the subnet exists, remove it from the proposed subnets list
|
139 |
-
if checked_public_subnets[subnet_name]["exists"] == True:
|
140 |
-
all_proposed_subnets_data = [
|
141 |
-
subnet for subnet in all_proposed_subnets_data
|
142 |
-
if subnet['name'] != subnet_name
|
143 |
-
]
|
144 |
-
|
145 |
-
context_data["checked_public_subnets"] = checked_public_subnets
|
146 |
-
|
147 |
-
checked_private_subnets = {}
|
148 |
-
if PRIVATE_SUBNETS_TO_USE:
|
149 |
-
for subnet_name in PRIVATE_SUBNETS_TO_USE:
|
150 |
-
print("subnet_name:", subnet_name)
|
151 |
-
exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
|
152 |
-
checked_private_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
|
153 |
-
|
154 |
-
# If the subnet exists, remove it from the proposed subnets list
|
155 |
-
if checked_private_subnets[subnet_name]["exists"] == True:
|
156 |
-
all_proposed_subnets_data = [
|
157 |
-
subnet for subnet in all_proposed_subnets_data
|
158 |
-
if subnet['name'] != subnet_name
|
159 |
-
]
|
160 |
-
|
161 |
-
context_data["checked_private_subnets"] = checked_private_subnets
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
print("\nName-only existence subnet check complete.\n")
|
166 |
-
|
167 |
-
if full_validation_mode:
|
168 |
-
print("\n--- Running in Full Subnet Validation Mode (CIDR/AZs provided) ---")
|
169 |
-
try:
|
170 |
-
validate_subnet_creation_parameters(vpc_id, all_proposed_subnets_data, existing_aws_subnets)
|
171 |
-
print("\nPre-synth validation successful. Proceeding with CDK synth.\n")
|
172 |
-
|
173 |
-
# Populate context_data for downstream CDK construct creation
|
174 |
-
context_data["public_subnets_to_create"] = []
|
175 |
if public_ready_for_full_validation:
|
176 |
for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
|
177 |
-
|
178 |
'name': name,
|
179 |
'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
|
180 |
-
'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i]
|
181 |
-
'is_public': True
|
182 |
})
|
183 |
-
context_data["private_subnets_to_create"] = []
|
184 |
if private_ready_for_full_validation:
|
185 |
for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
|
186 |
-
|
187 |
'name': name,
|
188 |
'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
|
189 |
-
'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i]
|
190 |
-
'is_public': False
|
191 |
})
|
192 |
|
193 |
-
|
194 |
-
|
195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
# Example checks and setting context values
|
198 |
# IAM Roles
|
|
|
49 |
context_data = {}
|
50 |
|
51 |
# --- Find the VPC ID first ---
|
52 |
+
if VPC_NAME:
|
53 |
+
print("VPC_NAME:", VPC_NAME)
|
54 |
+
vpc_id, nat_gateways = get_vpc_id_by_name(VPC_NAME)
|
55 |
+
|
56 |
+
# If you expect only one, or one per AZ and you're creating one per AZ in CDK:
|
57 |
+
if nat_gateways:
|
58 |
+
# For simplicity, let's just check if *any* NAT exists in the VPC
|
59 |
+
# A more robust check would match by subnet, AZ, or a specific tag.
|
60 |
+
context_data["exists:NatGateway"] = True
|
61 |
+
context_data["id:NatGateway"] = nat_gateways[0]['NatGatewayId'] # Store the ID of the first one found
|
62 |
+
else:
|
63 |
+
context_data["exists:NatGateway"] = False
|
64 |
+
context_data["id:NatGateway"] = None
|
65 |
+
|
66 |
+
if not vpc_id:
|
67 |
+
# If the VPC doesn't exist, you might not be able to check/create subnets.
|
68 |
+
# Decide how to handle this: raise an error, set a flag, etc.
|
69 |
+
raise RuntimeError(f"Required VPC '{VPC_NAME}' not found. Cannot proceed with subnet checks.")
|
70 |
+
|
71 |
+
context_data["vpc_id"] = vpc_id # Store VPC ID in context
|
72 |
+
|
73 |
+
# SUBNET CHECKS
|
74 |
+
context_data: Dict[str, Any] = {}
|
75 |
+
all_proposed_subnets_data: List[Dict[str, str]] = []
|
76 |
+
|
77 |
+
# Flag to indicate if full validation mode (with CIDR/AZs) is active
|
78 |
+
full_validation_mode = False
|
79 |
+
|
80 |
+
# Determine if full validation mode is possible/desired
|
81 |
+
# It's 'desired' if CIDR/AZs are provided, and their lengths match the name lists.
|
82 |
+
public_ready_for_full_validation = (
|
83 |
+
len(PUBLIC_SUBNETS_TO_USE) > 0 and
|
84 |
+
len(PUBLIC_SUBNET_CIDR_BLOCKS) == len(PUBLIC_SUBNETS_TO_USE) and
|
85 |
+
len(PUBLIC_SUBNET_AVAILABILITY_ZONES) == len(PUBLIC_SUBNETS_TO_USE)
|
86 |
+
)
|
87 |
+
private_ready_for_full_validation = (
|
88 |
+
len(PRIVATE_SUBNETS_TO_USE) > 0 and
|
89 |
+
len(PRIVATE_SUBNET_CIDR_BLOCKS) == len(PRIVATE_SUBNETS_TO_USE) and
|
90 |
+
len(PRIVATE_SUBNET_AVAILABILITY_ZONES) == len(PRIVATE_SUBNETS_TO_USE)
|
91 |
+
)
|
92 |
+
|
93 |
+
# Activate full validation if *any* type of subnet (public or private) has its full details provided.
|
94 |
+
# You might adjust this logic if you require ALL subnet types to have CIDRs, or NONE.
|
95 |
+
if public_ready_for_full_validation or private_ready_for_full_validation:
|
96 |
+
full_validation_mode = True
|
97 |
+
|
98 |
+
# If some are ready but others aren't, print a warning or raise an error based on your strictness
|
99 |
+
if public_ready_for_full_validation and not private_ready_for_full_validation and PRIVATE_SUBNETS_TO_USE:
|
100 |
+
print("Warning: Public subnets have CIDRs/AZs, but private subnets do not. Only public will be fully validated/created with CIDRs.")
|
101 |
+
if private_ready_for_full_validation and not public_ready_for_full_validation and PUBLIC_SUBNETS_TO_USE:
|
102 |
+
print("Warning: Private subnets have CIDRs/AZs, but public subnets do not. Only private will be fully validated/created with CIDRs.")
|
103 |
+
|
104 |
+
# Prepare data for validate_subnet_creation_parameters for all subnets that have full details
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
if public_ready_for_full_validation:
|
106 |
for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
|
107 |
+
all_proposed_subnets_data.append({
|
108 |
'name': name,
|
109 |
'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
|
110 |
+
'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i]
|
|
|
111 |
})
|
|
|
112 |
if private_ready_for_full_validation:
|
113 |
for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
|
114 |
+
all_proposed_subnets_data.append({
|
115 |
'name': name,
|
116 |
'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
|
117 |
+
'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i]
|
|
|
118 |
})
|
119 |
|
120 |
+
|
121 |
+
print(f"Target VPC ID for Boto3 lookup: {vpc_id}")
|
122 |
+
|
123 |
+
# Fetch all existing subnets in the target VPC once to avoid repeated API calls
|
124 |
+
try:
|
125 |
+
existing_aws_subnets = _get_existing_subnets_in_vpc(vpc_id)
|
126 |
+
except Exception as e:
|
127 |
+
print(f"Failed to fetch existing VPC subnets. Aborting. Error: {e}")
|
128 |
+
raise SystemExit(1) # Exit immediately if we can't get baseline data
|
129 |
+
|
130 |
+
print("\n--- Running Name-Only Subnet Existence Check Mode ---")
|
131 |
+
# Fallback: check only by name using the existing data
|
132 |
+
checked_public_subnets = {}
|
133 |
+
if PUBLIC_SUBNETS_TO_USE:
|
134 |
+
for subnet_name in PUBLIC_SUBNETS_TO_USE:
|
135 |
+
print("subnet_name:", subnet_name)
|
136 |
+
exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
|
137 |
+
checked_public_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
|
138 |
+
|
139 |
+
# If the subnet exists, remove it from the proposed subnets list
|
140 |
+
if checked_public_subnets[subnet_name]["exists"] == True:
|
141 |
+
all_proposed_subnets_data = [
|
142 |
+
subnet for subnet in all_proposed_subnets_data
|
143 |
+
if subnet['name'] != subnet_name
|
144 |
+
]
|
145 |
+
|
146 |
+
context_data["checked_public_subnets"] = checked_public_subnets
|
147 |
+
|
148 |
+
checked_private_subnets = {}
|
149 |
+
if PRIVATE_SUBNETS_TO_USE:
|
150 |
+
for subnet_name in PRIVATE_SUBNETS_TO_USE:
|
151 |
+
print("subnet_name:", subnet_name)
|
152 |
+
exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
|
153 |
+
checked_private_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
|
154 |
+
|
155 |
+
# If the subnet exists, remove it from the proposed subnets list
|
156 |
+
if checked_private_subnets[subnet_name]["exists"] == True:
|
157 |
+
all_proposed_subnets_data = [
|
158 |
+
subnet for subnet in all_proposed_subnets_data
|
159 |
+
if subnet['name'] != subnet_name
|
160 |
+
]
|
161 |
+
|
162 |
+
context_data["checked_private_subnets"] = checked_private_subnets
|
163 |
+
|
164 |
+
|
165 |
+
|
166 |
+
print("\nName-only existence subnet check complete.\n")
|
167 |
+
|
168 |
+
if full_validation_mode:
|
169 |
+
print("\n--- Running in Full Subnet Validation Mode (CIDR/AZs provided) ---")
|
170 |
+
try:
|
171 |
+
validate_subnet_creation_parameters(vpc_id, all_proposed_subnets_data, existing_aws_subnets)
|
172 |
+
print("\nPre-synth validation successful. Proceeding with CDK synth.\n")
|
173 |
+
|
174 |
+
# Populate context_data for downstream CDK construct creation
|
175 |
+
context_data["public_subnets_to_create"] = []
|
176 |
+
if public_ready_for_full_validation:
|
177 |
+
for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
|
178 |
+
context_data["public_subnets_to_create"].append({
|
179 |
+
'name': name,
|
180 |
+
'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
|
181 |
+
'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i],
|
182 |
+
'is_public': True
|
183 |
+
})
|
184 |
+
context_data["private_subnets_to_create"] = []
|
185 |
+
if private_ready_for_full_validation:
|
186 |
+
for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
|
187 |
+
context_data["private_subnets_to_create"].append({
|
188 |
+
'name': name,
|
189 |
+
'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
|
190 |
+
'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i],
|
191 |
+
'is_public': False
|
192 |
+
})
|
193 |
+
|
194 |
+
except (ValueError, Exception) as e:
|
195 |
+
print(f"\nFATAL ERROR: Subnet parameter validation failed: {e}\n")
|
196 |
+
raise SystemExit(1) # Exit if validation fails
|
197 |
|
198 |
# Example checks and setting context values
|
199 |
# IAM Roles
|
cdk/post_cdk_build_quickstart.py
CHANGED
@@ -13,10 +13,10 @@ start_codebuild_build(PROJECT_NAME=CODEBUILD_PROJECT_NAME)
|
|
13 |
# Upload config.env file to S3 bucket
|
14 |
upload_file_to_s3(local_file_paths="config/config.env", s3_key="", s3_bucket=S3_LOG_CONFIG_BUCKET_NAME)
|
15 |
|
16 |
-
total_seconds =
|
17 |
update_interval = 1 # Update every second
|
18 |
|
19 |
-
print("Waiting
|
20 |
|
21 |
# tqdm iterates over a range, and you perform a small sleep in each iteration
|
22 |
for i in tqdm(range(total_seconds), desc="Building container"):
|
|
|
13 |
# Upload config.env file to S3 bucket
|
14 |
upload_file_to_s3(local_file_paths="config/config.env", s3_key="", s3_bucket=S3_LOG_CONFIG_BUCKET_NAME)
|
15 |
|
16 |
+
total_seconds = 450 # 7.5 minutes
|
17 |
update_interval = 1 # Update every second
|
18 |
|
19 |
+
print("Waiting 7.5 minutes for the CodeBuild container to build.")
|
20 |
|
21 |
# tqdm iterates over a range, and you perform a small sleep in each iteration
|
22 |
for i in tqdm(range(total_seconds), desc="Building container"):
|
cdk/requirements.txt
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
aws-cdk-lib==2.
|
2 |
-
boto3==1.38.
|
3 |
-
pandas==2.
|
4 |
nodejs==0.1.1
|
5 |
python-dotenv==1.0.1
|
|
|
1 |
+
aws-cdk-lib==2.202.0
|
2 |
+
boto3==1.38.41
|
3 |
+
pandas==2.3.0
|
4 |
nodejs==0.1.1
|
5 |
python-dotenv==1.0.1
|
index.qmd
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
title: "Home"
|
3 |
---
|
4 |
|
5 |
-
version: 0.7.
|
6 |
|
7 |
Welcome to the Document Redaction App documentation. This site provides comprehensive documentation for the Document Redaction App.
|
8 |
|
|
|
2 |
title: "Home"
|
3 |
---
|
4 |
|
5 |
+
version: 0.7.1
|
6 |
|
7 |
Welcome to the Document Redaction App documentation. This site provides comprehensive documentation for the Document Redaction App.
|
8 |
|
pyproject.toml
CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
4 |
|
5 |
[project]
|
6 |
name = "doc_redaction"
|
7 |
-
version = "0.7.
|
8 |
description = "Redact PDF/image-based documents, or CSV/XLSX files using a Gradio-based GUI interface"
|
9 |
readme = "README.md"
|
10 |
requires-python = ">=3.10"
|
@@ -18,13 +18,13 @@ dependencies = [
|
|
18 |
"presidio_anonymizer==2.2.358",
|
19 |
"presidio-image-redactor==0.0.56",
|
20 |
"pikepdf==9.5.2",
|
21 |
-
"pandas==2.
|
22 |
"scikit-learn==1.6.1",
|
23 |
-
"spacy==3.8.
|
24 |
# Direct URL dependency for spacy model
|
25 |
"en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz",
|
26 |
-
"gradio==5.34.
|
27 |
-
"boto3==1.38.
|
28 |
"pyarrow==19.0.1",
|
29 |
"openpyxl==3.1.5",
|
30 |
"Faker==36.1.1",
|
|
|
4 |
|
5 |
[project]
|
6 |
name = "doc_redaction"
|
7 |
+
version = "0.7.1"
|
8 |
description = "Redact PDF/image-based documents, or CSV/XLSX files using a Gradio-based GUI interface"
|
9 |
readme = "README.md"
|
10 |
requires-python = ">=3.10"
|
|
|
18 |
"presidio_anonymizer==2.2.358",
|
19 |
"presidio-image-redactor==0.0.56",
|
20 |
"pikepdf==9.5.2",
|
21 |
+
"pandas==2.3.0",
|
22 |
"scikit-learn==1.6.1",
|
23 |
+
"spacy==3.8.7",
|
24 |
# Direct URL dependency for spacy model
|
25 |
"en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz",
|
26 |
+
"gradio==5.34.2",
|
27 |
+
"boto3==1.38.46",
|
28 |
"pyarrow==19.0.1",
|
29 |
"openpyxl==3.1.5",
|
30 |
"Faker==36.1.1",
|
requirements.txt
CHANGED
@@ -6,12 +6,12 @@ presidio_analyzer==2.2.358
|
|
6 |
presidio_anonymizer==2.2.358
|
7 |
presidio-image-redactor==0.0.56
|
8 |
pikepdf==9.5.2
|
9 |
-
pandas==2.
|
10 |
scikit-learn==1.6.1
|
11 |
-
spacy==3.8.
|
12 |
en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz
|
13 |
-
gradio==5.34.
|
14 |
-
boto3==1.38.
|
15 |
pyarrow==19.0.1
|
16 |
openpyxl==3.1.5
|
17 |
Faker==36.1.1
|
|
|
6 |
presidio_anonymizer==2.2.358
|
7 |
presidio-image-redactor==0.0.56
|
8 |
pikepdf==9.5.2
|
9 |
+
pandas==2.3.0
|
10 |
scikit-learn==1.6.1
|
11 |
+
spacy==3.8.7
|
12 |
en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz
|
13 |
+
gradio==5.34.2
|
14 |
+
boto3==1.38.46
|
15 |
pyarrow==19.0.1
|
16 |
openpyxl==3.1.5
|
17 |
Faker==36.1.1
|
src/installation_guide.qmd
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title: "App installation guide (with CDK)"
|
3 |
format:
|
4 |
html:
|
5 |
toc: true # Enable the table of contents
|
@@ -7,17 +7,17 @@ format:
|
|
7 |
toc-title: "On this page" # Optional: Title for your TOC
|
8 |
---
|
9 |
|
10 |
-
#
|
11 |
|
12 |
-
This guide gives an overview of how to install the app in an AWS environment using the code in the cdk/ folder of this Github repo. The most important thing you need is some familiarity with AWS and how to use it via console or command line, as well as administrator access to at least one region. Then follow the below steps.
|
13 |
|
14 |
## Prerequisites
|
15 |
|
|
|
16 |
* Install git on your computer from: [https://git-scm.com](https://git-scm.com)
|
17 |
-
*
|
18 |
-
*
|
19 |
-
*
|
20 |
-
* You will need to bootstrap the environment with CDK in both your primary region, and `us-east-1` if installing CloudFront and associated WAF.
|
21 |
```bash
|
22 |
# Bootstrap your primary region
|
23 |
cdk bootstrap aws://<YOUR_AWS_ACCOUNT>/eu-west-1
|
@@ -30,27 +30,23 @@ This guide gives an overview of how to install the app in an AWS environment usi
|
|
30 |
git clone https://github.com/seanpedrick-case/doc_redaction.git
|
31 |
```
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
36 |
|
37 |
-
|
38 |
-
|
39 |
-
You can either use the SSL certificate from a domain, or import an existing certificate into Certificate Manager. Ask your IT admin if you need help with this.
|
40 |
-
|
41 |
-
## If getting an SSL certificate for an existing domain
|
42 |
-
|
43 |
-
Make sure to point the certificate to `*.<domain-name>`.
|
44 |
|
45 |
Update your DNS records to include the CNAME record given by AWS. After your stack has been created, you will also need to create a CNAME DNS record for your domain pointing to your load balancer DNS with a subdomain, e.g., `redaction.<domain-name>`.
|
46 |
|
47 |
-
|
|
|
|
|
48 |
|
49 |
-
|
50 |
|
51 |
```json
|
52 |
{
|
53 |
-
"app": "<PATH TO PYTHON ENVIRONMENT FOLDER WHERE REQUIREMENTS HAVE BEEN LOADED
|
54 |
"context": {
|
55 |
"@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
|
56 |
"@aws-cdk/core:stackRelativeExports": true,
|
@@ -71,99 +67,129 @@ Update your DNS records to include the CNAME record given by AWS. After your sta
|
|
71 |
}
|
72 |
```
|
73 |
|
74 |
-
2. Create a `cdk_config.env` file in the `config` subfolder.
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
```ini
|
77 |
-
CDK_PREFIX=example-prefix #
|
78 |
-
|
79 |
-
AWS_REGION
|
80 |
AWS_ACCOUNT_ID=1234567890 # AWS account ID that has administrator access that you will use for deploying the stack
|
81 |
CDK_FOLDER=C:/path_to_cdk_folder/ # The place where the cdk folder code is located
|
82 |
CONTEXT_FILE=C:/path_to_cdk_folder/cdk.context.json
|
83 |
-
|
84 |
-
|
85 |
-
|
|
|
86 |
RUN_USEAST_STACK=False # Set this to True only if you have permissions to create a Cloudfront distribution and web ACL on top of it in the us-east-1 region. If you don't, the section below shows how you can create the CloudFront resource manually and map it to your application load balancer (as you should have permissions for that if you are admin in your region).
|
|
|
|
|
|
|
|
|
|
|
87 |
```
|
88 |
|
89 |
-
**Note: If you are using an SSL certificate with Cognito login on the application load balancer, you can set COGNITO_AUTH to 0 above, as you don't need the second login step to get to the app**
|
90 |
|
91 |
-
|
92 |
|
93 |
-
|
94 |
|
95 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
* If you want to use existing subnets, you can list them in the following environment variables:
|
98 |
-
|
99 |
-
|
|
|
|
|
100 |
|
101 |
* If you want to create new subnets, you need to also specify CIDR blocks and availability zones for the new subnets. The app will check with you upon deployment whether these CIDR blocks are available before trying to create.
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
|
|
|
|
|
|
106 |
|
107 |
If you try to create subnets in invalid CIDR blocks / availability zones, the console output will tell you and it will show you the currently occupied CIDR blocks to help find a space for new subnets you want to create.
|
108 |
|
109 |
-
3.
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
-
|
112 |
|
113 |
-
|
114 |
|
115 |
-
|
116 |
|
117 |
-
|
118 |
|
119 |
-
|
120 |
|
121 |
-
|
122 |
|
123 |
-
|
124 |
|
125 |
-
|
126 |
|
127 |
-
|
128 |
|
129 |
-
|
130 |
|
131 |
```ini
|
132 |
-
COGNITO_AUTH=1
|
133 |
-
RUN_AWS_FUNCTIONS=1
|
134 |
-
SESSION_OUTPUT_FOLDER=True #
|
135 |
```
|
136 |
|
137 |
-
|
138 |
|
139 |
-
|
140 |
|
141 |
Now that the app container is in Elastic Container Registry, you can proceed to run the app on a Fargate server.
|
142 |
-
|
143 |
-
Go to your new cluster, your new service, and select 'Update service'.
|
144 |
|
145 |
Select 'Force new deployment', and then set 'Desired number of tasks' to 1.
|
146 |
|
147 |
-
|
148 |
-
|
149 |
-
# Update DNS records for your domain (If using a domain for the SSL certificate)
|
150 |
-
|
151 |
-
To do this, you need to create a CNAME DNS record for your domain pointing to your load balancer DNS from a subdomain of your main domain registration, e.g., `redaction.<domain-name>`.
|
152 |
|
153 |
-
|
154 |
|
155 |
-
|
156 |
|
157 |
-
|
158 |
|
159 |
-
|
160 |
|
161 |
-
|
|
|
|
|
|
|
162 |
|
163 |
-
|
164 |
-
On the Cognito user pool page you can also enable MFA, if you are using an SSL certificate with Cognito login on the Application Load Balancer. Go to Cognito -> your user pool -> Sign in -> Multi-factor authentication
|
165 |
|
166 |
-
|
167 |
**Note: this is only relevant if you set `RUN_USEAST_STACK` to 'False' during CDK deployment**
|
168 |
|
169 |
If you were not able to create a CloudFront distribution via CDK, you should be able to do it through console. I would advise using CloudFront as the front end to the app.
|
@@ -186,25 +212,19 @@ Create a new CloudFront distribution.
|
|
186 |
* **For Behavior (modify default behavior):**
|
187 |
* Under Viewer protocol policy choose 'HTTP and HTTPS'.
|
188 |
|
189 |
-
|
190 |
|
191 |
-
|
192 |
|
193 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
194 |
|
195 |
-
|
196 |
-
|
197 |
-
In CloudFront geographic restrictions -> Countries -> choose an Allow list of countries.
|
198 |
-
|
199 |
-
Click again on Edit.
|
200 |
-
|
201 |
-
AWS WAF protection enabled you should see a link titled 'View details of your configuration'.
|
202 |
-
|
203 |
-
Go to Rules -> `AWS-AWSManagedRulesCommonRuleSet`, click Edit.
|
204 |
-
|
205 |
-
Under `SizeRestrictions_BODY` choose rule action override 'Override to Allow'. This is needed to allow for file upload to the app.
|
206 |
-
|
207 |
-
# Change Cognito redirection URL to your CloudFront distribution
|
208 |
|
209 |
Go to Cognito -> your user pool -> App Clients -> Login pages -> Managed login configuration.
|
210 |
|
@@ -215,7 +235,7 @@ Ensure that the callback URL is:
|
|
215 |
* `https://<CloudFront domain name>/oauth2/idpresponse`
|
216 |
* `https://<CloudFront domain name>/oauth/idpresponse`
|
217 |
|
218 |
-
|
219 |
|
220 |
Note that this only potentially helps with security if you are not using an SSL certificate with Cognito login on your application load balancer.
|
221 |
|
@@ -231,3 +251,5 @@ Go to EC2 - Load Balancers -> Your load balancer -> Listeners -> Your listener -
|
|
231 |
Then, change the default listener rule.
|
232 |
|
233 |
* Under Routing action change to 'Return fixed response'.
|
|
|
|
|
|
1 |
---
|
2 |
+
title: "App installation guide (with CDK or locally on Windows)"
|
3 |
format:
|
4 |
html:
|
5 |
toc: true # Enable the table of contents
|
|
|
7 |
toc-title: "On this page" # Optional: Title for your TOC
|
8 |
---
|
9 |
|
10 |
+
# Installation with CDK
|
11 |
|
12 |
+
This guide gives an overview of how to install the app in an AWS environment using the code in the 'cdk/' folder of this Github repo. The most important thing you need is some familiarity with AWS and how to use it via console or command line, as well as administrator access to at least one region. Then follow the below steps.
|
13 |
|
14 |
## Prerequisites
|
15 |
|
16 |
+
* Ensure you have an AWS Administrator account in your desired region to be able to deploy all the resources mentioned in cdk_stack.py.
|
17 |
* Install git on your computer from: [https://git-scm.com](https://git-scm.com)
|
18 |
+
* Install nodejs and npm: [https://docs.npmjs.com/downloading-and-installing-node-js-and-npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm). If using Windows, it may be easiest to install from the .msi installer at the bottom of the page [here](https://nodejs.org/en/download/).
|
19 |
+
* Install AWS CDK v2: [https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
|
20 |
+
* Bootstrap the environment with CDK in both your primary region, and `us-east-1` if installing CloudFront and associated WAF.
|
|
|
21 |
```bash
|
22 |
# Bootstrap your primary region
|
23 |
cdk bootstrap aws://<YOUR_AWS_ACCOUNT>/eu-west-1
|
|
|
30 |
git clone https://github.com/seanpedrick-case/doc_redaction.git
|
31 |
```
|
32 |
|
33 |
+
## Note on ACM Certificates
|
34 |
|
35 |
+
To get full HTTPS data transfer through the app, you will need an SSL certificate registered with AWS Certificate Manager.
|
36 |
|
37 |
+
You can either use the SSL certificate from a domain, or import an existing certificate into Certificate Manager. If you're not sure, ask your IT admin if you need help with this. If getting an SSL certificate for an existing domain, make sure to point the certificate to `*.<domain-name>`.
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
Update your DNS records to include the CNAME record given by AWS. After your stack has been created, you will also need to create a CNAME DNS record for your domain pointing to your load balancer DNS with a subdomain, e.g., `redaction.<domain-name>`.
|
40 |
|
41 |
+
## Steps to install the app using CDK
|
42 |
+
|
43 |
+
### 1. Create a python environment, load in packages from `requirements.txt`.
|
44 |
|
45 |
+
You need a `cdk.json` in the `cdk` folder. It should contain the following:
|
46 |
|
47 |
```json
|
48 |
{
|
49 |
+
"app": "<PATH TO PYTHON ENVIRONMENT FOLDER WHERE REQUIREMENTS HAVE BEEN LOADED>/python.exe app.py",
|
50 |
"context": {
|
51 |
"@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
|
52 |
"@aws-cdk/core:stackRelativeExports": true,
|
|
|
67 |
}
|
68 |
```
|
69 |
|
70 |
+
### 2. Create a `cdk_config.env` file in the `config` subfolder.
|
71 |
+
|
72 |
+
Depending on which environment variables you put in this file, you can choose whether to install the app in a completely new VPC, or in an existing VPC. The following shows you example config files that you could use.
|
73 |
+
|
74 |
+
#### Deploying the app an a brand new VPC
|
75 |
+
|
76 |
+
Here as a minimum it would be useful to put the following details in the cdk_config.env file (below are all example values, other possible variables to use here can be seen in the `cdk` folder/`cdk_config.py`).
|
77 |
|
78 |
```ini
|
79 |
+
CDK_PREFIX=example-prefix # This prefix will be added to the name of most of the created elements in your stack
|
80 |
+
NEW_VPC_CIDR=10.0.0.0/24 # The CIDR range for your newly created VPC
|
81 |
+
AWS_REGION=<your-region> # Region where elements will be created
|
82 |
AWS_ACCOUNT_ID=1234567890 # AWS account ID that has administrator access that you will use for deploying the stack
|
83 |
CDK_FOLDER=C:/path_to_cdk_folder/ # The place where the cdk folder code is located
|
84 |
CONTEXT_FILE=C:/path_to_cdk_folder/cdk.context.json
|
85 |
+
|
86 |
+
COGNITO_USER_POOL_DOMAIN_PREFIX=redaction-12345 # The prefix of the login / user sign up domain that you want to use with Cognito login. Should not contain the terms amazon, aws, or cognito.
|
87 |
+
COGNITO_AUTH=1 # Do you want to do in-app authentication (username and password only, not necessary if you are using an SSL certificate as recommended below)
|
88 |
+
USE_CLOUDFRONT=True # Recommended. If you intend to use CloudFront as the front URL to your application load balancer (ALB). This has some extra security features that you won't get with just an ALB, e.g. limiting app access by country.
|
89 |
RUN_USEAST_STACK=False # Set this to True only if you have permissions to create a Cloudfront distribution and web ACL on top of it in the us-east-1 region. If you don't, the section below shows how you can create the CloudFront resource manually and map it to your application load balancer (as you should have permissions for that if you are admin in your region).
|
90 |
+
CLOUDFRONT_DOMAIN=<example>.cloudfront.net # If you already know the domain of the CloudFront distribution that you want to use, you can add this here.
|
91 |
+
# If you are using an SSL certificate with your ALB (highly recommended):
|
92 |
+
ACM_SSL_CERTIFICATE_ARN=<SSL Certificate ARN> # This is the ARN of the SSL certificate that you have installed in AWS Certificate Manager
|
93 |
+
SSL_CERTIFICATE_DOMAIN=redaction.example.com # This is the domain of the SSL certificate that you have installed in AWS Certificate Manager
|
94 |
+
|
95 |
```
|
96 |
|
97 |
+
**Note: If you are using an SSL certificate with Cognito login on the application load balancer (strongly advised), you can set COGNITO_AUTH to 0 above, as you don't need the second login step to get to the app**
|
98 |
|
99 |
+
#### In an existing VPC
|
100 |
|
101 |
+
From the above example, remove the variable 'NEW_VPC_CIDR' and replace with the below:
|
102 |
|
103 |
+
```ini
|
104 |
+
VPC_NAME=example-vpc-name # Name of the VPC within which all the other elements will be created
|
105 |
+
EXISTING_IGW_ID=igw-1234567890 # (optional) The ID for an existing internet gateway that you want to use instead of creating a new one
|
106 |
+
SINGLE_NAT_GATEWAY_ID=nat-123456789 # (optional) The ID for an existing NAT gateway that you want to use instead of creating a new one
|
107 |
+
```
|
108 |
+
##### Subnets
|
109 |
+
|
110 |
+
If you are using an existing VPC then you may want to deploy the app within existing subnets rather than creating new ones:
|
111 |
+
|
112 |
+
* If you define no subnets in environment variables, the app will try to use existing private and public subnets. Bear in mind the app may overlap with IP addresses assigned to existing AWS resources. It is advised to at least specify existing subnets that you know are available, or create your own using one of the below methods.
|
113 |
|
114 |
* If you want to use existing subnets, you can list them in the following environment variables:
|
115 |
+
```ini
|
116 |
+
PUBLIC_SUBNETS_TO_USE=["PublicSubnet1", "PublicSubnet2", "PublicSubnet3"]`
|
117 |
+
PRIVATE_SUBNETS_TO_USE=["PrivateSubnet1", "PrivateSubnet2", "PrivateSubnet3"]`
|
118 |
+
```
|
119 |
|
120 |
* If you want to create new subnets, you need to also specify CIDR blocks and availability zones for the new subnets. The app will check with you upon deployment whether these CIDR blocks are available before trying to create.
|
121 |
+
|
122 |
+
```ini
|
123 |
+
PUBLIC_SUBNET_CIDR_BLOCKS=['10.222.333.0/28', '10.222.333.16/28', '10.222.333.32/28']
|
124 |
+
PUBLIC_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']
|
125 |
+
PRIVATE_SUBNET_CIDR_BLOCKS=['10.222.333.48/28', '10.222.333.64/28', '10.222.333.80/28']
|
126 |
+
PRIVATE_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']
|
127 |
+
```
|
128 |
|
129 |
If you try to create subnets in invalid CIDR blocks / availability zones, the console output will tell you and it will show you the currently occupied CIDR blocks to help find a space for new subnets you want to create.
|
130 |
|
131 |
+
### 3. Deploy your AWS stack using cdk deploy --all
|
132 |
+
|
133 |
+
In command line in console, go to your `cdk` folder in the redaction app folder. Run `cdk deploy --all`. This should try to deploy the first stack in the `app.py` file.
|
134 |
+
|
135 |
+
Hopefully everything will deploy successfully and you will be able to see your new stack in CloudFormation in the AWS console.
|
136 |
+
|
137 |
+
### 4. Tasks for after CDK deployment
|
138 |
|
139 |
+
The CDK deployment will create all the AWS resources needed to run the redaction app. However, there are some objects in AWS
|
140 |
|
141 |
+
#### Run `post_cdk_build_quickstart.py`
|
142 |
|
143 |
+
The following tasks are done by the `post_cdk_build_quickstart.py` file that you can find in the `cdk` folder. You will need to run this when logged in with AWS SSO through command line. I will describe how to do this in AWS console just in case the `.py` file doesn't work for you.
|
144 |
|
145 |
+
##### Codebuild
|
146 |
|
147 |
+
You need to build CodeBuild project after stack has finished deploying your CDK stack, as there will be no container in ECR.
|
148 |
|
149 |
+
If you don't want to run the 'post_cdk_build_quickstart.py' file, in console, go to CodeBuild -> your project -> click Start build. Check the logs, the build should complete in about 6-7 minutes.
|
150 |
|
151 |
+
##### Create a `config.env` file and upload to S3
|
152 |
|
153 |
+
The 'post_cdk_build_quickstart' file will upload a config file to S3, as the Fargate task definition references a `config.env` file.
|
154 |
|
155 |
+
if you want to do this manually:
|
156 |
|
157 |
+
Create a `config.env` file to upload to the S3 bucket that has at least the following variables:
|
158 |
|
159 |
```ini
|
160 |
+
COGNITO_AUTH=1 # If you are using an SSL certificate with your application load balancer, you will be logging in there. Set this to 0 to turn off the default login screen.
|
161 |
+
RUN_AWS_FUNCTIONS=1 # This will enable the app to communicate with AWS services.
|
162 |
+
SESSION_OUTPUT_FOLDER=True # This will put outputs for each user in separate output folders.
|
163 |
```
|
164 |
|
165 |
+
* Then, go to S3 and choose the new `...-logs` bucket that you created. Upload the `config.env` file into this bucket.
|
166 |
|
167 |
+
##### Update Elastic Container Service
|
168 |
|
169 |
Now that the app container is in Elastic Container Registry, you can proceed to run the app on a Fargate server.
|
170 |
+
The 'post_cdk_build_quickstart.py' file will do this for you, but you can also try this in Console. In ECS, go to your new cluster, your new service, and select 'Update service'.
|
|
|
171 |
|
172 |
Select 'Force new deployment', and then set 'Desired number of tasks' to 1.
|
173 |
|
174 |
+
## Additional Manual Tasks
|
|
|
|
|
|
|
|
|
175 |
|
176 |
+
### Update DNS records for your domain (If using a domain for the SSL certificate)
|
177 |
|
178 |
+
If the SSL certificate you are using is associated with a domain, you will need to update the DNS records for your domain registered with the AWS SSL certificate. To do this, you need to create a CNAME DNS record for your domain pointing to your load balancer DNS from a subdomain of your main domain registration, e.g., `redaction.<domain-name>`.
|
179 |
|
180 |
+
### Create a user in Cognito
|
181 |
|
182 |
+
You will next need to a create a user in Cognito to be able to log into the app.
|
183 |
|
184 |
+
* Go to Cognito and create a user with your own email address. Generate a password.
|
185 |
+
* Go to Cognito -> App clients -> Login pages -> View login page.
|
186 |
+
* Enter the email and temporary password details that come in the email (don't include the last full stop!).
|
187 |
+
* Change your password on the screen that pops up. You should now be able to login to the app.
|
188 |
|
189 |
+
### Set Multi-Factor Authentication for Cognito logins(optional but recommended)
|
190 |
+
On the Cognito user pool page you can also enable MFA, if you are using an SSL certificate with Cognito login on the Application Load Balancer. Go to Cognito -> your user pool -> Sign in -> Multi-factor authentication.
|
191 |
|
192 |
+
### Create CloudFront distribution
|
193 |
**Note: this is only relevant if you set `RUN_USEAST_STACK` to 'False' during CDK deployment**
|
194 |
|
195 |
If you were not able to create a CloudFront distribution via CDK, you should be able to do it through console. I would advise using CloudFront as the front end to the app.
|
|
|
212 |
* **For Behavior (modify default behavior):**
|
213 |
* Under Viewer protocol policy choose 'HTTP and HTTPS'.
|
214 |
|
215 |
+
#### Security features
|
216 |
|
217 |
+
You can add security features to your CloudFront distribution (recommended). If you use WAF, you will also need to change the default settings to allow for file upload to the app.
|
218 |
|
219 |
+
* In your CloudFront distribution, under 'Security' -> Edit -> Enable security protections.
|
220 |
+
* Choose rate limiting (default is fine). Then click Create.
|
221 |
+
* In CloudFront geographic restrictions -> Countries -> choose an Allow list of countries.
|
222 |
+
* Click again on Edit.
|
223 |
+
* In AWS WAF protection enabled you should see a link titled 'View details of your configuration'.
|
224 |
+
* Go to Rules -> `AWS-AWSManagedRulesCommonRuleSet`, click Edit.
|
225 |
+
* Under `SizeRestrictions_BODY` choose rule action override 'Override to Allow'. This is needed to allow for file upload to the app.
|
226 |
|
227 |
+
### Change Cognito redirection URL to your CloudFront distribution
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
228 |
|
229 |
Go to Cognito -> your user pool -> App Clients -> Login pages -> Managed login configuration.
|
230 |
|
|
|
235 |
* `https://<CloudFront domain name>/oauth2/idpresponse`
|
236 |
* `https://<CloudFront domain name>/oauth/idpresponse`
|
237 |
|
238 |
+
### Force traffic to come from specific CloudFront distribution (optional)
|
239 |
|
240 |
Note that this only potentially helps with security if you are not using an SSL certificate with Cognito login on your application load balancer.
|
241 |
|
|
|
251 |
Then, change the default listener rule.
|
252 |
|
253 |
* Under Routing action change to 'Return fixed response'.
|
254 |
+
|
255 |
+
You should now have successfully installed the document redaction app in an AWS environment using CDK.
|