seanpedrickcase commited on
Commit
9f51e70
·
1 Parent(s): 10da194

Updated CDK code for custom KMS keys, new VPCs. Minor package updates.

Browse files
.dockerignore CHANGED
@@ -13,6 +13,7 @@ tesseract/*
13
  poppler/*
14
  build/*
15
  dist/*
 
16
  build_deps/*
17
  user_guide/*
18
  cdk/config/*
 
13
  poppler/*
14
  build/*
15
  dist/*
16
+ docs/*
17
  build_deps/*
18
  user_guide/*
19
  cdk/config/*
.github/workflows/check_file_size.yml DELETED
@@ -1,16 +0,0 @@
1
- name: Check file size
2
- on: # or directly `on: [push]` to run the action on every push on any branch
3
- pull_request:
4
- branches: [main]
5
-
6
- # to run this workflow manually from the Actions tab
7
- workflow_dispatch:
8
-
9
- jobs:
10
- sync-to-hub:
11
- runs-on: ubuntu-latest
12
- steps:
13
- - name: Check large files
14
- uses: ActionsDesk/[email protected]
15
- with:
16
- filesizelimit: 10485760 # this is 10MB so we can sync to HF Spaces
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore CHANGED
@@ -25,6 +25,7 @@ cdk/cdk.out/*
25
  cdk/archive/*
26
  tld/*
27
  tmp/*
 
28
  cdk.out/*
29
  cdk.json
30
  cdk.context.json
 
25
  cdk/archive/*
26
  tld/*
27
  tmp/*
28
+ docs/*
29
  cdk.out/*
30
  cdk.json
31
  cdk.context.json
README.md CHANGED
@@ -10,7 +10,7 @@ license: agpl-3.0
10
  ---
11
  # Document redaction
12
 
13
- version: 0.7.0
14
 
15
  Redact personally identifiable information (PII) from documents (pdf, images), open text, or tabular data (xlsx/csv/parquet). Please see the [User Guide](#user-guide) for a walkthrough on how to use the app. Below is a very brief overview.
16
 
 
10
  ---
11
  # Document redaction
12
 
13
+ version: 0.7.1
14
 
15
  Redact personally identifiable information (PII) from documents (pdf, images), open text, or tabular data (xlsx/csv/parquet). Please see the [User Guide](#user-guide) for a walkthrough on how to use the app. Below is a very brief overview.
16
 
cdk/cdk_config.py CHANGED
@@ -82,8 +82,12 @@ CONTEXT_FILE = get_or_create_env_var('CONTEXT_FILE', 'cdk.context.json') # Defin
82
  CDK_FOLDER = get_or_create_env_var('CDK_FOLDER', '') # FULL_PATH_TO_CDK_FOLDER_HERE (with forward slash)
83
  RUN_USEAST_STACK = get_or_create_env_var('RUN_USEAST_STACK', 'False')
84
 
85
- ### VPC
86
  VPC_NAME = get_or_create_env_var('VPC_NAME', '')
 
 
 
 
87
  EXISTING_IGW_ID = get_or_create_env_var('EXISTING_IGW_ID', '')
88
  SINGLE_NAT_GATEWAY_ID = get_or_create_env_var('SINGLE_NAT_GATEWAY_ID', '')
89
 
@@ -122,6 +126,10 @@ ECR_CDK_REPO_NAME = get_or_create_env_var('ECR_CDK_REPO_NAME', f"{CDK_PREFIX}{EC
122
  S3_LOG_CONFIG_BUCKET_NAME = get_or_create_env_var('S3_LOG_CONFIG_BUCKET_NAME', f"{CDK_PREFIX}s3-logs".lower()) # S3 bucket names need to be lower case
123
  S3_OUTPUT_BUCKET_NAME = get_or_create_env_var('S3_OUTPUT_BUCKET_NAME', f"{CDK_PREFIX}s3-output".lower())
124
 
 
 
 
 
125
  ### ECS
126
  FARGATE_TASK_DEFINITION_NAME = get_or_create_env_var('FARGATE_TASK_DEFINITION_NAME', f"{CDK_PREFIX}FargateTaskDefinition")
127
  TASK_DEFINITION_FILE_LOCATION = get_or_create_env_var('TASK_DEFINITION_FILE_LOCATION', CDK_FOLDER + CONFIG_FOLDER + "task_definition.json")
@@ -160,7 +168,7 @@ CLOUDFRONT_DOMAIN = get_or_create_env_var('CLOUDFRONT_DOMAIN', "cloudfront_place
160
 
161
 
162
  # Certificate for Application load balancer (optional, for HTTPS and logins through the ALB)
163
- ACM_CERTIFICATE_ARN = get_or_create_env_var('ACM_CERTIFICATE_ARN', '')
164
  SSL_CERTIFICATE_DOMAIN = get_or_create_env_var('SSL_CERTIFICATE_DOMAIN', '') # e.g. example.com or www.example.com
165
 
166
  # This should be the CloudFront domain, the domain linked to your ACM certificate, or the DNS of your application load balancer in console afterwards
 
82
  CDK_FOLDER = get_or_create_env_var('CDK_FOLDER', '') # FULL_PATH_TO_CDK_FOLDER_HERE (with forward slash)
83
  RUN_USEAST_STACK = get_or_create_env_var('RUN_USEAST_STACK', 'False')
84
 
85
+ ### VPC and connections
86
  VPC_NAME = get_or_create_env_var('VPC_NAME', '')
87
+ NEW_VPC_DEFAULT_NAME = get_or_create_env_var('NEW_VPC_DEFAULT_NAME', f'{CDK_PREFIX}vpc')
88
+ NEW_VPC_CIDR = get_or_create_env_var('NEW_VPC_CIDR', '') # "10.0.0.0/24"
89
+
90
+
91
  EXISTING_IGW_ID = get_or_create_env_var('EXISTING_IGW_ID', '')
92
  SINGLE_NAT_GATEWAY_ID = get_or_create_env_var('SINGLE_NAT_GATEWAY_ID', '')
93
 
 
126
  S3_LOG_CONFIG_BUCKET_NAME = get_or_create_env_var('S3_LOG_CONFIG_BUCKET_NAME', f"{CDK_PREFIX}s3-logs".lower()) # S3 bucket names need to be lower case
127
  S3_OUTPUT_BUCKET_NAME = get_or_create_env_var('S3_OUTPUT_BUCKET_NAME', f"{CDK_PREFIX}s3-output".lower())
128
 
129
+ ### KMS KEYS FOR S3 AND SECRETS MANAGER
130
+ USE_CUSTOM_KMS_KEY = get_or_create_env_var('USE_CUSTOM_KMS_KEY', '1')
131
+ CUSTOM_KMS_KEY_NAME = get_or_create_env_var('CUSTOM_KMS_KEY_NAME', f"alias/{CDK_PREFIX}kms-key".lower())
132
+
133
  ### ECS
134
  FARGATE_TASK_DEFINITION_NAME = get_or_create_env_var('FARGATE_TASK_DEFINITION_NAME', f"{CDK_PREFIX}FargateTaskDefinition")
135
  TASK_DEFINITION_FILE_LOCATION = get_or_create_env_var('TASK_DEFINITION_FILE_LOCATION', CDK_FOLDER + CONFIG_FOLDER + "task_definition.json")
 
168
 
169
 
170
  # Certificate for Application load balancer (optional, for HTTPS and logins through the ALB)
171
+ ACM_SSL_CERTIFICATE_ARN = get_or_create_env_var('ACM_SSL_CERTIFICATE_ARN', '')
172
  SSL_CERTIFICATE_DOMAIN = get_or_create_env_var('SSL_CERTIFICATE_DOMAIN', '') # e.g. example.com or www.example.com
173
 
174
  # This should be the CloudFront domain, the domain linked to your ACM certificate, or the DNS of your application load balancer in console afterwards
cdk/cdk_stack.py CHANGED
@@ -21,11 +21,12 @@ from aws_cdk import (
21
  aws_elasticloadbalancingv2 as elbv2,
22
  aws_logs as logs,
23
  aws_wafv2 as wafv2,
 
24
  aws_dynamodb as dynamodb # Import the DynamoDB module
25
  )
26
 
27
  from constructs import Construct
28
- from cdk_config import CDK_PREFIX, VPC_NAME, AWS_MANAGED_TASK_ROLES_LIST, GITHUB_REPO_USERNAME, GITHUB_REPO_NAME, GITHUB_REPO_BRANCH, ECS_TASK_MEMORY_SIZE, ECS_TASK_CPU_SIZE, CUSTOM_HEADER, CUSTOM_HEADER_VALUE, AWS_REGION, CLOUDFRONT_GEO_RESTRICTION, DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS, GRADIO_SERVER_PORT, PUBLIC_SUBNETS_TO_USE, PUBLIC_SUBNET_CIDR_BLOCKS, PUBLIC_SUBNET_AVAILABILITY_ZONES, PRIVATE_SUBNETS_TO_USE, PRIVATE_SUBNET_CIDR_BLOCKS, PRIVATE_SUBNET_AVAILABILITY_ZONES, CODEBUILD_PROJECT_NAME, ECS_SECURITY_GROUP_NAME, ALB_NAME_SECURITY_GROUP_NAME, ALB_NAME, COGNITO_USER_POOL_NAME, COGNITO_USER_POOL_CLIENT_NAME, COGNITO_USER_POOL_CLIENT_SECRET_NAME, FARGATE_TASK_DEFINITION_NAME, ECS_SERVICE_NAME, WEB_ACL_NAME, CLOUDFRONT_DISTRIBUTION_NAME, ECS_TASK_ROLE_NAME, ALB_TARGET_GROUP_NAME, S3_LOG_CONFIG_BUCKET_NAME, S3_OUTPUT_BUCKET_NAME, ACM_CERTIFICATE_ARN, CLUSTER_NAME, CODEBUILD_ROLE_NAME, ECS_TASK_EXECUTION_ROLE_NAME, ECR_CDK_REPO_NAME, ECS_LOG_GROUP_NAME, SAVE_LOGS_TO_DYNAMODB, ACCESS_LOG_DYNAMODB_TABLE_NAME, FEEDBACK_LOG_DYNAMODB_TABLE_NAME, USAGE_LOG_DYNAMODB_TABLE_NAME, TASK_DEFINITION_FILE_LOCATION, EXISTING_IGW_ID, SINGLE_NAT_GATEWAY_ID, NAT_GATEWAY_NAME, COGNITO_USER_POOL_DOMAIN_PREFIX, COGNITO_REDIRECTION_URL, AWS_ACCOUNT_ID, ECS_USE_FARGATE_SPOT, ECS_READ_ONLY_FILE_SYSTEM, USE_CLOUDFRONT, LOAD_BALANCER_WEB_ACL_NAME
29
  from cdk_functions import create_subnets, create_web_acl_with_common_rules, add_custom_policies, add_alb_https_listener_with_cert, create_nat_gateway # Only keep CDK-native functions
30
 
31
  def _get_env_list(env_var_name: str) -> List[str]:
@@ -67,20 +68,80 @@ class CdkStack(Stack):
67
  print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
68
  return []
69
  # Optional: Add validation that all items in the list are dicts
70
- return ctx_value
 
 
71
 
72
 
73
  # --- VPC and Subnets (Assuming VPC is always lookup, Subnets are created/returned by create_subnets) ---
74
- # --- VPC Lookup (Always lookup as per your assumption) ---
75
- try:
76
- vpc = ec2.Vpc.from_lookup(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  self,
78
- "VPC",
79
- vpc_name=VPC_NAME
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  )
81
- print("Successfully looked up VPC:", vpc.vpc_id)
82
- except Exception as e:
83
- raise Exception(f"Could not look up VPC with name '{VPC_NAME}' due to: {e}")
 
 
 
84
 
85
  # --- Subnet Handling (Check Context and Create/Import) ---
86
  # Initialize lists to hold ISubnet objects (L2) and CfnSubnet/CfnRouteTable (L1)
@@ -100,35 +161,68 @@ class CdkStack(Stack):
100
  print("vpc.public_subnets:", vpc.public_subnets)
101
  print("vpc.private_subnets:", vpc.private_subnets)
102
 
103
- # public_subnets_by_az: Dict[str, List[ec2.ISubnet]] = {}
104
- # private_subnets_by_az: Dict[str, List[ec2.ISubnet]] = {}
 
 
 
 
 
105
 
106
- # Iterate through the subnets exposed by the Vpc L2 construct.
107
- # for subnet in vpc.public_subnets:
108
- # az = subnet.availability_zone
109
- # if az not in public_subnets_by_az:
110
- # public_subnets_by_az[az] = []
111
- # public_subnets_by_az[az].append(subnet)
112
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  selected_public_subnets = vpc.select_subnets(subnet_type=ec2.SubnetType.PUBLIC, one_per_az=True)
114
  private_subnets_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, one_per_az=True)
115
- private_subnets_isolated = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED, one_per_az=True)
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  combined_subnet_objects = []
118
 
119
- if private_subnets_egress.subnets:
120
- # Add the first PRIVATE_WITH_EGRESS subnet
121
- combined_subnet_objects.append(private_subnets_egress.subnets[0])
 
 
 
 
 
122
  else:
123
  self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found to select the first one.")
124
 
125
  # Add all PRIVATE_ISOLATED subnets *except* the first one (if they exist)
126
- if len(private_subnets_isolated.subnets) > 1:
127
- combined_subnet_objects.extend(private_subnets_isolated.subnets[1:])
128
- elif private_subnets_isolated.subnets: # Only 1 isolated subnet, add a warning if [1:] was desired
129
- self.node.add_warning("Only one PRIVATE_ISOLATED subnet found, private_subnets_isolated.subnets[1:] will be empty.")
130
- else:
131
- self.node.add_warning("No PRIVATE_ISOLATED subnets found.")
 
 
 
132
 
133
  # Create an ec2.SelectedSubnets object from the combined private subnet list.
134
  selected_private_subnets = vpc.select_subnets(
@@ -139,29 +233,6 @@ class CdkStack(Stack):
139
  print("selected_private_subnets:", selected_private_subnets)
140
 
141
 
142
- #self.private_route_tables_cfn = []
143
-
144
- # for subnet in vpc.private_subnets:
145
- # az = subnet.availability_zone
146
- # if az not in private_subnets_by_az:
147
- # private_subnets_by_az[az] = []
148
- # private_subnets_by_az[az].append(subnet)
149
-
150
- #selected_public_subnets: List[ec2.ISubnet] = []
151
- #selected_private_subnets: List[ec2.ISubnet] = []
152
-
153
- # Select one public subnet per AZ, preferring the first one found
154
- # for az in sorted(public_subnets_by_az.keys()):
155
- # if public_subnets_by_az[az]:
156
- # selected_public_subnets.append(public_subnets_by_az[az][0])
157
- # print(f"Selected existing public subnet: {public_subnets_by_az[az][0].subnet_id} from AZ {az}.")
158
-
159
- # Select one private subnet per AZ, preferring the first one found
160
- # for az in sorted(private_subnets_by_az.keys()):
161
- # if private_subnets_by_az[az]:
162
- # selected_private_subnets.append(private_subnets_by_az[az][0])
163
- # print(f"Selected existing private subnet: {private_subnets_by_az[az][0].subnet_id} from AZ {az}.")
164
-
165
  if len(selected_public_subnets.subnet_ids) < 2 or len(selected_private_subnets.subnet_ids) < 2:
166
  raise Exception("Need at least two public or private subnets in different availability zones")
167
 
@@ -222,11 +293,11 @@ class CdkStack(Stack):
222
  self.public_subnets.extend(newly_created_public_subnets)
223
  self.public_route_tables_cfn.extend(newly_created_public_rts_cfn)
224
 
225
- if not self.public_subnets:
226
- raise Exception("No public subnets found or created, exiting.")
227
-
228
 
229
  # --- NAT Gateway Creation/Lookup ---
 
230
  self.single_nat_gateway_id = None
231
 
232
  nat_gw_id_from_context = SINGLE_NAT_GATEWAY_ID
@@ -234,9 +305,20 @@ class CdkStack(Stack):
234
  if nat_gw_id_from_context:
235
  print(f"Using existing NAT Gateway ID from context: {nat_gw_id_from_context}")
236
  self.single_nat_gateway_id = nat_gw_id_from_context
237
- else:
238
- # If not in context, create a new one, but only if we have a public subnet.
239
- if self.public_subnets:
 
 
 
 
 
 
 
 
 
 
 
240
  print("NAT Gateway ID not found in context. Creating a new one.")
241
  # Place the NAT GW in the first available public subnet
242
  first_public_subnet = self.public_subnets[0]
@@ -248,7 +330,7 @@ class CdkStack(Stack):
248
  nat_gateway_id_context_key=SINGLE_NAT_GATEWAY_ID
249
  )
250
  else:
251
- print("WARNING: No public subnets available. Cannot create a NAT Gateway.")
252
 
253
 
254
  # --- 4. Process Private Subnets ---
@@ -280,17 +362,24 @@ class CdkStack(Stack):
280
  else:
281
  print("No private subnets specified for creation in context ('private_subnets_to_create').")
282
 
283
- if not self.private_subnets:
 
 
 
 
284
  raise Exception("No private subnets found or created, exiting.")
285
 
286
  # --- 5. Sanity Check and Output ---
287
-
288
- # Output the single NAT Gateway ID for verification
289
  if self.single_nat_gateway_id:
290
  CfnOutput(self, "SingleNatGatewayId", value=self.single_nat_gateway_id,
291
- description="ID of the single NAT Gateway used for private subnets.")
 
 
292
  else:
293
- raise Exception("No single NAT Gateway was created or resolved.")
 
 
294
 
295
  # --- Outputs for other stacks/regions ---
296
  # These are crucial for cross-stack, cross-region referencing
@@ -303,34 +392,6 @@ class CdkStack(Stack):
303
  self.params["public_route_tables"] = self.public_route_tables_cfn
304
 
305
 
306
- #class CdkStackMain(Stack):
307
- # def __init__(self, scope: Construct, construct_id: str, private_subnets:List[ec2.ISubnet]=[], private_route_tables: List[ec2.CfnRouteTable]=[], public_subnets:List[ec2.ISubnet]=[], public_route_tables: List[ec2.CfnRouteTable]=[], **kwargs) -> None:
308
- # super().__init__(scope, construct_id, **kwargs)
309
-
310
- # --- Helper to get context values ---
311
- # def get_context_bool(key: str, default: bool = False) -> bool:
312
- # return self.node.try_get_context(key) or default
313
-
314
- # def get_context_str(key: str, default: str = None) -> str:
315
- # return self.node.try_get_context(key) or default
316
-
317
- # def get_context_dict(key: str, default: dict = None) -> dict:
318
- # return self.node.try_get_context(key) or default
319
-
320
- # def get_context_list_of_dicts(key: str) -> List[Dict[str, Any]]:
321
- # ctx_value = self.node.try_get_context(key)
322
-
323
- # if not isinstance(ctx_value, list):
324
- # print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
325
- # return []
326
- # # Optional: Add validation that all items in the list are dicts
327
- # return ctx_value
328
-
329
- # self.private_subnets: List[ec2.ISubnet] = private_subnets
330
- # self.private_route_tables_cfn: List[ec2.CfnRouteTable] = private_route_tables
331
- # self.public_subnets: List[ec2.ISubnet] = public_subnets
332
- # self.public_route_tables_cfn: List[ec2.CfnRouteTable] = public_route_tables
333
-
334
  private_subnet_selection = ec2.SubnetSelection(subnets=self.private_subnets)
335
  public_subnet_selection = ec2.SubnetSelection(subnets=self.public_subnets)
336
 
@@ -340,16 +401,6 @@ class CdkStack(Stack):
340
  for sub in public_subnet_selection.subnets:
341
  print("public subnet:", sub.subnet_id, "is in availability zone:", sub.availability_zone)
342
 
343
- # try:
344
- # vpc = ec2.Vpc.from_lookup(
345
- # self,
346
- # "VPC",
347
- # vpc_name=VPC_NAME
348
- # )
349
- # print("Successfully looked up VPC")
350
- # except Exception as e:
351
- # raise Exception(f"Could not look up VPC with name '{VPC_NAME}' due to: {e}")
352
-
353
  print("Private subnet route tables:", self.private_route_tables_cfn)
354
 
355
  # Add the S3 Gateway Endpoint to the VPC
@@ -368,31 +419,60 @@ class CdkStack(Stack):
368
  description="The id for the S3 Gateway Endpoint.") # Specify the S3 service
369
 
370
  # --- IAM Roles ---
371
- try:
372
- codebuild_role_name = CODEBUILD_ROLE_NAME
373
- custom_sts_kms_policy = """{
374
- "Version": "2012-10-17",
375
- "Statement": [
376
- {
377
- "Sid": "STSCallerIdentity",
378
- "Effect": "Allow",
379
- "Action": [
380
- "sts:GetCallerIdentity"
381
- ],
382
- "Resource": "*"
383
- },
384
- {
385
- "Sid": "KMSAccess",
386
- "Effect": "Allow",
387
- "Action": [
388
- "kms:Encrypt",
389
- "kms:Decrypt",
390
- "kms:GenerateDataKey"
391
- ],
392
- "Resource": "*"
 
 
 
393
  }
394
- ]
395
- }"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
396
 
397
  if get_context_bool(f"exists:{codebuild_role_name}"):
398
  # If exists, lookup/import the role using ARN from context
@@ -458,13 +538,21 @@ class CdkStack(Stack):
458
  bucket = s3.Bucket.from_bucket_name(self, "LogConfigBucket", bucket_name=log_bucket_name)
459
  print("Using existing S3 bucket", log_bucket_name)
460
  else:
461
- bucket = s3.Bucket(self, "LogConfigBucket", bucket_name=log_bucket_name,
462
- versioned=False, # Set to True if you need versioning
463
- # IMPORTANT: Set removal_policy to DESTROY
464
- removal_policy=RemovalPolicy.DESTROY,
465
- # IMPORTANT: Set auto_delete_objects to True to empty the bucket before deletion
466
- auto_delete_objects=True
467
- ) # Explicitly set bucket_name
 
 
 
 
 
 
 
 
468
  print("Created S3 bucket", log_bucket_name)
469
 
470
  # Add policies - this will apply to both created and imported buckets
@@ -491,18 +579,31 @@ class CdkStack(Stack):
491
  output_bucket = s3.Bucket.from_bucket_name(self, "OutputBucket", bucket_name=output_bucket_name)
492
  print("Using existing Output bucket", output_bucket_name)
493
  else:
494
- output_bucket = s3.Bucket(self, "OutputBucket", bucket_name=output_bucket_name,
495
- lifecycle_rules=[
 
496
  s3.LifecycleRule(
497
  expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
498
  )
499
  ],
500
- versioned=False, # Set to True if you need versioning
501
- # IMPORTANT: Set removal_policy to DESTROY
 
 
 
 
 
 
 
 
 
 
 
 
502
  removal_policy=RemovalPolicy.DESTROY,
503
- # IMPORTANT: Set auto_delete_objects to True to empty the bucket before deletion
504
  auto_delete_objects=True
505
- )
 
506
  print("Created Output bucket:", output_bucket_name)
507
 
508
  # Add policies to output bucket
@@ -602,14 +703,7 @@ class CdkStack(Stack):
602
  # --- Security Groups ---
603
  try:
604
  ecs_security_group_name = ECS_SECURITY_GROUP_NAME
605
- # Following checks by name don't really work
606
- # Use CDK's from_lookup_by_name which handles lookup or throws an error if not found
607
- #try:
608
- # ecs_security_group = ec2.SecurityGroup.from_lookup_by_name(
609
- # self, "ECSSecurityGroup", vpc=vpc, security_group_name=ecs_security_group_name
610
- # )
611
- # print(f"Using existing Security Group: {ecs_security_group_name}")
612
- # except Exception: # If lookup fails, create
613
  try:
614
  ecs_security_group = ec2.SecurityGroup(
615
  self,
@@ -622,12 +716,7 @@ class CdkStack(Stack):
622
  print("Failed to create ECS security group due to:", e)
623
 
624
  alb_security_group_name = ALB_NAME_SECURITY_GROUP_NAME
625
- # try:
626
- # alb_security_group = ec2.SecurityGroup.from_lookup_by_name(
627
- # self, "ALBSecurityGroup", vpc=vpc, security_group_name=alb_security_group_name
628
- # )
629
- # print(f"Using existing Security Group: {alb_security_group_name}")
630
- # except Exception: # If lookup fails, create
631
  try:
632
  alb_security_group = ec2.SecurityGroup(
633
  self,
@@ -717,8 +806,6 @@ class CdkStack(Stack):
717
  print("Successfully created new Application Load Balancer")
718
  except Exception as e:
719
  raise Exception("Could not handle application load balancer due to:", e)
720
-
721
-
722
 
723
  # --- Cognito User Pool ---
724
  try:
@@ -738,7 +825,7 @@ class CdkStack(Stack):
738
  print(f"Created new user pool {user_pool.user_pool_id}.")
739
 
740
  # If you're using a certificate, assume that you will be using the ALB Cognito login features. You need different redirect URLs to accept the token that comes from Cognito authentication.
741
- if ACM_CERTIFICATE_ARN:
742
  redirect_uris = [COGNITO_REDIRECTION_URL, COGNITO_REDIRECTION_URL + "/oauth2/idpresponse"]
743
  else:
744
  redirect_uris = [COGNITO_REDIRECTION_URL]
@@ -786,31 +873,39 @@ class CdkStack(Stack):
786
 
787
  # --- Secrets Manager Secret ---
788
  try:
789
- secret_name = COGNITO_USER_POOL_CLIENT_SECRET_NAME
790
- if get_context_bool(f"exists:{secret_name}"):
791
  # Lookup by name
792
  secret = secretsmanager.Secret.from_secret_name_v2(self, "CognitoSecret", secret_name=secret_name)
793
  print(f"Using existing Secret {secret_name}.")
794
- else:
795
- secret = secretsmanager.Secret(self, "CognitoSecret", # Logical ID
796
- secret_name=secret_name, # Explicit resource name
797
- secret_object_value={
798
- "REDACTION_USER_POOL_ID": SecretValue.unsafe_plain_text(user_pool.user_pool_id), # Use the CDK attribute
799
- "REDACTION_CLIENT_ID": SecretValue.unsafe_plain_text(user_pool_client.user_pool_client_id), # Use the CDK attribute
800
- "REDACTION_CLIENT_SECRET": user_pool_client.user_pool_client_secret # Use the CDK attribute
801
- }
802
- )
803
- print(f"Created new secret {secret_name}.")
 
 
 
 
 
 
 
 
 
 
 
 
804
 
805
  except Exception as e:
806
  raise Exception("Could not handle Secrets Manager secret due to:", e)
807
 
808
  # --- Fargate Task Definition ---
809
  try:
810
- # For task definitions, re-creating with the same logical ID creates new revisions.
811
- # If you want to use a *specific existing revision*, you'd need to look it up by ARN.
812
- # If you want to update the latest revision, defining it here is the standard.
813
- # Let's assume we always define it here to get revision management.
814
  fargate_task_definition_name = FARGATE_TASK_DEFINITION_NAME
815
 
816
  read_only_file_system = ECS_READ_ONLY_FILE_SYSTEM == 'True'
@@ -906,8 +1001,8 @@ class CdkStack(Stack):
906
 
907
  cdk_managed_log_group = logs.LogGroup(self, "MyTaskLogGroup", # CDK Logical ID
908
  log_group_name=log_group_name_from_config,
909
- retention=logs.RetentionDays.ONE_MONTH, # Example: set retention
910
- removal_policy=RemovalPolicy.DESTROY # If you want it deleted when stack is deleted
911
  )
912
 
913
  epheremal_storage_volume_cdk_obj = ecs.Volume(
@@ -926,10 +1021,7 @@ class CdkStack(Stack):
926
  cpu_architecture=ecs.CpuArchitecture.X86_64,
927
  operating_system_family=ecs.OperatingSystemFamily.LINUX
928
  ),
929
- # 1. Specify the total ephemeral storage for the task
930
  ephemeral_storage_gib=21, # Minimum is 21 GiB
931
- # 2. Define the volume at the task level
932
- # This volume will use the ephemeral storage configured above.
933
  volumes=[epheremal_storage_volume_cdk_obj]
934
  )
935
  print("Fargate task definition defined.")
@@ -1093,7 +1185,7 @@ class CdkStack(Stack):
1093
  print(f"ALB listener on port {listener_port} defined.")
1094
 
1095
 
1096
- if ACM_CERTIFICATE_ARN:
1097
  http_listener.add_action(
1098
  "DefaultAction", # Logical ID for the default action
1099
  action=elbv2.ListenerAction.redirect(protocol='HTTPS',
@@ -1135,7 +1227,7 @@ class CdkStack(Stack):
1135
  print("Added targets and actions to ALB HTTP listener.")
1136
 
1137
  # Now the same for HTTPS if you have an ACM certificate
1138
- if ACM_CERTIFICATE_ARN:
1139
  listener_port_https = 443
1140
  # Check if Listener exists - from_listener_arn or lookup by port/ALB
1141
 
@@ -1143,7 +1235,7 @@ class CdkStack(Stack):
1143
  self,
1144
  "MyHttpsListener", # Logical ID for the HTTPS listener
1145
  alb,
1146
- acm_certificate_arn=ACM_CERTIFICATE_ARN,
1147
  default_target_group=target_group,
1148
  enable_cognito_auth=True,
1149
  cognito_user_pool=user_pool,
 
21
  aws_elasticloadbalancingv2 as elbv2,
22
  aws_logs as logs,
23
  aws_wafv2 as wafv2,
24
+ aws_kms as kms,
25
  aws_dynamodb as dynamodb # Import the DynamoDB module
26
  )
27
 
28
  from constructs import Construct
29
+ from cdk_config import CDK_PREFIX, VPC_NAME, AWS_MANAGED_TASK_ROLES_LIST, GITHUB_REPO_USERNAME, GITHUB_REPO_NAME, GITHUB_REPO_BRANCH, ECS_TASK_MEMORY_SIZE, ECS_TASK_CPU_SIZE, CUSTOM_HEADER, CUSTOM_HEADER_VALUE, AWS_REGION, CLOUDFRONT_GEO_RESTRICTION, DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS, GRADIO_SERVER_PORT, PUBLIC_SUBNETS_TO_USE, PUBLIC_SUBNET_CIDR_BLOCKS, PUBLIC_SUBNET_AVAILABILITY_ZONES, PRIVATE_SUBNETS_TO_USE, PRIVATE_SUBNET_CIDR_BLOCKS, PRIVATE_SUBNET_AVAILABILITY_ZONES, CODEBUILD_PROJECT_NAME, ECS_SECURITY_GROUP_NAME, ALB_NAME_SECURITY_GROUP_NAME, ALB_NAME, COGNITO_USER_POOL_NAME, COGNITO_USER_POOL_CLIENT_NAME, COGNITO_USER_POOL_CLIENT_SECRET_NAME, FARGATE_TASK_DEFINITION_NAME, ECS_SERVICE_NAME, WEB_ACL_NAME, CLOUDFRONT_DISTRIBUTION_NAME, ECS_TASK_ROLE_NAME, ALB_TARGET_GROUP_NAME, S3_LOG_CONFIG_BUCKET_NAME, S3_OUTPUT_BUCKET_NAME, ACM_SSL_CERTIFICATE_ARN, CLUSTER_NAME, CODEBUILD_ROLE_NAME, ECS_TASK_EXECUTION_ROLE_NAME, ECR_CDK_REPO_NAME, ECS_LOG_GROUP_NAME, SAVE_LOGS_TO_DYNAMODB, ACCESS_LOG_DYNAMODB_TABLE_NAME, FEEDBACK_LOG_DYNAMODB_TABLE_NAME, USAGE_LOG_DYNAMODB_TABLE_NAME, TASK_DEFINITION_FILE_LOCATION, EXISTING_IGW_ID, SINGLE_NAT_GATEWAY_ID, NAT_GATEWAY_NAME, COGNITO_USER_POOL_DOMAIN_PREFIX, COGNITO_REDIRECTION_URL, AWS_ACCOUNT_ID, ECS_USE_FARGATE_SPOT, ECS_READ_ONLY_FILE_SYSTEM, USE_CLOUDFRONT, LOAD_BALANCER_WEB_ACL_NAME, NEW_VPC_DEFAULT_NAME, NEW_VPC_CIDR, USE_CUSTOM_KMS_KEY, S3_KMS_KEY_NAME
30
  from cdk_functions import create_subnets, create_web_acl_with_common_rules, add_custom_policies, add_alb_https_listener_with_cert, create_nat_gateway # Only keep CDK-native functions
31
 
32
  def _get_env_list(env_var_name: str) -> List[str]:
 
68
  print(f"Warning: Context key '{key}' not found or not a list. Returning empty list.")
69
  return []
70
  # Optional: Add validation that all items in the list are dicts
71
+ return ctx_value
72
+
73
+ self.template_options.description = "Deployment of the 'doc_redaction' PDF, image, and XLSX/CSV redaction app. Git repo available at: https://github.com/seanpedrick-case/doc_redaction."
74
 
75
 
76
  # --- VPC and Subnets (Assuming VPC is always lookup, Subnets are created/returned by create_subnets) ---
77
+ new_vpc_created = False
78
+ if VPC_NAME:
79
+ print("Looking for current VPC:", VPC_NAME)
80
+ try:
81
+ vpc = ec2.Vpc.from_lookup(
82
+ self,
83
+ "VPC",
84
+ vpc_name=VPC_NAME
85
+ )
86
+ print("Successfully looked up VPC:", vpc.vpc_id)
87
+ except Exception as e:
88
+ raise Exception(f"Could not look up VPC with name '{VPC_NAME}' due to: {e}")
89
+
90
+ elif NEW_VPC_DEFAULT_NAME:
91
+ new_vpc_created = True
92
+ print(f"NEW_VPC_DEFAULT_NAME ('{NEW_VPC_DEFAULT_NAME}') is set. Creating a new VPC.")
93
+
94
+ # Configuration for the new VPC
95
+ # You can make these configurable via context as well, e.g.,
96
+ # new_vpc_cidr = self.node.try_get_context("new_vpc_cidr") or "10.0.0.0/24"
97
+ # new_vpc_max_azs = self.node.try_get_context("new_vpc_max_azs") or 2 # Use 2 AZs by default for HA
98
+ # new_vpc_nat_gateways = self.node.try_get_context("new_vpc_nat_gateways") or new_vpc_max_azs # One NAT GW per AZ for HA
99
+ # or 1 for cost savings if acceptable
100
+ if not NEW_VPC_CIDR:
101
+ raise Exception("App has been instructed to create a new VPC but not VPC CDR range provided to variable NEW_VPC_CIDR")
102
+
103
+ print("Provided NEW_VPC_CIDR range:", NEW_VPC_CIDR)
104
+
105
+ new_vpc_cidr = NEW_VPC_CIDR
106
+ new_vpc_max_azs = 2 # Creates resources in 2 AZs. Adjust as needed.
107
+
108
+ # For "a NAT gateway", you can set nat_gateways=1.
109
+ # For resilience (NAT GW per AZ), set nat_gateways=new_vpc_max_azs.
110
+ # The Vpc construct will create NAT Gateway(s) if subnet_type PRIVATE_WITH_EGRESS is used
111
+ # and nat_gateways > 0.
112
+ new_vpc_nat_gateways = 1 # Creates a single NAT Gateway for cost-effectiveness.
113
+ # If you need one per AZ for higher availability, set this to new_vpc_max_azs.
114
+
115
+ vpc = ec2.Vpc(
116
  self,
117
+ "MyNewLogicalVpc", # This is the CDK construct ID
118
+ vpc_name=NEW_VPC_DEFAULT_NAME,
119
+ ip_addresses=ec2.IpAddresses.cidr(new_vpc_cidr),
120
+ max_azs=new_vpc_max_azs,
121
+ nat_gateways=new_vpc_nat_gateways, # Number of NAT gateways to create
122
+ subnet_configuration=[
123
+ ec2.SubnetConfiguration(
124
+ name="Public", # Name prefix for public subnets
125
+ subnet_type=ec2.SubnetType.PUBLIC,
126
+ cidr_mask=28 # Adjust CIDR mask as needed (e.g., /24 provides ~250 IPs per subnet)
127
+ ),
128
+ ec2.SubnetConfiguration(
129
+ name="Private", # Name prefix for private subnets
130
+ subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, # Ensures these subnets have NAT Gateway access
131
+ cidr_mask=28 # Adjust CIDR mask as needed
132
+ )
133
+ # You could also add ec2.SubnetType.PRIVATE_ISOLATED if needed
134
+ ],
135
+ # Internet Gateway is created and configured automatically for PUBLIC subnets.
136
+ # Route tables for public subnets will point to the IGW.
137
+ # Route tables for PRIVATE_WITH_EGRESS subnets will point to the NAT Gateway(s).
138
  )
139
+ print(f"Successfully created new VPC: {vpc.vpc_id} with name '{NEW_VPC_DEFAULT_NAME}'")
140
+ # If nat_gateways > 0, vpc.nat_gateway_ips will contain EIPs if Vpc created them.
141
+ # vpc.public_subnets, vpc.private_subnets, vpc.isolated_subnets are populated.
142
+
143
+ else:
144
+ raise Exception("VPC_NAME for current VPC not found, and NEW_VPC_DEFAULT_NAME not found to create a new VPC")
145
 
146
  # --- Subnet Handling (Check Context and Create/Import) ---
147
  # Initialize lists to hold ISubnet objects (L2) and CfnSubnet/CfnRouteTable (L1)
 
161
  print("vpc.public_subnets:", vpc.public_subnets)
162
  print("vpc.private_subnets:", vpc.private_subnets)
163
 
164
+ if vpc.public_subnets: # These are already one_per_az if max_azs was used and Vpc created them
165
+ self.public_subnets.extend(vpc.public_subnets)
166
+ else:
167
+ self.node.add_warning("No public subnets found in the VPC.")
168
+
169
+ # Get private subnets with egress specifically
170
+ #selected_private_subnets_with_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS)
171
 
172
+ print(f"Selected from VPC: {len(self.public_subnets)} public, {len(self.private_subnets)} private_with_egress subnets.")
 
 
 
 
 
173
 
174
+ if len(self.public_subnets) < 1 or len(self.private_subnets) < 1 : # Simplified check for new VPC
175
+ # If new_vpc_max_azs was 1, you'd have 1 of each. If 2, then 2 of each.
176
+ # The original check ' < 2' might be too strict if new_vpc_max_azs=1
177
+ pass # For new VPC, allow single AZ setups if configured that way. The VPC construct ensures one per AZ up to max_azs.
178
+
179
+ if not self.public_subnets and not self.private_subnets:
180
+ print("Error: No public or private subnets could be found in the VPC for automatic selection. "
181
+ "You must either specify subnets in *_SUBNETS_TO_USE or ensure the VPC has discoverable subnets.")
182
+ raise RuntimeError("No suitable subnets found for automatic selection.")
183
+ else:
184
+ print(f"Automatically selected {len(self.public_subnets)} public and {len(self.private_subnets)} private subnets based on VPC properties.")
185
+
186
  selected_public_subnets = vpc.select_subnets(subnet_type=ec2.SubnetType.PUBLIC, one_per_az=True)
187
  private_subnets_egress = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, one_per_az=True)
 
188
 
189
+ if private_subnets_egress.subnets:
190
+ self.private_subnets.extend(private_subnets_egress.subnets)
191
+ else:
192
+ self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found in the VPC.")
193
+
194
+ try:
195
+ private_subnets_isolated = vpc.select_subnets(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED, one_per_az=True)
196
+ except Exception as e:
197
+ private_subnets_isolated = []
198
+ print("Could not find any isolated subnets due to:", e)
199
+
200
+
201
+
202
+ ###
203
  combined_subnet_objects = []
204
 
205
+ if private_subnets_isolated:
206
+ if private_subnets_egress.subnets:
207
+ # Add the first PRIVATE_WITH_EGRESS subnet
208
+ combined_subnet_objects.append(private_subnets_egress.subnets[0])
209
+ elif not private_subnets_isolated:
210
+ if private_subnets_egress.subnets:
211
+ # Add the first PRIVATE_WITH_EGRESS subnet
212
+ combined_subnet_objects.extend(private_subnets_egress.subnets)
213
  else:
214
  self.node.add_warning("No PRIVATE_WITH_EGRESS subnets found to select the first one.")
215
 
216
  # Add all PRIVATE_ISOLATED subnets *except* the first one (if they exist)
217
+ try:
218
+ if len(private_subnets_isolated.subnets) > 1:
219
+ combined_subnet_objects.extend(private_subnets_isolated.subnets[1:])
220
+ elif private_subnets_isolated.subnets: # Only 1 isolated subnet, add a warning if [1:] was desired
221
+ self.node.add_warning("Only one PRIVATE_ISOLATED subnet found, private_subnets_isolated.subnets[1:] will be empty.")
222
+ else:
223
+ self.node.add_warning("No PRIVATE_ISOLATED subnets found.")
224
+ except Exception as e:
225
+ print("Could not identify private isolated subnets due to:", e)
226
 
227
  # Create an ec2.SelectedSubnets object from the combined private subnet list.
228
  selected_private_subnets = vpc.select_subnets(
 
233
  print("selected_private_subnets:", selected_private_subnets)
234
 
235
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
236
  if len(selected_public_subnets.subnet_ids) < 2 or len(selected_private_subnets.subnet_ids) < 2:
237
  raise Exception("Need at least two public or private subnets in different availability zones")
238
 
 
293
  self.public_subnets.extend(newly_created_public_subnets)
294
  self.public_route_tables_cfn.extend(newly_created_public_rts_cfn)
295
 
296
+ if not self.public_subnets and not names_to_create_public and not PUBLIC_SUBNETS_TO_USE :
297
+ raise Exception("No public subnets found or created, exiting.")
 
298
 
299
  # --- NAT Gateway Creation/Lookup ---
300
+ print("Creating NAT gateway/located existing")
301
  self.single_nat_gateway_id = None
302
 
303
  nat_gw_id_from_context = SINGLE_NAT_GATEWAY_ID
 
305
  if nat_gw_id_from_context:
306
  print(f"Using existing NAT Gateway ID from context: {nat_gw_id_from_context}")
307
  self.single_nat_gateway_id = nat_gw_id_from_context
308
+
309
+ elif new_vpc_created and new_vpc_nat_gateways > 0 and hasattr(vpc, 'nat_gateways') and vpc.nat_gateways:
310
+ self.single_nat_gateway_id = vpc.nat_gateways[0].gateway_id
311
+ print(f"Using NAT Gateway {self.single_nat_gateway_id} created by the new VPC construct.")
312
+
313
+ if not self.single_nat_gateway_id:
314
+ print("Creating a new NAT gateway")
315
+
316
+ if hasattr(vpc, 'nat_gateways') and vpc.nat_gateways:
317
+ print("Existing NAT gateway found in vpc")
318
+ pass
319
+
320
+ # If not in context, create a new one, but only if we have a public subnet.
321
+ elif self.public_subnets:
322
  print("NAT Gateway ID not found in context. Creating a new one.")
323
  # Place the NAT GW in the first available public subnet
324
  first_public_subnet = self.public_subnets[0]
 
330
  nat_gateway_id_context_key=SINGLE_NAT_GATEWAY_ID
331
  )
332
  else:
333
+ print("WARNING: No public subnets available and NAT gateway not found in existing VPC. Cannot create a NAT Gateway.")
334
 
335
 
336
  # --- 4. Process Private Subnets ---
 
362
  else:
363
  print("No private subnets specified for creation in context ('private_subnets_to_create').")
364
 
365
+ # if not self.private_subnets:
366
+ # raise Exception("No private subnets found or created, exiting.")
367
+
368
+ if not self.private_subnets and not names_to_create_private and not PRIVATE_SUBNETS_TO_USE:
369
+ # This condition might need adjustment for new VPCs.
370
  raise Exception("No private subnets found or created, exiting.")
371
 
372
  # --- 5. Sanity Check and Output ---
373
+ # Output the single NAT Gateway ID for verification
 
374
  if self.single_nat_gateway_id:
375
  CfnOutput(self, "SingleNatGatewayId", value=self.single_nat_gateway_id,
376
+ description="ID of the single NAT Gateway resolved or created.")
377
+ elif NEW_VPC_DEFAULT_NAME and (self.node.try_get_context("new_vpc_nat_gateways") or 1) > 0:
378
+ print("INFO: A new VPC was created with NAT Gateway(s). Their routing is handled by the VPC construct. No single_nat_gateway_id was explicitly set for separate output.")
379
  else:
380
+ out_message = "WARNING: No single NAT Gateway was resolved or created explicitly by the script's logic after VPC setup."
381
+ print(out_message)
382
+ raise Exception(out_message)
383
 
384
  # --- Outputs for other stacks/regions ---
385
  # These are crucial for cross-stack, cross-region referencing
 
392
  self.params["public_route_tables"] = self.public_route_tables_cfn
393
 
394
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
395
  private_subnet_selection = ec2.SubnetSelection(subnets=self.private_subnets)
396
  public_subnet_selection = ec2.SubnetSelection(subnets=self.public_subnets)
397
 
 
401
  for sub in public_subnet_selection.subnets:
402
  print("public subnet:", sub.subnet_id, "is in availability zone:", sub.availability_zone)
403
 
 
 
 
 
 
 
 
 
 
 
404
  print("Private subnet route tables:", self.private_route_tables_cfn)
405
 
406
  # Add the S3 Gateway Endpoint to the VPC
 
419
  description="The id for the S3 Gateway Endpoint.") # Specify the S3 service
420
 
421
  # --- IAM Roles ---
422
+ if USE_CUSTOM_KMS_KEY == '1':
423
+ kms_key = kms.Key(self, "RedactionSharedKmsKey", alias=S3_KMS_KEY_NAME, removal_policy=RemovalPolicy.DESTROY)
424
+
425
+ custom_sts_kms_policy_dict = {
426
+ "Version": "2012-10-17",
427
+ "Statement": [
428
+ {
429
+ "Sid": "STSCallerIdentity",
430
+ "Effect": "Allow",
431
+ "Action": [
432
+ "sts:GetCallerIdentity"
433
+ ],
434
+ "Resource": "*"
435
+ },
436
+ {
437
+ "Sid": "KMSAccess",
438
+ "Effect": "Allow",
439
+ "Action": [
440
+ "kms:Encrypt",
441
+ "kms:Decrypt",
442
+ "kms:GenerateDataKey"
443
+ ],
444
+ "Resource": kms_key.key_arn # Use key_arn, as it's the full ARN, safer than key_id
445
+ }
446
+ ]
447
  }
448
+ else:
449
+ kms_key = None
450
+
451
+ custom_sts_kms_policy_dict = {
452
+ "Version": "2012-10-17",
453
+ "Statement": [
454
+ {
455
+ "Sid": "STSCallerIdentity",
456
+ "Effect": "Allow",
457
+ "Action": [
458
+ "sts:GetCallerIdentity"
459
+ ],
460
+ "Resource": "*"
461
+ } ,
462
+ {
463
+ "Sid": "KMSSecretsManagerDecrypt", # Explicitly add decrypt for default key
464
+ "Effect": "Allow",
465
+ "Action": [
466
+ "kms:Decrypt"
467
+ ],
468
+ "Resource": f"arn:aws:kms:{AWS_REGION}:{AWS_ACCOUNT_ID}:key/aws/secretsmanager"
469
+ }
470
+ ]
471
+ }
472
+ custom_sts_kms_policy = json.dumps(custom_sts_kms_policy_dict, indent=4)
473
+
474
+ try:
475
+ codebuild_role_name = CODEBUILD_ROLE_NAME
476
 
477
  if get_context_bool(f"exists:{codebuild_role_name}"):
478
  # If exists, lookup/import the role using ARN from context
 
538
  bucket = s3.Bucket.from_bucket_name(self, "LogConfigBucket", bucket_name=log_bucket_name)
539
  print("Using existing S3 bucket", log_bucket_name)
540
  else:
541
+ if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
542
+ bucket = s3.Bucket(self, "LogConfigBucket", bucket_name=log_bucket_name,
543
+ versioned=False,
544
+ removal_policy=RemovalPolicy.DESTROY,
545
+ auto_delete_objects=True,
546
+ encryption=s3.BucketEncryption.KMS,
547
+ encryption_key=kms_key
548
+ )
549
+ else:
550
+ bucket = s3.Bucket(self, "LogConfigBucket", bucket_name=log_bucket_name,
551
+ versioned=False,
552
+ removal_policy=RemovalPolicy.DESTROY,
553
+ auto_delete_objects=True
554
+ )
555
+
556
  print("Created S3 bucket", log_bucket_name)
557
 
558
  # Add policies - this will apply to both created and imported buckets
 
579
  output_bucket = s3.Bucket.from_bucket_name(self, "OutputBucket", bucket_name=output_bucket_name)
580
  print("Using existing Output bucket", output_bucket_name)
581
  else:
582
+ if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
583
+ output_bucket = s3.Bucket(self, "OutputBucket", bucket_name=output_bucket_name,
584
+ lifecycle_rules=[
585
  s3.LifecycleRule(
586
  expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
587
  )
588
  ],
589
+ versioned=False,
590
+ removal_policy=RemovalPolicy.DESTROY,
591
+ auto_delete_objects=True,
592
+ encryption=s3.BucketEncryption.KMS,
593
+ encryption_key=kms_key
594
+ )
595
+ else:
596
+ output_bucket = s3.Bucket(self, "OutputBucket", bucket_name=output_bucket_name,
597
+ lifecycle_rules=[
598
+ s3.LifecycleRule(
599
+ expiration=Duration.days(int(DAYS_TO_DISPLAY_WHOLE_DOCUMENT_JOBS))
600
+ )
601
+ ],
602
+ versioned=False,
603
  removal_policy=RemovalPolicy.DESTROY,
 
604
  auto_delete_objects=True
605
+ )
606
+
607
  print("Created Output bucket:", output_bucket_name)
608
 
609
  # Add policies to output bucket
 
703
  # --- Security Groups ---
704
  try:
705
  ecs_security_group_name = ECS_SECURITY_GROUP_NAME
706
+
 
 
 
 
 
 
 
707
  try:
708
  ecs_security_group = ec2.SecurityGroup(
709
  self,
 
716
  print("Failed to create ECS security group due to:", e)
717
 
718
  alb_security_group_name = ALB_NAME_SECURITY_GROUP_NAME
719
+
 
 
 
 
 
720
  try:
721
  alb_security_group = ec2.SecurityGroup(
722
  self,
 
806
  print("Successfully created new Application Load Balancer")
807
  except Exception as e:
808
  raise Exception("Could not handle application load balancer due to:", e)
 
 
809
 
810
  # --- Cognito User Pool ---
811
  try:
 
825
  print(f"Created new user pool {user_pool.user_pool_id}.")
826
 
827
  # If you're using a certificate, assume that you will be using the ALB Cognito login features. You need different redirect URLs to accept the token that comes from Cognito authentication.
828
+ if ACM_SSL_CERTIFICATE_ARN:
829
  redirect_uris = [COGNITO_REDIRECTION_URL, COGNITO_REDIRECTION_URL + "/oauth2/idpresponse"]
830
  else:
831
  redirect_uris = [COGNITO_REDIRECTION_URL]
 
873
 
874
  # --- Secrets Manager Secret ---
875
  try:
876
+ secret_name = COGNITO_USER_POOL_CLIENT_SECRET_NAME
877
+ if get_context_bool(f"exists:{secret_name}"):
878
  # Lookup by name
879
  secret = secretsmanager.Secret.from_secret_name_v2(self, "CognitoSecret", secret_name=secret_name)
880
  print(f"Using existing Secret {secret_name}.")
881
+ else:
882
+ if USE_CUSTOM_KMS_KEY == '1' and isinstance(kms_key, kms.Key):
883
+ secret = secretsmanager.Secret(self, "CognitoSecret", # Logical ID
884
+ secret_name=secret_name, # Explicit resource name
885
+ secret_object_value={
886
+ "REDACTION_USER_POOL_ID": SecretValue.unsafe_plain_text(user_pool.user_pool_id), # Use the CDK attribute
887
+ "REDACTION_CLIENT_ID": SecretValue.unsafe_plain_text(user_pool_client.user_pool_client_id), # Use the CDK attribute
888
+ "REDACTION_CLIENT_SECRET": user_pool_client.user_pool_client_secret # Use the CDK attribute
889
+ },
890
+ encryption_key=kms_key
891
+ )
892
+ else:
893
+ secret = secretsmanager.Secret(self, "CognitoSecret", # Logical ID
894
+ secret_name=secret_name, # Explicit resource name
895
+ secret_object_value={
896
+ "REDACTION_USER_POOL_ID": SecretValue.unsafe_plain_text(user_pool.user_pool_id), # Use the CDK attribute
897
+ "REDACTION_CLIENT_ID": SecretValue.unsafe_plain_text(user_pool_client.user_pool_client_id), # Use the CDK attribute
898
+ "REDACTION_CLIENT_SECRET": user_pool_client.user_pool_client_secret # Use the CDK attribute
899
+ }
900
+ )
901
+
902
+ print(f"Created new secret {secret_name}.")
903
 
904
  except Exception as e:
905
  raise Exception("Could not handle Secrets Manager secret due to:", e)
906
 
907
  # --- Fargate Task Definition ---
908
  try:
 
 
 
 
909
  fargate_task_definition_name = FARGATE_TASK_DEFINITION_NAME
910
 
911
  read_only_file_system = ECS_READ_ONLY_FILE_SYSTEM == 'True'
 
1001
 
1002
  cdk_managed_log_group = logs.LogGroup(self, "MyTaskLogGroup", # CDK Logical ID
1003
  log_group_name=log_group_name_from_config,
1004
+ retention=logs.RetentionDays.ONE_MONTH,
1005
+ removal_policy=RemovalPolicy.DESTROY
1006
  )
1007
 
1008
  epheremal_storage_volume_cdk_obj = ecs.Volume(
 
1021
  cpu_architecture=ecs.CpuArchitecture.X86_64,
1022
  operating_system_family=ecs.OperatingSystemFamily.LINUX
1023
  ),
 
1024
  ephemeral_storage_gib=21, # Minimum is 21 GiB
 
 
1025
  volumes=[epheremal_storage_volume_cdk_obj]
1026
  )
1027
  print("Fargate task definition defined.")
 
1185
  print(f"ALB listener on port {listener_port} defined.")
1186
 
1187
 
1188
+ if ACM_SSL_CERTIFICATE_ARN:
1189
  http_listener.add_action(
1190
  "DefaultAction", # Logical ID for the default action
1191
  action=elbv2.ListenerAction.redirect(protocol='HTTPS',
 
1227
  print("Added targets and actions to ALB HTTP listener.")
1228
 
1229
  # Now the same for HTTPS if you have an ACM certificate
1230
+ if ACM_SSL_CERTIFICATE_ARN:
1231
  listener_port_https = 443
1232
  # Check if Listener exists - from_listener_arn or lookup by port/ALB
1233
 
 
1235
  self,
1236
  "MyHttpsListener", # Logical ID for the HTTPS listener
1237
  alb,
1238
+ ACM_SSL_CERTIFICATE_ARN=ACM_SSL_CERTIFICATE_ARN,
1239
  default_target_group=target_group,
1240
  enable_cognito_auth=True,
1241
  cognito_user_pool=user_pool,
cdk/check_resources.py CHANGED
@@ -49,150 +49,151 @@ def check_and_set_context():
49
  context_data = {}
50
 
51
  # --- Find the VPC ID first ---
52
- print("VPC_NAME:", VPC_NAME)
53
- vpc_id, nat_gateways = get_vpc_id_by_name(VPC_NAME)
54
-
55
- # If you expect only one, or one per AZ and you're creating one per AZ in CDK:
56
- if nat_gateways:
57
- # For simplicity, let's just check if *any* NAT exists in the VPC
58
- # A more robust check would match by subnet, AZ, or a specific tag.
59
- context_data["exists:NatGateway"] = True
60
- context_data["id:NatGateway"] = nat_gateways[0]['NatGatewayId'] # Store the ID of the first one found
61
- else:
62
- context_data["exists:NatGateway"] = False
63
- context_data["id:NatGateway"] = None
64
-
65
- if not vpc_id:
66
- # If the VPC doesn't exist, you might not be able to check/create subnets.
67
- # Decide how to handle this: raise an error, set a flag, etc.
68
- raise RuntimeError(f"Required VPC '{VPC_NAME}' not found. Cannot proceed with subnet checks.")
69
-
70
- context_data["vpc_id"] = vpc_id # Store VPC ID in context
71
-
72
- # SUBNET CHECKS
73
- context_data: Dict[str, Any] = {}
74
- all_proposed_subnets_data: List[Dict[str, str]] = []
75
-
76
- # Flag to indicate if full validation mode (with CIDR/AZs) is active
77
- full_validation_mode = False
78
-
79
- # Determine if full validation mode is possible/desired
80
- # It's 'desired' if CIDR/AZs are provided, and their lengths match the name lists.
81
- public_ready_for_full_validation = (
82
- len(PUBLIC_SUBNETS_TO_USE) > 0 and
83
- len(PUBLIC_SUBNET_CIDR_BLOCKS) == len(PUBLIC_SUBNETS_TO_USE) and
84
- len(PUBLIC_SUBNET_AVAILABILITY_ZONES) == len(PUBLIC_SUBNETS_TO_USE)
85
- )
86
- private_ready_for_full_validation = (
87
- len(PRIVATE_SUBNETS_TO_USE) > 0 and
88
- len(PRIVATE_SUBNET_CIDR_BLOCKS) == len(PRIVATE_SUBNETS_TO_USE) and
89
- len(PRIVATE_SUBNET_AVAILABILITY_ZONES) == len(PRIVATE_SUBNETS_TO_USE)
90
- )
91
-
92
- # Activate full validation if *any* type of subnet (public or private) has its full details provided.
93
- # You might adjust this logic if you require ALL subnet types to have CIDRs, or NONE.
94
- if public_ready_for_full_validation or private_ready_for_full_validation:
95
- full_validation_mode = True
96
-
97
- # If some are ready but others aren't, print a warning or raise an error based on your strictness
98
- if public_ready_for_full_validation and not private_ready_for_full_validation and PRIVATE_SUBNETS_TO_USE:
99
- print("Warning: Public subnets have CIDRs/AZs, but private subnets do not. Only public will be fully validated/created with CIDRs.")
100
- if private_ready_for_full_validation and not public_ready_for_full_validation and PUBLIC_SUBNETS_TO_USE:
101
- print("Warning: Private subnets have CIDRs/AZs, but public subnets do not. Only private will be fully validated/created with CIDRs.")
102
-
103
- # Prepare data for validate_subnet_creation_parameters for all subnets that have full details
104
- if public_ready_for_full_validation:
105
- for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
106
- all_proposed_subnets_data.append({
107
- 'name': name,
108
- 'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
109
- 'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i]
110
- })
111
- if private_ready_for_full_validation:
112
- for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
113
- all_proposed_subnets_data.append({
114
- 'name': name,
115
- 'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
116
- 'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i]
117
- })
118
-
119
-
120
- print(f"Target VPC ID for Boto3 lookup: {vpc_id}")
121
-
122
- # Fetch all existing subnets in the target VPC once to avoid repeated API calls
123
- try:
124
- existing_aws_subnets = _get_existing_subnets_in_vpc(vpc_id)
125
- except Exception as e:
126
- print(f"Failed to fetch existing VPC subnets. Aborting. Error: {e}")
127
- raise SystemExit(1) # Exit immediately if we can't get baseline data
128
-
129
- print("\n--- Running Name-Only Subnet Existence Check Mode ---")
130
- # Fallback: check only by name using the existing data
131
- checked_public_subnets = {}
132
- if PUBLIC_SUBNETS_TO_USE:
133
- for subnet_name in PUBLIC_SUBNETS_TO_USE:
134
- print("subnet_name:", subnet_name)
135
- exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
136
- checked_public_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
137
-
138
- # If the subnet exists, remove it from the proposed subnets list
139
- if checked_public_subnets[subnet_name]["exists"] == True:
140
- all_proposed_subnets_data = [
141
- subnet for subnet in all_proposed_subnets_data
142
- if subnet['name'] != subnet_name
143
- ]
144
-
145
- context_data["checked_public_subnets"] = checked_public_subnets
146
-
147
- checked_private_subnets = {}
148
- if PRIVATE_SUBNETS_TO_USE:
149
- for subnet_name in PRIVATE_SUBNETS_TO_USE:
150
- print("subnet_name:", subnet_name)
151
- exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
152
- checked_private_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
153
-
154
- # If the subnet exists, remove it from the proposed subnets list
155
- if checked_private_subnets[subnet_name]["exists"] == True:
156
- all_proposed_subnets_data = [
157
- subnet for subnet in all_proposed_subnets_data
158
- if subnet['name'] != subnet_name
159
- ]
160
-
161
- context_data["checked_private_subnets"] = checked_private_subnets
162
-
163
-
164
-
165
- print("\nName-only existence subnet check complete.\n")
166
-
167
- if full_validation_mode:
168
- print("\n--- Running in Full Subnet Validation Mode (CIDR/AZs provided) ---")
169
- try:
170
- validate_subnet_creation_parameters(vpc_id, all_proposed_subnets_data, existing_aws_subnets)
171
- print("\nPre-synth validation successful. Proceeding with CDK synth.\n")
172
-
173
- # Populate context_data for downstream CDK construct creation
174
- context_data["public_subnets_to_create"] = []
175
  if public_ready_for_full_validation:
176
  for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
177
- context_data["public_subnets_to_create"].append({
178
  'name': name,
179
  'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
180
- 'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i],
181
- 'is_public': True
182
  })
183
- context_data["private_subnets_to_create"] = []
184
  if private_ready_for_full_validation:
185
  for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
186
- context_data["private_subnets_to_create"].append({
187
  'name': name,
188
  'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
189
- 'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i],
190
- 'is_public': False
191
  })
192
 
193
- except (ValueError, Exception) as e:
194
- print(f"\nFATAL ERROR: Subnet parameter validation failed: {e}\n")
195
- raise SystemExit(1) # Exit if validation fails
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
196
 
197
  # Example checks and setting context values
198
  # IAM Roles
 
49
  context_data = {}
50
 
51
  # --- Find the VPC ID first ---
52
+ if VPC_NAME:
53
+ print("VPC_NAME:", VPC_NAME)
54
+ vpc_id, nat_gateways = get_vpc_id_by_name(VPC_NAME)
55
+
56
+ # If you expect only one, or one per AZ and you're creating one per AZ in CDK:
57
+ if nat_gateways:
58
+ # For simplicity, let's just check if *any* NAT exists in the VPC
59
+ # A more robust check would match by subnet, AZ, or a specific tag.
60
+ context_data["exists:NatGateway"] = True
61
+ context_data["id:NatGateway"] = nat_gateways[0]['NatGatewayId'] # Store the ID of the first one found
62
+ else:
63
+ context_data["exists:NatGateway"] = False
64
+ context_data["id:NatGateway"] = None
65
+
66
+ if not vpc_id:
67
+ # If the VPC doesn't exist, you might not be able to check/create subnets.
68
+ # Decide how to handle this: raise an error, set a flag, etc.
69
+ raise RuntimeError(f"Required VPC '{VPC_NAME}' not found. Cannot proceed with subnet checks.")
70
+
71
+ context_data["vpc_id"] = vpc_id # Store VPC ID in context
72
+
73
+ # SUBNET CHECKS
74
+ context_data: Dict[str, Any] = {}
75
+ all_proposed_subnets_data: List[Dict[str, str]] = []
76
+
77
+ # Flag to indicate if full validation mode (with CIDR/AZs) is active
78
+ full_validation_mode = False
79
+
80
+ # Determine if full validation mode is possible/desired
81
+ # It's 'desired' if CIDR/AZs are provided, and their lengths match the name lists.
82
+ public_ready_for_full_validation = (
83
+ len(PUBLIC_SUBNETS_TO_USE) > 0 and
84
+ len(PUBLIC_SUBNET_CIDR_BLOCKS) == len(PUBLIC_SUBNETS_TO_USE) and
85
+ len(PUBLIC_SUBNET_AVAILABILITY_ZONES) == len(PUBLIC_SUBNETS_TO_USE)
86
+ )
87
+ private_ready_for_full_validation = (
88
+ len(PRIVATE_SUBNETS_TO_USE) > 0 and
89
+ len(PRIVATE_SUBNET_CIDR_BLOCKS) == len(PRIVATE_SUBNETS_TO_USE) and
90
+ len(PRIVATE_SUBNET_AVAILABILITY_ZONES) == len(PRIVATE_SUBNETS_TO_USE)
91
+ )
92
+
93
+ # Activate full validation if *any* type of subnet (public or private) has its full details provided.
94
+ # You might adjust this logic if you require ALL subnet types to have CIDRs, or NONE.
95
+ if public_ready_for_full_validation or private_ready_for_full_validation:
96
+ full_validation_mode = True
97
+
98
+ # If some are ready but others aren't, print a warning or raise an error based on your strictness
99
+ if public_ready_for_full_validation and not private_ready_for_full_validation and PRIVATE_SUBNETS_TO_USE:
100
+ print("Warning: Public subnets have CIDRs/AZs, but private subnets do not. Only public will be fully validated/created with CIDRs.")
101
+ if private_ready_for_full_validation and not public_ready_for_full_validation and PUBLIC_SUBNETS_TO_USE:
102
+ print("Warning: Private subnets have CIDRs/AZs, but public subnets do not. Only private will be fully validated/created with CIDRs.")
103
+
104
+ # Prepare data for validate_subnet_creation_parameters for all subnets that have full details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  if public_ready_for_full_validation:
106
  for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
107
+ all_proposed_subnets_data.append({
108
  'name': name,
109
  'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
110
+ 'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i]
 
111
  })
 
112
  if private_ready_for_full_validation:
113
  for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
114
+ all_proposed_subnets_data.append({
115
  'name': name,
116
  'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
117
+ 'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i]
 
118
  })
119
 
120
+
121
+ print(f"Target VPC ID for Boto3 lookup: {vpc_id}")
122
+
123
+ # Fetch all existing subnets in the target VPC once to avoid repeated API calls
124
+ try:
125
+ existing_aws_subnets = _get_existing_subnets_in_vpc(vpc_id)
126
+ except Exception as e:
127
+ print(f"Failed to fetch existing VPC subnets. Aborting. Error: {e}")
128
+ raise SystemExit(1) # Exit immediately if we can't get baseline data
129
+
130
+ print("\n--- Running Name-Only Subnet Existence Check Mode ---")
131
+ # Fallback: check only by name using the existing data
132
+ checked_public_subnets = {}
133
+ if PUBLIC_SUBNETS_TO_USE:
134
+ for subnet_name in PUBLIC_SUBNETS_TO_USE:
135
+ print("subnet_name:", subnet_name)
136
+ exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
137
+ checked_public_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
138
+
139
+ # If the subnet exists, remove it from the proposed subnets list
140
+ if checked_public_subnets[subnet_name]["exists"] == True:
141
+ all_proposed_subnets_data = [
142
+ subnet for subnet in all_proposed_subnets_data
143
+ if subnet['name'] != subnet_name
144
+ ]
145
+
146
+ context_data["checked_public_subnets"] = checked_public_subnets
147
+
148
+ checked_private_subnets = {}
149
+ if PRIVATE_SUBNETS_TO_USE:
150
+ for subnet_name in PRIVATE_SUBNETS_TO_USE:
151
+ print("subnet_name:", subnet_name)
152
+ exists, subnet_id = check_subnet_exists_by_name(subnet_name, existing_aws_subnets)
153
+ checked_private_subnets[subnet_name] = {"exists": exists, "id": subnet_id}
154
+
155
+ # If the subnet exists, remove it from the proposed subnets list
156
+ if checked_private_subnets[subnet_name]["exists"] == True:
157
+ all_proposed_subnets_data = [
158
+ subnet for subnet in all_proposed_subnets_data
159
+ if subnet['name'] != subnet_name
160
+ ]
161
+
162
+ context_data["checked_private_subnets"] = checked_private_subnets
163
+
164
+
165
+
166
+ print("\nName-only existence subnet check complete.\n")
167
+
168
+ if full_validation_mode:
169
+ print("\n--- Running in Full Subnet Validation Mode (CIDR/AZs provided) ---")
170
+ try:
171
+ validate_subnet_creation_parameters(vpc_id, all_proposed_subnets_data, existing_aws_subnets)
172
+ print("\nPre-synth validation successful. Proceeding with CDK synth.\n")
173
+
174
+ # Populate context_data for downstream CDK construct creation
175
+ context_data["public_subnets_to_create"] = []
176
+ if public_ready_for_full_validation:
177
+ for i, name in enumerate(PUBLIC_SUBNETS_TO_USE):
178
+ context_data["public_subnets_to_create"].append({
179
+ 'name': name,
180
+ 'cidr': PUBLIC_SUBNET_CIDR_BLOCKS[i],
181
+ 'az': PUBLIC_SUBNET_AVAILABILITY_ZONES[i],
182
+ 'is_public': True
183
+ })
184
+ context_data["private_subnets_to_create"] = []
185
+ if private_ready_for_full_validation:
186
+ for i, name in enumerate(PRIVATE_SUBNETS_TO_USE):
187
+ context_data["private_subnets_to_create"].append({
188
+ 'name': name,
189
+ 'cidr': PRIVATE_SUBNET_CIDR_BLOCKS[i],
190
+ 'az': PRIVATE_SUBNET_AVAILABILITY_ZONES[i],
191
+ 'is_public': False
192
+ })
193
+
194
+ except (ValueError, Exception) as e:
195
+ print(f"\nFATAL ERROR: Subnet parameter validation failed: {e}\n")
196
+ raise SystemExit(1) # Exit if validation fails
197
 
198
  # Example checks and setting context values
199
  # IAM Roles
cdk/post_cdk_build_quickstart.py CHANGED
@@ -13,10 +13,10 @@ start_codebuild_build(PROJECT_NAME=CODEBUILD_PROJECT_NAME)
13
  # Upload config.env file to S3 bucket
14
  upload_file_to_s3(local_file_paths="config/config.env", s3_key="", s3_bucket=S3_LOG_CONFIG_BUCKET_NAME)
15
 
16
- total_seconds = 480 # 8 minutes * 60 seconds/minute
17
  update_interval = 1 # Update every second
18
 
19
- print("Waiting eight minutes for the CodeBuild container to build.")
20
 
21
  # tqdm iterates over a range, and you perform a small sleep in each iteration
22
  for i in tqdm(range(total_seconds), desc="Building container"):
 
13
  # Upload config.env file to S3 bucket
14
  upload_file_to_s3(local_file_paths="config/config.env", s3_key="", s3_bucket=S3_LOG_CONFIG_BUCKET_NAME)
15
 
16
+ total_seconds = 450 # 7.5 minutes
17
  update_interval = 1 # Update every second
18
 
19
+ print("Waiting 7.5 minutes for the CodeBuild container to build.")
20
 
21
  # tqdm iterates over a range, and you perform a small sleep in each iteration
22
  for i in tqdm(range(total_seconds), desc="Building container"):
cdk/requirements.txt CHANGED
@@ -1,5 +1,5 @@
1
- aws-cdk-lib==2.200.2
2
- boto3==1.38.35
3
- pandas==2.2.3
4
  nodejs==0.1.1
5
  python-dotenv==1.0.1
 
1
+ aws-cdk-lib==2.202.0
2
+ boto3==1.38.41
3
+ pandas==2.3.0
4
  nodejs==0.1.1
5
  python-dotenv==1.0.1
index.qmd CHANGED
@@ -2,7 +2,7 @@
2
  title: "Home"
3
  ---
4
 
5
- version: 0.7.0
6
 
7
  Welcome to the Document Redaction App documentation. This site provides comprehensive documentation for the Document Redaction App.
8
 
 
2
  title: "Home"
3
  ---
4
 
5
+ version: 0.7.1
6
 
7
  Welcome to the Document Redaction App documentation. This site provides comprehensive documentation for the Document Redaction App.
8
 
pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
 
5
  [project]
6
  name = "doc_redaction"
7
- version = "0.7.0"
8
  description = "Redact PDF/image-based documents, or CSV/XLSX files using a Gradio-based GUI interface"
9
  readme = "README.md"
10
  requires-python = ">=3.10"
@@ -18,13 +18,13 @@ dependencies = [
18
  "presidio_anonymizer==2.2.358",
19
  "presidio-image-redactor==0.0.56",
20
  "pikepdf==9.5.2",
21
- "pandas==2.2.3",
22
  "scikit-learn==1.6.1",
23
- "spacy==3.8.4",
24
  # Direct URL dependency for spacy model
25
  "en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz",
26
- "gradio==5.34.0",
27
- "boto3==1.38.35",
28
  "pyarrow==19.0.1",
29
  "openpyxl==3.1.5",
30
  "Faker==36.1.1",
 
4
 
5
  [project]
6
  name = "doc_redaction"
7
+ version = "0.7.1"
8
  description = "Redact PDF/image-based documents, or CSV/XLSX files using a Gradio-based GUI interface"
9
  readme = "README.md"
10
  requires-python = ">=3.10"
 
18
  "presidio_anonymizer==2.2.358",
19
  "presidio-image-redactor==0.0.56",
20
  "pikepdf==9.5.2",
21
+ "pandas==2.3.0",
22
  "scikit-learn==1.6.1",
23
+ "spacy==3.8.7",
24
  # Direct URL dependency for spacy model
25
  "en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz",
26
+ "gradio==5.34.2",
27
+ "boto3==1.38.46",
28
  "pyarrow==19.0.1",
29
  "openpyxl==3.1.5",
30
  "Faker==36.1.1",
requirements.txt CHANGED
@@ -6,12 +6,12 @@ presidio_analyzer==2.2.358
6
  presidio_anonymizer==2.2.358
7
  presidio-image-redactor==0.0.56
8
  pikepdf==9.5.2
9
- pandas==2.2.3
10
  scikit-learn==1.6.1
11
- spacy==3.8.4
12
  en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz
13
- gradio==5.34.0
14
- boto3==1.38.35
15
  pyarrow==19.0.1
16
  openpyxl==3.1.5
17
  Faker==36.1.1
 
6
  presidio_anonymizer==2.2.358
7
  presidio-image-redactor==0.0.56
8
  pikepdf==9.5.2
9
+ pandas==2.3.0
10
  scikit-learn==1.6.1
11
+ spacy==3.8.7
12
  en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz
13
+ gradio==5.34.2
14
+ boto3==1.38.46
15
  pyarrow==19.0.1
16
  openpyxl==3.1.5
17
  Faker==36.1.1
src/installation_guide.qmd CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: "App installation guide (with CDK)"
3
  format:
4
  html:
5
  toc: true # Enable the table of contents
@@ -7,17 +7,17 @@ format:
7
  toc-title: "On this page" # Optional: Title for your TOC
8
  ---
9
 
10
- # Introduction
11
 
12
- This guide gives an overview of how to install the app in an AWS environment using the code in the cdk/ folder of this Github repo. The most important thing you need is some familiarity with AWS and how to use it via console or command line, as well as administrator access to at least one region. Then follow the below steps.
13
 
14
  ## Prerequisites
15
 
 
16
  * Install git on your computer from: [https://git-scm.com](https://git-scm.com)
17
- * You will also need to install nodejs and npm: [https://docs.npmjs.com/downloading-and-installing-node-js-and-npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm)
18
- * You will need an AWS Administrator account in your desired region to install.
19
- * You will need AWS CDK v2 installed: [https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
20
- * You will need to bootstrap the environment with CDK in both your primary region, and `us-east-1` if installing CloudFront and associated WAF.
21
  ```bash
22
  # Bootstrap your primary region
23
  cdk bootstrap aws://<YOUR_AWS_ACCOUNT>/eu-west-1
@@ -30,27 +30,23 @@ This guide gives an overview of how to install the app in an AWS environment usi
30
  git clone https://github.com/seanpedrick-case/doc_redaction.git
31
  ```
32
 
33
- # VPC ACM Certificate
34
 
35
- This CDK code is designed to work within an existing VPC. The code does not create a new VPC if it doesn't exist. So you will need to do that yourself.
36
 
37
- Additionally, to get full HTTPS data transfer through the app, you will need an SSL certificate registered with AWS Certificate Manager.
38
-
39
- You can either use the SSL certificate from a domain, or import an existing certificate into Certificate Manager. Ask your IT admin if you need help with this.
40
-
41
- ## If getting an SSL certificate for an existing domain
42
-
43
- Make sure to point the certificate to `*.<domain-name>`.
44
 
45
  Update your DNS records to include the CNAME record given by AWS. After your stack has been created, you will also need to create a CNAME DNS record for your domain pointing to your load balancer DNS with a subdomain, e.g., `redaction.<domain-name>`.
46
 
47
- 1. Create a python environment, load in packages from `requirements.txt`.
 
 
48
 
49
- Need a `cdk.json` in the `cdk` folder. It should contain the following:
50
 
51
  ```json
52
  {
53
- "app": "<PATH TO PYTHON ENVIRONMENT FOLDER WHERE REQUIREMENTS HAVE BEEN LOADED>python.exe app.py",
54
  "context": {
55
  "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
56
  "@aws-cdk/core:stackRelativeExports": true,
@@ -71,99 +67,129 @@ Update your DNS records to include the CNAME record given by AWS. After your sta
71
  }
72
  ```
73
 
74
- 2. Create a `cdk_config.env` file in the `config` subfolder. Here as a minimum it would be useful to put the following details in the env file (below are example values, other possible variables to use here can be seen in the `cdk` folder/`cdk_config.py`).
 
 
 
 
 
 
75
 
76
  ```ini
77
- CDK_PREFIX=example-prefix # Prefix to most created elements in your stack
78
- VPC_NAME=example-vpc-name # Name of the VPC within which all the other elements will be created
79
- AWS_REGION=us-west-1 # Region where elements will be created
80
  AWS_ACCOUNT_ID=1234567890 # AWS account ID that has administrator access that you will use for deploying the stack
81
  CDK_FOLDER=C:/path_to_cdk_folder/ # The place where the cdk folder code is located
82
  CONTEXT_FILE=C:/path_to_cdk_folder/cdk.context.json
83
- EXISTING_IGW_ID=igw-1234567890 # (optional) The ID for an existing internet gateway that you want to use instead of creating a new one
84
- SINGLE_NAT_GATEWAY_ID=nat-123456789 # (optional) The ID for an existing NAT gateway that you want to use instead of creating a new one
85
- COGNITO_USER_POOL_DOMAIN_PREFIX=lambeth-redaction-37924 # The prefix of the login / user sign up domain that you want to use with Cognito login. Should not contain the terms amazon, aws, or cognito.
 
86
  RUN_USEAST_STACK=False # Set this to True only if you have permissions to create a Cloudfront distribution and web ACL on top of it in the us-east-1 region. If you don't, the section below shows how you can create the CloudFront resource manually and map it to your application load balancer (as you should have permissions for that if you are admin in your region).
 
 
 
 
 
87
  ```
88
 
89
- **Note: If you are using an SSL certificate with Cognito login on the application load balancer, you can set COGNITO_AUTH to 0 above, as you don't need the second login step to get to the app**
90
 
91
- # Subnets
92
 
93
- ### NOTE: I would generally advise creating new subnets as then you will be sure about connectivity between AWS resources that underpin your app.
94
 
95
- * If you set no subnets, the app will try to use existing private and public subnets. This approach is risky as the app may overlap with IP addresses assigned to existing AWS resources. It is advised to at least specify existing subnets that you know are available, or create your own using one of the below methods.
 
 
 
 
 
 
 
 
 
96
 
97
  * If you want to use existing subnets, you can list them in the following environment variables:
98
- * `PUBLIC_SUBNETS_TO_USE=["PublicSubnet1", "PublicSubnet2", "PublicSubnet3"]`
99
- * `PRIVATE_SUBNETS_TO_USE=["PrivateSubnet1", "PrivateSubnet2", "PrivateSubnet3"]`
 
 
100
 
101
  * If you want to create new subnets, you need to also specify CIDR blocks and availability zones for the new subnets. The app will check with you upon deployment whether these CIDR blocks are available before trying to create.
102
- * `PUBLIC_SUBNET_CIDR_BLOCKS=['10.222.333.0/28', '10.222.333.16/28', '10.222.333.32/28']`
103
- * `PUBLIC_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']`
104
- * `PRIVATE_SUBNET_CIDR_BLOCKS=['10.222.333.48/28', '10.222.333.64/28', '10.222.333.80/28']`
105
- * `PRIVATE_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']`
 
 
 
106
 
107
  If you try to create subnets in invalid CIDR blocks / availability zones, the console output will tell you and it will show you the currently occupied CIDR blocks to help find a space for new subnets you want to create.
108
 
109
- 3. In command line in console, go to your `cdk` folder in the redaction app folder. Run `cdk deploy --all`. This should try to deploy the first stack in the `app.py` file.
 
 
 
 
 
 
110
 
111
- Hopefully everything will deploy successfully and you will be able to see your new stack in CloudFormation in the AWS console.
112
 
113
- 4. Tasks for after CDK deployment
114
 
115
- # Tasks performed by `post_cdk_build_quickstart.py`
116
 
117
- **Note:** The following tasks are done by the `post_cdk_build_quickstart.py` file that you can find in the `cdk` folder. You will need to run this when logged in with AWS SSO through command line. I will describe how to do this in AWS console just in case the `.py` file doesn't work for you.
118
 
119
- ## Codebuild
120
 
121
- Need to build CodeBuild project after stack has finished building, as there will be no container in ECR.
122
 
123
- Go to CodeBuild -> your project -> click Start build. Check the logs, the build should be progressing.
124
 
125
- ## Create a `config.env` file and upload to S3
126
 
127
- The Fargate task definition references a `config.env` file.
128
 
129
- Need to create a `config.env` file to upload to the S3 bucket that has the variables:
130
 
131
  ```ini
132
- COGNITO_AUTH=1
133
- RUN_AWS_FUNCTIONS=1
134
- SESSION_OUTPUT_FOLDER=True # If this is False it currently seems to fail to allow for writable log directories
135
  ```
136
 
137
- Go to S3 and choose the new `...-logs` bucket that you created. Upload the `config.env` file into this bucket.
138
 
139
- ## Update Elastic Container Service
140
 
141
  Now that the app container is in Elastic Container Registry, you can proceed to run the app on a Fargate server.
142
-
143
- Go to your new cluster, your new service, and select 'Update service'.
144
 
145
  Select 'Force new deployment', and then set 'Desired number of tasks' to 1.
146
 
147
- # Additional Manual Tasks
148
-
149
- # Update DNS records for your domain (If using a domain for the SSL certificate)
150
-
151
- To do this, you need to create a CNAME DNS record for your domain pointing to your load balancer DNS from a subdomain of your main domain registration, e.g., `redaction.<domain-name>`.
152
 
153
- # Cognito
154
 
155
- Go to Cognito and create a user with your own email address. Generate a password.
156
 
157
- Go to Cognito -> App clients -> Login pages -> View login page.
158
 
159
- Enter the email and temporary password details that come in the email (don't include the last full stop!).
160
 
161
- Change your password.
 
 
 
162
 
163
- ## Set MFA (optional)
164
- On the Cognito user pool page you can also enable MFA, if you are using an SSL certificate with Cognito login on the Application Load Balancer. Go to Cognito -> your user pool -> Sign in -> Multi-factor authentication
165
 
166
- # Create CloudFront distribution
167
  **Note: this is only relevant if you set `RUN_USEAST_STACK` to 'False' during CDK deployment**
168
 
169
  If you were not able to create a CloudFront distribution via CDK, you should be able to do it through console. I would advise using CloudFront as the front end to the app.
@@ -186,25 +212,19 @@ Create a new CloudFront distribution.
186
  * **For Behavior (modify default behavior):**
187
  * Under Viewer protocol policy choose 'HTTP and HTTPS'.
188
 
189
- ## Security features
190
 
191
- In your CloudFront distribution, under 'Security' -> Edit -> Enable security protections.
192
 
193
- Choose rate limiting (default is fine).
 
 
 
 
 
 
194
 
195
- Create.
196
-
197
- In CloudFront geographic restrictions -> Countries -> choose an Allow list of countries.
198
-
199
- Click again on Edit.
200
-
201
- AWS WAF protection enabled you should see a link titled 'View details of your configuration'.
202
-
203
- Go to Rules -> `AWS-AWSManagedRulesCommonRuleSet`, click Edit.
204
-
205
- Under `SizeRestrictions_BODY` choose rule action override 'Override to Allow'. This is needed to allow for file upload to the app.
206
-
207
- # Change Cognito redirection URL to your CloudFront distribution
208
 
209
  Go to Cognito -> your user pool -> App Clients -> Login pages -> Managed login configuration.
210
 
@@ -215,7 +235,7 @@ Ensure that the callback URL is:
215
  * `https://<CloudFront domain name>/oauth2/idpresponse`
216
  * `https://<CloudFront domain name>/oauth/idpresponse`
217
 
218
- # Force traffic to come from specific CloudFront distribution (optional)
219
 
220
  Note that this only potentially helps with security if you are not using an SSL certificate with Cognito login on your application load balancer.
221
 
@@ -231,3 +251,5 @@ Go to EC2 - Load Balancers -> Your load balancer -> Listeners -> Your listener -
231
  Then, change the default listener rule.
232
 
233
  * Under Routing action change to 'Return fixed response'.
 
 
 
1
  ---
2
+ title: "App installation guide (with CDK or locally on Windows)"
3
  format:
4
  html:
5
  toc: true # Enable the table of contents
 
7
  toc-title: "On this page" # Optional: Title for your TOC
8
  ---
9
 
10
+ # Installation with CDK
11
 
12
+ This guide gives an overview of how to install the app in an AWS environment using the code in the 'cdk/' folder of this Github repo. The most important thing you need is some familiarity with AWS and how to use it via console or command line, as well as administrator access to at least one region. Then follow the below steps.
13
 
14
  ## Prerequisites
15
 
16
+ * Ensure you have an AWS Administrator account in your desired region to be able to deploy all the resources mentioned in cdk_stack.py.
17
  * Install git on your computer from: [https://git-scm.com](https://git-scm.com)
18
+ * Install nodejs and npm: [https://docs.npmjs.com/downloading-and-installing-node-js-and-npm](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm). If using Windows, it may be easiest to install from the .msi installer at the bottom of the page [here](https://nodejs.org/en/download/).
19
+ * Install AWS CDK v2: [https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
20
+ * Bootstrap the environment with CDK in both your primary region, and `us-east-1` if installing CloudFront and associated WAF.
 
21
  ```bash
22
  # Bootstrap your primary region
23
  cdk bootstrap aws://<YOUR_AWS_ACCOUNT>/eu-west-1
 
30
  git clone https://github.com/seanpedrick-case/doc_redaction.git
31
  ```
32
 
33
+ ## Note on ACM Certificates
34
 
35
+ To get full HTTPS data transfer through the app, you will need an SSL certificate registered with AWS Certificate Manager.
36
 
37
+ You can either use the SSL certificate from a domain, or import an existing certificate into Certificate Manager. If you're not sure, ask your IT admin if you need help with this. If getting an SSL certificate for an existing domain, make sure to point the certificate to `*.<domain-name>`.
 
 
 
 
 
 
38
 
39
  Update your DNS records to include the CNAME record given by AWS. After your stack has been created, you will also need to create a CNAME DNS record for your domain pointing to your load balancer DNS with a subdomain, e.g., `redaction.<domain-name>`.
40
 
41
+ ## Steps to install the app using CDK
42
+
43
+ ### 1. Create a python environment, load in packages from `requirements.txt`.
44
 
45
+ You need a `cdk.json` in the `cdk` folder. It should contain the following:
46
 
47
  ```json
48
  {
49
+ "app": "<PATH TO PYTHON ENVIRONMENT FOLDER WHERE REQUIREMENTS HAVE BEEN LOADED>/python.exe app.py",
50
  "context": {
51
  "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
52
  "@aws-cdk/core:stackRelativeExports": true,
 
67
  }
68
  ```
69
 
70
+ ### 2. Create a `cdk_config.env` file in the `config` subfolder.
71
+
72
+ Depending on which environment variables you put in this file, you can choose whether to install the app in a completely new VPC, or in an existing VPC. The following shows you example config files that you could use.
73
+
74
+ #### Deploying the app an a brand new VPC
75
+
76
+ Here as a minimum it would be useful to put the following details in the cdk_config.env file (below are all example values, other possible variables to use here can be seen in the `cdk` folder/`cdk_config.py`).
77
 
78
  ```ini
79
+ CDK_PREFIX=example-prefix # This prefix will be added to the name of most of the created elements in your stack
80
+ NEW_VPC_CIDR=10.0.0.0/24 # The CIDR range for your newly created VPC
81
+ AWS_REGION=<your-region> # Region where elements will be created
82
  AWS_ACCOUNT_ID=1234567890 # AWS account ID that has administrator access that you will use for deploying the stack
83
  CDK_FOLDER=C:/path_to_cdk_folder/ # The place where the cdk folder code is located
84
  CONTEXT_FILE=C:/path_to_cdk_folder/cdk.context.json
85
+
86
+ COGNITO_USER_POOL_DOMAIN_PREFIX=redaction-12345 # The prefix of the login / user sign up domain that you want to use with Cognito login. Should not contain the terms amazon, aws, or cognito.
87
+ COGNITO_AUTH=1 # Do you want to do in-app authentication (username and password only, not necessary if you are using an SSL certificate as recommended below)
88
+ USE_CLOUDFRONT=True # Recommended. If you intend to use CloudFront as the front URL to your application load balancer (ALB). This has some extra security features that you won't get with just an ALB, e.g. limiting app access by country.
89
  RUN_USEAST_STACK=False # Set this to True only if you have permissions to create a Cloudfront distribution and web ACL on top of it in the us-east-1 region. If you don't, the section below shows how you can create the CloudFront resource manually and map it to your application load balancer (as you should have permissions for that if you are admin in your region).
90
+ CLOUDFRONT_DOMAIN=<example>.cloudfront.net # If you already know the domain of the CloudFront distribution that you want to use, you can add this here.
91
+ # If you are using an SSL certificate with your ALB (highly recommended):
92
+ ACM_SSL_CERTIFICATE_ARN=<SSL Certificate ARN> # This is the ARN of the SSL certificate that you have installed in AWS Certificate Manager
93
+ SSL_CERTIFICATE_DOMAIN=redaction.example.com # This is the domain of the SSL certificate that you have installed in AWS Certificate Manager
94
+
95
  ```
96
 
97
+ **Note: If you are using an SSL certificate with Cognito login on the application load balancer (strongly advised), you can set COGNITO_AUTH to 0 above, as you don't need the second login step to get to the app**
98
 
99
+ #### In an existing VPC
100
 
101
+ From the above example, remove the variable 'NEW_VPC_CIDR' and replace with the below:
102
 
103
+ ```ini
104
+ VPC_NAME=example-vpc-name # Name of the VPC within which all the other elements will be created
105
+ EXISTING_IGW_ID=igw-1234567890 # (optional) The ID for an existing internet gateway that you want to use instead of creating a new one
106
+ SINGLE_NAT_GATEWAY_ID=nat-123456789 # (optional) The ID for an existing NAT gateway that you want to use instead of creating a new one
107
+ ```
108
+ ##### Subnets
109
+
110
+ If you are using an existing VPC then you may want to deploy the app within existing subnets rather than creating new ones:
111
+
112
+ * If you define no subnets in environment variables, the app will try to use existing private and public subnets. Bear in mind the app may overlap with IP addresses assigned to existing AWS resources. It is advised to at least specify existing subnets that you know are available, or create your own using one of the below methods.
113
 
114
  * If you want to use existing subnets, you can list them in the following environment variables:
115
+ ```ini
116
+ PUBLIC_SUBNETS_TO_USE=["PublicSubnet1", "PublicSubnet2", "PublicSubnet3"]`
117
+ PRIVATE_SUBNETS_TO_USE=["PrivateSubnet1", "PrivateSubnet2", "PrivateSubnet3"]`
118
+ ```
119
 
120
  * If you want to create new subnets, you need to also specify CIDR blocks and availability zones for the new subnets. The app will check with you upon deployment whether these CIDR blocks are available before trying to create.
121
+
122
+ ```ini
123
+ PUBLIC_SUBNET_CIDR_BLOCKS=['10.222.333.0/28', '10.222.333.16/28', '10.222.333.32/28']
124
+ PUBLIC_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']
125
+ PRIVATE_SUBNET_CIDR_BLOCKS=['10.222.333.48/28', '10.222.333.64/28', '10.222.333.80/28']
126
+ PRIVATE_SUBNET_AVAILABILITY_ZONES=['eu-east-1a', 'eu-east-1b', 'eu-east-1c']
127
+ ```
128
 
129
  If you try to create subnets in invalid CIDR blocks / availability zones, the console output will tell you and it will show you the currently occupied CIDR blocks to help find a space for new subnets you want to create.
130
 
131
+ ### 3. Deploy your AWS stack using cdk deploy --all
132
+
133
+ In command line in console, go to your `cdk` folder in the redaction app folder. Run `cdk deploy --all`. This should try to deploy the first stack in the `app.py` file.
134
+
135
+ Hopefully everything will deploy successfully and you will be able to see your new stack in CloudFormation in the AWS console.
136
+
137
+ ### 4. Tasks for after CDK deployment
138
 
139
+ The CDK deployment will create all the AWS resources needed to run the redaction app. However, there are some objects in AWS
140
 
141
+ #### Run `post_cdk_build_quickstart.py`
142
 
143
+ The following tasks are done by the `post_cdk_build_quickstart.py` file that you can find in the `cdk` folder. You will need to run this when logged in with AWS SSO through command line. I will describe how to do this in AWS console just in case the `.py` file doesn't work for you.
144
 
145
+ ##### Codebuild
146
 
147
+ You need to build CodeBuild project after stack has finished deploying your CDK stack, as there will be no container in ECR.
148
 
149
+ If you don't want to run the 'post_cdk_build_quickstart.py' file, in console, go to CodeBuild -> your project -> click Start build. Check the logs, the build should complete in about 6-7 minutes.
150
 
151
+ ##### Create a `config.env` file and upload to S3
152
 
153
+ The 'post_cdk_build_quickstart' file will upload a config file to S3, as the Fargate task definition references a `config.env` file.
154
 
155
+ if you want to do this manually:
156
 
157
+ Create a `config.env` file to upload to the S3 bucket that has at least the following variables:
158
 
159
  ```ini
160
+ COGNITO_AUTH=1 # If you are using an SSL certificate with your application load balancer, you will be logging in there. Set this to 0 to turn off the default login screen.
161
+ RUN_AWS_FUNCTIONS=1 # This will enable the app to communicate with AWS services.
162
+ SESSION_OUTPUT_FOLDER=True # This will put outputs for each user in separate output folders.
163
  ```
164
 
165
+ * Then, go to S3 and choose the new `...-logs` bucket that you created. Upload the `config.env` file into this bucket.
166
 
167
+ ##### Update Elastic Container Service
168
 
169
  Now that the app container is in Elastic Container Registry, you can proceed to run the app on a Fargate server.
170
+ The 'post_cdk_build_quickstart.py' file will do this for you, but you can also try this in Console. In ECS, go to your new cluster, your new service, and select 'Update service'.
 
171
 
172
  Select 'Force new deployment', and then set 'Desired number of tasks' to 1.
173
 
174
+ ## Additional Manual Tasks
 
 
 
 
175
 
176
+ ### Update DNS records for your domain (If using a domain for the SSL certificate)
177
 
178
+ If the SSL certificate you are using is associated with a domain, you will need to update the DNS records for your domain registered with the AWS SSL certificate. To do this, you need to create a CNAME DNS record for your domain pointing to your load balancer DNS from a subdomain of your main domain registration, e.g., `redaction.<domain-name>`.
179
 
180
+ ### Create a user in Cognito
181
 
182
+ You will next need to a create a user in Cognito to be able to log into the app.
183
 
184
+ * Go to Cognito and create a user with your own email address. Generate a password.
185
+ * Go to Cognito -> App clients -> Login pages -> View login page.
186
+ * Enter the email and temporary password details that come in the email (don't include the last full stop!).
187
+ * Change your password on the screen that pops up. You should now be able to login to the app.
188
 
189
+ ### Set Multi-Factor Authentication for Cognito logins(optional but recommended)
190
+ On the Cognito user pool page you can also enable MFA, if you are using an SSL certificate with Cognito login on the Application Load Balancer. Go to Cognito -> your user pool -> Sign in -> Multi-factor authentication.
191
 
192
+ ### Create CloudFront distribution
193
  **Note: this is only relevant if you set `RUN_USEAST_STACK` to 'False' during CDK deployment**
194
 
195
  If you were not able to create a CloudFront distribution via CDK, you should be able to do it through console. I would advise using CloudFront as the front end to the app.
 
212
  * **For Behavior (modify default behavior):**
213
  * Under Viewer protocol policy choose 'HTTP and HTTPS'.
214
 
215
+ #### Security features
216
 
217
+ You can add security features to your CloudFront distribution (recommended). If you use WAF, you will also need to change the default settings to allow for file upload to the app.
218
 
219
+ * In your CloudFront distribution, under 'Security' -> Edit -> Enable security protections.
220
+ * Choose rate limiting (default is fine). Then click Create.
221
+ * In CloudFront geographic restrictions -> Countries -> choose an Allow list of countries.
222
+ * Click again on Edit.
223
+ * In AWS WAF protection enabled you should see a link titled 'View details of your configuration'.
224
+ * Go to Rules -> `AWS-AWSManagedRulesCommonRuleSet`, click Edit.
225
+ * Under `SizeRestrictions_BODY` choose rule action override 'Override to Allow'. This is needed to allow for file upload to the app.
226
 
227
+ ### Change Cognito redirection URL to your CloudFront distribution
 
 
 
 
 
 
 
 
 
 
 
 
228
 
229
  Go to Cognito -> your user pool -> App Clients -> Login pages -> Managed login configuration.
230
 
 
235
  * `https://<CloudFront domain name>/oauth2/idpresponse`
236
  * `https://<CloudFront domain name>/oauth/idpresponse`
237
 
238
+ ### Force traffic to come from specific CloudFront distribution (optional)
239
 
240
  Note that this only potentially helps with security if you are not using an SSL certificate with Cognito login on your application load balancer.
241
 
 
251
  Then, change the default listener rule.
252
 
253
  * Under Routing action change to 'Return fixed response'.
254
+
255
+ You should now have successfully installed the document redaction app in an AWS environment using CDK.