Nechba commited on
Commit
72eb62e
·
verified ·
1 Parent(s): 8922a70

Update utils.py

Browse files
Files changed (1) hide show
  1. utils.py +21 -18
utils.py CHANGED
@@ -44,24 +44,27 @@ def process_local_pdf(pdf_bytes: bytes):
44
  """
45
  # Configure Gemini
46
  prompt ="""Please analyze the provided images of the real estate document set and perform the following actions:
47
-
48
- 1. *Identify Parties:* Determine and list Seller 1, Seller 2 (if applicable), Buyer 1, and Buyer 2.
49
- 2. *Identify Missing Items:* Locate and list all instances of missing signatures and missing initials for all parties across all documents.
50
- 3. *Identify Checked Boxes:* Locate and list all checkboxes that have been marked or checked.
51
- 4. *Generate Secondary Questions:* For checkboxes that indicate significant waivers (e.g., home warranty, inspection rights, lead paint assessment), specific conditions (e.g., cash sale, contingency status), potential conflicts, or reference other documents, formulate a relevant 'Secondary Question' designed to prompt confirmation or clarification from the user/parties involved.
52
- 5. *Check for Required Paperwork:* Based only on the checkboxes identified in step 3 that explicitly state or strongly imply a specific addendum or disclosure document should be attached (e.g., "Lead Based Paint Disclosure Addendum attached", "See Counter Offer Addendum", "Seller's Disclosure...Addendum attached", "Retainer Addendum attached", etc.), check if a document matching that description appears to be present within the provided image set. Note whether this implied paperwork is 'Found', 'Missing', or 'Potentially Missing/Ambiguous' within the provided images.
53
- 6. *Identify Conflicts:* Specifically look for and note any directly contradictory information or conflicting checked boxes (like the conflicting inspection clauses found previously).
54
- 7. *Provide Location:* For every identified item (missing signature/initial, checked box, required paperwork status, party identification, conflict), specify the approximate line number(s) or clear location on the page (e.g., Bottom Right Initials, Seller Signature Block).
55
- 8. *Format Output:* Present all findings comprehensively in CSV format. The CSV columns should be:
56
- * Category (e.g., Parties, Missing Item, Checked Box, Required Paperwork, Conflict)
57
- * Image number (just make this number {})
58
- * Item Type (e.g., Seller Initials, Home Warranty Waiver, Lead Paint Addendum Check, Lead Paint Addendum Document)
59
- * Status (e.g., Identified, Missing, Checked, Found, Potentially Missing, Conflict Detected)
60
- * Details (Specifics like names, text of the checkbox, description of the issue or document status)
61
- * Secondary Question (if applicable) (The question generated in step 4)
62
-
63
- Please apply this analysis to the entire set of documents provided.
64
- """
 
 
 
65
 
66
  # Convert to images
67
  images = pdf_to_images(pdf_bytes)
 
44
  """
45
  # Configure Gemini
46
  prompt ="""Please analyze the provided images of the real estate document set and perform the following actions:
47
+
48
+ 1. Identify Parties: Determine and list all present parties involved in the transaction. Include Seller 1, Seller 2 (only if mentioned), Buyer 1, and Buyer 2 (only if mentioned). Omit any party that is not clearly identified in the documents.
49
+
50
+ 2. Identify Missing Items: Locate and list all instances of missing signatures and missing initials for each identified party across all documents.
51
+
52
+ 3. Identify Checked Boxes: Locate and list all checkboxes that have been marked or checked.
53
+
54
+ 4. Generate Secondary Questions: For checkboxes that indicate significant waivers (e.g., home warranty, inspection rights, lead paint assessment), specific conditions (e.g., cash sale, contingency status), potential conflicts, or reference other documents, formulate a relevant 'Secondary Question' designed to prompt confirmation or clarification from the user/parties involved.
55
+
56
+ 5. Check for Required Paperwork: Based only on the checkboxes identified in step 3 that explicitly state or strongly imply a specific addendum or disclosure document should be attached (e.g., "Lead Based Paint Disclosure Addendum attached", "See Counter Offer Addendum", "Seller's Disclosure...Addendum attached", "Retainer Addendum attached", etc.), check if a document matching that description appears to be present within the provided image set. Note whether this implied paperwork is 'Found', 'Missing', or 'Potentially Missing/Ambiguous'.
57
+
58
+ 6. Identify Conflicts: Specifically look for and note any directly contradictory information or conflicting checked boxes (like the conflicting inspection clauses found previously).
59
+
60
+ 7. Provide Location: For every identified item (missing signature/initial, checked box, required paperwork status, party identification, conflict), specify the approximate line number(s) or clear location on the page (e.g., Bottom Right Initials, Seller Signature Block).
61
+
62
+ 8. Format Output: Present all findings comprehensively in CSV format. The CSV columns should be:
63
+ * Category (e.g., Parties, Missing Item, Checked Box, Required Paperwork, Conflict)
64
+ * Image number (just make this number {})
65
+ * Item Type (e.g., Seller Initials, Home Warranty Waiver, Lead Paint Addendum Check, Lead Paint Addendum Document)
66
+ * Status (e.g., Identified, Missing, Checked, Found, Potentially Missing, Conflict)
67
+ """
68
 
69
  # Convert to images
70
  images = pdf_to_images(pdf_bytes)