Sompote commited on
Commit
2c200f8
Β·
verified Β·
1 Parent(s): 8a11eb7

Upload 17 files

Browse files
Dockerfile CHANGED
@@ -1,21 +1,31 @@
1
- FROM python:3.9-slim
2
-
3
- WORKDIR /app
4
 
 
5
  RUN apt-get update && apt-get install -y \
6
- build-essential \
7
- curl \
8
- software-properties-common \
9
- git \
10
  && rm -rf /var/lib/apt/lists/*
11
 
12
- COPY requirements.txt ./
13
- COPY src/ ./src/
 
 
 
 
 
 
 
14
 
15
- RUN pip3 install -r requirements.txt
 
16
 
17
- EXPOSE 8501
 
18
 
19
- HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
 
20
 
21
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
1
+ FROM python:3.11-slim
 
 
2
 
3
+ # Install system dependencies
4
  RUN apt-get update && apt-get install -y \
5
+ poppler-utils \
6
+ tesseract-ocr \
7
+ libgl1-mesa-glx \
8
+ libglib2.0-0 \
9
  && rm -rf /var/lib/apt/lists/*
10
 
11
+ # Set working directory
12
+ WORKDIR /app
13
+
14
+ # Copy requirements first for better caching
15
+ COPY requirements_hf.txt .
16
+ RUN pip install --no-cache-dir -r requirements_hf.txt
17
+
18
+ # Copy application files
19
+ COPY . .
20
 
21
+ # Create environment template
22
+ RUN if [ ! -f .env ]; then cp .env_template .env; fi
23
 
24
+ # Expose Streamlit port
25
+ EXPOSE 7860
26
 
27
+ # Health check
28
+ HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
29
 
30
+ # Run the application
31
+ CMD ["streamlit", "run", "app_hf.py", "--server.port=7860", "--server.address=0.0.0.0"]
README.md CHANGED
@@ -1,19 +1,98 @@
1
- ---
2
- title: Soil Profile
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: soil profile analysis
12
- ---
13
-
14
- # Welcome to Streamlit!
15
-
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
-
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ—οΈ Soil Boring Log Analyzer
2
+
3
+ An AI-powered application for analyzing soil boring logs using multiple LLM providers. Upload PDF or image files of soil boring logs to automatically extract and analyze soil layers with professional geotechnical insights.
4
+
5
+ ## ✨ Features
6
+
7
+ - **Multi-LLM Support**: Choose from OpenRouter, Anthropic Claude, or Google Gemini
8
+ - **Document Processing**: Upload PDF or image files of soil boring logs
9
+ - **AI Analysis**: Three analysis methods (CrewAI, LangGraph, Unified Workflow)
10
+ - **Soil Classification**: Automatic soil type classification with strength parameters
11
+ - **Interactive Visualizations**: Professional soil profile charts with units
12
+ - **Layer Processing**: Smart layer merging and splitting capabilities
13
+
14
+ ## πŸš€ Quick Start
15
+
16
+ 1. **Configure LLM Provider**: Add your API key for at least one provider:
17
+ - **OpenRouter**: Get key from [openrouter.ai/keys](https://openrouter.ai/keys)
18
+ - **Anthropic**: Get key from [console.anthropic.com](https://console.anthropic.com/)
19
+ - **Google AI Studio**: Get key from [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
20
+
21
+ 2. **Upload Document**: Choose a soil boring log file (PDF, PNG, JPG)
22
+
23
+ 3. **Select Analysis Method**:
24
+ - **CrewAI**: Two-agent system with quality control
25
+ - **LangGraph**: Single agent workflow
26
+ - **Unified Workflow**: Streamlined processing
27
+
28
+ 4. **Configure Options**: Set layer merging and splitting preferences
29
+
30
+ 5. **Analyze**: Get detailed soil analysis with interactive visualizations
31
+
32
+ ## πŸ”§ Supported File Formats
33
+
34
+ - **PDF**: Soil boring log reports
35
+ - **Images**: PNG, JPG, JPEG of soil boring logs
36
+
37
+ ## πŸ€– LLM Providers
38
+
39
+ ### OpenRouter
40
+ - Access to multiple models through one API
41
+ - Recommended: Claude-4.0 Sonnet, GPT-4 Turbo
42
+
43
+ ### Anthropic Direct
44
+ - Direct access to Claude models
45
+ - Excellent for technical analysis
46
+
47
+ ### Google AI Studio
48
+ - Direct access to Gemini models
49
+ - Advanced multimodal capabilities
50
+
51
+ ## πŸ“Š Analysis Methods
52
+
53
+ ### CrewAI (Recommended)
54
+ - **Soil Expert Agent**: Specializes in soil classification
55
+ - **Geotechnical Agent**: Focuses on engineering parameters
56
+ - **Quality Control**: Two-agent validation system
57
+
58
+ ### LangGraph
59
+ - Single agent with structured workflow
60
+ - Good for straightforward analyses
61
+
62
+ ### Unified Workflow
63
+ - Streamlined processing pipeline
64
+ - Fast analysis with comprehensive validation
65
+
66
+ ## πŸ”¬ Technical Features
67
+
68
+ - **Su Detection**: Comprehensive undrained shear strength value extraction
69
+ - **Layer Optimization**: Smart merging of similar layers
70
+ - **Thick Layer Splitting**: Automatic subdivision of thick layers
71
+ - **Unit Conversions**: Proper handling of different measurement units
72
+ - **Professional Charts**: Publication-ready soil profile visualizations
73
+
74
+ ## 🎯 Use Cases
75
+
76
+ - **Geotechnical Engineering**: Foundation design analysis
77
+ - **Site Investigation**: Soil characterization studies
78
+ - **Construction Planning**: Subsurface condition assessment
79
+ - **Academic Research**: Soil mechanics studies
80
+ - **Consulting**: Client report generation
81
+
82
+ ## πŸ”’ Privacy & Security
83
+
84
+ - **No Data Storage**: Documents are processed in memory only
85
+ - **API Key Security**: Keys are handled securely in environment variables
86
+ - **Local Processing**: All analysis happens in your session
87
+
88
+ ## πŸ› οΈ Technical Stack
89
+
90
+ - **Frontend**: Streamlit
91
+ - **AI Framework**: LangGraph, CrewAI
92
+ - **LLM Integration**: OpenAI API, Anthropic API, Google AI
93
+ - **Visualization**: Plotly, Matplotlib
94
+ - **Document Processing**: PyPDF2, PIL
95
+
96
+ ## πŸ“ License
97
+
98
+ This project is developed for geotechnical engineering applications with AI-powered analysis capabilities.
app.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Soil Boring Log Analyzer - Hugging Face Spaces Version
4
+ Optimized for deployment on Hugging Face Spaces with Streamlit
5
+ """
6
+
7
+ import streamlit as st
8
+ import os
9
+ import shutil
10
+ from pathlib import Path
11
+
12
+ # Hugging Face Spaces Setup
13
+ def setup_hf_environment():
14
+ """Setup environment for Hugging Face Spaces"""
15
+ # Create .env file from template if it doesn't exist
16
+ if not os.path.exists('.env') and os.path.exists('.env_template'):
17
+ shutil.copy('.env_template', '.env')
18
+ st.info("πŸ”§ Environment template created. Please configure your API keys in the sidebar.")
19
+
20
+ # Initialize HF environment
21
+ setup_hf_environment()
22
+
23
+ # Import main app after environment setup
24
+ from app import main
25
+
26
+ # Hugging Face Spaces Configuration
27
+ st.set_page_config(
28
+ page_title="πŸ—οΈ Soil Boring Log Analyzer",
29
+ page_icon="πŸ—οΈ",
30
+ layout="wide",
31
+ initial_sidebar_state="expanded",
32
+ menu_items={
33
+ 'Get Help': 'https://huggingface.co/spaces/your-username/soil-boring-analyzer',
34
+ 'Report a bug': 'https://huggingface.co/spaces/your-username/soil-boring-analyzer/discussions',
35
+ 'About': """
36
+ # πŸ—οΈ Soil Boring Log Analyzer
37
+
38
+ An AI-powered application for analyzing soil boring logs using multiple LLM providers.
39
+
40
+ **Features:**
41
+ - Multi-LLM Support (OpenRouter, Anthropic, Google)
42
+ - PDF/Image document processing
43
+ - Professional soil analysis
44
+ - Interactive visualizations
45
+
46
+ **Powered by:** Streamlit, LangGraph, CrewAI
47
+ """
48
+ }
49
+ )
50
+
51
+ # Add Hugging Face Spaces header
52
+ if __name__ == "__main__":
53
+ with st.container():
54
+ st.markdown("""
55
+ <div style='text-align: center; padding: 1rem; background: linear-gradient(90deg, #ff6b6b, #4ecdc4); color: white; border-radius: 10px; margin-bottom: 1rem;'>
56
+ <h2>πŸ—οΈ Soil Boring Log Analyzer</h2>
57
+ <p>AI-Powered Geotechnical Analysis | Powered by Multiple LLM Providers</p>
58
+ </div>
59
+ """, unsafe_allow_html=True)
60
+
61
+ # Run main application
62
+ main()
config.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from dotenv import load_dotenv
3
+
4
+ load_dotenv()
5
+
6
+ # LLM Provider Configuration
7
+ LLM_PROVIDERS = {
8
+ "openrouter": {
9
+ "name": "OpenRouter",
10
+ "base_url": "https://openrouter.ai/api/v1",
11
+ "api_key_env": "OPENROUTER_API_KEY",
12
+ "description": "Access to multiple models through OpenRouter",
13
+ "supports_all_models": True
14
+ },
15
+ "anthropic": {
16
+ "name": "Anthropic Direct",
17
+ "base_url": "https://api.anthropic.com",
18
+ "api_key_env": "ANTHROPIC_API_KEY",
19
+ "description": "Direct access to Claude models",
20
+ "supports_all_models": False,
21
+ "supported_models": ["anthropic/claude-sonnet-4", "anthropic/claude-3.5-sonnet-20241022", "anthropic/claude-3-sonnet-20240229", "anthropic/claude-3-haiku-20240307", "anthropic/claude-3-opus-20240229"]
22
+ },
23
+ "google": {
24
+ "name": "Google AI Studio",
25
+ "base_url": "https://generativelanguage.googleapis.com",
26
+ "api_key_env": "GOOGLE_API_KEY",
27
+ "description": "Direct access to Gemini models",
28
+ "supports_all_models": False,
29
+ "supported_models": ["google/gemini-2.5-pro-preview-05-06", "google/gemini-pro-vision"]
30
+ }
31
+ }
32
+
33
+ # Default provider and model (can be None if no API key is set)
34
+ DEFAULT_PROVIDER = None
35
+ DEFAULT_MODEL = None
36
+
37
+ def get_api_key(provider):
38
+ """Get API key for specified provider"""
39
+ return os.getenv(LLM_PROVIDERS[provider]["api_key_env"])
40
+
41
+ def get_available_providers():
42
+ """Get list of providers with valid API keys"""
43
+ available = []
44
+ for provider_id, provider_info in LLM_PROVIDERS.items():
45
+ if get_api_key(provider_id):
46
+ available.append(provider_id)
47
+ return available
48
+
49
+ def get_models_for_provider(provider_id):
50
+ """Get available models for a specific provider"""
51
+ available_models = {}
52
+ for model_id, model_info in AVAILABLE_MODELS.items():
53
+ if provider_id in model_info.get("providers", []):
54
+ available_models[model_id] = model_info
55
+ return available_models
56
+
57
+ def get_default_provider_and_model():
58
+ """Get default provider and model based on available API keys"""
59
+ try:
60
+ available_providers = get_available_providers()
61
+
62
+ if not available_providers:
63
+ return None, None
64
+
65
+ # Prefer providers in order: anthropic, openrouter, google
66
+ preferred_order = ["anthropic", "openrouter", "google"]
67
+
68
+ selected_provider = None
69
+ for provider in preferred_order:
70
+ if provider in available_providers:
71
+ selected_provider = provider
72
+ break
73
+
74
+ if not selected_provider:
75
+ selected_provider = available_providers[0]
76
+
77
+ # Get a recommended model for this provider
78
+ available_models = get_models_for_provider(selected_provider)
79
+ recommended_models = {k: v for k, v in available_models.items() if v.get("recommended", False)}
80
+
81
+ if recommended_models:
82
+ selected_model = list(recommended_models.keys())[0]
83
+ elif available_models:
84
+ selected_model = list(available_models.keys())[0]
85
+ else:
86
+ selected_model = None
87
+
88
+ return selected_provider, selected_model
89
+ except Exception:
90
+ # If anything fails, return None values
91
+ return None, None
92
+
93
+ # Available models for soil analysis (recommended for structured outputs)
94
+ AVAILABLE_MODELS = {
95
+ # Claude Models (Excellent for technical analysis)
96
+ "anthropic/claude-sonnet-4": {
97
+ "name": "Claude-4.0 Sonnet",
98
+ "description": "Latest Claude model with superior reasoning and technical analysis",
99
+ "cost": "Medium",
100
+ "recommended": True,
101
+ "supports_images": True,
102
+ "providers": ["openrouter", "anthropic"]
103
+ },
104
+ "anthropic/claude-3.5-sonnet-20241022": {
105
+ "name": "Claude-3.5 Sonnet",
106
+ "description": "Previous Claude model, excellent reasoning and technical analysis",
107
+ "cost": "Medium",
108
+ "recommended": True,
109
+ "supports_images": True,
110
+ "providers": ["openrouter", "anthropic"]
111
+ },
112
+ "anthropic/claude-3-sonnet-20240229": {
113
+ "name": "Claude-3 Sonnet (Legacy)",
114
+ "description": "Previous version, balanced performance",
115
+ "cost": "Medium",
116
+ "recommended": False,
117
+ "supports_images": True,
118
+ "providers": ["openrouter", "anthropic"]
119
+ },
120
+ "anthropic/claude-3-haiku-20240307": {
121
+ "name": "Claude-3 Haiku",
122
+ "description": "Faster and cheaper, good for basic analysis",
123
+ "cost": "Low",
124
+ "recommended": False,
125
+ "supports_images": True,
126
+ "providers": ["openrouter", "anthropic"]
127
+ },
128
+ "anthropic/claude-3-opus-20240229": {
129
+ "name": "Claude-3 Opus",
130
+ "description": "Most capable legacy model, best for complex analysis",
131
+ "cost": "High",
132
+ "recommended": True,
133
+ "supports_images": True,
134
+ "providers": ["openrouter", "anthropic"]
135
+ },
136
+
137
+ # GPT Models (Good structured output)
138
+ "openai/gpt-4-turbo": {
139
+ "name": "GPT-4 Turbo",
140
+ "description": "Fast and capable, good JSON output",
141
+ "cost": "Medium",
142
+ "recommended": True,
143
+ "supports_images": True,
144
+ "providers": ["openrouter"]
145
+ },
146
+ "openai/gpt-3.5-turbo": {
147
+ "name": "GPT-3.5 Turbo",
148
+ "description": "Fast and cheap, basic analysis",
149
+ "cost": "Low",
150
+ "recommended": False,
151
+ "supports_images": False,
152
+ "providers": ["openrouter"]
153
+ },
154
+
155
+ # Specialized Models
156
+ "meta-llama/llama-3.1-70b-instruct": {
157
+ "name": "Llama-3.1 70B",
158
+ "description": "Open source, good performance",
159
+ "cost": "Low",
160
+ "recommended": False,
161
+ "supports_images": False,
162
+ "providers": ["openrouter"]
163
+ },
164
+ "mistralai/mixtral-8x7b-instruct": {
165
+ "name": "Mixtral 8x7B",
166
+ "description": "Good multilingual support",
167
+ "cost": "Low",
168
+ "recommended": False,
169
+ "supports_images": False,
170
+ "providers": ["openrouter"]
171
+ },
172
+
173
+ # xAI Models
174
+ "x-ai/grok-3-beta": {
175
+ "name": "xAI Grok 3",
176
+ "description": "Latest xAI model with advanced reasoning capabilities (text-only)",
177
+ "cost": "Medium",
178
+ "recommended": True,
179
+ "supports_images": False,
180
+ "providers": ["openrouter"]
181
+ },
182
+
183
+ # Google Models
184
+ "google/gemini-2.5-pro-preview-05-06": {
185
+ "name": "Gemini 2.5 Pro Preview",
186
+ "description": "Latest Google Gemini model with advanced multimodal capabilities",
187
+ "cost": "Medium",
188
+ "recommended": True,
189
+ "supports_images": True,
190
+ "providers": ["openrouter", "google"]
191
+ },
192
+ "google/gemini-pro-vision": {
193
+ "name": "Gemini Pro Vision",
194
+ "description": "Google's multimodal model optimized for vision tasks",
195
+ "cost": "Medium",
196
+ "recommended": False,
197
+ "supports_images": True,
198
+ "providers": ["openrouter", "google"]
199
+ }
200
+ }
201
+
202
+ SOIL_TYPES = {
203
+ "clay": ["soft clay", "medium clay", "stiff clay", "very stiff clay", "hard clay"],
204
+ "sand": ["loose sand", "medium dense sand", "dense sand", "very dense sand"],
205
+ "silt": ["soft silt", "medium silt", "stiff silt"],
206
+ "gravel": ["loose gravel", "dense gravel"],
207
+ "rock": ["weathered rock", "soft rock", "hard rock"]
208
+ }
209
+
210
+ STRENGTH_PARAMETERS = {
211
+ "clay": "Su (kPa)",
212
+ "sand": "SPT-N",
213
+ "silt": "SPT-N",
214
+ "gravel": "SPT-N",
215
+ "rock": "UCS (MPa)"
216
+ }
crewai_agents.py ADDED
@@ -0,0 +1,553 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from crewai import Agent, Task, Crew, Process
2
+ from typing import Dict, Any, List
3
+ import json
4
+ import os
5
+ from llm_client import LLMClient
6
+ from soil_analyzer import SoilLayerAnalyzer
7
+ from pydantic import BaseModel, Field
8
+ from config import LLM_PROVIDERS, get_default_provider_and_model
9
+
10
+ class CrewAIGeotechSystem:
11
+ def __init__(self, model=None, api_key=None):
12
+ # Handle API key - if explicitly passed as empty string, use that (for mock mode)
13
+ if api_key == "":
14
+ self.api_key = ""
15
+ else:
16
+ self.api_key = api_key or ""
17
+
18
+ _, default_model = get_default_provider_and_model()
19
+ self.model = model or default_model
20
+
21
+ # Check if we have a valid API key
22
+ self.has_api_key = bool(self.api_key and self.api_key.strip())
23
+
24
+ if self.has_api_key:
25
+ # Initialize our working LLMClient for actual LLM calls
26
+ self.llm_client = LLMClient(model=self.model, api_key=self.api_key)
27
+
28
+ # We'll use direct LLM calls instead of CrewAI agents due to compatibility issues
29
+ self.geotech_agent = "Geotechnical Engineer (Direct LLM)"
30
+ self.senior_geotech_agent = "Senior Geotechnical Engineer (Direct LLM)"
31
+ else:
32
+ # No API key available - set to None to trigger mock mode
33
+ self.llm_client = None
34
+ self.geotech_agent = None
35
+ self.senior_geotech_agent = None
36
+
37
+ def _run_geotech_analysis_direct(self, soil_data: Dict[str, Any]) -> str:
38
+ """Run geotechnical analysis using direct LLM calls"""
39
+
40
+ analysis_prompt = f"""
41
+ As an experienced geotechnical engineer, analyze the following soil boring log data and provide a comprehensive geotechnical assessment:
42
+
43
+ DATA:
44
+ {json.dumps(soil_data, indent=2)}
45
+
46
+ CRITICAL UNIT CONVERSION REQUIREMENTS:
47
+ 1. **Su (Undrained Shear Strength) Unit Conversions:**
48
+ - t/mΒ² β†’ kPa: MULTIPLY BY 9.81 (NOT 10!)
49
+ - ksc (kg/cmΒ²) β†’ kPa: multiply by 98.0
50
+ - psi β†’ kPa: multiply by 6.895
51
+ - MPa β†’ kPa: multiply by 1000
52
+ - tsf (tons/ftΒ²) β†’ kPa: multiply by 95.76
53
+
54
+ 2. **Common Unit Errors to Check:**
55
+ - If Su values seem unusually high (>500 kPa for soft clay), check if units are incorrectly converted
56
+ - If Su values seem unusually low (<10 kPa for stiff clay), check if conversion factor was applied
57
+ - t/mΒ² is commonly misconverted using factor of 10 instead of 9.81 - BE CAREFUL!
58
+
59
+ CRITICAL LAYER ANALYSIS REQUIREMENTS:
60
+ 3. **Su Value Consistency Within Layers:**
61
+ - **EXAMINE each layer for Su value variations**
62
+ - If multiple Su values exist within a single layer, check for consistency
63
+ - **LAYER SPLITTING CRITERIA:**
64
+ * If Su values within a layer vary by >30%, consider splitting the layer
65
+ * If Su values show clear trend (increasing/decreasing), split at transition points
66
+ * If one Su value is >2x another Su value in same layer, MUST split the layer
67
+
68
+ 4. **Layer Splitting Protocol:**
69
+ - Identify depth ranges where Su values are similar (within 20-30% variation)
70
+ - Create new layer boundaries at points where Su values change significantly
71
+ - Each new sub-layer should have consistent Su values (average or representative value)
72
+ - Update soil descriptions to reflect the new layer characteristics
73
+
74
+ 5. **Su Value Assignment for Layers:**
75
+ - If Su values are consistent within layer: use average value
76
+ - If Su values vary significantly: split layer and assign representative values
77
+ - Document the original Su readings and how they were processed
78
+
79
+ Your responsibilities:
80
+ 1. **CAREFULLY validate ALL unit conversions** - pay special attention to Su values
81
+ 2. Check if any Su values are in t/mΒ² and need conversion to kPa using factor 9.81
82
+ 3. **ANALYZE Su value consistency within each layer**
83
+ 4. **SPLIT layers where Su values vary significantly (>30% variation or >2x difference)**
84
+ 5. Validate all geotechnical parameters for consistency and reasonableness
85
+ 6. Check layer classifications and transitions
86
+ 7. Verify strength parameter correlations (Su vs water content, SPT vs consistency)
87
+ 8. Calculate layer statistics and identify any outliers
88
+ 9. Ensure ALL unit conversions are correct (kPa, degrees, etc.)
89
+ 10. Check depth continuity and layer boundaries
90
+
91
+ Focus on:
92
+ - **UNIT CONVERSION ACCURACY** (especially Su values from t/mΒ² to kPa)
93
+ - **Su VALUE CONSISTENCY within layers** - split if values vary significantly
94
+ - Parameter consistency (e.g., soft clay should have low Su, high water content)
95
+ - Reasonable strength ranges for each soil type
96
+ - Proper calculation methodology
97
+ - Layer transition logic
98
+
99
+ LAYER SPLITTING DECISION TREE:
100
+ βœ“ Check if multiple Su values exist in each layer
101
+ βœ“ Calculate variation between Su values (max-min)/average
102
+ βœ“ If variation >30% OR max/min ratio >2.0: SPLIT the layer
103
+ βœ“ Create new layers with consistent Su values
104
+ βœ“ Assign average Su value to each new sub-layer
105
+ βœ“ Update layer descriptions and boundaries
106
+
107
+ UNIT CONVERSION VERIFICATION CHECKLIST:
108
+ βœ“ Check if any Su values are in t/mΒ² - if yes, multiply by 9.81 to get kPa
109
+ βœ“ Verify Su values are reasonable for soil consistency (soft clay: 10-30 kPa, stiff clay: 50-100 kPa)
110
+ βœ“ Check SPT N-values are reasonable for soil description
111
+ βœ“ Ensure depth units are in meters
112
+
113
+ Provide detailed analysis with any concerns noted, especially unit conversion issues and layer splitting recommendations.
114
+ """
115
+
116
+ try:
117
+ # Use our working LLMClient
118
+ response = self.llm_client.client.chat.completions.create(
119
+ model=self.model,
120
+ messages=[
121
+ {"role": "system", "content": "You are an experienced geotechnical engineer with expertise in soil mechanics and foundation design. You are particularly careful about unit conversions and have seen many errors caused by incorrect unit conversion factors."},
122
+ {"role": "user", "content": analysis_prompt}
123
+ ],
124
+ temperature=0.1,
125
+ max_tokens=2000
126
+ )
127
+
128
+ return response.choices[0].message.content
129
+
130
+ except Exception as e:
131
+ return f"Analysis failed: {str(e)}"
132
+
133
+ def _run_senior_review_direct(self, analysis_result: str) -> str:
134
+ """Run senior engineer review using direct LLM calls"""
135
+
136
+ review_prompt = f"""
137
+ As a senior geotechnical engineer with 20+ years of experience, review the following geotechnical analysis for consistency, accuracy, and engineering reasonableness:
138
+
139
+ ANALYSIS TO REVIEW:
140
+ {analysis_result}
141
+
142
+ CRITICAL REVIEW FOCUS - UNIT CONVERSIONS:
143
+ Your PRIMARY responsibility is to catch unit conversion errors that can lead to catastrophic design failures.
144
+
145
+ 1. **Su (Undrained Shear Strength) CRITICAL CHECKS:**
146
+ - Are any Su values still in t/mΒ²? If yes, they MUST be converted to kPa using factor 9.81
147
+ - Are Su values reasonable for the soil consistency described?
148
+ - Soft clay: 10-30 kPa | Medium clay: 30-60 kPa | Stiff clay: 60-120 kPa | Very stiff clay: >120 kPa
149
+ - If Su > 500 kPa for soft clay β†’ MAJOR RED FLAG - likely unit error
150
+ - If Su < 10 kPa for stiff clay β†’ MAJOR RED FLAG - likely unit error
151
+
152
+ 2. **COMMON UNIT CONVERSION ERRORS TO IDENTIFY:**
153
+ - Using factor 10 instead of 9.81 for t/mΒ² β†’ kPa conversion
154
+ - Missing unit conversions (values still in original units)
155
+ - Wrong conversion factors applied
156
+
157
+ CRITICAL LAYER SPLITTING REVIEW:
158
+ 3. **Su Value Consistency Within Layers:**
159
+ - **EXAMINE if layers with varying Su values were properly split**
160
+ - Check if any single layer contains Su values that vary by >30%
161
+ - **LAYER SPLITTING VALIDATION:**
162
+ * Were layers with Su variation >30% properly split?
163
+ * Were layers with Su ratio >2.0 (max/min) properly split?
164
+ * Are the new layer boundaries logical and well-defined?
165
+ * Does each sub-layer have consistent Su values?
166
+
167
+ 4. **Layer Splitting Quality Control:**
168
+ - Verify that split layers have representative Su values (average or appropriate)
169
+ - Check that layer descriptions match the Su values assigned
170
+ - Ensure depth boundaries are clearly defined for split layers
171
+ - Validate that soil consistency matches the Su values in each sub-layer
172
+
173
+ Your senior engineer review responsibilities:
174
+
175
+ 1. **UNIT CONVERSION VALIDATION (HIGHEST PRIORITY):**
176
+ - Su vs water content relationships for clay soils
177
+ - SPT N-values vs soil consistency correlations
178
+ - Strength ranges within expected bounds for each soil type
179
+ - Unit conversion accuracy (kPa, degrees, m) - ESPECIALLY t/mΒ² to kPa using 9.81
180
+
181
+ 2. **LAYER SPLITTING VALIDATION (HIGH PRIORITY):**
182
+ - Check if layers with varying Su values were appropriately split
183
+ - Verify consistency of Su values within each layer
184
+ - Validate that layer boundaries make engineering sense
185
+
186
+ 3. **PARAMETER CONSISTENCY CHECKS:**
187
+ - Layer boundaries and transitions are logical
188
+ - Classification consistency across depth
189
+ - Parameter ranges match soil descriptions
190
+
191
+ 4. **RED FLAGS TO IDENTIFY:**
192
+ - **CRITICAL:** Su values in wrong units (t/mΒ² not converted to kPa)
193
+ - **CRITICAL:** Su values unreasonable for consistency (high Su with soft clay, low Su with stiff clay)
194
+ - **CRITICAL:** Single layer with highly variable Su values (>30% variation) not split
195
+ - High water content with very high Su (unusual for clay)
196
+ - Low water content with very low Su
197
+ - Soft consistency with high SPT N-values
198
+ - Hard consistency with low SPT N-values
199
+ - Strength values that appear incorrectly converted
200
+
201
+ 5. **DECISION CRITERIA:**
202
+ - If you find ANY unit conversion errors: **REJECT and require re-investigation**
203
+ - If Su values are inconsistent with soil consistency: **REJECT and require re-investigation**
204
+ - If layers with varying Su values were not properly split: **REJECT and require re-investigation**
205
+ - If parameters look reasonable: Approve with confidence assessment
206
+ - If minor concerns exist: Approve with notes
207
+
208
+ LAYER SPLITTING REVIEW CHECKLIST:
209
+ βœ“ Are there any layers with multiple different Su values?
210
+ βœ“ Were layers with Su variation >30% properly split?
211
+ βœ“ Were layers with Su ratio >2.0 properly split?
212
+ βœ“ Do split layers have consistent Su values within each sub-layer?
213
+ βœ“ Are layer descriptions updated to match the split layers?
214
+
215
+ UNIT CONVERSION VERIFICATION CHECKLIST FOR REVIEW:
216
+ βœ“ Are ALL Su values properly converted to kPa?
217
+ βœ“ Are Su values reasonable for the described soil consistency?
218
+ βœ“ Has the correct factor (9.81) been used for t/mΒ² β†’ kPa conversion?
219
+ βœ“ Are SPT N-values consistent with soil descriptions?
220
+
221
+ Provide your professional judgment on whether this analysis is acceptable or requires revision.
222
+ Be EXTREMELY specific about any unit conversion issues and layer splitting issues found and provide clear guidance for correction.
223
+
224
+ REMEMBER: Unit conversion errors and improper layer definition can lead to foundation failures. Be thorough.
225
+ """
226
+
227
+ try:
228
+ # Use our working LLMClient
229
+ response = self.llm_client.client.chat.completions.create(
230
+ model=self.model,
231
+ messages=[
232
+ {"role": "system", "content": "You are a senior geotechnical engineer with extensive experience in complex foundation projects and rigorous quality control. You have seen foundation failures caused by unit conversion errors and are extremely vigilant about this issue."},
233
+ {"role": "user", "content": review_prompt}
234
+ ],
235
+ temperature=0.1,
236
+ max_tokens=2000
237
+ )
238
+
239
+ return response.choices[0].message.content
240
+
241
+ except Exception as e:
242
+ return f"Review failed: {str(e)}"
243
+
244
+ def _run_reinvestigation_direct(self, original_analysis: str, review_feedback: str) -> str:
245
+ """Run re-investigation using direct LLM calls"""
246
+
247
+ reinvestigation_prompt = f"""
248
+ Based on the senior engineer's review, re-investigate and address the following issues:
249
+
250
+ ORIGINAL ANALYSIS:
251
+ {original_analysis}
252
+
253
+ SENIOR REVIEW FEEDBACK:
254
+ {review_feedback}
255
+
256
+ RE-INVESTIGATION REQUIREMENTS:
257
+
258
+ **PRIORITY 1: UNIT CONVERSION CORRECTIONS**
259
+ If the senior engineer identified unit conversion issues:
260
+ 1. **Su (Undrained Shear Strength) Corrections:**
261
+ - If values are in t/mΒ², convert to kPa by multiplying by 9.81
262
+ - Double-check ALL conversion factors used
263
+ - Verify final Su values are reasonable for soil consistency
264
+
265
+ 2. **Verify Conversion Factors:**
266
+ - t/mΒ² β†’ kPa: multiply by 9.81 (NOT 10)
267
+ - ksc β†’ kPa: multiply by 98.0
268
+ - psi β†’ kPa: multiply by 6.895
269
+ - MPa β†’ kPa: multiply by 1000
270
+
271
+ **PRIORITY 2: LAYER SPLITTING CORRECTIONS**
272
+ If the senior engineer identified layer splitting issues:
273
+ 1. **Su Value Variation Analysis:**
274
+ - Re-examine each layer for Su value consistency
275
+ - Calculate variation: (max Su - min Su) / average Su
276
+ - Calculate ratio: max Su / min Su
277
+
278
+ 2. **Layer Splitting Protocol:**
279
+ - If Su variation >30% OR ratio >2.0: SPLIT the layer
280
+ - Create new layer boundaries at points where Su values change significantly
281
+ - Assign consistent Su values to each new sub-layer (use average within range)
282
+ - Update layer descriptions to reflect new boundaries
283
+
284
+ 3. **New Layer Definition:**
285
+ - Each sub-layer should have Su values within 20-30% variation
286
+ - Update depth ranges for each new sub-layer
287
+ - Ensure soil consistency descriptions match Su values
288
+ - Verify layer transitions are logical
289
+
290
+ **GENERAL RE-INVESTIGATION:**
291
+ 1. Address each specific concern raised by the senior engineer
292
+ 2. Re-examine parameter relationships and correlations
293
+ 3. Double-check ALL unit conversions and calculations
294
+ 4. Provide revised analysis with explanations for changes
295
+ 5. Justify any assumptions or interpretations made
296
+
297
+ **LAYER SPLITTING EXAMPLE:**
298
+ Original Layer: 2.0-6.0m, Clay, Su values: 15, 25, 45 kPa
299
+ Analysis: Variation = (45-15)/28 = 107% > 30%, Ratio = 45/15 = 3.0 > 2.0
300
+ Action: SPLIT INTO:
301
+ - Layer 2a: 2.0-4.0m, Soft Clay, Su = 20 kPa (average of 15, 25)
302
+ - Layer 2b: 4.0-6.0m, Medium Clay, Su = 45 kPa
303
+
304
+ **UNIT CONVERSION RE-CHECK PROTOCOL:**
305
+ βœ“ Identify original units for each parameter
306
+ βœ“ Apply correct conversion factors
307
+ βœ“ Verify converted values are reasonable
308
+ βœ“ Check consistency with soil descriptions
309
+
310
+ **LAYER SPLITTING RE-CHECK PROTOCOL:**
311
+ βœ“ Check Su value variation within each layer
312
+ βœ“ Split layers with >30% variation or >2.0 ratio
313
+ βœ“ Assign representative Su values to sub-layers
314
+ βœ“ Update layer descriptions and boundaries
315
+ βœ“ Verify consistency between Su and soil consistency
316
+
317
+ Focus specifically on the issues identified in the review.
318
+ Provide a comprehensive revised analysis that addresses ALL concerns.
319
+
320
+ **Show your work:** For any corrections, clearly state:
321
+ - Original layer configuration
322
+ - Su values and their variation/ratio
323
+ - Splitting decision and rationale
324
+ - New layer boundaries and Su assignments
325
+ - Unit conversion details (original value, factor, final value)
326
+ - Verification that results are reasonable
327
+ """
328
+
329
+ try:
330
+ # Use our working LLMClient
331
+ response = self.llm_client.client.chat.completions.create(
332
+ model=self.model,
333
+ messages=[
334
+ {"role": "system", "content": "You are an experienced geotechnical engineer conducting a thorough re-investigation based on senior engineer feedback. You are particularly focused on correcting any unit conversion errors identified."},
335
+ {"role": "user", "content": reinvestigation_prompt}
336
+ ],
337
+ temperature=0.1,
338
+ max_tokens=2000
339
+ )
340
+
341
+ return response.choices[0].message.content
342
+
343
+ except Exception as e:
344
+ return f"Re-investigation failed: {str(e)}"
345
+
346
+ def _run_final_review_direct(self, revised_analysis: str, previous_concerns: str) -> str:
347
+ """Run final review using direct LLM calls"""
348
+
349
+ final_review_prompt = f"""
350
+ Conduct final review of the re-investigated analysis:
351
+
352
+ REVISED ANALYSIS:
353
+ {revised_analysis}
354
+
355
+ PREVIOUS CONCERNS:
356
+ {previous_concerns}
357
+
358
+ Final validation requirements:
359
+ 1. Confirm all previous concerns have been adequately addressed
360
+ 2. Verify parameter consistency and engineering reasonableness
361
+ 3. Check that explanations are technically sound
362
+ 4. Provide final approval or additional guidance if still needed
363
+
364
+ Make final determination: APPROVED or REQUIRES FURTHER WORK
365
+ """
366
+
367
+ try:
368
+ # Use our working LLMClient
369
+ response = self.llm_client.client.chat.completions.create(
370
+ model=self.model,
371
+ messages=[
372
+ {"role": "system", "content": "You are a senior geotechnical engineer conducting final validation with authority to approve or reject the analysis."},
373
+ {"role": "user", "content": final_review_prompt}
374
+ ],
375
+ temperature=0.1,
376
+ max_tokens=2000
377
+ )
378
+
379
+ return response.choices[0].message.content
380
+
381
+ except Exception as e:
382
+ return f"Final review failed: {str(e)}"
383
+
384
+ def run_geotechnical_analysis(self, soil_data: Dict[str, Any]) -> Dict[str, Any]:
385
+ """Run the complete two-agent geotechnical analysis workflow using direct LLM calls"""
386
+
387
+ # Handle case where no LLM is available (testing mode)
388
+ if not self.has_api_key:
389
+ return self._mock_analysis_for_testing(soil_data)
390
+
391
+ try:
392
+ # Run initial analysis using direct LLM call
393
+ analysis_result = self._run_geotech_analysis_direct(soil_data)
394
+
395
+ if "failed:" in analysis_result:
396
+ return {
397
+ "error": f"Initial analysis failed: {analysis_result}",
398
+ "status": "error",
399
+ "workflow": "failed"
400
+ }
401
+
402
+ # Run review using direct LLM call
403
+ review_result = self._run_senior_review_direct(analysis_result)
404
+
405
+ if "failed:" in review_result:
406
+ return {
407
+ "error": f"Review failed: {review_result}",
408
+ "status": "error",
409
+ "workflow": "failed"
410
+ }
411
+
412
+ # Check if re-investigation is needed based on review content
413
+ review_text = review_result.lower()
414
+ needs_reinvestigation = any(keyword in review_text for keyword in [
415
+ "re-investigate", "reject", "inconsistent", "unusual", "verify", "additional testing",
416
+ "requires revision", "not acceptable", "re-examination", "concerning", "requires further work"
417
+ ])
418
+
419
+ if needs_reinvestigation:
420
+ # Run re-investigation
421
+ final_analysis = self._run_reinvestigation_direct(analysis_result, review_result)
422
+
423
+ if "failed:" in final_analysis:
424
+ return {
425
+ "error": f"Re-investigation failed: {final_analysis}",
426
+ "status": "error",
427
+ "workflow": "failed"
428
+ }
429
+
430
+ # Final review of re-investigation
431
+ final_review = self._run_final_review_direct(final_analysis, review_result)
432
+
433
+ if "failed:" in final_review:
434
+ return {
435
+ "error": f"Final review failed: {final_review}",
436
+ "status": "error",
437
+ "workflow": "failed"
438
+ }
439
+
440
+ return {
441
+ "initial_analysis": analysis_result,
442
+ "initial_review": review_result,
443
+ "reinvestigation": final_analysis,
444
+ "final_review": final_review,
445
+ "status": "completed_with_revision",
446
+ "workflow": "two_stage_review_direct_llm"
447
+ }
448
+
449
+ else:
450
+ return {
451
+ "analysis": analysis_result,
452
+ "review": review_result,
453
+ "status": "approved",
454
+ "workflow": "single_stage_review_direct_llm"
455
+ }
456
+
457
+ except Exception as e:
458
+ return {
459
+ "error": f"CrewAI analysis failed: {str(e)}",
460
+ "status": "error",
461
+ "workflow": "failed"
462
+ }
463
+
464
+ def _mock_analysis_for_testing(self, soil_data: Dict[str, Any]) -> Dict[str, Any]:
465
+ """Provides mock analysis for testing when no API key is available"""
466
+
467
+ # Simulate comprehensive analysis based on input data
468
+ project_name = soil_data.get('project_info', {}).get('project_name', 'Unknown Project')
469
+ boring_id = soil_data.get('project_info', {}).get('boring_id', 'Unknown Boring')
470
+ soil_layers = soil_data.get('soil_layers', [])
471
+
472
+ # Generate realistic mock analysis
473
+ mock_analysis = f"""
474
+ GEOTECHNICAL ANALYSIS REPORT - {project_name} ({boring_id})
475
+
476
+ EXECUTIVE SUMMARY:
477
+ Analyzed {len(soil_layers)} soil layers from the boring log data.
478
+ Overall soil conditions appear consistent with typical soil behavior patterns.
479
+
480
+ LAYER ANALYSIS:
481
+ """
482
+
483
+ for i, layer in enumerate(soil_layers, 1):
484
+ soil_type = layer.get('soil_type', 'Unknown')
485
+ consistency = layer.get('consistency', 'Unknown')
486
+ depth_from = layer.get('depth_from', 0)
487
+ depth_to = layer.get('depth_to', 0)
488
+ strength_value = layer.get('strength_value', 'N/A')
489
+ strength_unit = layer.get('strength_unit', '')
490
+
491
+ mock_analysis += f"""
492
+ Layer {i} ({depth_from}-{depth_to}m): {soil_type.title()}
493
+ - Consistency: {consistency}
494
+ - Strength: {strength_value} {strength_unit}
495
+ - Classification appears reasonable for {consistency} {soil_type}
496
+ - Depth continuity validated βœ“
497
+ """
498
+
499
+ mock_analysis += """
500
+
501
+ VALIDATION CHECKS:
502
+ βœ“ Layer depth continuity confirmed
503
+ βœ“ Strength parameters within expected ranges
504
+ βœ“ Soil classification consistency verified
505
+ βœ“ Unit conversions validated
506
+
507
+ RECOMMENDATIONS:
508
+ - Data appears consistent and suitable for preliminary design
509
+ - Standard geotechnical correlations apply
510
+ - Consider additional testing for final design if required
511
+ """
512
+
513
+ mock_review = f"""
514
+ SENIOR ENGINEER REVIEW - {project_name}
515
+
516
+ I have reviewed the geotechnical analysis and findings:
517
+
518
+ TECHNICAL REVIEW:
519
+ - Analysis methodology is sound and follows standard practices
520
+ - Parameter correlations are reasonable and well-documented
521
+ - Soil classification is consistent with strength parameters
522
+ - Depth boundaries and layer transitions are appropriate
523
+
524
+ QUALITY ASSURANCE:
525
+ - All calculations have been verified
526
+ - Unit conversions are correct
527
+ - Data consistency checks passed
528
+ - Engineering correlations within acceptable ranges
529
+
530
+ APPROVAL STATUS: βœ… APPROVED
531
+
532
+ The analysis meets professional standards and is suitable for use in geotechnical design.
533
+ No re-investigation required at this time.
534
+
535
+ Senior Geotechnical Engineer Review Complete.
536
+ """
537
+
538
+ return {
539
+ "status": "approved",
540
+ "workflow": "two_agent_review",
541
+ "analysis": mock_analysis.strip(),
542
+ "review": mock_review.strip(),
543
+ "summary": f"Mock analysis completed for {len(soil_layers)} soil layers. All parameters validated.",
544
+ "timestamp": "2024-06-26T10:00:00Z",
545
+ "system": "CrewAI Mock System (No API Key)",
546
+ "reinvestigation_required": False
547
+ }
548
+
549
+ # Example usage function
550
+ def analyze_soil_with_crewai(soil_data: Dict[str, Any]) -> Dict[str, Any]:
551
+ """Main function to run CrewAI-based geotechnical analysis"""
552
+ system = CrewAIGeotechSystem()
553
+ return system.run_geotechnical_analysis(soil_data)
document_processor.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import PyPDF2
2
+ from PIL import Image
3
+ import base64
4
+ import io
5
+ import streamlit as st
6
+
7
+ try:
8
+ from pdf2image import convert_from_path
9
+ PDF2IMAGE_AVAILABLE = True
10
+ except ImportError:
11
+ PDF2IMAGE_AVAILABLE = False
12
+ st.warning("⚠️ pdf2image not available. PDF to image conversion will be limited.")
13
+
14
+ class DocumentProcessor:
15
+ def __init__(self):
16
+ pass
17
+
18
+ def extract_text_from_pdf(self, pdf_file):
19
+ """Extract text content from PDF file"""
20
+ try:
21
+ pdf_reader = PyPDF2.PdfReader(pdf_file)
22
+ text = ""
23
+ for page in pdf_reader.pages:
24
+ text += page.extract_text() + "\n"
25
+ return text
26
+ except Exception as e:
27
+ st.error(f"Error extracting text from PDF: {str(e)}")
28
+ return None
29
+
30
+ def convert_pdf_to_images(self, pdf_file):
31
+ """Convert PDF pages to images"""
32
+ if not PDF2IMAGE_AVAILABLE:
33
+ st.warning("PDF to image conversion not available. Install poppler-utils and pdf2image.")
34
+ return None
35
+
36
+ try:
37
+ images = convert_from_path(pdf_file, dpi=200)
38
+ return images
39
+ except Exception as e:
40
+ st.error(f"Error converting PDF to images: {str(e)}")
41
+ return None
42
+
43
+ def image_to_base64(self, image):
44
+ """Convert PIL image to base64 string for API"""
45
+ try:
46
+ if isinstance(image, str):
47
+ with open(image, "rb") as img_file:
48
+ return base64.b64encode(img_file.read()).decode('utf-8')
49
+ else:
50
+ buffered = io.BytesIO()
51
+ image.save(buffered, format="PNG")
52
+ return base64.b64encode(buffered.getvalue()).decode('utf-8')
53
+ except Exception as e:
54
+ st.error(f"Error converting image to base64: {str(e)}")
55
+ return None
56
+
57
+ def process_uploaded_file(self, uploaded_file):
58
+ """Process uploaded file (PDF or image)"""
59
+ if uploaded_file is None:
60
+ return None, None, None
61
+
62
+ file_type = uploaded_file.type
63
+
64
+ if file_type == "application/pdf":
65
+ # Extract text
66
+ text_content = self.extract_text_from_pdf(uploaded_file)
67
+
68
+ # Convert to images for visual analysis (if available)
69
+ images = None
70
+ image_base64 = None
71
+
72
+ if PDF2IMAGE_AVAILABLE:
73
+ try:
74
+ import tempfile
75
+ import os
76
+
77
+ # Use temporary file to avoid conflicts
78
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as temp_pdf:
79
+ temp_pdf.write(uploaded_file.getbuffer())
80
+ temp_pdf_path = temp_pdf.name
81
+
82
+ try:
83
+ images = self.convert_pdf_to_images(temp_pdf_path)
84
+
85
+ # Convert first page to base64 for LLM analysis
86
+ if images and len(images) > 0:
87
+ image_base64 = self.image_to_base64(images[0])
88
+ finally:
89
+ # Clean up temporary file
90
+ if os.path.exists(temp_pdf_path):
91
+ os.unlink(temp_pdf_path)
92
+
93
+ except Exception as e:
94
+ st.warning(f"PDF to image conversion failed: {str(e)}. Using text analysis only.")
95
+
96
+ return text_content, images, image_base64
97
+
98
+ elif file_type in ["image/jpeg", "image/png", "image/jpg"]:
99
+ # For image files
100
+ try:
101
+ image = Image.open(uploaded_file)
102
+ image_base64 = self.image_to_base64(image)
103
+
104
+ return None, [image], image_base64
105
+ except Exception as e:
106
+ st.error(f"Error processing image file: {str(e)}")
107
+ return None, None, None
108
+
109
+ else:
110
+ st.error("Unsupported file type. Please upload PDF or image files.")
111
+ return None, None, None
langgraph_agent.py ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langgraph.graph import StateGraph, END
2
+ from langchain.schema import BaseMessage, HumanMessage, AIMessage
3
+ from typing import TypedDict, List, Dict, Any
4
+ import json
5
+ from llm_client import LLMClient
6
+ from soil_analyzer import SoilLayerAnalyzer
7
+
8
+ class AgentState(TypedDict):
9
+ messages: List[BaseMessage]
10
+ soil_data: Dict[str, Any]
11
+ analysis_results: Dict[str, Any]
12
+ user_feedback: str
13
+ current_task: str
14
+ iteration_count: int
15
+ text_content: str
16
+ image_base64: str
17
+
18
+ class SoilAnalysisAgent:
19
+ def __init__(self):
20
+ # Initialize with None client - will be set when needed
21
+ self.llm_client = None
22
+ self.soil_analyzer = SoilLayerAnalyzer()
23
+ self.graph = self._create_graph()
24
+
25
+ def _create_graph(self):
26
+ """Create the LangGraph workflow"""
27
+ workflow = StateGraph(AgentState)
28
+
29
+ # Add nodes
30
+ workflow.add_node("analyze_document", self._analyze_document)
31
+ workflow.add_node("validate_layers", self._validate_layers)
32
+ workflow.add_node("optimize_layers", self._optimize_layers)
33
+ workflow.add_node("generate_insights", self._generate_insights)
34
+ workflow.add_node("handle_feedback", self._handle_feedback)
35
+
36
+ # Add edges
37
+ workflow.add_edge("analyze_document", "validate_layers")
38
+ workflow.add_edge("validate_layers", "optimize_layers")
39
+ workflow.add_edge("optimize_layers", "generate_insights")
40
+ workflow.add_conditional_edges(
41
+ "generate_insights",
42
+ self._should_handle_feedback,
43
+ {
44
+ "feedback": "handle_feedback",
45
+ "end": END
46
+ }
47
+ )
48
+ workflow.add_edge("handle_feedback", "validate_layers")
49
+
50
+ # Set entry point
51
+ workflow.set_entry_point("analyze_document")
52
+
53
+ return workflow.compile()
54
+
55
+ def _analyze_document(self, state: AgentState) -> AgentState:
56
+ """Analyze the soil boring log document"""
57
+ # Extract document content from state
58
+ document_content = state.get("text_content")
59
+ image_content = state.get("image_base64")
60
+
61
+ # Analyze using LLM
62
+ soil_data = self.llm_client.analyze_soil_boring_log(
63
+ text_content=document_content,
64
+ image_base64=image_content
65
+ )
66
+
67
+ state["soil_data"] = soil_data
68
+ state["current_task"] = "document_analysis"
69
+ state["messages"].append(AIMessage(content="Document analysis completed"))
70
+
71
+ return state
72
+
73
+ def _validate_layers(self, state: AgentState) -> AgentState:
74
+ """Validate soil layer continuity and consistency"""
75
+ soil_data = state["soil_data"]
76
+
77
+ if "soil_layers" in soil_data:
78
+ # Validate layer continuity
79
+ validated_layers = self.soil_analyzer.validate_layer_continuity(
80
+ soil_data["soil_layers"]
81
+ )
82
+
83
+ soil_data["soil_layers"] = validated_layers
84
+
85
+ # Calculate statistics
86
+ stats = self.soil_analyzer.calculate_layer_statistics(validated_layers)
87
+ state["analysis_results"] = {"validation_stats": stats}
88
+
89
+ state["current_task"] = "layer_validation"
90
+ state["messages"].append(AIMessage(content="Layer validation completed"))
91
+
92
+ return state
93
+
94
+ def _optimize_layers(self, state: AgentState) -> AgentState:
95
+ """Optimize layer division by merging/splitting as needed"""
96
+ soil_data = state["soil_data"]
97
+
98
+ if "soil_layers" in soil_data:
99
+ optimization_results = self.soil_analyzer.optimize_layer_division(
100
+ soil_data["soil_layers"]
101
+ )
102
+
103
+ state["analysis_results"]["optimization"] = optimization_results
104
+
105
+ state["current_task"] = "layer_optimization"
106
+ state["messages"].append(AIMessage(content="Layer optimization completed"))
107
+
108
+ return state
109
+
110
+ def _generate_insights(self, state: AgentState) -> AgentState:
111
+ """Generate insights and recommendations"""
112
+ soil_data = state["soil_data"]
113
+ analysis_results = state["analysis_results"]
114
+
115
+ # Generate insights using LLM
116
+ insights_prompt = f"""
117
+ Based on the soil boring log analysis, provide geotechnical insights and recommendations:
118
+
119
+ Soil Data: {json.dumps(soil_data, indent=2)}
120
+ Analysis Results: {json.dumps(analysis_results, indent=2)}
121
+
122
+ Please provide:
123
+ 1. Key geotechnical findings
124
+ 2. Foundation recommendations
125
+ 3. Construction considerations
126
+ 4. Potential risks or concerns
127
+ 5. Recommended additional testing
128
+ """
129
+
130
+ try:
131
+ response = self.llm_client.client.chat.completions.create(
132
+ model=self.llm_client.model,
133
+ messages=[{"role": "user", "content": insights_prompt}],
134
+ max_tokens=1000,
135
+ temperature=0.3
136
+ )
137
+
138
+ insights = response.choices[0].message.content
139
+ state["analysis_results"]["insights"] = insights
140
+
141
+ except Exception as e:
142
+ state["analysis_results"]["insights"] = f"Error generating insights: {str(e)}"
143
+
144
+ state["current_task"] = "insight_generation"
145
+ state["messages"].append(AIMessage(content="Insights generation completed"))
146
+
147
+ return state
148
+
149
+ def _handle_feedback(self, state: AgentState) -> AgentState:
150
+ """Handle user feedback and refine analysis"""
151
+ user_feedback = state.get("user_feedback", "")
152
+ soil_data = state["soil_data"]
153
+
154
+ if user_feedback:
155
+ # Refine soil layers based on feedback
156
+ refined_data = self.llm_client.refine_soil_layers(soil_data, user_feedback)
157
+
158
+ if "error" not in refined_data:
159
+ state["soil_data"] = refined_data
160
+
161
+ state["current_task"] = "feedback_handling"
162
+ state["iteration_count"] = state.get("iteration_count", 0) + 1
163
+ state["messages"].append(AIMessage(content=f"Feedback processed (iteration {state['iteration_count']})"))
164
+
165
+ return state
166
+
167
+ def _should_handle_feedback(self, state: AgentState) -> str:
168
+ """Determine if feedback should be handled"""
169
+ if state.get("user_feedback") and state.get("iteration_count", 0) < 3:
170
+ return "feedback"
171
+ return "end"
172
+
173
+ def run_analysis(self, text_content=None, image_base64=None, user_feedback=None):
174
+ """Run the complete soil analysis workflow"""
175
+
176
+ # Prepare initial state - store content in state instead of message
177
+ initial_message = HumanMessage(content="Starting soil boring log analysis")
178
+
179
+ initial_state = {
180
+ "messages": [initial_message],
181
+ "soil_data": {},
182
+ "analysis_results": {},
183
+ "user_feedback": user_feedback or "",
184
+ "current_task": "initialization",
185
+ "iteration_count": 0,
186
+ "text_content": text_content,
187
+ "image_base64": image_base64
188
+ }
189
+
190
+ # Run the graph
191
+ result = self.graph.invoke(initial_state)
192
+
193
+ return {
194
+ "soil_data": result["soil_data"],
195
+ "analysis_results": result["analysis_results"],
196
+ "messages": result["messages"],
197
+ "current_task": result["current_task"],
198
+ "iteration_count": result["iteration_count"]
199
+ }
200
+
201
+ def process_feedback(self, current_state, feedback):
202
+ """Process user feedback and continue analysis"""
203
+ current_state["user_feedback"] = feedback
204
+
205
+ # Continue from feedback handling
206
+ result = self.graph.invoke(current_state, {"recursion_limit": 10})
207
+
208
+ return {
209
+ "soil_data": result["soil_data"],
210
+ "analysis_results": result["analysis_results"],
211
+ "messages": result["messages"],
212
+ "current_task": result["current_task"],
213
+ "iteration_count": result["iteration_count"]
214
+ }
llm_client.py ADDED
@@ -0,0 +1,588 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import openai
2
+ import json
3
+ import streamlit as st
4
+ from config import LLM_PROVIDERS, AVAILABLE_MODELS, get_default_provider_and_model
5
+ from soil_calculations import SoilCalculations
6
+
7
+ class LLMClient:
8
+ def __init__(self, model=None, api_key=None, provider=None):
9
+ # Get defaults if not provided
10
+ if not provider or not model:
11
+ default_provider, default_model = get_default_provider_and_model()
12
+ self.provider = provider or default_provider
13
+ self.model = model or default_model
14
+ else:
15
+ self.provider = provider
16
+ self.model = model
17
+
18
+ self.api_key = api_key
19
+
20
+ # Only create client if we have API key and provider
21
+ if not self.api_key or not self.provider:
22
+ self.client = None
23
+ self.calculator = SoilCalculations()
24
+ return
25
+
26
+ # Get provider configuration
27
+ provider_config = LLM_PROVIDERS.get(self.provider, {})
28
+ base_url = provider_config.get("base_url", "https://openrouter.ai/api/v1")
29
+
30
+ self.client = openai.OpenAI(
31
+ base_url=base_url,
32
+ api_key=self.api_key,
33
+ )
34
+ self.calculator = SoilCalculations()
35
+
36
+ def _supports_images(self) -> bool:
37
+ """Check if the current model supports image inputs"""
38
+ model_info = AVAILABLE_MODELS.get(self.model, {})
39
+ return model_info.get('supports_images', False)
40
+
41
+ def analyze_soil_boring_log(self, text_content=None, image_base64=None):
42
+ """Analyze soil boring log using LLM"""
43
+
44
+ # Standardize units in text content before analysis
45
+ if text_content:
46
+ text_content, unit_conversions = self.calculator.standardize_units(text_content)
47
+ if unit_conversions:
48
+ st.info(f"πŸ“ Converted units: {', '.join([f'{k}β†’{v}' for k, v in unit_conversions.items()])}")
49
+
50
+ system_prompt = """You are an expert geotechnical engineer specializing in soil boring log interpretation.
51
+
52
+ IMPORTANT: You must respond with ONLY valid JSON data. Do not include any text before or after the JSON.
53
+
54
+ SAMPLE TYPE IDENTIFICATION (CRITICAL - FOLLOW EXACT ORDER):
55
+
56
+ **STEP 1 - FIRST COLUMN STRATIFICATION SYMBOLS (ABSOLUTE HIGHEST PRIORITY):**
57
+ ALWAYS look at the FIRST COLUMN of each layer for stratification symbols:
58
+
59
+ - **SS-1, SS-2, SS-18, SS18, SS-5** β†’ SS (Split Spoon) sample
60
+ - **ST-1, ST-2, ST-5, ST5, ST-12** β†’ ST (Shelby Tube) sample
61
+ - **SS1, SS2, SS3** (without dash) β†’ SS sample
62
+ - **ST1, ST2, ST3** (without dash) β†’ ST sample
63
+ - **Look for pattern: [SS|ST][-]?[0-9]+** in first column
64
+
65
+ **EXAMPLES of First Column Recognition:**
66
+ ```
67
+ SS-18 | Brown clay, N=8 β†’ sample_type="SS" (SS-18 in first column)
68
+ ST-5 | Gray clay, Su=45 kPa β†’ sample_type="ST" (ST-5 in first column)
69
+ SS12 | Sandy clay, SPT test β†’ sample_type="SS" (SS12 in first column)
70
+ ST3 | Soft clay, unconfined β†’ sample_type="ST" (ST3 in first column)
71
+ ```
72
+
73
+ **STEP 2 - If NO first column symbols, then check description keywords:**
74
+ - SS indicators: "split spoon", "SPT", "standard penetration", "disturbed"
75
+ - ST indicators: "shelby", "tube", "undisturbed", "UT", "unconfined compression"
76
+
77
+ **STEP 3 - If still unclear, use strength parameter type:**
78
+ - SPT-N values present β†’ likely SS sample
79
+ - Su values from unconfined test β†’ likely ST sample
80
+
81
+ CRITICAL SOIL CLASSIFICATION RULES (MANDATORY):
82
+
83
+ **SAND LAYER CLASSIFICATION REQUIREMENTS:**
84
+ 1. **Sand layers MUST have sieve analysis evidence** - Look for:
85
+ - "Sieve #200: X% passing" or "#200 passing: X%"
86
+ - "Fines content: X%" (same as sieve #200)
87
+ - "Particle size analysis" or "gradation test"
88
+ - "% passing 0.075mm" (equivalent to #200 sieve)
89
+
90
+ 2. **Classification Rules**:
91
+ - Sieve #200 >50% passing β†’ CLAY (fine-grained)
92
+ - Sieve #200 <50% passing β†’ SAND/GRAVEL (coarse-grained)
93
+
94
+ 3. **NO SIEVE ANALYSIS = ASSUME CLAY (MANDATORY)**:
95
+ - If no sieve analysis data found β†’ ALWAYS classify as CLAY
96
+ - Include note: "Assumed clay - no sieve analysis data available"
97
+ - Set sieve_200_passing: null (not a number)
98
+
99
+ **CRITICAL**: Never classify as sand/silt without explicit sieve analysis evidence
100
+ **CRITICAL**: Always look for sieve #200 data before classifying as sand
101
+
102
+ CRITICAL SS/ST SAMPLE RULES (MUST FOLLOW):
103
+
104
+ FOR SS (Split Spoon) SAMPLES:
105
+ 1. ALWAYS use RAW N-VALUE (not N-corrected, N-correction, or adjusted N)
106
+ 2. Look for: "N = 15", "SPT-N = 8", "raw N = 20", "field N = 12"
107
+ 3. IGNORE: "N-corrected = 25", "N-correction = 18", "adjusted N = 30"
108
+ 4. For clay: Use SPT-N parameter (will be converted to Su using Su=5*N)
109
+ 5. For sand/silt: Use SPT-N parameter (will be converted to friction angle)
110
+ 6. NEVER use unconfined compression Su values for SS samples - ONLY use N values
111
+
112
+ FOR ST (Shelby Tube) SAMPLES:
113
+ 1. ALWAYS USE DIRECT Su values from unconfined compression test
114
+ 2. If ST sample has Su value (e.g., "Su = 25 kPa"), use that EXACT value
115
+ 3. NEVER convert SPT-N to Su for ST samples when direct Su is available
116
+ 4. Priority: Direct Su measurement > any other value
117
+
118
+ EXTRACTION PRIORITY FOR SS SAMPLES:
119
+ 1. Raw N, Field N, Measured N (highest priority)
120
+ 2. N-value without "corrected" or "correction" terms
121
+ 3. General SPT-N value (lowest priority)
122
+ 4. NEVER use Su from unconfined compression for SS samples
123
+
124
+ CRITICAL UNIT CONVERSION REQUIREMENTS (MUST APPLY):
125
+
126
+ **MANDATORY SU UNIT CONVERSION - READ FROM IMAGE/FILE:**
127
+ When extracting Su values from images or text, you MUST convert to kPa BEFORE using the value:
128
+
129
+ 1. **ksc or kg/cmΒ²**: Su_kPa = Su_ksc Γ— 98.0
130
+ Example: "Su = 2.5 ksc" β†’ strength_value: 245 (not 2.5)
131
+
132
+ 2. **t/mΒ² (tonnes/mΒ²)**: Su_kPa = Su_tonnes Γ— 9.81
133
+ Example: "Su = 3.0 t/mΒ²" β†’ strength_value: 29.43 (not 3.0)
134
+
135
+ 3. **psi**: Su_kPa = Su_psi Γ— 6.895
136
+ Example: "Su = 50 psi" β†’ strength_value: 344.75 (not 50)
137
+
138
+ 4. **psf**: Su_kPa = Su_psf Γ— 0.048
139
+ Example: "Su = 1000 psf" β†’ strength_value: 48 (not 1000)
140
+
141
+ 5. **kPa**: Use directly (no conversion needed)
142
+ Example: "Su = 75 kPa" β†’ strength_value: 75
143
+
144
+ 6. **MPa**: Su_kPa = Su_MPa Γ— 1000
145
+ Example: "Su = 0.1 MPa" β†’ strength_value: 100 (not 0.1)
146
+
147
+ **IMPORTANT**: Always include original unit in description for verification
148
+ **SPT-N values**: Keep as-is (no unit conversion needed)
149
+
150
+ CRITICAL SU-WATER CONTENT VALIDATION (MANDATORY):
151
+
152
+ **EXTRACT WATER CONTENT WHEN AVAILABLE:**
153
+ Always extract water content (w%) when mentioned in the description:
154
+ - \"water content = 25%\" β†’ water_content: 25
155
+ - \"w = 30%\" β†’ water_content: 30
156
+ - \"moisture content 35%\" β†’ water_content: 35
157
+
158
+ **VALIDATE SU-WATER CONTENT CORRELATION:**
159
+ For clay layers, Su and water content should correlate reasonably:
160
+ - Very soft clay: Su < 25 kPa, w% > 40%
161
+ - Soft clay: Su 25-50 kPa, w% 30-40%
162
+ - Medium clay: Su 50-100 kPa, w% 20-30%
163
+ - Stiff clay: Su 100-200 kPa, w% 15-25%
164
+ - Very stiff clay: Su 200-400 kPa, w% 10-20%
165
+ - Hard clay: Su > 400 kPa, w% < 15%
166
+
167
+ **CRITICAL UNIT CHECK SCENARIOS:**
168
+ - If Su > 1000 kPa with w% > 20%: CHECK if Su is in wrong units (psi, psf?)
169
+ - If Su < 5 kPa with w% < 15%: CHECK if Su is in wrong units (MPa, bar?)
170
+ - If correlation seems very off: VERIFY unit conversion was applied correctly
171
+
172
+ CRITICAL OUTPUT FORMAT (MANDATORY):
173
+
174
+ You MUST respond with ONLY a valid JSON object. Do not include:
175
+ - Explanatory text before or after the JSON
176
+ - Markdown formatting (```json ```)
177
+ - Comments or notes
178
+ - Multiple JSON objects
179
+
180
+ Start your response directly with { and end with }
181
+
182
+ LAYER GROUPING REQUIREMENTS:
183
+ 1. MAXIMUM 7 LAYERS TOTAL - Group similar adjacent layers to achieve this limit
184
+ 2. CLAY AND SAND MUST BE SEPARATE - Never combine clay layers with sand layers
185
+ 3. Group adjacent layers with similar properties (same soil type and similar consistency)
186
+ 4. Prioritize engineering significance over minor variations
187
+
188
+ Analyze the provided soil boring log and extract the following information in this exact JSON format:
189
+
190
+ {
191
+ "project_info": {
192
+ "project_name": "string",
193
+ "boring_id": "string",
194
+ "location": "string",
195
+ "date": "string",
196
+ "depth_total": 10.0
197
+ },
198
+ "soil_layers": [
199
+ {
200
+ "layer_id": 1,
201
+ "depth_from": 0.0,
202
+ "depth_to": 2.5,
203
+ "soil_type": "clay",
204
+ "description": "Brown silty clay, ST sample, Su = 25 kPa",
205
+ "sample_type": "ST",
206
+ "strength_parameter": "Su",
207
+ "strength_value": 25,
208
+ "sieve_200_passing": 65,
209
+ "water_content": 35.5,
210
+ "color": "brown",
211
+ "moisture": "moist",
212
+ "consistency": "soft",
213
+ "su_source": "Unconfined Compression Test"
214
+ }
215
+ ],
216
+ "water_table": {
217
+ "depth": 3.0,
218
+ "date_encountered": "2024-01-01"
219
+ },
220
+ "notes": "Additional observations"
221
+ }
222
+
223
+ EXAMPLES OF CORRECT PROCESSING WITH UNIT CONVERSION AND SOIL CLASSIFICATION:
224
+
225
+ **SS SAMPLE EXAMPLES:**
226
+ 1. "SS-18: Clay layer, N = 8, Su = 45 kPa from unconfined test"
227
+ β†’ Use: sample_type="SS", strength_parameter="SPT-N", strength_value=8
228
+ β†’ IGNORE the Su=45 kPa value for SS samples
229
+
230
+ 2. "SS18: Soft clay, field N = 6, N-corrected = 10"
231
+ β†’ Use: sample_type="SS", strength_parameter="SPT-N", strength_value=6 (raw N)
232
+ β†’ IGNORE N-corrected value
233
+
234
+ **ST SAMPLE EXAMPLES WITH UNIT CONVERSION:**
235
+ 1. "ST-5: Stiff clay, Su = 85 kPa from unconfined compression"
236
+ β†’ Use: sample_type="ST", strength_parameter="Su", strength_value=85
237
+
238
+ 2. "ST-12: Medium clay, Su = 2.5 ksc from unconfined test"
239
+ β†’ Convert: 2.5 Γ— 98 = 245 kPa
240
+ β†’ Use: sample_type="ST", strength_parameter="Su", strength_value=245
241
+
242
+ 3. "ST sample: Clay, unconfined strength = 3.0 t/mΒ²"
243
+ β†’ Convert: 3.0 Γ— 9.81 = 29.43 kPa
244
+ β†’ Use: sample_type="ST", strength_parameter="Su", strength_value=29.43
245
+
246
+ **SOIL CLASSIFICATION EXAMPLES:**
247
+ 1. "Brown silty clay, no sieve analysis data"
248
+ β†’ soil_type="clay", sieve_200_passing=null
249
+ β†’ Note: "Assumed clay - no sieve analysis data available"
250
+
251
+ 2. "Sandy clay, sieve #200: 75% passing"
252
+ β†’ soil_type="clay", sieve_200_passing=75
253
+ β†’ Classification: Clay (>50% passing)
254
+
255
+ 3. "Medium sand, gradation test shows 25% passing #200"
256
+ β†’ soil_type="sand", sieve_200_passing=25
257
+ β†’ Classification: Sand (<50% passing)
258
+
259
+ 4. "Dense sand layer" (NO sieve data mentioned)
260
+ β†’ soil_type="clay", sieve_200_passing=null
261
+ β†’ Note: "Assumed clay - no sieve analysis data available"
262
+ β†’ NEVER classify as sand without sieve data
263
+
264
+ CRITICAL LAYER GROUPING RULES:
265
+ 1. MAXIMUM 7 LAYERS - If you identify more than 7 distinct zones, group adjacent similar layers
266
+ 2. SEPARATE CLAY/SAND - Never group clay with sand, silt, or gravel layers
267
+ 3. Group similar adjacent layers:
268
+ - Combine "soft clay" + "soft clay" into one "soft clay" layer
269
+ - Combine "medium sand" + "medium sand" into one "medium sand" layer
270
+ - Combine layers with similar strength values (within 30% difference)
271
+ 4. Maintain engineering significance:
272
+ - Keep layers with significantly different strength parameters separate
273
+ - Preserve important transitions (e.g., clay to sand interface)
274
+ - Maintain water table interfaces as layer boundaries when significant
275
+
276
+ TECHNICAL RULES:
277
+ 1. All numeric values must be numbers, not strings
278
+ 2. For soil_type, use basic terms: "clay", "sand", "silt", "gravel" - do NOT include consistency
279
+ 3. Include sample_type field: "SS" (Split Spoon) or "ST" (Shelby Tube)
280
+ 4. Include sieve_200_passing field when available (percentage passing sieve #200)
281
+ 5. Include water_content field when available (percentage water content for clay consistency checks)
282
+ 6. Include su_source field: "Unconfined Compression Test" for direct measurements, or "Calculated from SPT-N" for conversions
283
+ 7. Strength parameters:
284
+ - SS samples: ALWAYS use "SPT-N" with RAW N-value (will be converted based on soil type)
285
+ - ST samples with clay: Use "Su" with DIRECT value in kPa from unconfined compression test
286
+ - For sand/gravel: Always use "SPT-N" with N-value
287
+ - NEVER use Su for SS samples, NEVER calculate Su from SPT-N for ST samples that have direct Su
288
+ 8. Put consistency separately in "consistency" field: "soft", "medium", "stiff", "loose", "dense", etc.
289
+ 9. Ensure continuous depths (no gaps or overlaps)
290
+ 10. All depths in meters, strength values as numbers
291
+ 11. Return ONLY the JSON object, no additional text
292
+
293
+ GROUPING EXAMPLES:
294
+ - Original: [0-2m soft clay, 2-4m soft clay, 4-6m medium sand, 6-8m medium sand]
295
+ - Grouped: [0-4m soft clay, 4-8m medium sand] (4 layers reduced to 2)
296
+
297
+ STRENGTH PARAMETER EXAMPLES:
298
+ - SS sample: "Clay, N = 8 blows, Su = 40 kPa unconfined" β†’ Use SPT-N = 8 (IGNORE Su for SS)
299
+ - ST sample: "Clay, Su = 45 kPa from unconfined test" β†’ Use Su = 45 (DIRECT measurement)
300
+ - SS sample: "Clay, field N = 12, N-corrected = 18" β†’ Use SPT-N = 12 (raw N, IGNORE corrected)"""
301
+
302
+ messages = [{"role": "system", "content": system_prompt}]
303
+
304
+ # Check if model supports images
305
+ supports_images = self._supports_images()
306
+
307
+ if text_content:
308
+ messages.append({
309
+ "role": "user",
310
+ "content": f"Please analyze this soil boring log text:\n\n{text_content}"
311
+ })
312
+
313
+ if image_base64 and supports_images:
314
+ messages.append({
315
+ "role": "user",
316
+ "content": [
317
+ {
318
+ "type": "text",
319
+ "text": "Please analyze this soil boring log image:"
320
+ },
321
+ {
322
+ "type": "image_url",
323
+ "image_url": {
324
+ "url": f"data:image/png;base64,{image_base64}"
325
+ }
326
+ }
327
+ ]
328
+ })
329
+ elif image_base64 and not supports_images:
330
+ # Model doesn't support images, notify user and continue with text-only
331
+ model_name = AVAILABLE_MODELS.get(self.model, {}).get('name', self.model)
332
+ st.warning(f"⚠️ {model_name} doesn't support image analysis. Using text content only.")
333
+ if not text_content:
334
+ st.error("❌ No text content available for analysis. Please ensure your document has extractable text or use a model that supports images.")
335
+ return {"error": "No text content available and model doesn't support images"}
336
+
337
+ try:
338
+ response = self.client.chat.completions.create(
339
+ model=self.model,
340
+ messages=messages,
341
+ max_tokens=2000,
342
+ temperature=0.1
343
+ )
344
+
345
+ content = response.choices[0].message.content
346
+
347
+ # Try to extract JSON from response
348
+ try:
349
+ # Try different JSON extraction methods
350
+ json_str = content.strip()
351
+
352
+ # Remove markdown code blocks if present
353
+ if "```json" in json_str:
354
+ json_start = json_str.find("```json") + 7
355
+ json_end = json_str.find("```", json_start)
356
+ json_str = json_str[json_start:json_end].strip()
357
+ elif "```" in json_str:
358
+ # Remove any code blocks
359
+ json_start = json_str.find("```") + 3
360
+ json_end = json_str.rfind("```")
361
+ if json_end > json_start:
362
+ json_str = json_str[json_start:json_end].strip()
363
+
364
+ # Find JSON object boundaries
365
+ if not json_str.startswith("{"):
366
+ start_idx = json_str.find("{")
367
+ if start_idx != -1:
368
+ json_str = json_str[start_idx:]
369
+
370
+ if not json_str.endswith("}"):
371
+ end_idx = json_str.rfind("}")
372
+ if end_idx != -1:
373
+ json_str = json_str[:end_idx + 1]
374
+
375
+ # Parse JSON
376
+ result = json.loads(json_str)
377
+
378
+ # Validate required structure
379
+ if "soil_layers" not in result:
380
+ result["soil_layers"] = []
381
+ if "project_info" not in result:
382
+ result["project_info"] = {}
383
+
384
+ # Validate and enhance soil classification
385
+ result = self.calculator.validate_soil_classification(result)
386
+
387
+ # Enhance layers with calculated parameters
388
+ if result["soil_layers"]:
389
+ result["soil_layers"] = self.calculator.enhance_soil_layers(result["soil_layers"])
390
+
391
+ # Process with SS/ST classification
392
+ result = self.calculator.process_with_ss_st_classification(result)
393
+
394
+ # Enforce 7-layer limit and clay/sand separation
395
+ result["soil_layers"] = self._enforce_layer_grouping_rules(result["soil_layers"])
396
+
397
+ return result
398
+
399
+ except json.JSONDecodeError as e:
400
+ st.error(f"Failed to parse LLM response as JSON: {str(e)}")
401
+ # Try to create a basic structure from the response
402
+ return self._fallback_parse(content)
403
+
404
+ except Exception as e:
405
+ error_msg = str(e)
406
+
407
+ # Check for model availability error
408
+ if "not a valid model ID" in error_msg:
409
+ st.error(f"❌ Model '{self.model}' is not available on OpenRouter")
410
+ st.info("πŸ’‘ Try switching to a different model in the sidebar (Claude-3.5 Sonnet or GPT-4 Turbo are recommended)")
411
+ return {"error": f"Model not available: {self.model}"}
412
+ else:
413
+ st.error(f"Error calling LLM API: {error_msg}")
414
+ return {"error": error_msg}
415
+
416
+ def _fallback_parse(self, content):
417
+ """Fallback parser when JSON parsing fails"""
418
+ try:
419
+ import re
420
+
421
+ # Try to extract basic information using regex
422
+ layers = []
423
+
424
+ # Look for depth patterns like "0-2m", "2-5m", etc.
425
+ depth_pattern = r'(\d+(?:\.\d+)?)\s*-\s*(\d+(?:\.\d+)?)m?\s*[:|]?\s*([^,\n]+)'
426
+ matches = re.findall(depth_pattern, content, re.IGNORECASE)
427
+
428
+ for i, match in enumerate(matches):
429
+ depth_from = float(match[0])
430
+ depth_to = float(match[1])
431
+ description = match[2].strip()
432
+
433
+ # Extract soil type from description
434
+ soil_type = "unknown"
435
+ if "clay" in description.lower():
436
+ if "soft" in description.lower():
437
+ soil_type = "soft clay"
438
+ elif "stiff" in description.lower():
439
+ soil_type = "stiff clay"
440
+ else:
441
+ soil_type = "medium clay"
442
+ elif "sand" in description.lower():
443
+ if "loose" in description.lower():
444
+ soil_type = "loose sand"
445
+ elif "dense" in description.lower():
446
+ soil_type = "dense sand"
447
+ else:
448
+ soil_type = "medium dense sand"
449
+
450
+ layers.append({
451
+ "layer_id": i + 1,
452
+ "depth_from": depth_from,
453
+ "depth_to": depth_to,
454
+ "soil_type": soil_type,
455
+ "description": description,
456
+ "strength_parameter": "Su" if "clay" in soil_type else "SPT-N",
457
+ "strength_value": 50, # Default value
458
+ "color": "unknown",
459
+ "moisture": "unknown",
460
+ "consistency": "unknown"
461
+ })
462
+
463
+ return {
464
+ "project_info": {
465
+ "project_name": "Unknown",
466
+ "boring_id": "Unknown",
467
+ "location": "Unknown",
468
+ "date": "Unknown",
469
+ "depth_total": max([layer["depth_to"] for layer in layers]) if layers else 0
470
+ },
471
+ "soil_layers": layers,
472
+ "water_table": {"depth": None, "date_encountered": None},
473
+ "notes": "Parsed using fallback method - original response: " + content[:200] + "..."
474
+ }
475
+ except Exception as e:
476
+ return {"error": f"Fallback parsing failed: {str(e)}", "raw_response": content}
477
+
478
+ def _enforce_layer_grouping_rules(self, layers):
479
+ """Enforce 7-layer maximum and clay/sand separation rules"""
480
+
481
+ if not layers or len(layers) <= 7:
482
+ return layers
483
+
484
+ st.info(f"πŸ“Š Grouping layers: {len(layers)} layers found, grouping to meet 7-layer limit")
485
+
486
+ # Group similar adjacent layers to reduce count to 7 or fewer
487
+ grouped_layers = []
488
+ i = 0
489
+
490
+ while i < len(layers) and len(grouped_layers) < 7:
491
+ current_layer = layers[i].copy()
492
+
493
+ # Check if we can group with next layers
494
+ if i < len(layers) - 1 and len(grouped_layers) < 6: # Leave room for at least one more layer
495
+ next_layer = layers[i + 1]
496
+
497
+ # Group if same soil type and similar consistency (but never clay with sand)
498
+ can_group = (
499
+ current_layer.get('soil_type') == next_layer.get('soil_type') and
500
+ current_layer.get('consistency') == next_layer.get('consistency') and
501
+ not (current_layer.get('soil_type') == 'clay' and next_layer.get('soil_type') == 'sand') and
502
+ not (current_layer.get('soil_type') == 'sand' and next_layer.get('soil_type') == 'clay')
503
+ )
504
+
505
+ if can_group:
506
+ # Merge the layers
507
+ current_layer['depth_to'] = next_layer.get('depth_to', current_layer['depth_to'])
508
+ current_layer['description'] = f"Grouped: {current_layer.get('description', '')} + {next_layer.get('description', '')}"
509
+
510
+ # Average strength values
511
+ curr_strength = current_layer.get('strength_value', 0) or 0
512
+ next_strength = next_layer.get('strength_value', 0) or 0
513
+ if curr_strength and next_strength:
514
+ current_layer['strength_value'] = (curr_strength + next_strength) / 2
515
+ elif next_strength:
516
+ current_layer['strength_value'] = next_strength
517
+
518
+ # Skip next layer since it's been merged
519
+ i += 2
520
+ else:
521
+ i += 1
522
+ else:
523
+ i += 1
524
+
525
+ grouped_layers.append(current_layer)
526
+
527
+ # If still too many layers, group remaining similar layers into existing ones
528
+ if i < len(layers):
529
+ for remaining_layer in layers[i:]:
530
+ # Find a compatible layer to merge with
531
+ merged = False
532
+ for existing_layer in grouped_layers:
533
+ if (existing_layer.get('soil_type') == remaining_layer.get('soil_type') and
534
+ existing_layer.get('consistency') == remaining_layer.get('consistency')):
535
+ existing_layer['depth_to'] = max(existing_layer['depth_to'], remaining_layer.get('depth_to', 0))
536
+ existing_layer['description'] += f" + {remaining_layer.get('description', '')}"
537
+ merged = True
538
+ break
539
+
540
+ if not merged and len(grouped_layers) < 7:
541
+ grouped_layers.append(remaining_layer)
542
+
543
+ # Update layer IDs
544
+ for idx, layer in enumerate(grouped_layers):
545
+ layer['layer_id'] = idx + 1
546
+
547
+ # Add note about grouping
548
+ if len(grouped_layers) < len(layers):
549
+ st.success(f"βœ… Grouped {len(layers)} layers into {len(grouped_layers)} layers (7-layer limit)")
550
+
551
+ return grouped_layers[:7] # Ensure maximum 7 layers
552
+
553
+ def refine_soil_layers(self, soil_data, user_feedback):
554
+ """Refine soil layer interpretation based on user feedback"""
555
+
556
+ system_prompt = """You are an expert geotechnical engineer. The user has provided feedback on the initial soil boring log analysis.
557
+ Please refine the soil layer interpretation based on their input and return the updated JSON in the same format."""
558
+
559
+ messages = [
560
+ {"role": "system", "content": system_prompt},
561
+ {"role": "user", "content": f"Original analysis: {json.dumps(soil_data, indent=2)}"},
562
+ {"role": "user", "content": f"User feedback: {user_feedback}"}
563
+ ]
564
+
565
+ try:
566
+ response = self.client.chat.completions.create(
567
+ model=self.model,
568
+ messages=messages,
569
+ max_tokens=2000,
570
+ temperature=0.1
571
+ )
572
+
573
+ content = response.choices[0].message.content
574
+
575
+ try:
576
+ if "```json" in content:
577
+ json_start = content.find("```json") + 7
578
+ json_end = content.find("```", json_start)
579
+ json_str = content[json_start:json_end].strip()
580
+ else:
581
+ json_str = content
582
+
583
+ return json.loads(json_str)
584
+ except json.JSONDecodeError:
585
+ return {"error": "Invalid JSON response", "raw_response": content}
586
+
587
+ except Exception as e:
588
+ return {"error": str(e)}
nearest_neighbor_grouping.py ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from sklearn.neighbors import NearestNeighbors
3
+ from sklearn.preprocessing import StandardScaler
4
+ from sklearn.cluster import DBSCAN
5
+ import pandas as pd
6
+ from typing import List, Dict, Any, Tuple
7
+ import streamlit as st
8
+
9
+ class NearestNeighborGrouping:
10
+ def __init__(self):
11
+ self.scaler = StandardScaler()
12
+ self.feature_weights = {
13
+ 'depth_mid': 0.05, # Depth position (less important for similarity)
14
+ 'thickness': 0.05, # Layer thickness (less important)
15
+ 'soil_type_encoded': 0.35, # Soil type (most important)
16
+ 'consistency_encoded': 0.30, # Consistency/density (very important)
17
+ 'strength_value': 0.15, # Strength parameter
18
+ 'moisture_encoded': 0.05, # Moisture content
19
+ 'color_encoded': 0.05 # Color
20
+ }
21
+
22
+ def encode_categorical_features(self, layers: List[Dict]) -> pd.DataFrame:
23
+ """Convert categorical features to numerical for clustering"""
24
+
25
+ # Create DataFrame from layers
26
+ df_data = []
27
+ for i, layer in enumerate(layers):
28
+ layer_data = {
29
+ 'layer_index': i,
30
+ 'layer_id': layer.get('layer_id', i+1),
31
+ 'depth_from': layer.get('depth_from', 0),
32
+ 'depth_to': layer.get('depth_to', 0),
33
+ 'depth_mid': (layer.get('depth_from', 0) + layer.get('depth_to', 0)) / 2,
34
+ 'thickness': layer.get('depth_to', 0) - layer.get('depth_from', 0),
35
+ 'soil_type': layer.get('soil_type', 'unknown').lower(),
36
+ 'consistency': layer.get('consistency', 'unknown').lower(),
37
+ 'strength_value': layer.get('strength_value', 0) or layer.get('calculated_su', 0) or 0,
38
+ 'moisture': layer.get('moisture', 'unknown').lower(),
39
+ 'color': layer.get('color', 'unknown').lower(),
40
+ 'description': layer.get('description', '')
41
+ }
42
+ df_data.append(layer_data)
43
+
44
+ df = pd.DataFrame(df_data)
45
+
46
+ # Encode soil types
47
+ soil_type_mapping = {
48
+ 'clay': 1, 'silt': 2, 'sand': 3, 'gravel': 4, 'rock': 5, 'unknown': 0
49
+ }
50
+ df['soil_type_encoded'] = df['soil_type'].map(soil_type_mapping).fillna(0)
51
+
52
+ # Encode consistency/density
53
+ consistency_mapping = {
54
+ 'very soft': 1, 'soft': 2, 'medium': 3, 'stiff': 4, 'very stiff': 5, 'hard': 6,
55
+ 'very loose': 1, 'loose': 2, 'medium dense': 3, 'dense': 4, 'very dense': 5,
56
+ 'unknown': 0
57
+ }
58
+ df['consistency_encoded'] = df['consistency'].map(consistency_mapping).fillna(0)
59
+
60
+ # Encode moisture
61
+ moisture_mapping = {
62
+ 'dry': 1, 'moist': 2, 'wet': 3, 'saturated': 4, 'unknown': 0
63
+ }
64
+ df['moisture_encoded'] = df['moisture'].map(moisture_mapping).fillna(0)
65
+
66
+ # Encode colors (simplified)
67
+ color_mapping = {
68
+ 'brown': 1, 'gray': 2, 'black': 3, 'red': 4, 'yellow': 5, 'white': 6, 'unknown': 0
69
+ }
70
+ df['color_encoded'] = df['color'].map(color_mapping).fillna(0)
71
+
72
+ return df
73
+
74
+ def calculate_layer_similarity(self, df: pd.DataFrame) -> np.ndarray:
75
+ """Calculate similarity matrix between layers using weighted features"""
76
+
77
+ # Select features for similarity calculation
78
+ feature_columns = [
79
+ 'depth_mid', 'thickness', 'soil_type_encoded',
80
+ 'consistency_encoded', 'strength_value', 'moisture_encoded', 'color_encoded'
81
+ ]
82
+
83
+ # Prepare feature matrix
84
+ features = df[feature_columns].copy()
85
+
86
+ # Handle missing values
87
+ features = features.fillna(0)
88
+
89
+ # Apply feature weights
90
+ for col in feature_columns:
91
+ if col in self.feature_weights:
92
+ features[col] = features[col] * self.feature_weights[col]
93
+
94
+ # Standardize features
95
+ features_scaled = self.scaler.fit_transform(features)
96
+
97
+ # Calculate similarity matrix (using negative euclidean distance)
98
+ from sklearn.metrics.pairwise import euclidean_distances
99
+ distance_matrix = euclidean_distances(features_scaled)
100
+ similarity_matrix = 1 / (1 + distance_matrix) # Convert distance to similarity
101
+
102
+ return similarity_matrix, features_scaled
103
+
104
+ def find_nearest_neighbors(self, df: pd.DataFrame, k: int = 3) -> List[Dict]:
105
+ """Find k nearest neighbors for each soil layer"""
106
+
107
+ similarity_matrix, features_scaled = self.calculate_layer_similarity(df)
108
+
109
+ # Use NearestNeighbors to find k nearest neighbors
110
+ nn_model = NearestNeighbors(n_neighbors=min(k+1, len(df)), metric='euclidean')
111
+ nn_model.fit(features_scaled)
112
+
113
+ distances, indices = nn_model.kneighbors(features_scaled)
114
+
115
+ nearest_neighbors = []
116
+ for i, (layer_distances, layer_indices) in enumerate(zip(distances, indices)):
117
+ neighbors = []
118
+ for j, (dist, idx) in enumerate(zip(layer_distances[1:], layer_indices[1:])): # Skip self
119
+ neighbor_info = {
120
+ 'neighbor_index': int(idx),
121
+ 'neighbor_id': df.iloc[idx]['layer_id'],
122
+ 'distance': float(dist),
123
+ 'similarity_score': float(similarity_matrix[i, idx]),
124
+ 'soil_type': df.iloc[idx]['soil_type'],
125
+ 'consistency': df.iloc[idx]['consistency'],
126
+ 'depth_range': f"{df.iloc[idx]['depth_from']:.1f}-{df.iloc[idx]['depth_to']:.1f}m"
127
+ }
128
+ neighbors.append(neighbor_info)
129
+
130
+ layer_nn = {
131
+ 'layer_index': i,
132
+ 'layer_id': df.iloc[i]['layer_id'],
133
+ 'soil_type': df.iloc[i]['soil_type'],
134
+ 'consistency': df.iloc[i]['consistency'],
135
+ 'depth_range': f"{df.iloc[i]['depth_from']:.1f}-{df.iloc[i]['depth_to']:.1f}m",
136
+ 'nearest_neighbors': neighbors
137
+ }
138
+ nearest_neighbors.append(layer_nn)
139
+
140
+ return nearest_neighbors
141
+
142
+ def group_similar_layers(self, df: pd.DataFrame, similarity_threshold: float = 0.7) -> List[List[int]]:
143
+ """Group layers using DBSCAN clustering based on similarity"""
144
+
145
+ similarity_matrix, features_scaled = self.calculate_layer_similarity(df)
146
+
147
+ # Convert similarity to distance for DBSCAN
148
+ distance_matrix = 1 - similarity_matrix
149
+
150
+ # Use DBSCAN for clustering
151
+ eps = 1 - similarity_threshold # Convert similarity threshold to distance
152
+ clustering = DBSCAN(eps=eps, min_samples=1, metric='precomputed')
153
+ cluster_labels = clustering.fit_predict(distance_matrix)
154
+
155
+ # Group layers by cluster
156
+ clusters = {}
157
+ for i, label in enumerate(cluster_labels):
158
+ if label not in clusters:
159
+ clusters[label] = []
160
+ clusters[label].append(i)
161
+
162
+ # Convert to list of groups, filter out single-layer groups
163
+ layer_groups = []
164
+ for cluster_id, layer_indices in clusters.items():
165
+ if len(layer_indices) > 1: # Only groups with multiple layers
166
+ layer_groups.append(layer_indices)
167
+
168
+ return layer_groups, cluster_labels
169
+
170
+ def analyze_group_properties(self, df: pd.DataFrame, group_indices: List[int]) -> Dict:
171
+ """Analyze properties of a group of similar layers"""
172
+
173
+ group_layers = df.iloc[group_indices]
174
+
175
+ analysis = {
176
+ 'group_size': len(group_indices),
177
+ 'depth_range': {
178
+ 'min': group_layers['depth_from'].min(),
179
+ 'max': group_layers['depth_to'].max(),
180
+ 'total_thickness': group_layers['thickness'].sum()
181
+ },
182
+ 'soil_types': group_layers['soil_type'].value_counts().to_dict(),
183
+ 'consistencies': group_layers['consistency'].value_counts().to_dict(),
184
+ 'strength_stats': {
185
+ 'mean': group_layers['strength_value'].mean(),
186
+ 'min': group_layers['strength_value'].min(),
187
+ 'max': group_layers['strength_value'].max(),
188
+ 'std': group_layers['strength_value'].std()
189
+ },
190
+ 'layer_ids': group_layers['layer_id'].tolist(),
191
+ 'depth_ranges': [f"{row['depth_from']:.1f}-{row['depth_to']:.1f}m"
192
+ for _, row in group_layers.iterrows()]
193
+ }
194
+
195
+ return analysis
196
+
197
+ def suggest_layer_merging(self, layers: List[Dict], similarity_threshold: float = 0.8) -> Dict:
198
+ """Suggest which layers should be merged based on nearest neighbor analysis"""
199
+
200
+ if len(layers) < 2:
201
+ return {"groups": [], "recommendations": []}
202
+
203
+ # Encode features
204
+ df = self.encode_categorical_features(layers)
205
+
206
+ # Find similar layer groups
207
+ layer_groups, cluster_labels = self.group_similar_layers(df, similarity_threshold)
208
+
209
+ # Analyze each group
210
+ group_analyses = []
211
+ recommendations = []
212
+
213
+ for i, group_indices in enumerate(layer_groups):
214
+ group_analysis = self.analyze_group_properties(df, group_indices)
215
+ group_analysis['group_id'] = i + 1
216
+ group_analyses.append(group_analysis)
217
+
218
+ # Check if layers are adjacent or close
219
+ group_df = df.iloc[group_indices].sort_values('depth_from')
220
+ is_adjacent = self._check_adjacency(group_df)
221
+
222
+ if is_adjacent:
223
+ dominant_soil_type = max(group_analysis['soil_types'].items(), key=lambda x: x[1])[0]
224
+ dominant_consistency = max(group_analysis['consistencies'].items(), key=lambda x: x[1])[0]
225
+
226
+ recommendation = {
227
+ 'group_id': i + 1,
228
+ 'action': 'merge',
229
+ 'reason': f'Similar {dominant_consistency} {dominant_soil_type} layers in adjacent depths',
230
+ 'layer_ids': group_analysis['layer_ids'],
231
+ 'depth_ranges': group_analysis['depth_ranges'],
232
+ 'merged_properties': {
233
+ 'soil_type': dominant_soil_type,
234
+ 'consistency': dominant_consistency,
235
+ 'depth_from': group_analysis['depth_range']['min'],
236
+ 'depth_to': group_analysis['depth_range']['max'],
237
+ 'thickness': group_analysis['depth_range']['total_thickness'],
238
+ 'avg_strength': group_analysis['strength_stats']['mean']
239
+ }
240
+ }
241
+ recommendations.append(recommendation)
242
+
243
+ return {
244
+ 'groups': group_analyses,
245
+ 'recommendations': recommendations,
246
+ 'cluster_labels': cluster_labels.tolist()
247
+ }
248
+
249
+ def _check_adjacency(self, group_df: pd.DataFrame, max_gap: float = 0.5) -> bool:
250
+ """Check if layers in group are adjacent or nearly adjacent"""
251
+
252
+ if len(group_df) <= 1:
253
+ return True
254
+
255
+ # Sort by depth
256
+ sorted_df = group_df.sort_values('depth_from')
257
+
258
+ # Check gaps between consecutive layers
259
+ for i in range(len(sorted_df) - 1):
260
+ current_end = sorted_df.iloc[i]['depth_to']
261
+ next_start = sorted_df.iloc[i + 1]['depth_from']
262
+ gap = next_start - current_end
263
+
264
+ if gap > max_gap:
265
+ return False
266
+
267
+ return True
268
+
269
+ def get_layer_neighbors_report(self, layers: List[Dict], k: int = 3) -> str:
270
+ """Generate a detailed report of nearest neighbors for each layer"""
271
+
272
+ if len(layers) < 2:
273
+ return "Insufficient layers for neighbor analysis."
274
+
275
+ df = self.encode_categorical_features(layers)
276
+ nearest_neighbors = self.find_nearest_neighbors(df, k)
277
+
278
+ report_lines = [
279
+ "NEAREST NEIGHBOR ANALYSIS REPORT",
280
+ "=" * 50,
281
+ ""
282
+ ]
283
+
284
+ for layer_info in nearest_neighbors:
285
+ report_lines.append(f"Layer {layer_info['layer_id']}: {layer_info['consistency']} {layer_info['soil_type']} ({layer_info['depth_range']})")
286
+ report_lines.append(" Nearest Neighbors:")
287
+
288
+ for i, neighbor in enumerate(layer_info['nearest_neighbors'][:k], 1):
289
+ similarity_pct = neighbor['similarity_score'] * 100
290
+ report_lines.append(
291
+ f" {i}. Layer {neighbor['neighbor_id']}: {neighbor['consistency']} {neighbor['soil_type']} "
292
+ f"({neighbor['depth_range']}) - Similarity: {similarity_pct:.1f}%"
293
+ )
294
+
295
+ report_lines.append("")
296
+
297
+ return "\n".join(report_lines)
packages.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ poppler-utils
2
+ tesseract-ocr
3
+ libgl1-mesa-glx
4
+ libglib2.0-0
requirements.txt CHANGED
@@ -1,3 +1,19 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit>=1.28.0
2
+ openai>=1.3.0
3
+ PyPDF2>=3.0.0
4
+ pdf2image>=1.16.0
5
+ Pillow>=10.0.0
6
+ matplotlib>=3.8.0
7
+ plotly>=5.17.0
8
+ pandas>=2.1.0
9
+ numpy>=1.24.0
10
+ langgraph>=0.0.20
11
+ langchain>=0.1.0
12
+ langchain-core>=0.1.0
13
+ langchain-openai>=0.0.5
14
+ python-dotenv>=1.0.0
15
+ scikit-learn>=1.3.0
16
+ crewai>=0.22.0
17
+ crewai-tools>=0.4.0
18
+ typing-extensions>=4.8.0
19
+ pydantic>=2.0.0
soil_analyzer.py ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from typing import List, Dict, Any
3
+ import streamlit as st
4
+ from nearest_neighbor_grouping import NearestNeighborGrouping
5
+
6
+ class SoilLayerAnalyzer:
7
+ def __init__(self):
8
+ self.consistency_mapping = {
9
+ "soft": 1, "loose": 1,
10
+ "medium": 2, "medium dense": 2,
11
+ "stiff": 3, "dense": 3,
12
+ "very stiff": 4, "very dense": 4,
13
+ "hard": 5
14
+ }
15
+ self.nn_grouping = NearestNeighborGrouping()
16
+
17
+ def validate_layer_continuity(self, layers: List[Dict]) -> List[Dict]:
18
+ """Validate and fix layer depth continuity"""
19
+ if not layers:
20
+ return layers
21
+
22
+ # Sort layers by depth_from
23
+ sorted_layers = sorted(layers, key=lambda x: x.get("depth_from", 0))
24
+
25
+ validated_layers = []
26
+ for i, layer in enumerate(sorted_layers):
27
+ if i == 0:
28
+ # First layer starts from 0
29
+ layer["depth_from"] = 0
30
+ else:
31
+ # Each layer starts where previous ends
32
+ layer["depth_from"] = validated_layers[-1]["depth_to"]
33
+
34
+ validated_layers.append(layer)
35
+
36
+ return validated_layers
37
+
38
+ def identify_similar_layers(self, layers: List[Dict], similarity_threshold: float = 0.8) -> List[List[int]]:
39
+ """Identify layers that could potentially be grouped together"""
40
+ similar_groups = []
41
+
42
+ for i, layer1 in enumerate(layers):
43
+ for j, layer2 in enumerate(layers[i+1:], i+1):
44
+ similarity_score = self._calculate_layer_similarity(layer1, layer2)
45
+
46
+ if similarity_score >= similarity_threshold:
47
+ # Check if either layer is already in a group
48
+ group_found = False
49
+ for group in similar_groups:
50
+ if i in group:
51
+ if j not in group:
52
+ group.append(j)
53
+ group_found = True
54
+ break
55
+ elif j in group:
56
+ if i not in group:
57
+ group.append(i)
58
+ group_found = True
59
+ break
60
+
61
+ if not group_found:
62
+ similar_groups.append([i, j])
63
+
64
+ return similar_groups
65
+
66
+ def _calculate_layer_similarity(self, layer1: Dict, layer2: Dict) -> float:
67
+ """Calculate similarity score between two layers"""
68
+ score = 0.0
69
+ total_weight = 0.0
70
+
71
+ # Soil type similarity (weight: 0.4)
72
+ if layer1.get("soil_type", "").lower() == layer2.get("soil_type", "").lower():
73
+ score += 0.4
74
+ total_weight += 0.4
75
+
76
+ # Strength parameter similarity (weight: 0.3)
77
+ strength1 = layer1.get("strength_value")
78
+ strength2 = layer2.get("strength_value")
79
+ if strength1 is not None and strength2 is not None:
80
+ if abs(strength1 - strength2) / max(strength1, strength2) < 0.3:
81
+ score += 0.3
82
+ total_weight += 0.3
83
+
84
+ # Consistency similarity (weight: 0.2)
85
+ consistency1 = self._extract_consistency(layer1.get("soil_type", ""))
86
+ consistency2 = self._extract_consistency(layer2.get("soil_type", ""))
87
+ if consistency1 == consistency2:
88
+ score += 0.2
89
+ total_weight += 0.2
90
+
91
+ # Color similarity (weight: 0.1)
92
+ color1 = layer1.get("color") or ""
93
+ color2 = layer2.get("color") or ""
94
+ if color1.lower() == color2.lower():
95
+ score += 0.1
96
+ total_weight += 0.1
97
+
98
+ return score / total_weight if total_weight > 0 else 0.0
99
+
100
+ def _extract_consistency(self, soil_type: str) -> str:
101
+ """Extract consistency from soil type description"""
102
+ soil_type_lower = soil_type.lower()
103
+ for consistency in self.consistency_mapping.keys():
104
+ if consistency in soil_type_lower:
105
+ return consistency
106
+ return ""
107
+
108
+ def suggest_layer_merging(self, layers: List[Dict]) -> Dict[str, Any]:
109
+ """Suggest which layers could be merged"""
110
+ similar_groups = self.identify_similar_layers(layers)
111
+ suggestions = []
112
+
113
+ for group in similar_groups:
114
+ if len(group) >= 2:
115
+ group_layers = [layers[i] for i in group]
116
+
117
+ # Check if layers are adjacent or close
118
+ depths = [(layer["depth_from"], layer["depth_to"]) for layer in group_layers]
119
+ depths.sort()
120
+
121
+ # Check for adjacency
122
+ is_adjacent = True
123
+ for i in range(len(depths) - 1):
124
+ if abs(depths[i][1] - depths[i+1][0]) > 0.5: # 0.5m tolerance
125
+ is_adjacent = False
126
+ break
127
+
128
+ if is_adjacent:
129
+ suggestions.append({
130
+ "layer_indices": group,
131
+ "reason": "Similar soil properties and adjacent depths",
132
+ "merged_layer": self._create_merged_layer(group_layers)
133
+ })
134
+
135
+ return {"suggestions": suggestions}
136
+
137
+ def _create_merged_layer(self, layers: List[Dict]) -> Dict:
138
+ """Create a merged layer from multiple similar layers"""
139
+ if not layers:
140
+ return {}
141
+
142
+ merged = {
143
+ "layer_id": f"merged_{layers[0]['layer_id']}_{layers[-1]['layer_id']}",
144
+ "depth_from": min(layer["depth_from"] for layer in layers),
145
+ "depth_to": max(layer["depth_to"] for layer in layers),
146
+ "soil_type": layers[0]["soil_type"], # Use first layer's type
147
+ "description": f"Merged layer: {', '.join([layer.get('description', '') for layer in layers])}",
148
+ "strength_parameter": layers[0].get("strength_parameter", ""),
149
+ "strength_value": np.mean([layer.get("strength_value", 0) for layer in layers if layer.get("strength_value") is not None]),
150
+ "color": layers[0].get("color", ""),
151
+ "moisture": layers[0].get("moisture", ""),
152
+ "consistency": layers[0].get("consistency", "")
153
+ }
154
+
155
+ return merged
156
+
157
+ def suggest_layer_splitting(self, layers: List[Dict]) -> Dict[str, Any]:
158
+ """Suggest which layers should be split based on thickness and variability"""
159
+ suggestions = []
160
+
161
+ for i, layer in enumerate(layers):
162
+ thickness = layer["depth_to"] - layer["depth_from"]
163
+
164
+ # Suggest splitting very thick layers (>5m)
165
+ if thickness > 5.0:
166
+ suggested_splits = int(thickness / 2.5) # Split into ~2.5m sublayers
167
+
168
+ suggestions.append({
169
+ "layer_index": i,
170
+ "reason": f"Layer is very thick ({thickness:.1f}m) - consider splitting into {suggested_splits} sublayers",
171
+ "suggested_depths": np.linspace(layer["depth_from"], layer["depth_to"], suggested_splits + 1).tolist()
172
+ })
173
+
174
+ # Check for significant strength variation indication
175
+ description = layer.get("description", "").lower()
176
+ if any(word in description for word in ["varying", "variable", "interbedded", "alternating"]):
177
+ suggestions.append({
178
+ "layer_index": i,
179
+ "reason": "Description indicates variable conditions - consider splitting based on detailed log",
180
+ "suggested_depths": [layer["depth_from"], (layer["depth_from"] + layer["depth_to"])/2, layer["depth_to"]]
181
+ })
182
+
183
+ return {"suggestions": suggestions}
184
+
185
+ def optimize_layer_division(self, layers: List[Dict], merge_similar=True, split_thick=True) -> Dict[str, Any]:
186
+ """Optimize layer division by merging similar layers and splitting thick ones"""
187
+ optimized_layers = layers.copy()
188
+ changes_made = []
189
+
190
+ # Traditional merge suggestions
191
+ merge_suggestions = {"suggestions": []}
192
+ if merge_similar:
193
+ merge_suggestions = self.suggest_layer_merging(optimized_layers)
194
+ for suggestion in merge_suggestions["suggestions"]:
195
+ changes_made.append(f"Merged layers {suggestion['layer_indices']}: {suggestion['reason']}")
196
+
197
+ # Nearest neighbor analysis
198
+ nn_analysis = self.analyze_nearest_neighbors(optimized_layers)
199
+
200
+ # Split suggestions
201
+ split_suggestions = {"suggestions": []}
202
+ if split_thick:
203
+ split_suggestions = self.suggest_layer_splitting(optimized_layers)
204
+ for suggestion in split_suggestions["suggestions"]:
205
+ changes_made.append(f"Suggested splitting layer {suggestion['layer_index']}: {suggestion['reason']}")
206
+
207
+ return {
208
+ "optimized_layers": optimized_layers,
209
+ "changes_made": changes_made,
210
+ "merge_suggestions": merge_suggestions,
211
+ "split_suggestions": split_suggestions,
212
+ "nearest_neighbor_analysis": nn_analysis
213
+ }
214
+
215
+ def analyze_nearest_neighbors(self, layers: List[Dict], k: int = 3, similarity_threshold: float = 0.55) -> Dict[str, Any]:
216
+ """Perform nearest neighbor analysis on soil layers"""
217
+
218
+ if len(layers) < 2:
219
+ return {"message": "Insufficient layers for neighbor analysis"}
220
+
221
+ try:
222
+ # Get nearest neighbor analysis
223
+ nn_suggestions = self.nn_grouping.suggest_layer_merging(layers, similarity_threshold)
224
+
225
+ # Get detailed neighbor report
226
+ neighbor_report = self.nn_grouping.get_layer_neighbors_report(layers, k)
227
+
228
+ return {
229
+ "neighbor_groups": nn_suggestions.get("groups", []),
230
+ "merge_recommendations": nn_suggestions.get("recommendations", []),
231
+ "cluster_labels": nn_suggestions.get("cluster_labels", []),
232
+ "neighbor_report": neighbor_report,
233
+ "analysis_parameters": {
234
+ "similarity_threshold": similarity_threshold,
235
+ "k_neighbors": k,
236
+ "total_layers": len(layers)
237
+ }
238
+ }
239
+
240
+ except Exception as e:
241
+ st.error(f"Error in nearest neighbor analysis: {str(e)}")
242
+ return {"error": str(e)}
243
+
244
+ def get_grouping_summary(self, layers: List[Dict]) -> Dict[str, Any]:
245
+ """Get a comprehensive summary of layer grouping analysis"""
246
+
247
+ nn_analysis = self.analyze_nearest_neighbors(layers)
248
+
249
+ if "error" in nn_analysis:
250
+ return nn_analysis
251
+
252
+ summary = {
253
+ "total_layers": len(layers),
254
+ "identified_groups": len(nn_analysis.get("neighbor_groups", [])),
255
+ "merge_recommendations": len(nn_analysis.get("merge_recommendations", [])),
256
+ "group_details": []
257
+ }
258
+
259
+ # Add details for each group
260
+ for i, group in enumerate(nn_analysis.get("neighbor_groups", [])):
261
+ group_detail = {
262
+ "group_id": group.get("group_id", i+1),
263
+ "layers_in_group": group.get("group_size", 0),
264
+ "depth_range": f"{group.get('depth_range', {}).get('min', 0):.1f}-{group.get('depth_range', {}).get('max', 0):.1f}m",
265
+ "total_thickness": group.get('depth_range', {}).get('total_thickness', 0),
266
+ "dominant_soil_type": max(group.get('soil_types', {}).items(), key=lambda x: x[1])[0] if group.get('soil_types') else "unknown",
267
+ "layer_ids": group.get("layer_ids", [])
268
+ }
269
+ summary["group_details"].append(group_detail)
270
+
271
+ return summary
272
+
273
+ def calculate_layer_statistics(self, layers: List[Dict]) -> Dict[str, Any]:
274
+ """Calculate statistics for the soil profile"""
275
+ if not layers:
276
+ return {}
277
+
278
+ total_depth = max(layer["depth_to"] for layer in layers)
279
+ layer_count = len(layers)
280
+
281
+ # Soil type distribution
282
+ soil_types = {}
283
+ for layer in layers:
284
+ soil_type = layer.get("soil_type", "unknown")
285
+ thickness = layer["depth_to"] - layer["depth_from"]
286
+ if soil_type in soil_types:
287
+ soil_types[soil_type] += thickness
288
+ else:
289
+ soil_types[soil_type] = thickness
290
+
291
+ # Convert to percentages
292
+ soil_type_percentages = {k: (v/total_depth)*100 for k, v in soil_types.items()}
293
+
294
+ # Average layer thickness
295
+ thicknesses = [layer["depth_to"] - layer["depth_from"] for layer in layers]
296
+ avg_thickness = np.mean(thicknesses)
297
+
298
+ return {
299
+ "total_depth": total_depth,
300
+ "layer_count": layer_count,
301
+ "average_layer_thickness": avg_thickness,
302
+ "soil_type_distribution": soil_type_percentages,
303
+ "thickest_layer": max(thicknesses),
304
+ "thinnest_layer": min(thicknesses)
305
+ }
soil_boring_analyzer_hf_ready.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0945b4363e97b930f6cbaa9913888b5d63329830f6a6c05d3aa18be194e92f3d
3
+ size 69885
soil_calculations.py ADDED
@@ -0,0 +1,350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import re
3
+ import streamlit as st
4
+ from typing import Dict, List, Any, Tuple
5
+
6
+ class SoilCalculations:
7
+ def __init__(self):
8
+ # Peck correlation coefficients for friction angle calculation
9
+ self.peck_coefficients = {
10
+ "fine_sand": {"a": 27.1, "b": 0.3},
11
+ "medium_sand": {"a": 27.1, "b": 0.3},
12
+ "coarse_sand": {"a": 27.1, "b": 0.3},
13
+ "silty_sand": {"a": 25.4, "b": 0.3},
14
+ "clayey_sand": {"a": 25.4, "b": 0.3}
15
+ }
16
+
17
+ def calculate_su_from_n(self, n_value: float, correlation_factor: float = 5.0) -> float:
18
+ """Calculate undrained shear strength from SPT-N value for clay
19
+ Su = correlation_factor * N (typically 5-7 for clay)"""
20
+ if n_value is None or n_value <= 0:
21
+ return None
22
+ return correlation_factor * n_value
23
+
24
+ def calculate_friction_angle_peck(self, n_value: float, sand_type: str = "medium_sand",
25
+ effective_stress: float = 100.0) -> float:
26
+ """Calculate friction angle using Peck correlation
27
+ Ο† = a + b * log10(N60) where N60 is corrected SPT value"""
28
+ if n_value is None or n_value <= 0:
29
+ return None
30
+
31
+ # Apply overburden correction (simplified)
32
+ n60_corrected = n_value * (effective_stress / 100.0) ** 0.5
33
+ n60_corrected = min(n60_corrected, 50) # Cap at reasonable value
34
+
35
+ coeffs = self.peck_coefficients.get(sand_type, self.peck_coefficients["medium_sand"])
36
+ friction_angle = coeffs["a"] + coeffs["b"] * np.log10(max(n60_corrected, 1))
37
+
38
+ return min(friction_angle, 45) # Cap at reasonable maximum
39
+
40
+ def classify_soil_consistency(self, soil_type: str, n_value: float = None, su_value: float = None) -> str:
41
+ """Classify soil consistency based on strength parameters"""
42
+
43
+ if "clay" in soil_type.lower() or "silt" in soil_type.lower():
44
+ # Use Su for clay classification
45
+ if su_value is not None:
46
+ if su_value < 25:
47
+ return "very soft"
48
+ elif su_value < 50:
49
+ return "soft"
50
+ elif su_value < 100:
51
+ return "medium"
52
+ elif su_value < 200:
53
+ return "stiff"
54
+ elif su_value < 400:
55
+ return "very stiff"
56
+ else:
57
+ return "hard"
58
+ # Use N-value for clay if Su not available
59
+ elif n_value is not None:
60
+ if n_value < 2:
61
+ return "very soft"
62
+ elif n_value < 4:
63
+ return "soft"
64
+ elif n_value < 8:
65
+ return "medium"
66
+ elif n_value < 15:
67
+ return "stiff"
68
+ elif n_value < 30:
69
+ return "very stiff"
70
+ else:
71
+ return "hard"
72
+
73
+ elif "sand" in soil_type.lower() or "gravel" in soil_type.lower():
74
+ # Use N-value for sand classification
75
+ if n_value is not None:
76
+ if n_value < 4:
77
+ return "very loose"
78
+ elif n_value < 10:
79
+ return "loose"
80
+ elif n_value < 30:
81
+ return "medium dense"
82
+ elif n_value < 50:
83
+ return "dense"
84
+ else:
85
+ return "very dense"
86
+
87
+ return "unknown"
88
+
89
+ def standardize_units(self, text: str) -> Tuple[str, Dict[str, str]]:
90
+ """Standardize units in soil boring log text before LLM processing"""
91
+
92
+ unit_conversions = {}
93
+ standardized_text = text
94
+
95
+ # Convert feet to meters
96
+ feet_pattern = r'(\d+(?:\.\d+)?)\s*(?:ft|feet|\')'
97
+ feet_matches = re.findall(feet_pattern, standardized_text, re.IGNORECASE)
98
+ for match in feet_matches:
99
+ feet_value = float(match)
100
+ meters_value = feet_value * 0.3048
101
+ old_text = f"{match}ft"
102
+ new_text = f"{meters_value:.1f}m"
103
+ standardized_text = standardized_text.replace(old_text, new_text)
104
+ unit_conversions[old_text] = new_text
105
+
106
+ # Convert psf to kPa
107
+ psf_pattern = r'(\d+(?:\.\d+)?)\s*(?:psf|lbs?/ftΒ²?)'
108
+ psf_matches = re.findall(psf_pattern, standardized_text, re.IGNORECASE)
109
+ for match in psf_matches:
110
+ psf_value = float(match)
111
+ kpa_value = psf_value * 0.047880259
112
+ old_text = f"{match}psf"
113
+ new_text = f"{kpa_value:.1f}kPa"
114
+ standardized_text = standardized_text.replace(old_text, new_text)
115
+ unit_conversions[old_text] = new_text
116
+
117
+ # Convert psi to kPa
118
+ psi_pattern = r'(\d+(?:\.\d+)?)\s*(?:psi|lbs?/inΒ²?)'
119
+ psi_matches = re.findall(psi_pattern, standardized_text, re.IGNORECASE)
120
+ for match in psi_matches:
121
+ psi_value = float(match)
122
+ kpa_value = psi_value * 6.89476
123
+ old_text = f"{match}psi"
124
+ new_text = f"{kpa_value:.1f}kPa"
125
+ standardized_text = standardized_text.replace(old_text, new_text)
126
+ unit_conversions[old_text] = new_text
127
+
128
+ # Convert ksc (kg/cmΒ²) to kPa
129
+ ksc_pattern = r'(\d+(?:\.\d+)?)\s*(?:ksc|kg/cmΒ²?|kg/cm2)'
130
+ ksc_matches = re.findall(ksc_pattern, standardized_text, re.IGNORECASE)
131
+ for match in ksc_matches:
132
+ ksc_value = float(match)
133
+ kpa_value = ksc_value * 98.0
134
+ old_text = f"{match}ksc"
135
+ new_text = f"{kpa_value:.1f}kPa"
136
+ standardized_text = standardized_text.replace(old_text, new_text)
137
+ unit_conversions[old_text] = new_text
138
+
139
+ # Convert t/mΒ² (tonnes per square meter) to kPa
140
+ tonnes_pattern = r'(\d+(?:\.\d+)?)\s*(?:t/mΒ²?|ton/mΒ²?|tonnes?/mΒ²?|tonne/mΒ²?)'
141
+ tonnes_matches = re.findall(tonnes_pattern, standardized_text, re.IGNORECASE)
142
+ for match in tonnes_matches:
143
+ tonnes_value = float(match)
144
+ kpa_value = tonnes_value * 9.81
145
+ old_text = f"{match}t/mΒ²"
146
+ new_text = f"{kpa_value:.1f}kPa"
147
+ standardized_text = standardized_text.replace(old_text, new_text)
148
+ unit_conversions[old_text] = new_text
149
+
150
+ # Standardize depth notation
151
+ depth_pattern = r'(\d+(?:\.\d+)?)\s*-\s*(\d+(?:\.\d+)?)\s*(?:ft|feet|\')'
152
+ standardized_text = re.sub(depth_pattern,
153
+ lambda m: f"{float(m.group(1))*0.3048:.1f}-{float(m.group(2))*0.3048:.1f}m",
154
+ standardized_text, flags=re.IGNORECASE)
155
+
156
+ return standardized_text, unit_conversions
157
+
158
+ def enhance_soil_layers(self, soil_layers: List[Dict]) -> List[Dict]:
159
+ """Enhance soil layers with calculated parameters"""
160
+
161
+ enhanced_layers = []
162
+
163
+ for layer in soil_layers:
164
+ enhanced_layer = layer.copy()
165
+
166
+ # Extract values
167
+ n_value = layer.get("strength_value") if layer.get("strength_parameter") == "SPT-N" else None
168
+ su_value = layer.get("strength_value") if layer.get("strength_parameter") == "Su" else None
169
+ soil_type = layer.get("soil_type", "").lower()
170
+ depth_from = layer.get("depth_from", 0)
171
+ depth_to = layer.get("depth_to", 0)
172
+ sample_type = layer.get("sample_type", "")
173
+ su_source = layer.get("su_source", "")
174
+
175
+ # CRITICAL RULE: For SS samples, ALWAYS use Su=5*N calculation, IGNORE unconfined compression Su
176
+ if sample_type == "SS" and "clay" in soil_type:
177
+ # For SS samples, we MUST use N-value to calculate Su, regardless of any other Su data
178
+ if n_value is not None:
179
+ calculated_su = self.calculate_su_from_n(n_value)
180
+ enhanced_layer["strength_parameter"] = "Su"
181
+ enhanced_layer["strength_value"] = calculated_su
182
+ enhanced_layer["su_source"] = f"SS Sample: Calculated from raw N={n_value} (Su=5*N)"
183
+ enhanced_layer["original_spt"] = n_value
184
+
185
+ # Override any existing Su values for SS samples
186
+ if su_value is not None and su_value != calculated_su:
187
+ enhanced_layer["ignored_unconfined_su"] = su_value
188
+ st.warning(f"⚠️ SS Sample Layer {enhanced_layer.get('layer_id', 'Unknown')}: Ignored unconfined Su={su_value:.0f}, using calculated Su={calculated_su:.0f} kPa from N={n_value}")
189
+
190
+ st.success(f"βœ… SS Sample Layer {enhanced_layer.get('layer_id', 'Unknown')}: Su = 5 Γ— {n_value} = {calculated_su:.0f} kPa")
191
+ else:
192
+ st.error(f"❌ SS Sample Layer {enhanced_layer.get('layer_id', 'Unknown')}: No N-value found for Su calculation")
193
+
194
+ # For ST samples, preserve direct Su measurements
195
+ elif sample_type == "ST" and su_value is not None:
196
+ enhanced_layer["su_source"] = su_source or "ST Sample: Direct measurement from Unconfined Compression Test"
197
+ st.success(f"βœ… ST Sample Layer {enhanced_layer.get('layer_id', 'Unknown')}: Using direct Su={su_value:.0f} kPa")
198
+
199
+ # For other cases (no sample type specified), use previous logic but prioritize sample identification
200
+ else:
201
+ # Try to identify sample type from available data
202
+ if n_value is not None and su_value is None and "clay" in soil_type:
203
+ # Only calculate Su from N-value if no direct Su available (likely SS sample)
204
+ calculated_su = self.calculate_su_from_n(n_value)
205
+ enhanced_layer["calculated_su"] = calculated_su
206
+ enhanced_layer["su_source"] = f"Calculated from N={n_value} (Su=5*N) - assumed SS sample"
207
+ st.info(f"πŸ”¬ Layer {enhanced_layer.get('layer_id', 'Unknown')}: Calculated Su={calculated_su:.0f} kPa from N={n_value} (assumed SS)")
208
+ elif su_value is not None:
209
+ # Preserve direct Su values (likely ST sample)
210
+ enhanced_layer["su_source"] = su_source or "Direct measurement - assumed ST sample"
211
+ st.success(f"βœ… Layer {enhanced_layer.get('layer_id', 'Unknown')}: Using direct Su={su_value:.0f} kPa (assumed ST)")
212
+
213
+ # Handle sand/silt friction angle calculation
214
+ if "sand" in soil_type and n_value is not None:
215
+ # Calculate friction angle for sand
216
+ mid_depth = (depth_from + depth_to) / 2
217
+ effective_stress = 20 * mid_depth # Approximate effective stress (kPa)
218
+
219
+ sand_type_classification = "medium_sand"
220
+ if "fine" in soil_type:
221
+ sand_type_classification = "fine_sand"
222
+ elif "coarse" in soil_type:
223
+ sand_type_classification = "coarse_sand"
224
+ elif "silt" in soil_type:
225
+ sand_type_classification = "silty_sand"
226
+
227
+ friction_angle = self.calculate_friction_angle_peck(
228
+ n_value, sand_type_classification, effective_stress
229
+ )
230
+ enhanced_layer["friction_angle"] = friction_angle
231
+ enhanced_layer["friction_angle_source"] = f"Peck method from raw N={n_value}"
232
+
233
+ if sample_type == "SS":
234
+ st.success(f"βœ… SS Sample Layer {enhanced_layer.get('layer_id', 'Unknown')}: Ο† = {friction_angle:.1f}Β° from N={n_value}")
235
+ else:
236
+ st.info(f"πŸ“Š Layer {enhanced_layer.get('layer_id', 'Unknown')}: Ο† = {friction_angle:.1f}Β° from N={n_value}")
237
+
238
+ # Update consistency classification
239
+ consistency = self.classify_soil_consistency(soil_type, n_value, su_value)
240
+ if consistency != "unknown":
241
+ enhanced_layer["consistency"] = consistency
242
+
243
+ # Keep soil_type as basic type (clay, sand, silt)
244
+ base_soil = "clay" if "clay" in soil_type else \
245
+ "sand" if "sand" in soil_type else \
246
+ "silt" if "silt" in soil_type else \
247
+ "gravel" if "gravel" in soil_type else soil_type
248
+
249
+ # Remove any existing consistency terms from soil_type
250
+ for consistency_term in ["very soft", "soft", "medium", "stiff", "very stiff", "hard",
251
+ "very loose", "loose", "medium dense", "dense", "very dense"]:
252
+ base_soil = base_soil.replace(consistency_term, "").strip()
253
+
254
+ enhanced_layer["soil_type"] = base_soil
255
+
256
+ enhanced_layers.append(enhanced_layer)
257
+
258
+ return enhanced_layers
259
+
260
+ def validate_soil_classification(self, soil_data: Dict) -> Dict:
261
+ """Validate and improve soil classification"""
262
+
263
+ if "soil_layers" not in soil_data:
264
+ return soil_data
265
+
266
+ layers = soil_data["soil_layers"]
267
+ validated_layers = []
268
+
269
+ for layer in layers:
270
+ validated_layer = layer.copy()
271
+
272
+ # Check consistency between soil type and strength parameters
273
+ soil_type = layer.get("soil_type", "").lower()
274
+ strength_param = layer.get("strength_parameter", "")
275
+ strength_value = layer.get("strength_value")
276
+
277
+ # Fix parameter mismatches
278
+ if "clay" in soil_type and strength_param == "SPT-N" and strength_value:
279
+ # Clay should use Su, but if only N is available, calculate Su
280
+ calculated_su = self.calculate_su_from_n(strength_value)
281
+ validated_layer["calculated_su"] = calculated_su
282
+ validated_layer["su_source"] = f"Calculated from N={strength_value}"
283
+
284
+ elif "sand" in soil_type and strength_param == "Su":
285
+ # Sand should not have Su parameter
286
+ validated_layer["strength_parameter"] = "SPT-N"
287
+ validated_layer["parameter_note"] = "Corrected from Su to SPT-N for sand"
288
+
289
+ # Validate depth ranges
290
+ if validated_layer.get("depth_from") >= validated_layer.get("depth_to"):
291
+ # Fix invalid depth ranges
292
+ depth_from = validated_layer.get("depth_from", 0)
293
+ validated_layer["depth_to"] = depth_from + 1.0 # Default 1m thickness
294
+ validated_layer["depth_note"] = "Corrected invalid depth range"
295
+
296
+ validated_layers.append(validated_layer)
297
+
298
+ soil_data["soil_layers"] = validated_layers
299
+ return soil_data
300
+
301
+ def process_with_ss_st_classification(self, soil_data: Dict[str, Any]) -> Dict[str, Any]:
302
+ """
303
+ Process soil data with SS/ST sample classification
304
+ """
305
+ try:
306
+ from soil_classification import SoilClassificationProcessor
307
+
308
+ if "soil_layers" not in soil_data:
309
+ return soil_data
310
+
311
+ # Initialize the enhanced processor
312
+ processor = SoilClassificationProcessor()
313
+
314
+ # Process layers with SS/ST classification
315
+ enhanced_layers = processor.process_soil_layers(soil_data["soil_layers"])
316
+
317
+ # Update soil data
318
+ soil_data["soil_layers"] = enhanced_layers
319
+
320
+ # Add processing summary
321
+ processing_summary = processor.get_processing_summary(enhanced_layers)
322
+ soil_data["processing_summary"] = processing_summary
323
+
324
+ # Display processing summary
325
+ st.subheader("πŸ“Š SS/ST Processing Summary")
326
+ col1, col2, col3, col4 = st.columns(4)
327
+
328
+ with col1:
329
+ st.metric("Total Layers", processing_summary['total_layers'])
330
+ st.metric("ST Samples", processing_summary['st_samples'])
331
+
332
+ with col2:
333
+ st.metric("SS Samples", processing_summary['ss_samples'])
334
+ st.metric("Clay Layers", processing_summary['clay_layers'])
335
+
336
+ with col3:
337
+ st.metric("Sand/Silt Layers", processing_summary['sand_layers'])
338
+ st.metric("Su Calculated", processing_summary['su_calculated'])
339
+
340
+ with col4:
341
+ st.metric("Ο† Calculated", processing_summary['phi_calculated'])
342
+
343
+ return soil_data
344
+
345
+ except ImportError as e:
346
+ st.warning(f"⚠️ Enhanced SS/ST classification not available: {str(e)}")
347
+ return soil_data
348
+ except Exception as e:
349
+ st.error(f"❌ Error in SS/ST processing: {str(e)}")
350
+ return soil_data
soil_classification.py ADDED
@@ -0,0 +1,1434 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import numpy as np
3
+ import streamlit as st
4
+ from typing import Dict, List, Any, Tuple, Optional
5
+
6
+ class SoilClassificationProcessor:
7
+ """
8
+ Advanced soil classification processor that handles SS and ST samples
9
+ with proper unit conversions and soil parameter calculations
10
+ """
11
+
12
+ def __init__(self):
13
+ # Enhanced unit conversion factors to SI units
14
+ self.unit_conversions = {
15
+ # Pressure/Stress units to kPa
16
+ 'psi': 6.895,
17
+ 'psf': 0.04788,
18
+ 'kpa': 1.0,
19
+ 'kn/m2': 1.0,
20
+ 'kn/mΒ²': 1.0,
21
+ 'knm2': 1.0,
22
+ 'mpa': 1000.0,
23
+ 'pa': 0.001,
24
+ 'n/m2': 0.001,
25
+ 'n/mΒ²': 0.001,
26
+ 'nm2': 0.001,
27
+ 'ksf': 47.88,
28
+ 'tsf': 95.76,
29
+ 'kg/cm2': 98.0,
30
+ 'kg/cmΒ²': 98.0,
31
+ 'kgcm2': 98.0,
32
+ 'ksc': 98.0, # kilograms per square centimeter (same as kg/cmΒ²)
33
+ 'bar': 100.0,
34
+ 'atm': 101.325, # atmosphere to kPa
35
+ 'mmhg': 0.133322, # mmHg to kPa
36
+ 'inhg': 3.386, # inHg to kPa
37
+
38
+ # Enhanced tonnes/tons per square meter conversions
39
+ 't/m2': 9.81, # tonnes per square meter to kPa
40
+ 't/mΒ²': 9.81, # tonnes per square meter to kPa
41
+ 'tm2': 9.81, # tm2 variant
42
+ 'ton/m2': 9.81, # ton per square meter to kPa
43
+ 'ton/mΒ²': 9.81, # ton per square meter to kPa
44
+ 'tonm2': 9.81, # tonm2 variant
45
+ 'tonnes/m2': 9.81, # tonnes per square meter to kPa
46
+ 'tonnes/mΒ²': 9.81, # tonnes per square meter to kPa
47
+ 'tonnesm2': 9.81, # tonnesm2 variant
48
+ 'tonne/m2': 9.81, # tonne per square meter to kPa
49
+ 'tonne/mΒ²': 9.81, # tonne per square meter to kPa
50
+ 'tonnem2': 9.81, # tonnem2 variant
51
+
52
+ # Additional international pressure units
53
+ 'kgf/cm2': 98.0, # kilogram-force per cmΒ²
54
+ 'kgf/cmΒ²': 98.0, # kilogram-force per cmΒ²
55
+ 'kgfcm2': 98.0, # variant without symbols
56
+ 'lbf/in2': 6.895, # pound-force per square inch (same as psi)
57
+ 'lbf/ft2': 0.04788, # pound-force per square foot (same as psf)
58
+ 'lbfin2': 6.895, # variant without symbols
59
+ 'lbfft2': 0.04788, # variant without symbols
60
+
61
+ # Length units to meters (enhanced)
62
+ 'ft': 0.3048,
63
+ 'feet': 0.3048,
64
+ 'foot': 0.3048,
65
+ "'": 0.3048, # foot symbol
66
+ 'in': 0.0254,
67
+ 'inch': 0.0254,
68
+ 'inches': 0.0254,
69
+ '"': 0.0254, # inch symbol
70
+ 'cm': 0.01,
71
+ 'mm': 0.001,
72
+ 'km': 1000.0,
73
+ 'm': 1.0,
74
+ 'meter': 1.0,
75
+ 'metre': 1.0,
76
+ 'meters': 1.0,
77
+ 'metres': 1.0,
78
+ 'yd': 0.9144, # yard to meters
79
+ 'yard': 0.9144,
80
+ 'yards': 0.9144,
81
+
82
+ # Weight/Force units (for completeness)
83
+ 'n': 1.0, # Newton (SI base)
84
+ 'kn': 1000.0, # kilonewton to Newton
85
+ 'kgf': 9.81, # kilogram-force to Newton
86
+ 'lbf': 4.448, # pound-force to Newton
87
+ 'lb': 4.448, # pound (assuming force context)
88
+ 'kg': 9.81, # kilogram (assuming force context, kg*g)
89
+ }
90
+
91
+ # Soil classification criteria
92
+ self.sieve_200_threshold = 50.0 # % passing sieve #200 for clay classification
93
+
94
+ def process_soil_layers(self, layers: List[Dict]) -> List[Dict]:
95
+ """
96
+ Process soil layers with SS/ST sample classification and parameter calculation
97
+ """
98
+ processed_layers = []
99
+
100
+ st.info("πŸ”¬ Processing soil layers with SS/ST sample classification...")
101
+
102
+ for i, layer in enumerate(layers):
103
+ processed_layer = layer.copy()
104
+
105
+ # Step 1: Identify sample type (SS or ST)
106
+ sample_type = self._identify_sample_type(layer)
107
+ processed_layer['sample_type'] = sample_type
108
+
109
+ # Step 2: Classify soil type if not already classified
110
+ soil_type = self._classify_soil_type(layer)
111
+ processed_layer['soil_type'] = soil_type
112
+
113
+ # Step 3: Process based on sample type
114
+ if sample_type == 'ST':
115
+ processed_layer = self._process_st_sample(processed_layer)
116
+ elif sample_type == 'SS':
117
+ processed_layer = self._process_ss_sample(processed_layer)
118
+ else:
119
+ # Default processing for unidentified samples
120
+ processed_layer = self._process_default_sample(processed_layer)
121
+
122
+ # Step 4: Ensure all units are in SI
123
+ processed_layer = self._convert_to_si_units(processed_layer)
124
+
125
+ # Step 5: Validate and add engineering parameters
126
+ processed_layer = self._add_engineering_parameters(processed_layer)
127
+
128
+ # Step 6: Check clay consistency (water content vs Su)
129
+ processed_layer = self._check_clay_consistency(processed_layer)
130
+
131
+ processed_layers.append(processed_layer)
132
+
133
+ # Progress feedback
134
+ st.write(f" βœ… Layer {i+1}: {sample_type} sample, {soil_type} - {processed_layer.get('strength_parameter', 'N/A')}")
135
+
136
+ st.success(f"βœ… Processed {len(processed_layers)} soil layers with SS/ST classification")
137
+ return processed_layers
138
+
139
+ def _identify_sample_type(self, layer: Dict) -> str:
140
+ """
141
+ Identify if sample is Split Spoon (SS) or Shelby Tube (ST)
142
+ CRITICAL: Look at FIRST COLUMN stratification symbols with ABSOLUTE HIGHEST PRIORITY
143
+ """
144
+ description = layer.get('description', '').lower()
145
+
146
+ # ABSOLUTE HIGHEST PRIORITY: Check for first column stratification symbols
147
+ # Patterns for first column recognition: SS-18, ST-5, SS18, ST3, etc.
148
+ first_column_patterns = [
149
+ # High precision patterns for first column symbols
150
+ r'^[^|]*\b(ss[-]?\d+)\b', # SS-18, SS18 at start or before pipe
151
+ r'^[^|]*\b(st[-]?\d+)\b', # ST-5, ST5 at start or before pipe
152
+ r'^\s*(ss[-]?\d+)', # SS-number at very beginning
153
+ r'^\s*(st[-]?\d+)', # ST-number at very beginning
154
+ r'\|(.*?)(ss[-]?\d+)', # After pipe separator
155
+ r'\|(.*?)(st[-]?\d+)', # After pipe separator
156
+ r'\b(ss[-]?\d+)\s*[|:]', # SS-number followed by pipe or colon
157
+ r'\b(st[-]?\d+)\s*[|:]', # ST-number followed by pipe or colon
158
+ ]
159
+
160
+ for pattern in first_column_patterns:
161
+ match = re.search(pattern, description, re.IGNORECASE)
162
+ if match:
163
+ # Get the SS/ST part (could be in different groups)
164
+ matched_groups = [g for g in match.groups() if g and ('ss' in g.lower() or 'st' in g.lower())]
165
+ if matched_groups:
166
+ matched_text = matched_groups[0].lower().strip()
167
+ if matched_text.startswith('ss'):
168
+ st.success(f"🎯 FIRST COLUMN DETECTED: {matched_text.upper()} β†’ SS sample (HIGHEST PRIORITY)")
169
+ return 'SS'
170
+ elif matched_text.startswith('st'):
171
+ st.success(f"🎯 FIRST COLUMN DETECTED: {matched_text.upper()} β†’ ST sample (HIGHEST PRIORITY)")
172
+ return 'ST'
173
+
174
+ # FALLBACK: Check for standalone SS/ST symbols (lower priority)
175
+ standalone_patterns = [
176
+ r'\bss\b(?!\w)', # Just SS (not part of another word)
177
+ r'\bst\b(?!\w)' # Just ST (not part of another word)
178
+ ]
179
+
180
+ for pattern in standalone_patterns:
181
+ match = re.search(pattern, description, re.IGNORECASE)
182
+ if match:
183
+ matched_text = match.group(0).lower()
184
+ if matched_text == 'ss':
185
+ st.info(f"πŸ“Š Standalone symbol detected: SS β†’ SS sample")
186
+ return 'SS'
187
+ elif matched_text == 'st':
188
+ st.info(f"πŸ“Š Standalone symbol detected: ST β†’ ST sample")
189
+ return 'ST'
190
+
191
+ # SECOND: Check for keywords in description
192
+ # Keywords for ST samples
193
+ st_keywords = ['shelby', 'tube', 'undisturbed', 'ut', 'unconfined', 'uu test', 'ucs']
194
+
195
+ # Keywords for SS samples
196
+ ss_keywords = ['split spoon', 'spt', 'standard penetration', 'disturbed', 'n-value']
197
+
198
+ # Check for ST indicators
199
+ if any(keyword in description for keyword in st_keywords):
200
+ return 'ST'
201
+
202
+ # Check for SS indicators
203
+ if any(keyword in description for keyword in ss_keywords):
204
+ return 'SS'
205
+
206
+ # THIRD: Check strength parameter types
207
+ # Check if SPT-N value is present (indicates SS)
208
+ if layer.get('strength_parameter') == 'SPT-N' or 'spt' in description:
209
+ return 'SS'
210
+
211
+ # Check if Su value is present (could indicate ST)
212
+ if layer.get('strength_parameter') == 'Su' or 'su' in description.lower():
213
+ return 'ST'
214
+
215
+ # FOURTH: Default assumption based on available data
216
+ if layer.get('strength_value') and layer.get('strength_value') > 50:
217
+ return 'SS' # High values typically SPT-N
218
+ else:
219
+ return 'ST' # Lower values typically Su
220
+
221
+ def _classify_soil_type(self, layer: Dict) -> str:
222
+ """
223
+ Enhanced soil type classification with MANDATORY sieve analysis requirement for sand
224
+ CRITICAL: Sand layers MUST have sieve analysis evidence - otherwise assume clay
225
+ """
226
+ # Check if soil type is already specified and validate it
227
+ existing_type = layer.get('soil_type', '').lower()
228
+ if existing_type and existing_type != 'unknown':
229
+ # If it's sand/gravel, verify sieve analysis exists
230
+ if existing_type in ['sand', 'silt', 'gravel']:
231
+ sieve_200_passing = self._extract_sieve_200_data(layer)
232
+ if sieve_200_passing is None:
233
+ st.warning(f"⚠️ '{existing_type}' classification without sieve analysis data. OVERRIDING to 'clay' per requirements.")
234
+ layer['classification_override'] = f"Changed from '{existing_type}' to 'clay' - no sieve analysis data"
235
+ return 'clay'
236
+ else:
237
+ st.success(f"βœ… '{existing_type}' classification confirmed with sieve #200: {sieve_200_passing}% passing")
238
+ return existing_type
239
+ else:
240
+ return existing_type
241
+
242
+ description = layer.get('description', '').lower()
243
+
244
+ # CRITICAL: Check for sieve analysis data FIRST before any classification
245
+ sieve_200_passing = self._extract_sieve_200_data(layer)
246
+
247
+ if sieve_200_passing is not None:
248
+ # Sieve analysis data available - use it for classification
249
+ if sieve_200_passing > self.sieve_200_threshold:
250
+ classification = 'clay' # Fine-grained soil
251
+ st.success(f"βœ… Classified as CLAY: {sieve_200_passing}% passing #200 (>50%)")
252
+ else:
253
+ classification = 'sand' # Coarse-grained soil
254
+ st.success(f"βœ… Classified as SAND: {sieve_200_passing}% passing #200 (<50%)")
255
+
256
+ layer['sieve_200_passing'] = sieve_200_passing
257
+ layer['classification_basis'] = f"Sieve analysis: {sieve_200_passing}% passing #200"
258
+ return classification
259
+
260
+ # NO SIEVE ANALYSIS DATA - Check for explicit mentions but apply strict rules
261
+ potential_classifications = []
262
+
263
+ if any(clay_word in description for clay_word in ['clay', 'clayey', 'ch', 'cl']):
264
+ potential_classifications.append('clay')
265
+
266
+ if any(sand_word in description for sand_word in ['sand', 'sandy', 'sp', 'sw', 'sm', 'sc']):
267
+ potential_classifications.append('sand')
268
+
269
+ if any(silt_word in description for silt_word in ['silt', 'silty', 'ml', 'mh']):
270
+ potential_classifications.append('silt')
271
+
272
+ if any(gravel_word in description for gravel_word in ['gravel', 'gp', 'gw', 'gm', 'gc']):
273
+ potential_classifications.append('gravel')
274
+
275
+ # ENFORCE MANDATORY RULE: No sand/silt/gravel without sieve analysis
276
+ if any(coarse_type in potential_classifications for coarse_type in ['sand', 'silt', 'gravel']):
277
+ st.error(f"❌ CRITICAL: Found potential {potential_classifications} classification but NO sieve analysis data!")
278
+ st.warning(f"πŸ”§ ENFORCING RULE: Classifying as 'clay' - sand/silt/gravel requires sieve analysis evidence")
279
+ layer['classification_override'] = f"Forced clay classification - found {potential_classifications} terms but no sieve data"
280
+ layer['sieve_200_passing'] = None
281
+ layer['classification_basis'] = "Assumed clay - no sieve analysis data available (mandatory requirement)"
282
+ return 'clay'
283
+
284
+ # Default to clay if only clay terms found or no clear classification
285
+ if 'clay' in potential_classifications or not potential_classifications:
286
+ st.info(f"πŸ’‘ Classified as CLAY: {potential_classifications if potential_classifications else 'No explicit soil type found'}")
287
+ layer['sieve_200_passing'] = None
288
+ layer['classification_basis'] = "Assumed clay - no sieve analysis data available"
289
+ return 'clay'
290
+
291
+ # Final fallback - should not reach here
292
+ st.warning(f"⚠️ Unclear classification. Defaulting to 'clay' per mandatory requirements.")
293
+ layer['sieve_200_passing'] = None
294
+ layer['classification_basis'] = "Default clay classification - unclear soil type and no sieve data"
295
+ return 'clay'
296
+
297
+ def _extract_sieve_200_data(self, layer: Dict) -> Optional[float]:
298
+ """
299
+ Enhanced sieve #200 passing percentage extraction with comprehensive pattern recognition
300
+ """
301
+ description = layer.get('description', '')
302
+
303
+ # Enhanced patterns to catch all possible sieve analysis formats
304
+ patterns = [
305
+ # Standard #200 sieve patterns
306
+ r'#200[:\s]*(\d+(?:\.\d+)?)%',
307
+ r'sieve\s*#?200[:\s]*(\d+(?:\.\d+)?)%',
308
+ r'no\.?\s*200[:\s]*(\d+(?:\.\d+)?)%',
309
+ r'passing\s*#?200[:\s]*(\d+(?:\.\d+)?)%',
310
+ r'(\d+(?:\.\d+)?)%\s*passing\s*#?200',
311
+
312
+ # Fines content (equivalent to #200 passing)
313
+ r'fines[:\s]*(\d+(?:\.\d+)?)%',
314
+ r'fine[s]?\s*content[:\s]*(\d+(?:\.\d+)?)%',
315
+ r'(\d+(?:\.\d+)?)%\s*fines',
316
+
317
+ # 0.075mm equivalent (same as #200)
318
+ r'0\.075\s*mm[:\s]*(\d+(?:\.\d+)?)%\s*passing',
319
+ r'(\d+(?:\.\d+)?)%\s*passing\s*0\.075\s*mm',
320
+ r'0\.075[:\s]*(\d+(?:\.\d+)?)%',
321
+
322
+ # Particle size analysis patterns
323
+ r'particle\s*size[:\s]*(\d+(?:\.\d+)?)%\s*fines',
324
+ r'gradation[:\s]*(\d+(?:\.\d+)?)%\s*passing\s*#?200',
325
+ r'grain\s*size[:\s]*(\d+(?:\.\d+)?)%\s*fines',
326
+
327
+ # Sieve analysis results patterns
328
+ r'sieve\s*analysis[:\s].*?(\d+(?:\.\d+)?)%\s*passing\s*#?200',
329
+ r'sieve\s*analysis[:\s].*?#?200[:\s]*(\d+(?:\.\d+)?)%',
330
+
331
+ # ASTM/Standard method references
332
+ r'astm\s*d422[:\s].*?(\d+(?:\.\d+)?)%\s*passing\s*#?200',
333
+ r'astm\s*d6913[:\s].*?(\d+(?:\.\d+)?)%\s*passing\s*#?200',
334
+
335
+ # Alternative formats
336
+ r'(\d+(?:\.\d+)?)%\s*<\s*0\.075\s*mm', # Percent less than 0.075mm
337
+ r'minus\s*#?200[:\s]*(\d+(?:\.\d+)?)%', # Minus #200
338
+ r'(\d+(?:\.\d+)?)%\s*minus\s*#?200', # Percent minus #200
339
+ ]
340
+
341
+ for pattern in patterns:
342
+ match = re.search(pattern, description, re.IGNORECASE)
343
+ if match:
344
+ percentage = float(match.group(1))
345
+ st.success(f"βœ… Found sieve #200 data: {percentage}% passing from '{match.group(0)}'")
346
+
347
+ # Validate percentage range
348
+ if 0 <= percentage <= 100:
349
+ return percentage
350
+ else:
351
+ st.warning(f"⚠️ Invalid percentage ({percentage}%) found. Should be 0-100%.")
352
+ return None
353
+
354
+ # Check if explicitly mentioned in layer data
355
+ if 'sieve_200_passing' in layer and layer['sieve_200_passing'] is not None:
356
+ percentage = float(layer['sieve_200_passing'])
357
+ st.success(f"βœ… Found sieve #200 data in layer field: {percentage}% passing")
358
+ return percentage
359
+
360
+ # Check for related field names
361
+ for field_name in ['fines_content', 'percent_fines', 'fine_content', 'passing_200']:
362
+ if field_name in layer and layer[field_name] is not None:
363
+ percentage = float(layer[field_name])
364
+ st.success(f"βœ… Found sieve #200 equivalent in '{field_name}': {percentage}% passing")
365
+ return percentage
366
+
367
+ # Log that no sieve analysis was found
368
+ st.info(f"πŸ” No sieve #200 analysis data found in layer description or fields")
369
+ return None
370
+
371
+ def _process_st_sample(self, layer: Dict) -> Dict:
372
+ """
373
+ Process Shelby Tube (ST) sample - use unconfined compression test (Su) values
374
+ """
375
+ layer['processing_method'] = 'ST - Unconfined Compression Test'
376
+
377
+ # Look for Su values in the data
378
+ su_value = self._extract_su_value(layer)
379
+
380
+ if su_value is not None:
381
+ layer['strength_parameter'] = 'Su'
382
+ layer['strength_value'] = su_value
383
+ layer['su_source'] = 'Unconfined Compression Test'
384
+ else:
385
+ # If no Su value found, check for SPT and convert
386
+ spt_value = self._extract_spt_value(layer)
387
+ if spt_value is not None:
388
+ su_calculated = self._convert_spt_to_su(spt_value)
389
+ layer['strength_parameter'] = 'Su'
390
+ layer['strength_value'] = su_calculated
391
+ layer['su_source'] = f'Calculated from SPT-N={spt_value} (Su=5*N)'
392
+ layer['original_spt'] = spt_value
393
+
394
+ return layer
395
+
396
+ def _process_ss_sample(self, layer: Dict) -> Dict:
397
+ """
398
+ Process Split Spoon (SS) sample - ALWAYS use SPT values and convert to Su using Su=5*N
399
+ FOR SS SAMPLES: IGNORE any unconfined compression test Su values, ONLY use calculated Su=5*N
400
+ """
401
+ layer['processing_method'] = 'SS - SPT Conversion (Su=5*N)'
402
+
403
+ # CRITICAL: For SS samples, extract the raw SPT-N value and calculate Su from it
404
+ spt_value = self._extract_spt_value(layer)
405
+ soil_type = layer.get('soil_type', 'clay')
406
+
407
+ if spt_value is not None:
408
+ if soil_type == 'clay':
409
+ # MANDATORY: Convert SPT to undrained shear strength using Su = 5*N
410
+ # IGNORE any existing Su values from unconfined compression tests
411
+ calculated_su = self._convert_spt_to_su(spt_value)
412
+
413
+ # Override any existing Su values for SS samples
414
+ layer['strength_parameter'] = 'Su'
415
+ layer['strength_value'] = calculated_su
416
+ layer['su_source'] = f'Calculated from raw N={spt_value} (Su=5*N) - SS Sample'
417
+ layer['original_spt'] = spt_value
418
+
419
+ # Clear any conflicting unconfined compression data for SS samples
420
+ if 'unconfined_su' in layer:
421
+ layer['unconfined_su_ignored'] = layer.pop('unconfined_su')
422
+ st.warning(f"⚠️ SS Sample: Ignored unconfined compression Su, using calculated Su={calculated_su:.0f} kPa from N={spt_value}")
423
+
424
+ st.success(f"βœ… SS Sample: Su = 5 Γ— {spt_value} = {calculated_su:.0f} kPa")
425
+
426
+ elif soil_type in ['sand', 'silt']:
427
+ # Convert SPT to friction angle for granular soils
428
+ phi_value = self._convert_spt_to_friction_angle(spt_value)
429
+ layer['strength_parameter'] = 'Ο†'
430
+ layer['strength_value'] = phi_value
431
+ layer['friction_angle'] = phi_value
432
+ layer['phi_source'] = f'Calculated from raw N={spt_value} (Peck method) - SS Sample'
433
+ layer['original_spt'] = spt_value
434
+
435
+ st.success(f"βœ… SS Sample: Ο† = {phi_value:.1f}Β° from N={spt_value}")
436
+
437
+ else:
438
+ # Keep SPT value for other soil types
439
+ layer['strength_parameter'] = 'SPT-N'
440
+ layer['strength_value'] = spt_value
441
+ layer['original_spt'] = spt_value
442
+
443
+ st.info(f"πŸ“Š SS Sample: Using raw N={spt_value} for {soil_type}")
444
+
445
+ else:
446
+ st.error(f"❌ SS Sample: No SPT-N value found in layer data")
447
+
448
+ return layer
449
+
450
+ def _process_default_sample(self, layer: Dict) -> Dict:
451
+ """
452
+ Process sample with unknown type - use available data intelligently
453
+ """
454
+ layer['processing_method'] = 'Default - Based on available data'
455
+
456
+ # Try to identify and process based on existing parameters
457
+ existing_param = layer.get('strength_parameter', '').lower()
458
+
459
+ if 'su' in existing_param:
460
+ # Already has Su value
461
+ return self._process_st_sample(layer)
462
+ elif 'spt' in existing_param or 'n' in existing_param:
463
+ # Has SPT value
464
+ return self._process_ss_sample(layer)
465
+ else:
466
+ # Make best guess based on strength value
467
+ strength_val = layer.get('strength_value', 0)
468
+ if strength_val and strength_val > 50:
469
+ # Likely SPT value
470
+ layer['strength_parameter'] = 'SPT-N'
471
+ return self._process_ss_sample(layer)
472
+ else:
473
+ # Likely Su value
474
+ layer['strength_parameter'] = 'Su'
475
+ return self._process_st_sample(layer)
476
+
477
+ def _extract_su_value(self, layer: Dict) -> Optional[float]:
478
+ """
479
+ Enhanced Su (undrained shear strength) extraction with MANDATORY unit conversion checking
480
+ CRITICAL: All Su values must be converted to kPa before processing
481
+ """
482
+ # Check direct Su field first - but validate units
483
+ if layer.get('strength_parameter') == 'Su' and layer.get('strength_value') is not None:
484
+ su_value = float(layer['strength_value'])
485
+ # Check if this value needs unit conversion (warn if suspiciously low/high)
486
+ if su_value < 5:
487
+ st.warning(f"⚠️ Su value {su_value} seems low - verify it's in kPa, not MPa or other units")
488
+ elif su_value > 2000:
489
+ st.warning(f"⚠️ Su value {su_value} seems high - verify it's in kPa, not psi or other units")
490
+ return su_value
491
+
492
+ # Look in description for Su values with enhanced unit detection
493
+ description = layer.get('description', '')
494
+
495
+ # CRITICAL: Enhanced patterns with explicit unit capture for conversion
496
+ patterns = [
497
+ # Direct Su values with units - CAPTURE UNITS EXPLICITLY
498
+ r'su[:\s=]*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
499
+ r'undrained[:\s]*shear[:\s]*strength[:\s]*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
500
+ r'shear\s*strength[:\s]*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
501
+ r'ucs[:\s]*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
502
+ r'unconfined[:\s]*compression[:\s]*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
503
+
504
+ # Equation-style patterns
505
+ r'su\s*=\s*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
506
+ r'strength\s*=\s*(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²|psi|psf|ksc|kg/cm2|kg/cmΒ²|t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²|mpa)',
507
+
508
+ # Embedded unit patterns
509
+ r'(\d+(?:\.\d+)?)\s*(kpa|kn/m2|kn/mΒ²)\s*(?:su|strength)',
510
+ r'(\d+(?:\.\d+)?)\s*(ksc|kg/cm2|kg/cmΒ²)\s*(?:su|strength)',
511
+ r'(\d+(?:\.\d+)?)\s*(t/m2|t/mΒ²|ton/m2|ton/mΒ²|tonnes?/m2|tonnes?/mΒ²)\s*(?:su|strength)',
512
+ r'(\d+(?:\.\d+)?)\s*(psi|psf)\s*(?:su|strength)',
513
+ r'(\d+(?:\.\d+)?)\s*(mpa)\s*(?:su|strength)',
514
+
515
+ # Common non-SI units that need conversion
516
+ r'(\d+(?:\.\d+)?)\s*ksc\b', # ksc without explicit "su"
517
+ r'(\d+(?:\.\d+)?)\s*t/mΒ²?\b', # tonnes/mΒ²
518
+ r'(\d+(?:\.\d+)?)\s*psi\b', # psi
519
+ ]
520
+
521
+ for pattern in patterns:
522
+ match = re.search(pattern, description, re.IGNORECASE)
523
+ if match:
524
+ value = float(match.group(1))
525
+ unit = match.group(2).lower() if len(match.groups()) > 1 and match.group(2) else 'kpa'
526
+
527
+ # CRITICAL: Alert if unit conversion is needed
528
+ if unit != 'kpa':
529
+ st.warning(f"πŸ”§ UNIT CONVERSION REQUIRED: Found Su = {value} {unit.upper()}")
530
+
531
+ # Convert to kPa with detailed logging
532
+ converted_value = self._convert_pressure_to_kpa(value, unit)
533
+
534
+ # Store original values for verification
535
+ layer['original_su_value'] = value
536
+ layer['original_su_unit'] = unit.upper()
537
+ layer['converted_su_note'] = f"Converted from {value} {unit.upper()} to {converted_value:.1f} kPa"
538
+
539
+ # Enhanced validation with context-aware warnings
540
+ if converted_value < 1:
541
+ st.error(f"❌ Very low Su = {converted_value:.3f} kPa after conversion. Check original value: {value} {unit}")
542
+ elif converted_value > 2000:
543
+ st.warning(f"⚠️ Very high Su = {converted_value:.0f} kPa after conversion from {value} {unit}. Verify this is correct.")
544
+ elif 1 <= converted_value <= 1000:
545
+ st.success(f"βœ… Su = {converted_value:.1f} kPa (converted from {value} {unit.upper()})")
546
+ else:
547
+ st.info(f"πŸ“Š Su = {converted_value:.1f} kPa (converted from {value} {unit.upper()}) - unusual but accepted")
548
+
549
+ return converted_value
550
+
551
+ # Check for unitless Su values (assume kPa but warn)
552
+ unitless_patterns = [
553
+ r'su[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[a-zA-Z])', # Su value not followed by units
554
+ r'shear\s*strength[:\s]*(\d+(?:\.\d+)?)\b(?!\s*[a-zA-Z])',
555
+ r'unconfined[:\s]*(\d+(?:\.\d+)?)\b(?!\s*[a-zA-Z])',
556
+ ]
557
+
558
+ for pattern in unitless_patterns:
559
+ match = re.search(pattern, description, re.IGNORECASE)
560
+ if match:
561
+ value = float(match.group(1))
562
+ st.warning(f"⚠️ Found Su = {value} WITHOUT UNITS! Assuming kPa - please verify.")
563
+ layer['assumed_unit_warning'] = f"Assumed {value} is in kPa (no units specified)"
564
+ return value
565
+
566
+ # Check for explicit Su field in layer data
567
+ if 'su_value' in layer and layer['su_value'] is not None:
568
+ value = float(layer['su_value'])
569
+ st.info(f"πŸ“Š Using Su = {value:.1f} from field 'su_value' (assumed kPa)")
570
+ return value
571
+
572
+ # Check for other strength-related fields that might contain Su
573
+ for field_name in ['undrained_strength', 'unconfined_strength', 'cohesion']:
574
+ if field_name in layer and layer[field_name] is not None:
575
+ value = float(layer[field_name])
576
+ st.info(f"πŸ“Š Using Su = {value:.1f} kPa from field '{field_name}' (assumed kPa)")
577
+ return value
578
+
579
+ return None
580
+
581
+ def _extract_spt_value(self, layer: Dict) -> Optional[float]:
582
+ """
583
+ Enhanced SPT-N value extraction for SS samples - USE RAW N VALUE ONLY, NOT N-CORRECTED
584
+ Improved pattern matching for better SS layer division
585
+ """
586
+ # Check direct SPT field
587
+ if layer.get('strength_parameter') == 'SPT-N' and layer.get('strength_value'):
588
+ return float(layer['strength_value'])
589
+
590
+ # Look in description for SPT values - PRIORITIZE RAW N VALUES
591
+ description = layer.get('description', '')
592
+
593
+ # ENHANCED: Look for raw N value patterns with better precision
594
+ raw_n_patterns = [
595
+ # High priority patterns for raw N values
596
+ r'\braw[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Raw N value
597
+ r'\bfield[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Field N value
598
+ r'\bmeasured[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Measured N value
599
+ r'\bactual[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Actual N value
600
+ r'\bobserved[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Observed N value
601
+
602
+ # Standard N patterns NOT followed by correction terms
603
+ r'\bn[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))', # N value NOT corrected
604
+ r'\bspt[:\s]*n[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))', # SPT-N NOT corrected
605
+ r'\bn[-\s]?value[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))', # N-value NOT corrected
606
+ r'\bn\s*=\s*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))', # N = value NOT corrected
607
+
608
+ # Blow count patterns
609
+ r'\bblow[s]?[:\s]*count[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))',
610
+ r'\bblows[:\s]*per[:\s]*foot[:\s=]*(\d+(?:\.\d+)?)',
611
+ r'\bblow[s]?[:\s=]*(\d+(?:\.\d+)?)\b(?!\s*[-]?(?:corr|correct|adj|adjust))',
612
+
613
+ # SS sample specific patterns
614
+ r'\bss[-\s]*\d*[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # SS sample with N
615
+ r'\bsplit[:\s]*spoon[:\s]*n[:\s=]*(\d+(?:\.\d+)?)', # Split spoon N
616
+ ]
617
+
618
+ # First try to find raw N values with enhanced logging
619
+ for i, pattern in enumerate(raw_n_patterns):
620
+ match = re.search(pattern, description, re.IGNORECASE)
621
+ if match:
622
+ n_value = float(match.group(1))
623
+ pattern_type = ["Raw N", "Field N", "Measured N", "Actual N", "Observed N",
624
+ "Standard N", "SPT-N", "N-value", "N=", "Blow count",
625
+ "Blows/ft", "Blows", "SS N", "Split spoon N"][min(i, 13)]
626
+ st.success(f"βœ… SS Sample: Using {pattern_type} = {n_value} from: '{match.group(0)}'")
627
+
628
+ # Additional validation for SS samples
629
+ if n_value > 100:
630
+ st.warning(f"⚠️ Very high N value ({n_value}) detected. Please verify this is correct.")
631
+ elif n_value == 0:
632
+ st.warning(f"⚠️ Zero N value detected. May indicate very soft soil or measurement issue.")
633
+
634
+ return n_value
635
+
636
+ # Enhanced fallback patterns with warnings
637
+ fallback_patterns = [
638
+ r'\bn[:\s=]*(\d+(?:\.\d+)?)',
639
+ r'\bspt[:\s]*(\d+(?:\.\d+)?)',
640
+ r'(\d+(?:\.\d+)?)\s*(?:blow|n)',
641
+ r'penetration[:\s]*(\d+(?:\.\d+)?)',
642
+ r'resistance[:\s]*(\d+(?:\.\d+)?)'
643
+ ]
644
+
645
+ for pattern in fallback_patterns:
646
+ match = re.search(pattern, description, re.IGNORECASE)
647
+ if match:
648
+ n_value = float(match.group(1))
649
+
650
+ # Enhanced warnings for SS samples
651
+ warning_indicators = ['corr', 'correct', 'adj', 'adjust', 'modified', 'norm']
652
+ has_correction_indicator = any(indicator in description.lower() for indicator in warning_indicators)
653
+
654
+ if has_correction_indicator:
655
+ st.error(f"❌ SS Sample: Found N = {n_value} but description contains correction terms. This may be corrected N, not raw N!")
656
+ st.info("πŸ’‘ For SS samples, use only raw field N values (not corrected). Check original field logs.")
657
+ # Still return the value but flag it
658
+ layer['n_value_warning'] = f"Potentially corrected N value: {n_value}"
659
+ else:
660
+ st.info(f"πŸ“Š SS Sample: Using N = {n_value} from: '{match.group(0)}' (fallback pattern)")
661
+
662
+ return n_value
663
+
664
+ # If no N value found, provide specific guidance for SS samples
665
+ st.error(f"❌ SS Sample: No SPT-N value found in layer data")
666
+ st.info("πŸ’‘ SS samples require SPT-N values. Look for: N=X, SPT-N=X, raw N=X, field N=X, or blow count.")
667
+
668
+ return None
669
+
670
+ def _convert_spt_to_su(self, spt_n: float) -> float:
671
+ """
672
+ Convert SPT-N to undrained shear strength (Su) using Su = 5*N correlation
673
+ Enhanced for SS samples with validation
674
+ """
675
+ if spt_n <= 0:
676
+ st.warning(f"⚠️ Invalid N value ({spt_n}) for Su calculation. Using N=1 as minimum.")
677
+ spt_n = 1.0
678
+
679
+ su_calculated = 5.0 * spt_n
680
+
681
+ # Add validation and guidance for SS clay samples
682
+ if su_calculated < 10:
683
+ st.info(f"πŸ’‘ Very low Su = {su_calculated:.0f} kPa from N={spt_n}. Indicates very soft clay.")
684
+ elif su_calculated > 500:
685
+ st.warning(f"⚠️ Very high Su = {su_calculated:.0f} kPa from N={spt_n}. Verify N value is raw (not corrected).")
686
+
687
+ return su_calculated
688
+
689
+ def _convert_spt_to_friction_angle(self, spt_n: float) -> float:
690
+ """
691
+ Enhanced SPT-N to friction angle conversion for sand/silt layers in SS samples
692
+ Uses improved Peck method with soil type considerations
693
+ """
694
+ if spt_n <= 0:
695
+ st.warning(f"⚠️ Invalid N value ({spt_n}) for friction angle calculation. Using N=1 as minimum.")
696
+ spt_n = 1.0
697
+
698
+ # Enhanced Peck correlation with improvements:
699
+ # Ο† = 27.1 + 0.3 * N - 0.00054 * NΒ² (for fine to medium sand)
700
+ # Valid for N up to 50, with adjustments for different sand types
701
+
702
+ n_limited = min(spt_n, 50) # Cap at 50 for correlation validity
703
+
704
+ # Base Peck correlation
705
+ phi = 27.1 + 0.3 * n_limited - 0.00054 * (n_limited ** 2)
706
+
707
+ # Ensure reasonable minimum
708
+ phi_final = max(phi, 28) # Minimum reasonable friction angle for sand
709
+ phi_final = min(phi_final, 45) # Maximum reasonable friction angle
710
+
711
+ # Add validation and guidance for SS sand samples
712
+ if phi_final < 30:
713
+ st.info(f"πŸ’‘ Low Ο† = {phi_final:.1f}Β° from N={spt_n}. Indicates loose sand or silty sand.")
714
+ elif phi_final > 40:
715
+ st.info(f"πŸ’‘ High Ο† = {phi_final:.1f}Β° from N={spt_n}. Indicates dense, well-graded sand.")
716
+
717
+ # Special handling for very low or high N values
718
+ if spt_n < 4:
719
+ st.warning(f"⚠️ Very low N={spt_n} for sand. May indicate loose sand or silt. Consider checking soil classification.")
720
+ elif spt_n > 40:
721
+ st.info(f"πŸ’‘ Very high N={spt_n} for sand. Indicates very dense sand or possible gravel content.")
722
+
723
+ return phi_final
724
+
725
+ def _convert_pressure_to_kpa(self, value: float, unit: str) -> float:
726
+ """
727
+ Enhanced pressure value conversion to kPa with comprehensive unit support
728
+ """
729
+ if not unit or unit.lower() in ['', 'none', 'null']:
730
+ return value # Assume already in kPa if no unit specified
731
+
732
+ # Normalize unit string for better matching
733
+ unit_clean = unit.lower().replace('/', '').replace(' ', '').replace('Β²', '2').replace('Β³', '3')
734
+
735
+ # Remove common punctuation and extra characters
736
+ unit_clean = unit_clean.replace('.', '').replace('-', '').replace('_', '')
737
+
738
+ # Handle specific variations that need special processing
739
+ special_cases = {
740
+ # Tonne/ton variations
741
+ 'tm2': 9.81, 'tonm2': 9.81, 'tonnesm2': 9.81, 'tonnem2': 9.81,
742
+ # kg/cmΒ² variations
743
+ 'kgcm2': 98.0, 'kgfcm2': 98.0,
744
+ # kN/mΒ² variations
745
+ 'knm2': 1.0,
746
+ # Other common variations
747
+ 'psig': 6.895, # psi gauge
748
+ 'psia': 6.895, # psi absolute
749
+ 'psfa': 0.04788, # psf absolute
750
+ 'torr': 0.133322, # torr (same as mmHg)
751
+ }
752
+
753
+ # Check special cases first
754
+ if unit_clean in special_cases:
755
+ conversion_factor = special_cases[unit_clean]
756
+ else:
757
+ # Standard conversion using enhanced dictionary
758
+ conversion_factor = self.unit_conversions.get(unit_clean, None)
759
+
760
+ # If no exact match found, try intelligent partial matching
761
+ if conversion_factor is None:
762
+ for known_unit, factor in self.unit_conversions.items():
763
+ # Try various normalization approaches
764
+ known_normalized = known_unit.replace('/', '').replace('Β²', '2').replace(' ', '')
765
+ if known_normalized == unit_clean:
766
+ conversion_factor = factor
767
+ break
768
+
769
+ # Check if unit contains the known unit (for compound units)
770
+ if known_unit != unit_clean and known_unit in unit_clean:
771
+ conversion_factor = factor
772
+ break
773
+
774
+ # Final fallback - assume kPa if still no match found
775
+ if conversion_factor is None:
776
+ st.warning(f"⚠️ Unknown pressure unit '{unit}'. Assuming kPa - please verify.")
777
+ conversion_factor = 1.0
778
+
779
+ converted_value = value * conversion_factor
780
+
781
+ # Enhanced logging with validation
782
+ if conversion_factor != 1.0:
783
+ st.success(f"πŸ”§ Unit conversion: {value} {unit} = {converted_value:.1f} kPa (Γ—{conversion_factor})")
784
+
785
+ # Add validation warnings for unusual results
786
+ if converted_value > 10000:
787
+ st.warning(f"⚠️ Very high pressure result ({converted_value:.0f} kPa). Please verify unit conversion.")
788
+ elif converted_value < 0.1 and value > 0:
789
+ st.warning(f"⚠️ Very low pressure result ({converted_value:.3f} kPa). Please verify unit conversion.")
790
+
791
+ return converted_value
792
+
793
+ def _convert_to_si_units(self, layer: Dict) -> Dict:
794
+ """
795
+ Convert all measurements to SI units
796
+ """
797
+ # Convert depths to meters
798
+ for depth_field in ['depth_from', 'depth_to']:
799
+ if depth_field in layer:
800
+ depth_val, depth_unit = self._extract_value_and_unit(
801
+ str(layer[depth_field]), default_unit='m'
802
+ )
803
+ layer[depth_field] = self._convert_length_to_meters(depth_val, depth_unit)
804
+
805
+ # Convert strength values to appropriate SI units
806
+ if 'strength_value' in layer and 'strength_parameter' in layer:
807
+ param = layer['strength_parameter'].lower()
808
+
809
+ if param == 'su':
810
+ # Convert Su to kPa
811
+ strength_val, strength_unit = self._extract_value_and_unit(
812
+ str(layer['strength_value']), default_unit='kpa'
813
+ )
814
+ layer['strength_value'] = self._convert_pressure_to_kpa(strength_val, strength_unit)
815
+ layer['strength_unit'] = 'kPa'
816
+
817
+ # Validate Su value against water content if available
818
+ validation_result = self._validate_su_with_water_content(layer)
819
+ if validation_result.get('needs_unit_check'):
820
+ st.warning(f"⚠️ Su-water content validation: {validation_result['message']}")
821
+ layer['unit_validation_warning'] = validation_result['message']
822
+ if validation_result['recommendations']:
823
+ st.info("πŸ’‘ Recommendations: " + "; ".join(validation_result['recommendations']))
824
+
825
+ elif param in ['Ο†', 'phi', 'friction_angle']:
826
+ # Friction angle should be in degrees (already SI)
827
+ layer['strength_unit'] = 'degrees'
828
+
829
+ elif param == 'spt-n':
830
+ # SPT-N is dimensionless
831
+ layer['strength_unit'] = 'blows/30cm'
832
+
833
+ return layer
834
+
835
+ def _extract_value_and_unit(self, value_str: str, default_unit: str = '') -> Tuple[float, str]:
836
+ """
837
+ Extract numeric value and unit from a string
838
+ """
839
+ # Remove extra spaces and convert to lowercase
840
+ clean_str = value_str.strip().lower()
841
+
842
+ # Pattern to match number followed by optional unit
843
+ pattern = r'(\d+(?:\.\d+)?)\s*([a-zA-Z/Β²]+)?'
844
+ match = re.search(pattern, clean_str)
845
+
846
+ if match:
847
+ value = float(match.group(1))
848
+ unit = match.group(2) if match.group(2) else default_unit
849
+ return value, unit
850
+
851
+ try:
852
+ return float(clean_str), default_unit
853
+ except ValueError:
854
+ return 0.0, default_unit
855
+
856
+ def _convert_length_to_meters(self, value: float, unit: str) -> float:
857
+ """
858
+ Convert length value to meters
859
+ """
860
+ unit_clean = unit.lower().replace(' ', '')
861
+ conversion_factor = self.unit_conversions.get(unit_clean, 1.0)
862
+ return value * conversion_factor
863
+
864
+ def _detect_t_m2_unit_error(self, layer: Dict) -> Dict:
865
+ """
866
+ Detect if LLM failed to convert t/mΒ² units to kPa
867
+ This is the most common unit conversion error
868
+ """
869
+ result = {"needs_conversion": False, "critical_error": False}
870
+
871
+ # Only check layers with Su values
872
+ if layer.get("strength_parameter") != "Su" or not layer.get("strength_value"):
873
+ return result
874
+
875
+ su = float(layer["strength_value"])
876
+ wc = layer.get("water_content", 0)
877
+ description = layer.get("description", "")
878
+
879
+ # Critical detection: Su values that are likely t/mΒ² but not converted
880
+ # Typical t/mΒ² values are 1-8, typical kPa values are 10-400 for clay
881
+
882
+ # Pattern 1: Su 1-8 with reasonable water content (15-50%)
883
+ if 1.0 <= su <= 8.0 and 15 <= wc <= 50:
884
+ converted_su = su * 9.81
885
+ result.update({
886
+ "needs_conversion": True,
887
+ "critical_error": True,
888
+ "original_su": su,
889
+ "converted_su": converted_su,
890
+ "unit_error": "t/mΒ²",
891
+ "message": f"⚠️ CRITICAL: Su={su:.2f} appears to be in t/m² units, should be {converted_su:.1f} kPa",
892
+ "correction": f"{su:.2f} t/mΒ² Γ— 9.81 = {converted_su:.1f} kPa"
893
+ })
894
+
895
+ # Pattern 2: Very low Su (<5) with low water content - could be t/mΒ²
896
+ elif su < 5.0 and wc > 0 and wc < 25:
897
+ converted_su = su * 9.81
898
+ result.update({
899
+ "needs_conversion": True,
900
+ "critical_error": True,
901
+ "original_su": su,
902
+ "converted_su": converted_su,
903
+ "unit_error": "t/mΒ²",
904
+ "message": f"⚠️ POSSIBLE: Su={su:.2f} might be in t/m² units, check if should be {converted_su:.1f} kPa",
905
+ "correction": f"{su:.2f} t/mΒ² Γ— 9.81 = {converted_su:.1f} kPa"
906
+ })
907
+
908
+ # Pattern 3: Check description for t/mΒ² mentions
909
+ if any(unit in description.lower() for unit in ['t/mΒ²', 't/m2', 'ton/mΒ²', 'ton/m2', 'tonnes/mΒ²']):
910
+ if su < 10: # If description mentions t/mΒ² but Su is low, likely not converted
911
+ converted_su = su * 9.81
912
+ result.update({
913
+ "needs_conversion": True,
914
+ "critical_error": True,
915
+ "original_su": su,
916
+ "converted_su": converted_su,
917
+ "unit_error": "t/mΒ² (found in description)",
918
+ "message": f"⚠️ CRITICAL: Description mentions t/m² but Su={su:.2f} appears unconverted, should be {converted_su:.1f} kPa",
919
+ "correction": f"{su:.2f} t/mΒ² Γ— 9.81 = {converted_su:.1f} kPa"
920
+ })
921
+
922
+ return result
923
+
924
+ def _validate_su_with_water_content(self, layer: Dict) -> Dict:
925
+ """
926
+ ENHANCED Su-water content validation with comprehensive unit checking
927
+
928
+ Standard correlations for clay (empirical relationships):
929
+ - Very soft clay: Su < 25 kPa, w% > 40%
930
+ - Soft clay: Su 25-50 kPa, w% 30-40%
931
+ - Medium clay: Su 50-100 kPa, w% 20-30%
932
+ - Stiff clay: Su 100-200 kPa, w% 15-25%
933
+ - Very stiff clay: Su 200-400 kPa, w% 10-20%
934
+ - Hard clay: Su > 400 kPa, w% < 15%
935
+
936
+ Key unit conversions to check:
937
+ - t/mΒ² β†’ kPa: Γ—9.81 (CRITICAL)
938
+ - ksc β†’ kPa: Γ—98.0
939
+ - psi β†’ kPa: Γ—6.895
940
+ - MPa β†’ kPa: Γ—1000
941
+ """
942
+ validation_result = {
943
+ 'valid': True,
944
+ 'needs_unit_check': False,
945
+ 'critical_unit_error': False,
946
+ 'suggested_conversion': None,
947
+ 'message': '',
948
+ 'recommendations': [],
949
+ 'recheck_image': False
950
+ }
951
+
952
+ su_value = layer.get('strength_value')
953
+ water_content = layer.get('water_content')
954
+ soil_type = layer.get('soil_type', '')
955
+ description = layer.get('description', '')
956
+
957
+ # Only validate for clay layers with both Su and water content
958
+ if soil_type != 'clay' or not su_value or not water_content:
959
+ return validation_result
960
+
961
+ try:
962
+ su = float(su_value)
963
+ wc = float(water_content)
964
+
965
+ # STEP 1: Check for t/mΒ² unit errors first (most common issue)
966
+ t_m2_check = self._detect_t_m2_unit_error(layer)
967
+ if t_m2_check.get('critical_error'):
968
+ validation_result.update({
969
+ 'critical_unit_error': True,
970
+ 'needs_conversion': True,
971
+ 'original_value': t_m2_check['original_su'],
972
+ 'suggested_value': t_m2_check['converted_su'],
973
+ 'unit_error_type': t_m2_check['unit_error'],
974
+ 'suggested_conversion': t_m2_check['correction'],
975
+ 'message': t_m2_check['message'],
976
+ 'recheck_image': True,
977
+ 'reload_picture': True
978
+ })
979
+ return validation_result
980
+
981
+ # STEP 2: Check for other unit conversion errors
982
+ unit_check_results = self._check_su_unit_conversions(su, wc, description)
983
+ if unit_check_results['needs_conversion']:
984
+ validation_result.update(unit_check_results)
985
+ validation_result['critical_unit_error'] = True
986
+ validation_result['recheck_image'] = True
987
+ return validation_result
988
+
989
+ # STEP 3: Detailed correlation analysis
990
+ inconsistencies = []
991
+ correlation_score = self._calculate_correlation_score(su, wc)
992
+
993
+ # Very specific clay consistency checks
994
+ if su < 25 and wc < 30:
995
+ inconsistencies.append(f"Very soft clay (Su={su:.0f}kPa) typically has w%>30%, found {wc:.1f}%")
996
+ if wc < 20:
997
+ validation_result['recheck_image'] = True
998
+ inconsistencies.append("VERIFY: Water content seems too low for very soft clay")
999
+
1000
+ if su > 400 and wc > 30:
1001
+ inconsistencies.append(f"Hard clay (Su={su:.0f}kPa) typically has w%<20%, found {wc:.1f}%")
1002
+ validation_result['recheck_image'] = True
1003
+ inconsistencies.append("VERIFY: Water content seems too high for hard clay")
1004
+
1005
+ # Medium-range mismatches
1006
+ if 50 <= su <= 200 and (wc > 45 or wc < 10):
1007
+ inconsistencies.append(f"Medium-stiff clay (Su={su:.0f}kPa) with unusual w%={wc:.1f}%")
1008
+ validation_result['recheck_image'] = True
1009
+
1010
+ # STEP 4: Empirical correlation bounds (Terzaghi-Peck relationships)
1011
+ expected_su_range = self._get_expected_su_range(wc)
1012
+ if su < expected_su_range['min'] * 0.2 or su > expected_su_range['max'] * 5:
1013
+ validation_result['needs_unit_check'] = True
1014
+ validation_result['recheck_image'] = True
1015
+ inconsistencies.append(f"Su-w% correlation severely off: Expected {expected_su_range['min']:.0f}-{expected_su_range['max']:.0f}kPa for w%={wc:.1f}%, got {su:.0f}kPa")
1016
+
1017
+ # STEP 4: Finalize results
1018
+ if inconsistencies:
1019
+ validation_result['valid'] = False
1020
+ validation_result['message'] = '; '.join(inconsistencies)
1021
+
1022
+ # Enhanced recommendations
1023
+ if validation_result['needs_unit_check']:
1024
+ validation_result['recommendations'].extend([
1025
+ "⚠️ CRITICAL: Check Su unit conversion carefully",
1026
+ "t/mΒ² β†’ kPa: multiply by 9.81",
1027
+ "ksc β†’ kPa: multiply by 98.0",
1028
+ "psi β†’ kPa: multiply by 6.895",
1029
+ "MPa β†’ kPa: multiply by 1000",
1030
+ "πŸ” Re-examine the original image/document"
1031
+ ])
1032
+
1033
+ if validation_result['recheck_image']:
1034
+ validation_result['recommendations'].extend([
1035
+ "πŸ“· RECHECK IMAGE: Values seem inconsistent",
1036
+ "πŸ”„ Consider reloading the image",
1037
+ "πŸ“‹ Verify both Su and water content readings"
1038
+ ])
1039
+ else:
1040
+ validation_result['message'] = f"Su-water content correlation acceptable (score: {correlation_score:.1f})"
1041
+
1042
+ except (ValueError, TypeError) as e:
1043
+ validation_result['valid'] = False
1044
+ validation_result['message'] = f"Could not validate Su-water content: {str(e)}"
1045
+ validation_result['recheck_image'] = True
1046
+
1047
+ return validation_result
1048
+
1049
+ def _check_su_unit_conversions(self, su: float, wc: float, description: str) -> Dict:
1050
+ """Check for specific unit conversion errors"""
1051
+ result = {
1052
+ 'needs_conversion': False,
1053
+ 'suggested_conversion': None,
1054
+ 'critical_unit_error': False,
1055
+ 'message': ''
1056
+ }
1057
+
1058
+ # Check for t/mΒ² that wasn't converted (very common error)
1059
+ if 2 <= su <= 10 and 15 <= wc <= 40:
1060
+ suggested_su = su * 9.81
1061
+ result.update({
1062
+ 'needs_conversion': True,
1063
+ 'suggested_conversion': f"{su} t/mΒ² β†’ {suggested_su:.1f} kPa (Γ—9.81)",
1064
+ 'critical_unit_error': True,
1065
+ 'message': f"CRITICAL: Su={su:.1f} appears to be in t/mΒ² (should be {suggested_su:.1f} kPa)"
1066
+ })
1067
+ return result
1068
+
1069
+ # Check for ksc that wasn't converted
1070
+ if 0.5 <= su <= 5 and 15 <= wc <= 50:
1071
+ suggested_su = su * 98.0
1072
+ result.update({
1073
+ 'needs_conversion': True,
1074
+ 'suggested_conversion': f"{su} ksc β†’ {suggested_su:.1f} kPa (Γ—98)",
1075
+ 'critical_unit_error': True,
1076
+ 'message': f"CRITICAL: Su={su:.1f} appears to be in ksc (should be {suggested_su:.1f} kPa)"
1077
+ })
1078
+ return result
1079
+
1080
+ # Check for psi that wasn't converted (high values)
1081
+ if 50 <= su <= 500 and 10 <= wc <= 35:
1082
+ suggested_su = su * 6.895
1083
+ result.update({
1084
+ 'needs_conversion': True,
1085
+ 'suggested_conversion': f"{su} psi β†’ {suggested_su:.1f} kPa (Γ—6.895)",
1086
+ 'critical_unit_error': True,
1087
+ 'message': f"CRITICAL: Su={su:.0f} appears to be in psi (should be {suggested_su:.1f} kPa)"
1088
+ })
1089
+ return result
1090
+
1091
+ # Check for MPa that wasn't converted (very low values)
1092
+ if 0.01 <= su <= 0.5 and 10 <= wc <= 40:
1093
+ suggested_su = su * 1000
1094
+ result.update({
1095
+ 'needs_conversion': True,
1096
+ 'suggested_conversion': f"{su} MPa β†’ {suggested_su:.1f} kPa (Γ—1000)",
1097
+ 'critical_unit_error': True,
1098
+ 'message': f"CRITICAL: Su={su:.2f} appears to be in MPa (should be {suggested_su:.1f} kPa)"
1099
+ })
1100
+ return result
1101
+
1102
+ return result
1103
+
1104
+ def _get_expected_su_range(self, water_content: float) -> Dict[str, float]:
1105
+ """Get expected Su range based on water content (empirical correlations)"""
1106
+ wc = water_content
1107
+
1108
+ # Conservative empirical relationships
1109
+ if wc >= 50:
1110
+ return {'min': 5, 'max': 20} # Very soft clay
1111
+ elif wc >= 40:
1112
+ return {'min': 10, 'max': 35} # Soft clay
1113
+ elif wc >= 30:
1114
+ return {'min': 20, 'max': 60} # Medium clay
1115
+ elif wc >= 20:
1116
+ return {'min': 40, 'max': 150} # Stiff clay
1117
+ elif wc >= 15:
1118
+ return {'min': 80, 'max': 250} # Very stiff clay
1119
+ else:
1120
+ return {'min': 150, 'max': 500} # Hard clay
1121
+
1122
+ def _calculate_correlation_score(self, su: float, wc: float) -> float:
1123
+ """Calculate correlation score (0-10, higher is better)"""
1124
+ # Simple scoring based on typical relationships
1125
+ expected_range = self._get_expected_su_range(wc)
1126
+
1127
+ if expected_range['min'] <= su <= expected_range['max']:
1128
+ return 10.0 # Perfect correlation
1129
+ elif expected_range['min'] * 0.5 <= su <= expected_range['max'] * 2:
1130
+ return 7.0 # Good correlation
1131
+ elif expected_range['min'] * 0.2 <= su <= expected_range['max'] * 5:
1132
+ return 4.0 # Acceptable correlation
1133
+ else:
1134
+ return 1.0 # Poor correlation
1135
+
1136
+ def _add_engineering_parameters(self, layer: Dict) -> Dict:
1137
+ """
1138
+ Add additional engineering parameters based on soil properties
1139
+ """
1140
+ soil_type = layer.get('soil_type', '')
1141
+
1142
+ # Add typical engineering properties based on soil type and strength
1143
+ if soil_type == 'clay':
1144
+ su_value = layer.get('strength_value', 0)
1145
+ if su_value > 0:
1146
+ # Estimate consistency based on Su
1147
+ if su_value < 25:
1148
+ layer['consistency'] = 'very soft'
1149
+ elif su_value < 50:
1150
+ layer['consistency'] = 'soft'
1151
+ elif su_value < 100:
1152
+ layer['consistency'] = 'medium'
1153
+ elif su_value < 200:
1154
+ layer['consistency'] = 'stiff'
1155
+ elif su_value < 400:
1156
+ layer['consistency'] = 'very stiff'
1157
+ else:
1158
+ layer['consistency'] = 'hard'
1159
+
1160
+ # Estimate unit weight (kN/mΒ³)
1161
+ layer['unit_weight'] = 16 + su_value / 50 # Empirical correlation
1162
+ layer['unit_weight_unit'] = 'kN/mΒ³'
1163
+
1164
+ elif soil_type in ['sand', 'silt']:
1165
+ # For sand/silt, use SPT-N or friction angle
1166
+ if 'original_spt' in layer:
1167
+ spt_n = layer['original_spt']
1168
+ # Estimate relative density based on SPT-N
1169
+ if spt_n < 4:
1170
+ layer['consistency'] = 'very loose'
1171
+ elif spt_n < 10:
1172
+ layer['consistency'] = 'loose'
1173
+ elif spt_n < 30:
1174
+ layer['consistency'] = 'medium dense'
1175
+ elif spt_n < 50:
1176
+ layer['consistency'] = 'dense'
1177
+ else:
1178
+ layer['consistency'] = 'very dense'
1179
+
1180
+ # Estimate unit weight (kN/mΒ³)
1181
+ layer['unit_weight'] = 14 + spt_n / 5 # Empirical correlation
1182
+ layer['unit_weight_unit'] = 'kN/mΒ³'
1183
+
1184
+ return layer
1185
+
1186
+ def _check_clay_consistency(self, layer: Dict) -> Dict:
1187
+ """
1188
+ Check consistency between water content and Su for clay soils
1189
+ """
1190
+ soil_type = layer.get('soil_type', '')
1191
+ if soil_type != 'clay':
1192
+ return layer
1193
+
1194
+ su_value = layer.get('strength_value')
1195
+ water_content = self._extract_water_content(layer)
1196
+
1197
+ if su_value and water_content:
1198
+ # Perform consistency check
1199
+ consistency_result = self._validate_clay_water_content_su_relationship(
1200
+ water_content, su_value
1201
+ )
1202
+
1203
+ layer['water_content'] = water_content
1204
+ layer['water_content_unit'] = '%'
1205
+ layer['clay_consistency_check'] = consistency_result
1206
+
1207
+ # Add consistency notes
1208
+ if consistency_result['is_consistent']:
1209
+ layer['consistency_note'] = f"βœ… Water content ({water_content}%) consistent with Su ({su_value} kPa)"
1210
+ else:
1211
+ layer['consistency_note'] = f"⚠️ {consistency_result['warning']}"
1212
+
1213
+ return layer
1214
+
1215
+ def _extract_water_content(self, layer: Dict) -> Optional[float]:
1216
+ """
1217
+ Extract water content from layer data
1218
+ """
1219
+ # Check if water content is directly specified
1220
+ if 'water_content' in layer:
1221
+ return float(layer['water_content'])
1222
+
1223
+ # Look in description for water content values
1224
+ description = layer.get('description', '')
1225
+
1226
+ patterns = [
1227
+ r'w[:\s=]*(\d+(?:\.\d+)?)\s*%',
1228
+ r'water\s*content[:\s]*(\d+(?:\.\d+)?)\s*%',
1229
+ r'moisture\s*content[:\s]*(\d+(?:\.\d+)?)\s*%',
1230
+ r'wc[:\s=]*(\d+(?:\.\d+)?)\s*%',
1231
+ r'(\d+(?:\.\d+)?)\s*%\s*moisture',
1232
+ r'(\d+(?:\.\d+)?)\s*%\s*water'
1233
+ ]
1234
+
1235
+ for pattern in patterns:
1236
+ match = re.search(pattern, description, re.IGNORECASE)
1237
+ if match:
1238
+ return float(match.group(1))
1239
+
1240
+ return None
1241
+
1242
+ def _validate_clay_water_content_su_relationship(self, water_content: float, su_value: float) -> Dict:
1243
+ """
1244
+ Validate the relationship between water content and undrained shear strength for clay
1245
+
1246
+ Enhanced analysis for ST layer soil division based on water content and unconfined test results:
1247
+ - Higher water content generally corresponds to lower Su
1248
+ - Different clay types have different relationships
1249
+ - Consider stress history and plasticity effects
1250
+ """
1251
+
1252
+ # Enhanced empirical relationships for clay consistency with expanded ranges
1253
+ consistency_ranges = {
1254
+ 'very_soft': {'w_range': (40, 150), 'su_range': (0, 25), 'description': 'High plasticity, organic clays'},
1255
+ 'soft': {'w_range': (25, 70), 'su_range': (25, 50), 'description': 'Normally consolidated clays'},
1256
+ 'medium': {'w_range': (18, 40), 'su_range': (50, 100), 'description': 'Lightly overconsolidated clays'},
1257
+ 'stiff': {'w_range': (12, 28), 'su_range': (100, 200), 'description': 'Overconsolidated clays'},
1258
+ 'very_stiff': {'w_range': (8, 20), 'su_range': (200, 400), 'description': 'Heavily overconsolidated clays'},
1259
+ 'hard': {'w_range': (5, 15), 'su_range': (400, 1000), 'description': 'Desiccated or cemented clays'}
1260
+ }
1261
+
1262
+ # Determine expected consistency based on Su
1263
+ su_consistency = None
1264
+ for consistency, ranges in consistency_ranges.items():
1265
+ if ranges['su_range'][0] <= su_value <= ranges['su_range'][1]:
1266
+ su_consistency = consistency
1267
+ break
1268
+
1269
+ # Determine expected consistency based on water content
1270
+ w_consistency = None
1271
+ for consistency, ranges in consistency_ranges.items():
1272
+ if ranges['w_range'][0] <= water_content <= ranges['w_range'][1]:
1273
+ w_consistency = consistency
1274
+ break
1275
+
1276
+ # Check consistency
1277
+ result = {
1278
+ 'water_content': water_content,
1279
+ 'su_value': su_value,
1280
+ 'w_consistency': w_consistency,
1281
+ 'su_consistency': su_consistency,
1282
+ 'is_consistent': False,
1283
+ 'warning': '',
1284
+ 'note': ''
1285
+ }
1286
+
1287
+ if su_consistency and w_consistency:
1288
+ if su_consistency == w_consistency:
1289
+ result['is_consistent'] = True
1290
+ result['note'] = f"Water content and Su both indicate {su_consistency.replace('_', ' ')} clay"
1291
+ else:
1292
+ result['warning'] = f"Inconsistent: Water content suggests {w_consistency.replace('_', ' ')} clay, but Su suggests {su_consistency.replace('_', ' ')} clay"
1293
+ elif su_consistency and not w_consistency:
1294
+ if water_content > 60:
1295
+ result['warning'] = f"Very high water content ({water_content}%) for Su = {su_value} kPa. Check if clay is highly plastic or organic."
1296
+ elif water_content < 10:
1297
+ result['warning'] = f"Very low water content ({water_content}%) for clay. Check if sample was dried or is highly over-consolidated."
1298
+ else:
1299
+ result['note'] = f"Water content outside typical ranges but Su indicates {su_consistency.replace('_', ' ')} clay"
1300
+ elif w_consistency and not su_consistency:
1301
+ result['warning'] = f"Su value ({su_value} kPa) outside typical ranges for clay with {water_content}% water content"
1302
+ else:
1303
+ result['warning'] = f"Both water content ({water_content}%) and Su ({su_value} kPa) outside typical clay ranges"
1304
+
1305
+ # Enhanced empirical correlation checks for ST layer division
1306
+ if water_content and su_value:
1307
+ # Advanced correlation analysis for ST samples
1308
+
1309
+ # Check for high plasticity clay indicators
1310
+ if water_content > 80:
1311
+ if su_value < 25:
1312
+ result['note'] = f"High plasticity clay indicated: w={water_content}%, Su={su_value} kPa. Possible CH or organic clay."
1313
+ elif su_value > 50:
1314
+ result['warning'] = f"Inconsistent: Very high water content ({water_content}%) with moderate/high Su ({su_value} kPa). Check sample integrity or clay type."
1315
+
1316
+ # Check for low plasticity clay indicators
1317
+ elif water_content < 15:
1318
+ if su_value > 200:
1319
+ result['note'] = f"Low plasticity, overconsolidated clay: w={water_content}%, Su={su_value} kPa. Possible CL or aged clay."
1320
+ elif su_value < 100:
1321
+ result['warning'] = f"Low water content ({water_content}%) with low Su ({su_value} kPa). Unusual - check if sample was dried."
1322
+
1323
+ # Check stress history indicators
1324
+ ocr_estimate = self._estimate_overconsolidation_ratio(water_content, su_value)
1325
+ if ocr_estimate > 1.5:
1326
+ result['note'] = result.get('note', '') + f" Estimated OCR β‰ˆ {ocr_estimate:.1f} (overconsolidated)"
1327
+ elif ocr_estimate < 0.8:
1328
+ result['note'] = result.get('note', '') + f" Estimated OCR β‰ˆ {ocr_estimate:.1f} (possibly underconsolidated)"
1329
+
1330
+ # Soil division recommendations for ST samples
1331
+ result['st_division_recommendation'] = self._recommend_st_layer_division(water_content, su_value)
1332
+
1333
+ return result
1334
+
1335
+ def _estimate_overconsolidation_ratio(self, water_content: float, su_value: float) -> float:
1336
+ """
1337
+ Estimate overconsolidation ratio (OCR) from water content and Su
1338
+ Based on empirical correlations for ST samples
1339
+ """
1340
+ # Simplified correlation: OCR β‰ˆ (Su_measured / Su_normally_consolidated)
1341
+ # For normally consolidated clays: Su β‰ˆ 0.22 * Οƒ'v
1342
+ # Approximate Οƒ'v from water content using typical correlations
1343
+
1344
+ if water_content > 50:
1345
+ # High water content suggests normally consolidated or slightly overconsolidated
1346
+ expected_su_nc = max(15, 100 - water_content) # Simplified correlation
1347
+ else:
1348
+ # Lower water content suggests overconsolidation
1349
+ expected_su_nc = max(50, 150 - 2 * water_content)
1350
+
1351
+ ocr_estimate = su_value / expected_su_nc if expected_su_nc > 0 else 1.0
1352
+ return max(0.5, min(ocr_estimate, 10.0)) # Reasonable bounds
1353
+
1354
+ def _recommend_st_layer_division(self, water_content: float, su_value: float) -> Dict:
1355
+ """
1356
+ Recommend layer division strategy for ST samples based on water content and Su results
1357
+ """
1358
+ recommendation = {
1359
+ 'division_strategy': 'single_layer',
1360
+ 'reason': 'Uniform properties',
1361
+ 'subdivision_criteria': []
1362
+ }
1363
+
1364
+ # Check for significant property variations that suggest subdivision
1365
+ if water_content > 60 and su_value > 75:
1366
+ recommendation['division_strategy'] = 'check_variation'
1367
+ recommendation['reason'] = 'Conflicting water content and strength - check for property variations'
1368
+ recommendation['subdivision_criteria'].append('Water content variation > 10%')
1369
+ recommendation['subdivision_criteria'].append('Su variation > 30%')
1370
+
1371
+ elif water_content < 20 and su_value < 80:
1372
+ recommendation['division_strategy'] = 'check_variation'
1373
+ recommendation['reason'] = 'Both low water content and Su - check for soil type variations'
1374
+ recommendation['subdivision_criteria'].append('Plasticity index variations')
1375
+ recommendation['subdivision_criteria'].append('Sieve analysis variations')
1376
+
1377
+ elif abs(water_content - 30) > 20 or su_value > 300:
1378
+ recommendation['division_strategy'] = 'subdivide_recommended'
1379
+ recommendation['reason'] = 'Extreme properties suggest heterogeneous layer'
1380
+ recommendation['subdivision_criteria'].append('Test at multiple depths')
1381
+ recommendation['subdivision_criteria'].append('Check for interbedded materials')
1382
+
1383
+ return recommendation
1384
+
1385
+ def get_processing_summary(self, layers: List[Dict]) -> Dict[str, Any]:
1386
+ """
1387
+ Generate a summary of the soil layer processing
1388
+ """
1389
+ summary = {
1390
+ 'total_layers': len(layers),
1391
+ 'st_samples': 0,
1392
+ 'ss_samples': 0,
1393
+ 'clay_layers': 0,
1394
+ 'sand_layers': 0,
1395
+ 'su_calculated': 0,
1396
+ 'phi_calculated': 0,
1397
+ 'clay_consistency_checks': 0,
1398
+ 'consistent_clays': 0,
1399
+ 'inconsistent_clays': 0,
1400
+ 'unit_conversions': [],
1401
+ 'processing_notes': []
1402
+ }
1403
+
1404
+ for layer in layers:
1405
+ # Count sample types
1406
+ sample_type = layer.get('sample_type', '')
1407
+ if sample_type == 'ST':
1408
+ summary['st_samples'] += 1
1409
+ elif sample_type == 'SS':
1410
+ summary['ss_samples'] += 1
1411
+
1412
+ # Count soil types
1413
+ soil_type = layer.get('soil_type', '')
1414
+ if soil_type == 'clay':
1415
+ summary['clay_layers'] += 1
1416
+ elif soil_type in ['sand', 'silt']:
1417
+ summary['sand_layers'] += 1
1418
+
1419
+ # Count calculated parameters
1420
+ if 'su_source' in layer and 'Calculated' in layer['su_source']:
1421
+ summary['su_calculated'] += 1
1422
+ if 'phi_source' in layer and 'Calculated' in layer['phi_source']:
1423
+ summary['phi_calculated'] += 1
1424
+
1425
+ # Count clay consistency checks
1426
+ if 'clay_consistency_check' in layer:
1427
+ summary['clay_consistency_checks'] += 1
1428
+ consistency_result = layer['clay_consistency_check']
1429
+ if consistency_result.get('is_consistent', False):
1430
+ summary['consistent_clays'] += 1
1431
+ else:
1432
+ summary['inconsistent_clays'] += 1
1433
+
1434
+ return summary
soil_visualizer.py ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import matplotlib.pyplot as plt
2
+ import plotly.graph_objects as go
3
+ import plotly.express as px
4
+ import pandas as pd
5
+ import numpy as np
6
+ import streamlit as st
7
+ from config import SOIL_TYPES, STRENGTH_PARAMETERS
8
+
9
+ class SoilProfileVisualizer:
10
+ def __init__(self):
11
+ self.soil_colors = {
12
+ "soft clay": "#8B4513",
13
+ "medium clay": "#A0522D",
14
+ "stiff clay": "#D2691E",
15
+ "very stiff clay": "#CD853F",
16
+ "hard clay": "#DEB887",
17
+ "loose sand": "#F4A460",
18
+ "medium dense sand": "#DAA520",
19
+ "dense sand": "#B8860B",
20
+ "very dense sand": "#CD853F",
21
+ "soft silt": "#DDA0DD",
22
+ "medium silt": "#BA55D3",
23
+ "stiff silt": "#9370DB",
24
+ "loose gravel": "#696969",
25
+ "dense gravel": "#2F4F4F",
26
+ "weathered rock": "#708090",
27
+ "soft rock": "#2F4F4F",
28
+ "hard rock": "#36454F"
29
+ }
30
+
31
+ def create_soil_profile_plot(self, soil_data):
32
+ """Create interactive soil profile visualization"""
33
+ if not soil_data or "soil_layers" not in soil_data:
34
+ return None
35
+
36
+ layers = soil_data["soil_layers"]
37
+
38
+ fig = go.Figure()
39
+
40
+ # Add soil layers
41
+ for i, layer in enumerate(layers):
42
+ depth_from = layer.get("depth_from", 0)
43
+ depth_to = layer.get("depth_to", 0)
44
+ soil_type = layer.get("soil_type", "unknown")
45
+ description = layer.get("description", "")
46
+ strength_value = layer.get("strength_value", "N/A")
47
+ strength_param = layer.get("strength_parameter", "")
48
+
49
+ # Get color
50
+ color = self.soil_colors.get(soil_type.lower(), "#CCCCCC")
51
+
52
+ # Create layer rectangle
53
+ fig.add_shape(
54
+ type="rect",
55
+ x0=0, x1=1,
56
+ y0=-depth_to, y1=-depth_from,
57
+ fillcolor=color,
58
+ line=dict(color="black", width=1),
59
+ opacity=0.8
60
+ )
61
+
62
+ # Add layer text with enhanced parameters
63
+ mid_depth = -(depth_from + depth_to) / 2
64
+
65
+ # Build text with available parameters
66
+ text_lines = [f"{layer.get('consistency', '')} {soil_type}".strip()]
67
+
68
+ # Add strength parameters
69
+ if strength_param and strength_value is not None:
70
+ text_lines.append(f"{strength_param}: {strength_value}")
71
+
72
+ # Add calculated Su if available
73
+ if layer.get("calculated_su"):
74
+ text_lines.append(f"Su: {layer['calculated_su']:.0f} kPa*")
75
+
76
+ # Add friction angle if available
77
+ if layer.get("friction_angle"):
78
+ text_lines.append(f"Ο†: {layer['friction_angle']:.1f}Β°*")
79
+
80
+ fig.add_annotation(
81
+ x=0.5, y=mid_depth,
82
+ text="<br>".join(text_lines),
83
+ showarrow=False,
84
+ font=dict(size=9, color="white"),
85
+ bgcolor="rgba(0,0,0,0.6)",
86
+ bordercolor="white",
87
+ borderwidth=1
88
+ )
89
+
90
+ # Add depth markers
91
+ max_depth = max([layer.get("depth_to", 0) for layer in layers])
92
+ depth_ticks = list(range(0, int(max_depth) + 5, 5))
93
+
94
+ fig.update_layout(
95
+ title="Soil Profile",
96
+ xaxis=dict(
97
+ range=[0, 1],
98
+ showticklabels=False,
99
+ showgrid=False,
100
+ zeroline=False
101
+ ),
102
+ yaxis=dict(
103
+ title="Depth (m)",
104
+ range=[-max_depth - 2, 2],
105
+ tickvals=[-d for d in depth_ticks],
106
+ ticktext=[str(d) for d in depth_ticks],
107
+ showgrid=True,
108
+ gridcolor="lightgray"
109
+ ),
110
+ width=400,
111
+ height=600,
112
+ margin=dict(l=50, r=50, t=50, b=50)
113
+ )
114
+
115
+ # Add water table if present
116
+ if "water_table" in soil_data and soil_data["water_table"].get("depth"):
117
+ wt_depth = soil_data["water_table"]["depth"]
118
+ fig.add_hline(
119
+ y=-wt_depth,
120
+ line_dash="dash",
121
+ line_color="blue",
122
+ annotation_text="Water Table",
123
+ annotation_position="right"
124
+ )
125
+
126
+ return fig
127
+
128
+ def create_strength_profile_plot(self, soil_data):
129
+ """Create strength parameter vs depth plot"""
130
+ if not soil_data or "soil_layers" not in soil_data:
131
+ return None
132
+
133
+ layers = soil_data["soil_layers"]
134
+
135
+ depths = []
136
+ strengths = []
137
+ soil_types = []
138
+
139
+ for layer in layers:
140
+ depth_from = layer.get("depth_from", 0)
141
+ depth_to = layer.get("depth_to", 0)
142
+ strength_value = layer.get("strength_value")
143
+ soil_type = layer.get("soil_type", "")
144
+
145
+ if strength_value is not None:
146
+ mid_depth = (depth_from + depth_to) / 2
147
+ depths.append(mid_depth)
148
+ strengths.append(strength_value)
149
+ soil_types.append(soil_type)
150
+
151
+ if not depths:
152
+ return None
153
+
154
+ fig = go.Figure()
155
+
156
+ # Group by parameter type
157
+ clay_depths = []
158
+ clay_strengths = []
159
+ sand_depths = []
160
+ sand_strengths = []
161
+
162
+ for i, soil_type in enumerate(soil_types):
163
+ if "clay" in soil_type.lower():
164
+ clay_depths.append(depths[i])
165
+ clay_strengths.append(strengths[i])
166
+ else:
167
+ sand_depths.append(depths[i])
168
+ sand_strengths.append(strengths[i])
169
+
170
+ # Add traces
171
+ if clay_depths:
172
+ # Create custom hover text for Su values
173
+ clay_hover_text = [f"Depth: {d:.1f}m<br>Su: {s:.1f} kPa" for d, s in zip(clay_depths, clay_strengths)]
174
+
175
+ fig.add_trace(go.Scatter(
176
+ x=clay_strengths,
177
+ y=clay_depths,
178
+ mode='markers+lines',
179
+ name='Su (kPa)',
180
+ marker=dict(color='brown', size=8),
181
+ line=dict(color='brown'),
182
+ hovertemplate='%{customdata}<extra></extra>',
183
+ customdata=clay_hover_text
184
+ ))
185
+
186
+ if sand_depths:
187
+ # Create custom hover text for SPT-N values
188
+ sand_hover_text = [f"Depth: {d:.1f}m<br>SPT-N: {s:.0f} blows/30cm" for d, s in zip(sand_depths, sand_strengths)]
189
+
190
+ fig.add_trace(go.Scatter(
191
+ x=sand_strengths,
192
+ y=sand_depths,
193
+ mode='markers+lines',
194
+ name='SPT-N (blows/30cm)',
195
+ marker=dict(color='gold', size=8),
196
+ line=dict(color='gold'),
197
+ hovertemplate='%{customdata}<extra></extra>',
198
+ customdata=sand_hover_text
199
+ ))
200
+
201
+ # Determine primary axis title based on data
202
+ if clay_depths and sand_depths:
203
+ xaxis_title = "Strength Value (Su in kPa / SPT-N)"
204
+ elif clay_depths:
205
+ xaxis_title = "Undrained Shear Strength, Su (kPa)"
206
+ elif sand_depths:
207
+ xaxis_title = "SPT-N Value (blows/30cm)"
208
+ else:
209
+ xaxis_title = "Strength Value"
210
+
211
+ fig.update_layout(
212
+ title="Strength Parameters vs Depth",
213
+ xaxis_title=xaxis_title,
214
+ yaxis_title="Depth (m)",
215
+ yaxis=dict(autorange='reversed'),
216
+ width=500,
217
+ height=600,
218
+ showlegend=True,
219
+ legend=dict(
220
+ yanchor="top",
221
+ y=0.99,
222
+ xanchor="left",
223
+ x=0.01
224
+ )
225
+ )
226
+
227
+ return fig
228
+
229
+ def create_layer_summary_table(self, soil_data):
230
+ """Create summary table of soil layers"""
231
+ if not soil_data or "soil_layers" not in soil_data:
232
+ return None
233
+
234
+ layers = soil_data["soil_layers"]
235
+
236
+ df_data = []
237
+ for layer in layers:
238
+ # Build strength info with units
239
+ strength_info = ""
240
+ if layer.get("strength_parameter") and layer.get("strength_value") is not None:
241
+ param = layer['strength_parameter']
242
+ value = layer['strength_value']
243
+
244
+ # Add units based on parameter type
245
+ if param == "Su":
246
+ strength_info = f"Su: {value:.1f} kPa"
247
+ elif param == "SPT-N":
248
+ strength_info = f"SPT-N: {value:.0f} blows/30cm"
249
+ else:
250
+ strength_info = f"{param}: {value}"
251
+
252
+ # Add calculated parameters
253
+ calc_params = []
254
+ if layer.get("calculated_su"):
255
+ calc_params.append(f"Su: {layer['calculated_su']:.0f} kPa (calc)")
256
+ if layer.get("friction_angle"):
257
+ calc_params.append(f"Ο†: {layer['friction_angle']:.1f}Β° (calc)")
258
+
259
+ if calc_params:
260
+ strength_info += f" | {' | '.join(calc_params)}"
261
+
262
+ df_data.append({
263
+ "Layer": layer.get("layer_id", ""),
264
+ "Depth From (m)": layer.get("depth_from", ""),
265
+ "Depth To (m)": layer.get("depth_to", ""),
266
+ "Soil Type": f"{layer.get('consistency', '')} {layer.get('soil_type', '')}".strip(),
267
+ "Description": layer.get("description", ""),
268
+ "Strength Parameters": strength_info,
269
+ "Color": layer.get("color", ""),
270
+ "Moisture": layer.get("moisture", ""),
271
+ "Notes": layer.get("su_source", "") or layer.get("friction_angle_source", "") or ""
272
+ })
273
+
274
+ return pd.DataFrame(df_data)
275
+
276
+ def export_profile_data(self, soil_data, format="csv"):
277
+ """Export soil profile data"""
278
+ df = self.create_layer_summary_table(soil_data)
279
+
280
+ if format == "csv":
281
+ return df.to_csv(index=False)
282
+ elif format == "json":
283
+ return df.to_json(orient="records", indent=2)
284
+ else:
285
+ return df.to_string(index=False)
unified_soil_workflow.py ADDED
@@ -0,0 +1,1287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Unified Soil Analysis Workflow using LangGraph
3
+ Combines LLM classification and SS/ST processing into a single controlled workflow
4
+ """
5
+
6
+ import json
7
+ from typing import Dict, List, Any, Optional, TypedDict, Annotated
8
+ import streamlit as st
9
+ from langgraph.graph import StateGraph, START, END
10
+ from langgraph.graph.message import add_messages
11
+ from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
12
+ import openai
13
+ from soil_classification import SoilClassificationProcessor
14
+ from soil_calculations import SoilCalculations
15
+ from config import LLM_PROVIDERS, AVAILABLE_MODELS, get_default_provider_and_model, get_api_key
16
+
17
+
18
+ class SoilAnalysisState(TypedDict):
19
+ """State for the unified soil analysis workflow"""
20
+ # Input data
21
+ text_content: Optional[str]
22
+ image_base64: Optional[str]
23
+ model: str
24
+ api_key: str
25
+
26
+ # Processing flags
27
+ merge_similar: bool
28
+ split_thick: bool
29
+
30
+ # LLM Analysis results
31
+ raw_llm_response: Optional[str]
32
+ llm_extraction_success: bool
33
+ extraction_errors: List[str]
34
+ retry_count: int # Add retry counter
35
+
36
+ # Soil data (from LLM)
37
+ project_info: Dict[str, Any]
38
+ raw_soil_layers: List[Dict[str, Any]]
39
+ water_table: Dict[str, Any]
40
+ notes: str
41
+
42
+ # Processing results
43
+ processed_layers: List[Dict[str, Any]]
44
+ processing_summary: Dict[str, Any]
45
+ validation_stats: Dict[str, Any]
46
+ optimization_results: Dict[str, Any]
47
+
48
+ # Final output
49
+ final_soil_data: Dict[str, Any]
50
+ workflow_status: str
51
+ workflow_messages: Annotated[List[BaseMessage], add_messages]
52
+
53
+
54
+ class UnifiedSoilWorkflow:
55
+ """
56
+ Unified LangGraph workflow for soil analysis
57
+ Combines LLM extraction and SS/ST processing into one controlled flow
58
+ """
59
+
60
+ def __init__(self):
61
+ self.soil_processor = SoilClassificationProcessor()
62
+ self.soil_calculator = SoilCalculations()
63
+ self.workflow = self._build_workflow()
64
+
65
+ def _get_provider_from_model(self, model: str) -> str:
66
+ """Determine provider from model name"""
67
+ for model_id, model_info in AVAILABLE_MODELS.items():
68
+ if model_id == model:
69
+ # Return the first provider that supports this model
70
+ providers = model_info.get("providers", [])
71
+ if providers:
72
+ return providers[0]
73
+
74
+ # Default fallback logic based on model prefix
75
+ if model.startswith("anthropic/"):
76
+ return "anthropic"
77
+ elif model.startswith("google/"):
78
+ return "google"
79
+ else:
80
+ return "openrouter" # Default to OpenRouter for other models
81
+
82
+ def _build_workflow(self) -> StateGraph:
83
+ """Build the unified LangGraph workflow"""
84
+
85
+ # Create workflow graph
86
+ workflow = StateGraph(SoilAnalysisState)
87
+
88
+ # Add nodes
89
+ workflow.add_node("validate_inputs", self._validate_inputs)
90
+ workflow.add_node("extract_with_llm", self._extract_with_llm)
91
+ workflow.add_node("validate_extraction", self._validate_extraction)
92
+ workflow.add_node("process_ss_st_classification", self._process_ss_st_classification)
93
+ workflow.add_node("apply_unit_conversions", self._apply_unit_conversions)
94
+ workflow.add_node("validate_soil_classification", self._validate_soil_classification)
95
+ workflow.add_node("calculate_parameters", self._calculate_parameters)
96
+ workflow.add_node("optimize_layers", self._optimize_layers)
97
+ workflow.add_node("finalize_results", self._finalize_results)
98
+ workflow.add_node("handle_errors", self._handle_errors)
99
+
100
+ # Define workflow edges
101
+ workflow.add_edge(START, "validate_inputs")
102
+
103
+ # Conditional routing based on validation
104
+ workflow.add_conditional_edges(
105
+ "validate_inputs",
106
+ self._should_continue_after_validation,
107
+ {
108
+ "continue": "extract_with_llm",
109
+ "error": "handle_errors"
110
+ }
111
+ )
112
+
113
+ workflow.add_edge("extract_with_llm", "validate_extraction")
114
+
115
+ # Simplified routing - no retry loop to prevent recursion
116
+ workflow.add_conditional_edges(
117
+ "validate_extraction",
118
+ self._should_continue_after_extraction,
119
+ {
120
+ "continue": "process_ss_st_classification",
121
+ "error": "handle_errors"
122
+ }
123
+ )
124
+
125
+ workflow.add_edge("process_ss_st_classification", "apply_unit_conversions")
126
+ workflow.add_edge("apply_unit_conversions", "validate_soil_classification")
127
+ workflow.add_edge("validate_soil_classification", "calculate_parameters")
128
+ workflow.add_edge("calculate_parameters", "optimize_layers")
129
+ workflow.add_edge("finalize_results", END)
130
+ workflow.add_edge("optimize_layers", "finalize_results")
131
+ workflow.add_edge("handle_errors", END)
132
+
133
+ return workflow.compile()
134
+
135
+ def _validate_inputs(self, state: SoilAnalysisState) -> SoilAnalysisState:
136
+ """Validate input data and configuration"""
137
+ st.info("πŸ” Step 1: Validating inputs...")
138
+
139
+ errors = []
140
+
141
+ # Validate API key
142
+ if not state.get("api_key"):
143
+ errors.append("No API key provided")
144
+
145
+ # Validate content
146
+ if not state.get("text_content") and not state.get("image_base64"):
147
+ errors.append("No text or image content provided")
148
+
149
+ # Validate model (allow custom models not in AVAILABLE_MODELS)
150
+ _, default_model = get_default_provider_and_model()
151
+ model = state.get("model", default_model)
152
+ if not model or not isinstance(model, str):
153
+ errors.append(f"Invalid model format: {model}")
154
+ elif model not in AVAILABLE_MODELS:
155
+ # Allow custom models - just log info
156
+ st.info(f"πŸ“‹ Using custom model: {model} (not in pre-configured list)")
157
+
158
+ if errors:
159
+ state["extraction_errors"] = errors
160
+ state["workflow_status"] = "validation_failed"
161
+ state["workflow_messages"] = [HumanMessage(content=f"Validation errors: {', '.join(errors)}")]
162
+ else:
163
+ state["workflow_status"] = "validated"
164
+ state["workflow_messages"] = [HumanMessage(content="Input validation passed")]
165
+ st.success("βœ… Input validation passed")
166
+
167
+ return state
168
+
169
+ def _extract_with_llm(self, state: SoilAnalysisState) -> SoilAnalysisState:
170
+ """Extract soil data using LLM with enhanced prompts"""
171
+ retry_count = state.get("retry_count", 0)
172
+ st.info(f"πŸ€– Step 2: Extracting soil data with LLM... (attempt {retry_count + 1})")
173
+
174
+ try:
175
+ # Determine provider and base URL from model
176
+ provider_id = self._get_provider_from_model(state["model"])
177
+ base_url = LLM_PROVIDERS[provider_id]["base_url"]
178
+
179
+ # Initialize OpenAI client with correct provider
180
+ client = openai.OpenAI(
181
+ base_url=base_url,
182
+ api_key=state["api_key"]
183
+ )
184
+
185
+ # Enhanced system prompt with all requirements - use safer version for Gemini
186
+ if "gemini" in state["model"].lower():
187
+ system_prompt = self._get_gemini_safe_prompt()
188
+ st.info("πŸ”§ Using Gemini-optimized prompt to avoid content filtering")
189
+ else:
190
+ system_prompt = self._get_unified_system_prompt()
191
+
192
+ # Build messages
193
+ messages = [{"role": "system", "content": system_prompt}]
194
+
195
+ # Add content
196
+ if state.get("text_content"):
197
+ messages.append({
198
+ "role": "user",
199
+ "content": f"Please analyze this soil boring log text:\n\n{state['text_content']}"
200
+ })
201
+
202
+ # Add image if supported and available
203
+ model_info = AVAILABLE_MODELS.get(state["model"], {})
204
+ # For custom models, assume image support (user responsibility)
205
+ supports_images = model_info.get('supports_images', True) if state["model"] not in AVAILABLE_MODELS else model_info.get('supports_images', False)
206
+
207
+ if state.get("image_base64") and supports_images:
208
+ messages.append({
209
+ "role": "user",
210
+ "content": [
211
+ {"type": "text", "text": "Please analyze this soil boring log image:"},
212
+ {
213
+ "type": "image_url",
214
+ "image_url": {"url": f"data:image/png;base64,{state['image_base64']}"}
215
+ }
216
+ ]
217
+ })
218
+
219
+ # Call LLM with detailed error handling
220
+ st.info(f"πŸ”— Making API call to {state['model']}...")
221
+ st.info(f"πŸ“ Message count: {len(messages)}, Max tokens: 3000")
222
+
223
+ try:
224
+ response = client.chat.completions.create(
225
+ model=state["model"],
226
+ messages=messages,
227
+ max_tokens=3000,
228
+ temperature=0.1
229
+ )
230
+
231
+ # Debug response structure
232
+ st.info(f"πŸ” Response received - Choices count: {len(response.choices) if response and response.choices else 0}")
233
+
234
+ # Check if response is valid
235
+ if not response or not response.choices:
236
+ raise Exception("No response received from LLM API")
237
+
238
+ raw_response = response.choices[0].message.content
239
+
240
+ # Debug response content
241
+ if raw_response is None:
242
+ raise Exception("Response content is None")
243
+ elif not raw_response.strip():
244
+ # Check if it's just whitespace/newlines
245
+ if len(raw_response) > 0:
246
+ whitespace_chars = [repr(c) for c in raw_response[:10]]
247
+ raise Exception(f"Response contains only whitespace (length: {len(raw_response)}, chars: {whitespace_chars})")
248
+ else:
249
+ raise Exception("Completely empty response from LLM API")
250
+
251
+ # Check for very short responses that might indicate filtering
252
+ elif len(raw_response.strip()) < 10:
253
+ st.warning(f"⚠️ Very short response ({len(raw_response)} chars): '{raw_response[:50]}'")
254
+ st.info("πŸ’‘ This might indicate content filtering. Try a simpler prompt or different model.")
255
+
256
+ state["raw_llm_response"] = raw_response
257
+ st.success(f"πŸ“₯ Received response: {len(raw_response)} characters")
258
+
259
+ except Exception as api_error:
260
+ # Enhanced API error handling
261
+ error_msg = str(api_error)
262
+ st.error(f"❌ API call failed: {error_msg}")
263
+
264
+ # Check if it's a model-specific issue
265
+ if "not a valid model ID" in error_msg:
266
+ st.error(f"🚫 Model '{state['model']}' is not available on OpenRouter")
267
+ st.info("πŸ’‘ Try using a different model like 'anthropic/claude-sonnet-4'")
268
+ elif "rate limit" in error_msg.lower():
269
+ st.error("⏰ Rate limit exceeded. Please wait and try again.")
270
+ elif "empty" in error_msg.lower() or "none" in error_msg.lower():
271
+ st.error("πŸ“­ Model returned empty response. This might be due to:")
272
+ st.info(" β€’ Content filtering by the model")
273
+ st.info(" β€’ Model configuration issues")
274
+ st.info(" β€’ Input content triggering safety filters")
275
+ st.info("πŸ’‘ Try a different model or simpler input text")
276
+
277
+ raise api_error
278
+
279
+ # Parse JSON response with enhanced error handling
280
+ soil_data = self._parse_llm_response(raw_response)
281
+
282
+ if "error" in soil_data:
283
+ state["llm_extraction_success"] = False
284
+ state["extraction_errors"] = [soil_data["error"]]
285
+ state["workflow_status"] = "extraction_failed"
286
+ st.error(f"❌ JSON parsing failed: {soil_data['error']}")
287
+ else:
288
+ # Validate that we have basic required data
289
+ layers = soil_data.get("soil_layers", [])
290
+ if not layers:
291
+ state["llm_extraction_success"] = False
292
+ state["extraction_errors"] = ["No soil layers found in LLM response"]
293
+ state["workflow_status"] = "extraction_failed"
294
+ st.error("❌ No soil layers found in LLM response")
295
+ else:
296
+ state["llm_extraction_success"] = True
297
+ state["project_info"] = soil_data.get("project_info", {})
298
+ state["raw_soil_layers"] = layers
299
+ state["water_table"] = soil_data.get("water_table", {})
300
+ state["notes"] = soil_data.get("notes", "")
301
+ state["workflow_status"] = "extracted"
302
+
303
+ st.success(f"βœ… LLM extraction completed: {len(layers)} layers found")
304
+
305
+ except Exception as e:
306
+ state["llm_extraction_success"] = False
307
+ state["extraction_errors"] = [str(e)]
308
+ state["workflow_status"] = "extraction_error"
309
+ st.error(f"❌ LLM extraction failed: {str(e)}")
310
+
311
+ state["workflow_messages"] = state.get("workflow_messages", []) + [
312
+ AIMessage(content=f"LLM extraction: {'success' if state['llm_extraction_success'] else 'failed'}")
313
+ ]
314
+
315
+ return state
316
+
317
+ def _validate_extraction(self, state: SoilAnalysisState) -> SoilAnalysisState:
318
+ """Validate LLM extraction results"""
319
+ st.info("πŸ” Step 3: Validating extraction results...")
320
+
321
+ if not state["llm_extraction_success"]:
322
+ return state
323
+
324
+ validation_errors = []
325
+
326
+ # Check for required data
327
+ if not state["raw_soil_layers"]:
328
+ validation_errors.append("No soil layers extracted")
329
+
330
+ # Validate layer structure
331
+ for i, layer in enumerate(state["raw_soil_layers"]):
332
+ if "depth_from" not in layer or "depth_to" not in layer:
333
+ validation_errors.append(f"Layer {i+1}: Missing depth information")
334
+ if "soil_type" not in layer:
335
+ validation_errors.append(f"Layer {i+1}: Missing soil type")
336
+
337
+ if validation_errors:
338
+ state["extraction_errors"] = validation_errors
339
+ state["workflow_status"] = "extraction_failed" # Use consistent status name
340
+ st.warning(f"⚠️ Validation issues found: {len(validation_errors)} errors")
341
+ else:
342
+ state["workflow_status"] = "extraction_validated"
343
+ st.success("βœ… Extraction validation passed")
344
+
345
+ return state
346
+
347
+ def _process_ss_st_classification(self, state: SoilAnalysisState) -> SoilAnalysisState:
348
+ """Process SS/ST sample classification"""
349
+ st.info("πŸ§ͺ Step 4: Processing SS/ST sample classification...")
350
+
351
+ try:
352
+ processed_layers = self.soil_processor.process_soil_layers(state["raw_soil_layers"])
353
+ state["processed_layers"] = processed_layers
354
+ state["workflow_status"] = "ss_st_processed"
355
+
356
+ st.success(f"βœ… SS/ST processing completed: {len(processed_layers)} layers processed")
357
+
358
+ except Exception as e:
359
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"SS/ST processing error: {str(e)}"]
360
+ state["workflow_status"] = "ss_st_error"
361
+ st.error(f"❌ SS/ST processing failed: {str(e)}")
362
+
363
+ return state
364
+
365
+ def _apply_unit_conversions(self, state: SoilAnalysisState) -> SoilAnalysisState:
366
+ """Apply unit conversions to all measurements"""
367
+ st.info("πŸ”§ Step 5: Applying unit conversions...")
368
+
369
+ try:
370
+ converted_layers = []
371
+ unit_warnings = []
372
+
373
+ for layer in state["processed_layers"]:
374
+ converted_layer = self.soil_processor._convert_to_si_units(layer)
375
+ converted_layers.append(converted_layer)
376
+
377
+ # Collect unit validation warnings
378
+ if converted_layer.get('unit_validation_warning'):
379
+ unit_warnings.append(f"Layer {layer.get('layer_id', '?')}: {converted_layer['unit_validation_warning']}")
380
+
381
+ state["processed_layers"] = converted_layers
382
+ state["workflow_status"] = "units_converted"
383
+
384
+ # Track different types of validation issues
385
+ unit_errors = []
386
+ recheck_needed = []
387
+ critical_errors = []
388
+
389
+ for layer in converted_layers:
390
+ validation_warning = layer.get('unit_validation_warning', '')
391
+ if validation_warning:
392
+ layer_id = layer.get('layer_id', '?')
393
+
394
+ # Check if this layer needs image recheck
395
+ if hasattr(self.soil_processor, '_validate_su_with_water_content'):
396
+ detailed_validation = self.soil_processor._validate_su_with_water_content(layer)
397
+
398
+ if detailed_validation.get('critical_unit_error'):
399
+ critical_errors.append(f"Layer {layer_id}: {detailed_validation.get('suggested_conversion', 'Unit error')}")
400
+
401
+ if detailed_validation.get('recheck_image'):
402
+ recheck_needed.append(f"Layer {layer_id}: {validation_warning}")
403
+ else:
404
+ unit_errors.append(f"Layer {layer_id}: {validation_warning}")
405
+
406
+ # Display different types of issues with appropriate severity
407
+ if critical_errors:
408
+ st.error("🚨 CRITICAL UNIT CONVERSION ERRORS DETECTED:")
409
+ for error in critical_errors:
410
+ st.error(f" β€’ {error}")
411
+ st.error("⚠️ These values appear to be in wrong units - conversion may be needed!")
412
+
413
+ if recheck_needed:
414
+ st.warning("πŸ“· IMAGE RECHECK RECOMMENDED:")
415
+ for recheck in recheck_needed:
416
+ st.warning(f" β€’ {recheck}")
417
+ st.info("πŸ’‘ Su-water content values seem inconsistent - consider reloading the image")
418
+
419
+ if unit_errors:
420
+ st.warning("⚠️ Su-water content validation issues:")
421
+ for error in unit_errors:
422
+ st.info(f" β€’ {error}")
423
+
424
+ # Store all warnings for later reference
425
+ all_warnings = critical_errors + recheck_needed + unit_errors
426
+ if all_warnings:
427
+ state["unit_validation_warnings"] = all_warnings
428
+ state["needs_image_recheck"] = len(recheck_needed) > 0
429
+ state["has_critical_unit_errors"] = len(critical_errors) > 0
430
+
431
+ # Add to final results for user action
432
+ state["validation_recommendations"] = {
433
+ "critical_unit_errors": critical_errors,
434
+ "recheck_image": recheck_needed,
435
+ "general_warnings": unit_errors
436
+ }
437
+ else:
438
+ st.success("βœ… Unit conversions applied - all Su-water content correlations look reasonable")
439
+
440
+ except Exception as e:
441
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"Unit conversion error: {str(e)}"]
442
+ state["workflow_status"] = "conversion_error"
443
+ st.error(f"❌ Unit conversion failed: {str(e)}")
444
+
445
+ return state
446
+
447
+ def _validate_soil_classification(self, state: SoilAnalysisState) -> SoilAnalysisState:
448
+ """Validate soil classification with sieve analysis requirements"""
449
+ st.info("🎯 Step 6: Validating soil classification...")
450
+
451
+ try:
452
+ validated_layers = []
453
+ classification_warnings = []
454
+
455
+ for layer in state["processed_layers"]:
456
+ # Apply enhanced soil classification validation
457
+ validated_layer = layer.copy()
458
+
459
+ # Re-classify with strict sieve analysis requirements
460
+ soil_type = self.soil_processor._classify_soil_type(validated_layer)
461
+ validated_layer["soil_type"] = soil_type
462
+
463
+ # Track classification changes
464
+ if layer.get("soil_type") != soil_type:
465
+ classification_warnings.append(
466
+ f"Layer {layer.get('layer_id', '?')}: Changed from '{layer.get('soil_type')}' to '{soil_type}'"
467
+ )
468
+
469
+ validated_layers.append(validated_layer)
470
+
471
+ state["processed_layers"] = validated_layers
472
+ state["workflow_status"] = "classification_validated"
473
+
474
+ if classification_warnings:
475
+ st.warning(f"⚠️ Classification changes: {len(classification_warnings)} layers updated")
476
+ for warning in classification_warnings:
477
+ st.info(f" β€’ {warning}")
478
+ else:
479
+ st.success("βœ… Soil classification validation passed")
480
+
481
+ except Exception as e:
482
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"Classification validation error: {str(e)}"]
483
+ state["workflow_status"] = "classification_error"
484
+ st.error(f"❌ Classification validation failed: {str(e)}")
485
+
486
+ return state
487
+
488
+ def _calculate_parameters(self, state: SoilAnalysisState) -> SoilAnalysisState:
489
+ """Calculate engineering parameters (Su, Ο†, etc.)"""
490
+ st.info("πŸ“Š Step 7: Calculating engineering parameters...")
491
+
492
+ try:
493
+ enhanced_layers = self.soil_calculator.enhance_soil_layers(state["processed_layers"])
494
+
495
+ # Enhanced post-processing for multiple Su values
496
+ enhanced_layers = self._process_multiple_su_values(enhanced_layers)
497
+
498
+ state["processed_layers"] = enhanced_layers
499
+ state["workflow_status"] = "parameters_calculated"
500
+
501
+ st.success("βœ… Engineering parameters calculated")
502
+
503
+ except Exception as e:
504
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"Parameter calculation error: {str(e)}"]
505
+ state["workflow_status"] = "calculation_error"
506
+ st.error(f"❌ Parameter calculation failed: {str(e)}")
507
+
508
+ return state
509
+
510
+ def _process_multiple_su_values(self, layers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
511
+ """Process layers that may have multiple Su values and decide on subdivision"""
512
+ enhanced_layers = []
513
+
514
+ for layer in layers:
515
+ # Check if layer description mentions multiple Su values
516
+ description = layer.get('description', '').lower()
517
+
518
+ # Look for patterns indicating multiple Su values
519
+ import re
520
+
521
+ # Pattern to find multiple Su values in description
522
+ su_pattern = r'su[=\s]*(\d+(?:\.\d+)?)\s*(?:kpa|kPa|t/mΒ²|ksc|psi)'
523
+ su_values = re.findall(su_pattern, description)
524
+
525
+ # Pattern to find Su ranges
526
+ range_pattern = r'su\s*(?:ranges?|from)\s*(\d+(?:\.\d+)?)\s*(?:-|to)\s*(\d+(?:\.\d+)?)\s*(?:kpa|kPa)'
527
+ range_match = re.search(range_pattern, description)
528
+
529
+ # Pattern to find averaged Su values
530
+ avg_pattern = r'su\s*(?:averaged|average|mean)\s*(?:from)?\s*(?:\d+\s*measurements?)?\s*[:\s]*(\d+(?:\.\d+)?)'
531
+ avg_match = re.search(avg_pattern, description)
532
+
533
+ if len(su_values) > 1:
534
+ # Multiple Su values found - decide on subdivision or averaging
535
+ su_nums = [float(val) for val in su_values]
536
+
537
+ # Check variation
538
+ min_su = min(su_nums)
539
+ max_su = max(su_nums)
540
+ avg_su = sum(su_nums) / len(su_nums)
541
+ variation = (max_su - min_su) / avg_su if avg_su > 0 else 0
542
+
543
+ if variation > 0.5 or max_su / min_su > 2.0:
544
+ # High variation - suggest layer subdivision
545
+ layer['subdivision_suggested'] = True
546
+ layer['su_variation_high'] = True
547
+ layer['su_values_found'] = su_nums
548
+ layer['su_variation_ratio'] = max_su / min_su if min_su > 0 else 0
549
+ layer['subdivision_reason'] = f"High Su variation: {min_su:.1f}-{max_su:.1f} kPa (ratio: {max_su/min_su:.1f}x)"
550
+
551
+ # Update description to highlight the issue
552
+ layer['description'] += f" [SUBDIVISION RECOMMENDED: Su varies {min_su:.1f}-{max_su:.1f} kPa]"
553
+
554
+ st.warning(f"πŸ”„ Layer {layer.get('layer_id', '?')}: High Su variation detected - subdivision recommended")
555
+
556
+ else:
557
+ # Low variation - use average
558
+ layer['su_averaged'] = True
559
+ layer['su_values_found'] = su_nums
560
+ layer['su_average_used'] = avg_su
561
+ layer['strength_value'] = avg_su
562
+ layer['description'] += f" [Su averaged from {len(su_nums)} values: {', '.join([f'{v:.1f}' for v in su_nums])} kPa β†’ {avg_su:.1f} kPa]"
563
+
564
+ st.info(f"πŸ“Š Layer {layer.get('layer_id', '?')}: Averaged {len(su_nums)} Su values: {avg_su:.1f} kPa")
565
+
566
+ elif range_match:
567
+ # Su range found
568
+ min_su = float(range_match.group(1))
569
+ max_su = float(range_match.group(2))
570
+ avg_su = (min_su + max_su) / 2
571
+
572
+ layer['su_range_found'] = True
573
+ layer['su_range'] = [min_su, max_su]
574
+ layer['su_range_average'] = avg_su
575
+ layer['strength_value'] = avg_su
576
+ layer['description'] += f" [Su range {min_su:.1f}-{max_su:.1f} kPa, using average {avg_su:.1f} kPa]"
577
+
578
+ st.info(f"πŸ“Š Layer {layer.get('layer_id', '?')}: Su range processed, using average {avg_su:.1f} kPa")
579
+
580
+ elif avg_match:
581
+ # Averaged Su value already mentioned
582
+ avg_su = float(avg_match.group(1))
583
+ layer['su_pre_averaged'] = True
584
+ layer['su_average_value'] = avg_su
585
+ layer['strength_value'] = avg_su
586
+
587
+ # Add metadata for tracking
588
+ layer['su_processing_applied'] = True
589
+
590
+ enhanced_layers.append(layer)
591
+
592
+ return enhanced_layers
593
+
594
+ def _optimize_layers(self, state: SoilAnalysisState) -> SoilAnalysisState:
595
+ """Optimize layer division and grouping"""
596
+ st.info("βš™οΈ Step 8: Optimizing layer division...")
597
+
598
+ try:
599
+ from soil_analyzer import SoilLayerAnalyzer
600
+ analyzer = SoilLayerAnalyzer()
601
+
602
+ # Validate layer continuity
603
+ validated_layers = analyzer.validate_layer_continuity(state["processed_layers"])
604
+
605
+ # Calculate statistics
606
+ stats = analyzer.calculate_layer_statistics(validated_layers)
607
+ state["validation_stats"] = stats
608
+
609
+ # Optimize layer division
610
+ optimization = analyzer.optimize_layer_division(
611
+ validated_layers,
612
+ merge_similar=state.get("merge_similar", True),
613
+ split_thick=state.get("split_thick", True)
614
+ )
615
+ state["optimization_results"] = optimization
616
+
617
+ # Use optimized layers
618
+ state["processed_layers"] = optimization.get("optimized_layers", validated_layers)
619
+ state["workflow_status"] = "optimized"
620
+
621
+ st.success("βœ… Layer optimization completed")
622
+
623
+ except Exception as e:
624
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"Optimization error: {str(e)}"]
625
+ state["workflow_status"] = "optimization_error"
626
+ st.error(f"❌ Layer optimization failed: {str(e)}")
627
+
628
+ return state
629
+
630
+ def _finalize_results(self, state: SoilAnalysisState) -> SoilAnalysisState:
631
+ """Finalize and package results"""
632
+ st.info("πŸ“¦ Step 9: Finalizing results...")
633
+
634
+ try:
635
+ # Generate processing summary
636
+ processing_summary = self.soil_processor.get_processing_summary(state["processed_layers"])
637
+ state["processing_summary"] = processing_summary
638
+
639
+ # Package final results
640
+ final_soil_data = {
641
+ "project_info": state["project_info"],
642
+ "soil_layers": state["processed_layers"],
643
+ "water_table": state["water_table"],
644
+ "notes": state["notes"],
645
+ "processing_summary": processing_summary,
646
+ "validation_stats": state.get("validation_stats", {}),
647
+ "optimization_results": state.get("optimization_results", {}),
648
+ "workflow_metadata": {
649
+ "model_used": state["model"],
650
+ "processing_steps": 9,
651
+ "total_layers": len(state["processed_layers"]),
652
+ "ss_samples": processing_summary.get("ss_samples", 0),
653
+ "st_samples": processing_summary.get("st_samples", 0)
654
+ }
655
+ }
656
+
657
+ state["final_soil_data"] = final_soil_data
658
+ state["workflow_status"] = "completed"
659
+
660
+ st.success("πŸŽ‰ Unified soil analysis workflow completed successfully!")
661
+
662
+ except Exception as e:
663
+ state["extraction_errors"] = state.get("extraction_errors", []) + [f"Finalization error: {str(e)}"]
664
+ state["workflow_status"] = "finalization_error"
665
+ st.error(f"❌ Result finalization failed: {str(e)}")
666
+
667
+ return state
668
+
669
+ def _handle_errors(self, state: SoilAnalysisState) -> SoilAnalysisState:
670
+ """Handle workflow errors"""
671
+ st.error("❌ Workflow encountered errors")
672
+
673
+ errors = state.get("extraction_errors", [])
674
+ for error in errors:
675
+ st.error(f" β€’ {error}")
676
+
677
+ state["workflow_status"] = "failed"
678
+ state["final_soil_data"] = {
679
+ "error": "Workflow failed",
680
+ "errors": errors,
681
+ "raw_response": state.get("raw_llm_response", "")
682
+ }
683
+
684
+ return state
685
+
686
+ # Conditional routing functions
687
+ def _should_continue_after_validation(self, state: SoilAnalysisState) -> str:
688
+ """Determine next step after input validation"""
689
+ if state["workflow_status"] == "validated":
690
+ return "continue"
691
+ else:
692
+ return "error"
693
+
694
+ def _should_continue_after_extraction(self, state: SoilAnalysisState) -> str:
695
+ """Determine next step after LLM extraction - simplified without retry loops"""
696
+ workflow_status = state.get("workflow_status", "unknown")
697
+
698
+ if workflow_status == "extraction_validated":
699
+ st.info("βœ… Proceeding to SS/ST classification...")
700
+ return "continue"
701
+ else:
702
+ st.error(f"❌ Extraction validation failed with status: {workflow_status}")
703
+ return "error"
704
+
705
+ def _get_gemini_safe_prompt(self) -> str:
706
+ """Get a simplified, safer prompt for Gemini models to avoid content filtering"""
707
+ return """You are a geotechnical engineer analyzing soil data.
708
+
709
+ Extract information from soil boring logs and return ONLY valid JSON.
710
+
711
+ Required JSON format:
712
+ {
713
+ "project_info": {
714
+ "project_name": "string",
715
+ "boring_id": "string",
716
+ "location": "string",
717
+ "date": "string",
718
+ "depth_total": 10.0
719
+ },
720
+ "soil_layers": [
721
+ {
722
+ "layer_id": 1,
723
+ "depth_from": 0.0,
724
+ "depth_to": 2.0,
725
+ "soil_type": "clay",
726
+ "description": "description text",
727
+ "sample_type": "SS",
728
+ "strength_parameter": "SPT-N",
729
+ "strength_value": 15,
730
+ "water_content": 25,
731
+ "color": "brown",
732
+ "consistency": "soft"
733
+ }
734
+ ],
735
+ "water_table": {"depth": 3.0, "date_encountered": "2024-01-01"},
736
+ "notes": "Additional notes"
737
+ }
738
+
739
+ Key rules:
740
+ 1. Look for SS-* or ST-* sample identifiers in first column
741
+ 2. SS samples use SPT-N values, ST samples use Su values
742
+ 3. **CRITICAL - READ COLUMN HEADERS FOR UNITS**:
743
+ Look at table headers to identify Su units:
744
+ - If header shows "Su t/mΒ²" or "Su (t/mΒ²)" β†’ Units are t/mΒ²
745
+ - If header shows "Su kPa" or "Su (kPa)" β†’ Units are kPa
746
+ - If header shows "Su ksc" or "Su (ksc)" β†’ Units are ksc
747
+ 4. **CAREFULLY convert Su units to kPa BASED ON HEADER**:
748
+ - t/mΒ² β†’ kPa: multiply by 9.81 (CRITICAL - MOST COMMON ERROR)
749
+ - ksc/kg/cmΒ² β†’ kPa: multiply by 98.0
750
+ - psi β†’ kPa: multiply by 6.895
751
+ - MPa β†’ kPa: multiply by 1000
752
+ - kPa β†’ kPa: no conversion (use directly)
753
+ 5. Extract water content when available
754
+ 6. Check Su-water content correlation (soft clay: Su<50kPa, w%>30%)
755
+ 7. Group similar layers (maximum 7 layers total)
756
+ 8. Return ONLY the JSON object, no explanatory text
757
+ 9. Start response with { and end with }"""
758
+
759
+ def _get_unified_system_prompt(self) -> str:
760
+ """Get the comprehensive system prompt for unified processing"""
761
+ return """You are an expert geotechnical engineer specializing in soil boring log interpretation.
762
+
763
+ IMPORTANT: You must respond with ONLY valid JSON data. Do not include any text before or after the JSON.
764
+
765
+ SAMPLE TYPE IDENTIFICATION (CRITICAL - FOLLOW EXACT ORDER):
766
+
767
+ **STEP 1 - FIRST COLUMN STRATIFICATION SYMBOLS (ABSOLUTE HIGHEST PRIORITY):**
768
+ ALWAYS look at the FIRST COLUMN of each layer for stratification symbols:
769
+
770
+ - **SS-1, SS-2, SS-18, SS18, SS-5** β†’ SS (Split Spoon) sample
771
+ - **ST-1, ST-2, ST-5, ST5, ST-12** β†’ ST (Shelby Tube) sample
772
+ - **SS1, SS2, SS3** (without dash) β†’ SS sample
773
+ - **ST1, ST2, ST3** (without dash) β†’ ST sample
774
+ - **Look for pattern: [SS|ST][-]?[0-9]+** in first column
775
+
776
+ **EXAMPLES of First Column Recognition:**
777
+ ```
778
+ SS-18 | Brown clay, N=8 β†’ sample_type="SS" (SS-18 in first column)
779
+ ST-5 | Gray clay, Su=45 kPa β†’ sample_type="ST" (ST-5 in first column)
780
+ SS12 | Sandy clay, SPT test β†’ sample_type="SS" (SS12 in first column)
781
+ ST3 | Soft clay, unconfined β†’ sample_type="ST" (ST3 in first column)
782
+ ```
783
+
784
+ **STEP 2 - If NO first column symbols, then check description keywords:**
785
+ - SS indicators: "split spoon", "SPT", "standard penetration", "disturbed"
786
+ - ST indicators: "shelby", "tube", "undisturbed", "UT", "unconfined compression"
787
+
788
+ **STEP 3 - If still unclear, use strength parameter type:**
789
+ - SPT-N values present β†’ likely SS sample
790
+ - Su values from unconfined test β†’ likely ST sample
791
+
792
+ CRITICAL SOIL CLASSIFICATION RULES (MANDATORY):
793
+
794
+ **SAND LAYER CLASSIFICATION REQUIREMENTS:**
795
+ 1. **Sand layers MUST have sieve analysis evidence** - Look for:
796
+ - "Sieve #200: X% passing" or "#200 passing: X%"
797
+ - "Fines content: X%" (same as sieve #200)
798
+ - "Particle size analysis" or "gradation test"
799
+ - "% passing 0.075mm" (equivalent to #200 sieve)
800
+
801
+ 2. **Classification Rules**:
802
+ - Sieve #200 >50% passing β†’ CLAY (fine-grained)
803
+ - Sieve #200 <50% passing β†’ SAND/GRAVEL (coarse-grained)
804
+
805
+ 3. **NO SIEVE ANALYSIS = ASSUME CLAY (MANDATORY)**:
806
+ - If no sieve analysis data found β†’ ALWAYS classify as CLAY
807
+ - Include note: "Assumed clay - no sieve analysis data available"
808
+ - Set sieve_200_passing: null (not a number)
809
+
810
+ **CRITICAL**: Never classify as sand/silt without explicit sieve analysis evidence
811
+ **CRITICAL**: Always look for sieve #200 data before classifying as sand
812
+
813
+ CRITICAL SS/ST SAMPLE RULES (MUST FOLLOW):
814
+
815
+ FOR SS (Split Spoon) SAMPLES:
816
+ 1. ALWAYS use RAW N-VALUE (not N-corrected, N-correction, or adjusted N)
817
+ 2. Look for: "N = 15", "SPT-N = 8", "raw N = 20", "field N = 12"
818
+ 3. IGNORE: "N-corrected = 25", "N-correction = 18", "adjusted N = 30"
819
+ 4. For clay: Use SPT-N parameter (will be converted to Su using Su=5*N)
820
+ 5. For sand/silt: Use SPT-N parameter (will be converted to friction angle)
821
+ 6. NEVER use unconfined compression Su values for SS samples - ONLY use N values
822
+
823
+ FOR ST (Shelby Tube) SAMPLES:
824
+ 1. ALWAYS USE DIRECT Su values from unconfined compression test
825
+ 2. If ST sample has Su value (e.g., "Su = 25 kPa"), use that EXACT value
826
+ 3. NEVER convert SPT-N to Su for ST samples when direct Su is available
827
+ 4. Priority: Direct Su measurement > any other value
828
+
829
+ CRITICAL SU VALUE EXTRACTION - MULTIPLE VALUES PER LAYER:
830
+
831
+ **EXTRACT ALL SU VALUES IN COLUMN (CRITICAL ENHANCEMENT):**
832
+
833
+ **STEP 1 - SCAN ENTIRE SU COLUMN FOR EACH LAYER:**
834
+ 1. Look for ALL Su values that fall within each layer's depth range
835
+ 2. Extract EVERY Su value found in the Su column for that depth interval
836
+ 3. Record ALL values with their exact depths if specified
837
+ 4. Note: A single layer may have multiple Su measurements at different depths
838
+
839
+ **STEP 2 - HANDLE MULTIPLE SU VALUES PER LAYER:**
840
+ For layers with multiple Su values, you have several options:
841
+
842
+ Option A - **LAYER SUBDIVISION (PREFERRED for significant variation):**
843
+ - If Su values vary by >50% or have >2x ratio β†’ Split into sublayers
844
+ - Example: Layer 2.0-6.0m has Su values [25, 45, 80] kPa
845
+ - Split into: Layer 2.0-3.5m (Su=25kPa), Layer 3.5-5.0m (Su=45kPa), Layer 5.0-6.0m (Su=80kPa)
846
+
847
+ Option B - **AVERAGE SU VALUES (for similar values):**
848
+ - If Su values are within Β±30% of mean β†’ Use average
849
+ - Example: Layer 1.0-3.0m has Su values [35, 40, 38] kPa β†’ Use Su=37.7kPa
850
+ - Include note: "Su averaged from 3 measurements: 35, 40, 38 kPa"
851
+
852
+ Option C - **REPRESENTATIVE VALUE (for clusters):**
853
+ - If multiple similar values with one outlier β†’ Use cluster average
854
+ - Example: Su values [25, 28, 26, 45] β†’ Use 26.3kPa (ignore outlier 45)
855
+
856
+ **STEP 3 - DOCUMENT ALL VALUES FOUND:**
857
+ Always include in description:
858
+ - "Su values found: 25, 35, 42 kPa (averaged to 34 kPa)"
859
+ - "Multiple Su measurements: 30, 28, 32 kPa at depths 2.1, 2.5, 2.8m"
860
+ - "Su ranges from 40-60 kPa, used average 50 kPa"
861
+
862
+ CRITICAL UNIT CONVERSION REQUIREMENTS (MUST APPLY):
863
+
864
+ **MANDATORY SU UNIT CONVERSION - READ COLUMN HEADERS FIRST:**
865
+
866
+ **STEP 1 - IDENTIFY UNITS FROM TABLE HEADERS (CRITICAL):**
867
+ ALWAYS look at the column headers to identify Su units:
868
+ - "Su t/mΒ²" or "Su (t/mΒ²)" in header β†’ Values are in t/mΒ²
869
+ - "Su kPa" or "Su (kPa)" in header β†’ Values are in kPa
870
+ - "Su ksc" or "Su (ksc)" in header β†’ Values are in ksc
871
+ - "Su psi" or "Su (psi)" in header β†’ Values are in psi
872
+ - Just "Su" with units below β†’ Look at unit row (e.g., "t/mΒ²")
873
+
874
+ **STEP 2 - CONVERT TO kPa BASED ON IDENTIFIED UNITS:**
875
+ When extracting Su values from images or text, you MUST convert to kPa BEFORE using the value:
876
+
877
+ 1. **ksc or kg/cmΒ²**: Su_kPa = Su_ksc Γ— 98.0
878
+ Example: "Su = 2.5 ksc" β†’ strength_value: 245 (not 2.5)
879
+
880
+ 2. **t/mΒ² (tonnes/mΒ²)**: Su_kPa = Su_tonnes Γ— 9.81
881
+ Example: "Su = 3.0 t/mΒ²" β†’ strength_value: 29.43 (not 3.0)
882
+ **CRITICAL**: This is the MOST COMMON unit in boring logs!
883
+
884
+ 3. **psi**: Su_kPa = Su_psi Γ— 6.895
885
+ Example: "Su = 50 psi" β†’ strength_value: 344.75 (not 50)
886
+
887
+ 4. **psf**: Su_kPa = Su_psf Γ— 0.048
888
+ Example: "Su = 1000 psf" β†’ strength_value: 48 (not 1000)
889
+
890
+ 5. **kPa**: Use directly (no conversion needed)
891
+ Example: "Su = 75 kPa" β†’ strength_value: 75
892
+
893
+ 6. **MPa**: Su_kPa = Su_MPa Γ— 1000
894
+ Example: "Su = 0.1 MPa" β†’ strength_value: 100 (not 0.1)
895
+
896
+ **CRITICAL EXAMPLES FROM BORING LOGS:**
897
+ - Table header shows "Su t/mΒ²", value 1.41 β†’ strength_value: 13.83 (1.41 Γ— 9.81)
898
+ - Table header shows "Su t/mΒ²", value 2.41 β†’ strength_value: 23.64 (2.41 Γ— 9.81)
899
+ - Table header shows "Su kPa", value 75 β†’ strength_value: 75 (no conversion)
900
+
901
+ **IMPORTANT**: Always include original unit in description for verification
902
+ **SPT-N values**: Keep as-is (no unit conversion needed)
903
+
904
+ CRITICAL SU-WATER CONTENT VALIDATION (MANDATORY):
905
+
906
+ **EXTRACT WATER CONTENT WHEN AVAILABLE:**
907
+ Always extract water content (w%) when mentioned in the description:
908
+ - \"water content = 25%\" β†’ water_content: 25
909
+ - \"w = 30%\" β†’ water_content: 30
910
+ - \"moisture content 35%\" β†’ water_content: 35
911
+
912
+ **VALIDATE SU-WATER CONTENT CORRELATION:**
913
+ For clay layers, Su and water content should correlate reasonably:
914
+ - Very soft clay: Su < 25 kPa, w% > 40%
915
+ - Soft clay: Su 25-50 kPa, w% 30-40%
916
+ - Medium clay: Su 50-100 kPa, w% 20-30%
917
+ - Stiff clay: Su 100-200 kPa, w% 15-25%
918
+ - Very stiff clay: Su 200-400 kPa, w% 10-20%
919
+ - Hard clay: Su > 400 kPa, w% < 15%
920
+
921
+ **CRITICAL UNIT CHECK SCENARIOS:**
922
+ - If Su > 1000 kPa with w% > 20%: CHECK if Su is in wrong units (psi, psf?)
923
+ - If Su < 5 kPa with w% < 15%: CHECK if Su is in wrong units (MPa, bar?)
924
+ - If correlation seems very off: VERIFY unit conversion was applied correctly
925
+
926
+ CRITICAL OUTPUT FORMAT (MANDATORY):
927
+
928
+ You MUST respond with ONLY a valid JSON object. Do not include:
929
+ - Explanatory text before or after the JSON
930
+ - Markdown formatting (```json ```)
931
+ - Comments or notes
932
+ - Multiple JSON objects
933
+
934
+ Start your response directly with { and end with }
935
+
936
+ EXAMPLE CORRECT RESPONSE FORMAT:
937
+ {
938
+ "project_info": {
939
+ "project_name": "Sample Project",
940
+ "boring_id": "BH-01",
941
+ "location": "Sample Location",
942
+ "date": "2024-06-25",
943
+ "depth_total": 10.0
944
+ },
945
+ "soil_layers": [
946
+ {
947
+ "layer_id": 1,
948
+ "depth_from": 0.0,
949
+ "depth_to": 2.0,
950
+ "soil_type": "clay",
951
+ "description": "Brown clay, soft, SS-1 sample",
952
+ "sample_type": "SS",
953
+ "strength_parameter": "SPT-N",
954
+ "strength_value": 4,
955
+ "water_content": 35,
956
+ "color": "brown",
957
+ "consistency": "soft"
958
+ }
959
+ ],
960
+ "water_table": {"depth": 3.0, "date_encountered": "2024-06-25"},
961
+ "notes": "Standard soil boring analysis"
962
+ }
963
+
964
+ LAYER GROUPING REQUIREMENTS:
965
+ 1. MAXIMUM 7 LAYERS TOTAL - Group similar adjacent layers to achieve this limit
966
+ 2. CLAY AND SAND MUST BE SEPARATE - Never combine clay layers with sand layers
967
+ 3. Group adjacent layers with similar properties (same soil type and similar consistency)
968
+ 4. Prioritize engineering significance over minor variations
969
+
970
+ Analyze the provided soil boring log and extract the following information in this exact JSON format:
971
+
972
+ {
973
+ "project_info": {
974
+ "project_name": "string",
975
+ "boring_id": "string",
976
+ "location": "string",
977
+ "date": "string",
978
+ "depth_total": 10.0
979
+ },
980
+ "soil_layers": [
981
+ {
982
+ "layer_id": 1,
983
+ "depth_from": 0.0,
984
+ "depth_to": 2.5,
985
+ "soil_type": "clay",
986
+ "description": "Brown silty clay, ST sample, Su = 25 kPa",
987
+ "sample_type": "ST",
988
+ "strength_parameter": "Su",
989
+ "strength_value": 25,
990
+ "sieve_200_passing": 65,
991
+ "water_content": 35.5,
992
+ "color": "brown",
993
+ "moisture": "moist",
994
+ "consistency": "soft",
995
+ "su_source": "Unconfined Compression Test"
996
+ }
997
+ ],
998
+ "water_table": {
999
+ "depth": 3.0,
1000
+ "date_encountered": "2024-01-01"
1001
+ },
1002
+ "notes": "Additional observations"
1003
+ }
1004
+
1005
+ **CRITICAL EXAMPLES - MULTIPLE SU VALUES PER LAYER:**
1006
+
1007
+ **EXAMPLE 1 - Multiple Su Values (SUBDIVISION CASE):**
1008
+ Layer depth 2.0-6.0m with Su column showing:
1009
+ - "Su at 2.5m = 25 kPa"
1010
+ - "Su at 4.0m = 45 kPa"
1011
+ - "Su at 5.5m = 80 kPa"
1012
+
1013
+ PROCESSING: High variation (25-80 kPa, ratio 3.2x) β†’ SUBDIVISION RECOMMENDED
1014
+ β†’ Include ALL values in description: "Multiple Su values: 25, 45, 80 kPa [SUBDIVISION RECOMMENDED: High variation]"
1015
+ β†’ Use representative value (middle): strength_value=45
1016
+ β†’ Add metadata: subdivision_suggested=true, su_variation_high=true
1017
+
1018
+ **EXAMPLE 2 - Multiple Similar Su Values (AVERAGING CASE):**
1019
+ Layer depth 1.0-3.0m with Su column showing:
1020
+ - "Su = 35 kPa"
1021
+ - "Su = 40 kPa"
1022
+ - "Su = 38 kPa"
1023
+
1024
+ PROCESSING: Low variation (Β±7% from mean) β†’ USE AVERAGE
1025
+ β†’ Description: "Su averaged from 3 measurements: 35, 40, 38 kPa β†’ 37.7 kPa"
1026
+ β†’ Use: strength_value=37.7
1027
+
1028
+ **EXAMPLE 3 - Su Range Detection:**
1029
+ Layer with Su column: "Su ranges 40-60 kPa"
1030
+ β†’ Description: "Su range 40-60 kPa, using average 50 kPa"
1031
+ β†’ Use: strength_value=50
1032
+
1033
+ EXAMPLES OF CORRECT FIRST COLUMN SYMBOL RECOGNITION:
1034
+
1035
+ **SS SAMPLE EXAMPLES (First Column Priority):**
1036
+ 1. "SS-18 | Clay layer, N = 8, Su = 45 kPa from unconfined test"
1037
+ β†’ First column: SS-18 β†’ sample_type="SS" (HIGHEST PRIORITY)
1038
+ β†’ Use: strength_parameter="SPT-N", strength_value=8
1039
+ β†’ IGNORE the Su=45 kPa value for SS samples
1040
+
1041
+ 2. "SS18 | Soft clay, field N = 6, N-corrected = 10"
1042
+ β†’ First column: SS18 β†’ sample_type="SS" (HIGHEST PRIORITY)
1043
+ β†’ Use: strength_parameter="SPT-N", strength_value=6 (raw N)
1044
+ β†’ IGNORE N-corrected value
1045
+
1046
+ 3. "SS-5 | Brown clay, split spoon test, N=12"
1047
+ β†’ First column: SS-5 β†’ sample_type="SS" (HIGHEST PRIORITY)
1048
+ β†’ Use: strength_parameter="SPT-N", strength_value=12
1049
+
1050
+ **ST SAMPLE EXAMPLES (First Column Priority):**
1051
+ 1. "ST-5 | Stiff clay, Su = 85 kPa from unconfined compression"
1052
+ β†’ First column: ST-5 β†’ sample_type="ST" (HIGHEST PRIORITY)
1053
+ β†’ Use: strength_parameter="Su", strength_value=85
1054
+
1055
+ 2. "ST-12 | Medium clay, Su = 2.5 ksc from unconfined test"
1056
+ β†’ First column: ST-12 β†’ sample_type="ST" (HIGHEST PRIORITY)
1057
+ β†’ Convert: 2.5 Γ— 98 = 245 kPa
1058
+ β†’ Use: strength_parameter="Su", strength_value=245
1059
+
1060
+ 3. "ST3 | Clay, unconfined strength = 3.0 t/mΒ²"
1061
+ β†’ First column: ST3 β†’ sample_type="ST" (HIGHEST PRIORITY)
1062
+ β†’ Convert: 3.0 Γ— 9.81 = 29.43 kPa
1063
+ β†’ Use: strength_parameter="Su", strength_value=29.43
1064
+
1065
+ 4. "ST-8 | Gray clay, shelby tube, Su = 120 kPa"
1066
+ β†’ First column: ST-8 β†’ sample_type="ST" (HIGHEST PRIORITY)
1067
+ β†’ Use: strength_parameter="Su", strength_value=120
1068
+
1069
+ 5. "ST-10 | Gray clay, depth 3.0-6.0m, Su values: 35, 42, 39 kPa"
1070
+ β†’ First column: ST-10 β†’ sample_type="ST" (HIGHEST PRIORITY)
1071
+ β†’ Multiple values detected: variation <30% β†’ Use average
1072
+ β†’ Use: strength_parameter="Su", strength_value=38.7
1073
+ β†’ Description: "Gray clay, shelby tube, Su averaged from 3 measurements: 35, 42, 39 kPa β†’ 38.7 kPa"
1074
+
1075
+ 6. "ST-15 | Stiff clay, Su measurements: 45, 85, 120 kPa at different depths"
1076
+ β†’ First column: ST-15 β†’ sample_type="ST" (HIGHEST PRIORITY)
1077
+ β†’ High variation detected: ratio 2.7x β†’ SUBDIVISION RECOMMENDED
1078
+ β†’ Use: strength_parameter="Su", strength_value=85 (middle value)
1079
+ β†’ Description: "Stiff clay, multiple Su values: 45, 85, 120 kPa [SUBDIVISION RECOMMENDED: High variation]"
1080
+
1081
+ **SOIL CLASSIFICATION EXAMPLES:**
1082
+ 1. "Brown silty clay, no sieve analysis data"
1083
+ β†’ soil_type="clay", sieve_200_passing=null
1084
+ β†’ Note: "Assumed clay - no sieve analysis data available"
1085
+
1086
+ 2. "Sandy clay, sieve #200: 75% passing"
1087
+ β†’ soil_type="clay", sieve_200_passing=75
1088
+ β†’ Classification: Clay (>50% passing)
1089
+
1090
+ 3. "Medium sand, gradation test shows 25% passing #200"
1091
+ β†’ soil_type="sand", sieve_200_passing=25
1092
+ β†’ Classification: Sand (<50% passing)
1093
+
1094
+ 4. "Dense sand layer" (NO sieve data mentioned)
1095
+ β†’ soil_type="clay", sieve_200_passing=null
1096
+ β†’ Note: "Assumed clay - no sieve analysis data available"
1097
+ β†’ NEVER classify as sand without sieve data
1098
+
1099
+ TECHNICAL RULES:
1100
+ 1. All numeric values must be numbers, not strings
1101
+ 2. For soil_type, use basic terms: "clay", "sand", "silt", "gravel" - do NOT include consistency
1102
+ 3. Include sample_type field: "SS" (Split Spoon) or "ST" (Shelby Tube)
1103
+ 4. Include sieve_200_passing field when available (percentage passing sieve #200)
1104
+ 5. Include water_content field when available (percentage water content for clay consistency checks)
1105
+ 6. Include su_source field: "Unconfined Compression Test" for direct measurements, or "Calculated from SPT-N" for conversions
1106
+ 7. Strength parameters:
1107
+ - SS samples: ALWAYS use "SPT-N" with RAW N-value (will be converted based on soil type)
1108
+ - ST samples with clay: Use "Su" with DIRECT value in kPa from unconfined compression test
1109
+ - For sand/gravel: Always use "SPT-N" with N-value
1110
+ - NEVER use Su for SS samples, NEVER calculate Su from SPT-N for ST samples that have direct Su
1111
+ 8. Put consistency separately in "consistency" field: "soft", "medium", "stiff", "loose", "dense", etc.
1112
+ 9. Ensure continuous depths (no gaps or overlaps)
1113
+ 10. All depths in meters, strength values as numbers
1114
+ 11. Return ONLY the JSON object, no additional text"""
1115
+
1116
+ def _parse_llm_response(self, response: str) -> Dict[str, Any]:
1117
+ """Parse LLM JSON response with enhanced error handling"""
1118
+
1119
+ # First check if response is empty or None
1120
+ if not response or not response.strip():
1121
+ return {"error": "Empty response from LLM", "raw_response": response or ""}
1122
+
1123
+ try:
1124
+ # Clean response
1125
+ json_str = response.strip()
1126
+
1127
+ # Log raw response for debugging (first 500 chars)
1128
+ st.info(f"πŸ“ Raw LLM response preview: {json_str[:500]}{'...' if len(json_str) > 500 else ''}")
1129
+
1130
+ # Remove markdown code blocks if present
1131
+ if "```json" in json_str:
1132
+ json_start = json_str.find("```json") + 7
1133
+ json_end = json_str.find("```", json_start)
1134
+ if json_end == -1:
1135
+ json_end = len(json_str)
1136
+ json_str = json_str[json_start:json_end].strip()
1137
+ st.info("πŸ”§ Extracted JSON from markdown code block")
1138
+ elif "```" in json_str:
1139
+ json_start = json_str.find("```") + 3
1140
+ json_end = json_str.rfind("```")
1141
+ if json_end > json_start:
1142
+ json_str = json_str[json_start:json_end].strip()
1143
+ st.info("πŸ”§ Extracted content from code block")
1144
+
1145
+ # Handle cases where LLM includes explanatory text before/after JSON
1146
+ # Look for JSON object boundaries more aggressively
1147
+ brace_start = json_str.find("{")
1148
+ brace_end = json_str.rfind("}")
1149
+
1150
+ if brace_start != -1 and brace_end != -1 and brace_end > brace_start:
1151
+ json_str = json_str[brace_start:brace_end + 1]
1152
+ st.info(f"πŸ”§ Extracted JSON object: {len(json_str)} characters")
1153
+ elif not json_str.startswith("{"):
1154
+ # No JSON found
1155
+ return {
1156
+ "error": f"No JSON object found in response. Response appears to be: {json_str[:200]}",
1157
+ "raw_response": response
1158
+ }
1159
+
1160
+ # Try to parse JSON
1161
+ result = json.loads(json_str)
1162
+
1163
+ # Validate structure
1164
+ if not isinstance(result, dict):
1165
+ return {"error": f"Expected JSON object, got {type(result)}", "raw_response": response}
1166
+
1167
+ if "soil_layers" not in result:
1168
+ result["soil_layers"] = []
1169
+ st.warning("⚠️ No 'soil_layers' found in response, using empty list")
1170
+
1171
+ if "project_info" not in result:
1172
+ result["project_info"] = {}
1173
+ st.warning("⚠️ No 'project_info' found in response, using empty dict")
1174
+
1175
+ st.success(f"βœ… JSON parsed successfully: {len(result.get('soil_layers', []))} layers found")
1176
+ return result
1177
+
1178
+ except json.JSONDecodeError as e:
1179
+ error_msg = f"JSON parsing failed: {str(e)}"
1180
+ st.error(f"❌ {error_msg}")
1181
+ st.error(f"πŸ“ Problematic content: {json_str[:300] if 'json_str' in locals() else 'N/A'}")
1182
+ return {"error": error_msg, "raw_response": response}
1183
+ except Exception as e:
1184
+ error_msg = f"Response parsing failed: {str(e)}"
1185
+ st.error(f"❌ {error_msg}")
1186
+ return {"error": error_msg, "raw_response": response}
1187
+
1188
+ def get_workflow_visualization(self) -> str:
1189
+ """Get a visual representation of the workflow steps"""
1190
+ return """
1191
+ πŸš€ **Unified Soil Analysis Workflow** πŸš€
1192
+
1193
+ **Step 1** πŸ” **Validate Inputs** β†’ Check API key, content, model
1194
+ **Step 2** πŸ€– **Extract with LLM** β†’ Use enhanced prompts for SS/ST classification
1195
+ **Step 3** βœ… **Validate Extraction** β†’ Check layer structure and data quality
1196
+ **Step 4** πŸ§ͺ **Process SS/ST Classification** β†’ Apply sample-specific processing
1197
+ **Step 5** πŸ”§ **Apply Unit Conversions** β†’ Convert all values to SI units (kPa)
1198
+ **Step 6** 🎯 **Validate Soil Classification** β†’ Enforce sieve analysis requirements
1199
+ **Step 7** πŸ“Š **Calculate Parameters** β†’ Compute Su, Ο†, and other properties
1200
+ **Step 8** βš™οΈ **Optimize Layers** β†’ Group and validate layer continuity
1201
+ **Step 9** πŸ“¦ **Finalize Results** β†’ Package complete analysis results
1202
+
1203
+ **Key Features:**
1204
+ β€’ **Unified Processing**: Single workflow handles all steps
1205
+ β€’ **SS/ST Classification**: Automatic sample type identification
1206
+ β€’ **Unit Conversion**: All Su values converted to kPa from images/text
1207
+ β€’ **Sieve Analysis Enforcement**: Sand layers require #200 sieve data
1208
+ β€’ **Error Handling**: Comprehensive validation and recovery
1209
+ β€’ **State Management**: Complete workflow state tracking
1210
+ """
1211
+
1212
+ def analyze_soil_boring_log(self,
1213
+ text_content: Optional[str] = None,
1214
+ image_base64: Optional[str] = None,
1215
+ model: str = None,
1216
+ api_key: str = None,
1217
+ merge_similar: bool = True,
1218
+ split_thick: bool = True) -> Dict[str, Any]:
1219
+ """
1220
+ Run the unified soil analysis workflow
1221
+
1222
+ Args:
1223
+ text_content: Extracted text from document
1224
+ image_base64: Base64 encoded image
1225
+ model: LLM model to use
1226
+ api_key: OpenRouter API key
1227
+ merge_similar: Whether to merge similar layers
1228
+ split_thick: Whether to split thick layers
1229
+
1230
+ Returns:
1231
+ Complete soil analysis results
1232
+ """
1233
+
1234
+ # Initialize state
1235
+ initial_state = SoilAnalysisState(
1236
+ text_content=text_content,
1237
+ image_base64=image_base64,
1238
+ model=model or get_default_provider_and_model()[1],
1239
+ api_key=api_key or "",
1240
+ merge_similar=merge_similar,
1241
+ split_thick=split_thick,
1242
+ llm_extraction_success=False,
1243
+ extraction_errors=[],
1244
+ retry_count=0, # Initialize retry counter
1245
+ project_info={},
1246
+ raw_soil_layers=[],
1247
+ processed_layers=[],
1248
+ water_table={},
1249
+ notes="",
1250
+ processing_summary={},
1251
+ validation_stats={},
1252
+ optimization_results={},
1253
+ final_soil_data={},
1254
+ workflow_status="initializing",
1255
+ workflow_messages=[]
1256
+ )
1257
+
1258
+ # Run workflow
1259
+ st.info("πŸš€ Starting unified soil analysis workflow...")
1260
+
1261
+ try:
1262
+ # Execute the workflow with recursion limit protection
1263
+ final_state = self.workflow.invoke(
1264
+ initial_state,
1265
+ config={"recursion_limit": 50} # Set explicit recursion limit
1266
+ )
1267
+
1268
+ # Return results
1269
+ if final_state["workflow_status"] == "completed":
1270
+ st.success("πŸŽ‰ Unified workflow completed successfully!")
1271
+ return final_state["final_soil_data"]
1272
+ else:
1273
+ st.error(f"❌ Workflow failed with status: {final_state['workflow_status']}")
1274
+ return final_state["final_soil_data"]
1275
+
1276
+ except Exception as e:
1277
+ error_msg = str(e)
1278
+ if "recursion limit" in error_msg.lower():
1279
+ st.error("❌ Workflow execution failed: Recursion limit reached. This may indicate a configuration issue with the model or workflow logic.")
1280
+ st.info("πŸ’‘ Try using a different model or check your input data format.")
1281
+ else:
1282
+ st.error(f"❌ Workflow execution failed: {error_msg}")
1283
+
1284
+ return {
1285
+ "error": f"Workflow execution failed: {error_msg}",
1286
+ "workflow_status": "execution_failed"
1287
+ }