Spaces:

abhisheksan
/

westernfront

Running

App Files Files Community

abhisheksan commited on May 11

Commit

0905727

verified ·

1 Parent(s): 82c6c01

Upload 9 files

Browse files

Files changed (9) hide show

Dockerfile +24 -0
README.md +172 -11
analysis_service.py +288 -0
app.py +237 -0
docker-compose.yml +20 -0
models.py +68 -0
requirements.txt +10 -0
twitter_service.py +945 -0
vercel.json +15 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,24 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Create logs directory for application logs
+RUN mkdir -p logs
+# Copy application code
+COPY . .
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV LOG_LEVEL=INFO
+# Expose the port the app runs on
+EXPOSE 8000
+# Command to run the application
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,11 +1,172 @@
----
-title: Westernfront
-emoji: 🦀
-colorFrom: indigo
-colorTo: blue
-sdk: docker
-pinned: false
-short_description: westernfront ai backend
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# WesternFront: India-Pakistan Conflict Tracker API
+A FastAPI application that leverages unofficial Twitter access via Twikit and Google's Gemini AI to monitor and analyze India-Pakistan tensions in real-time.
+## Overview
+WesternFront is an AI-powered conflict tracker that:
+1. Collects tweets from reliable news sources covering India-Pakistan relations without using official Twitter API
+2. Analyzes these tweets using Google's Gemini AI to assess the current conflict situation
+3. Provides RESTful endpoints to access the analysis
+4. Updates analysis periodically and on-demand
+## Core Components
+### Twitter Data Collection
+- Uses [Twikit](https://github.com/d60/twikit) for unofficial Twitter access
+- Fetches tweets from a predefined list of reliable sources
+- Implements caching to avoid unnecessary requests
+### AI Analysis with Gemini
+- Analyzes collected tweets to assess India-Pakistan tensions
+- Generates comprehensive reports including:
+  - Current situation summary
+  - Key developments in the last 24-48 hours
+  - Information reliability assessment
+  - Regional stability implications
+  - Tension level classification (Low/Medium/High/Critical)
+### FastAPI Server
+- Endpoint for on-demand analysis updates
+- Endpoint to get latest analysis
+- Background task system for periodic updates
+- Health check endpoint
+- Source list and keyword management
+## Getting Started
+### Prerequisites
+- Python 3.9+
+- Docker (optional)
+### Environment Setup
+1. Clone the repository
+2. Copy `.env.example` to `.env` and fill in the required values:
+   ```
+   # Twitter Credentials
+   TWITTER_USERNAME=your_twitter_username
+   TWITTER_PASSWORD=your_twitter_password
+   TWITTER_EMAIL=your_twitter_email
+   # Google Gemini API Key
+   GEMINI_API_KEY=your_gemini_api_key
+   # Application Settings
+   UPDATE_INTERVAL_MINUTES=60
+   CACHE_EXPIRY_MINUTES=120
+   LOG_LEVEL=INFO
+   ```
+### Installation
+#### Local Development
+```bash
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+# Install dependencies
+pip install -r requirements.txt
+# Run the application
+uvicorn app:app --reload
+```
+#### Docker Deployment
+```bash
+# Build the Docker image
+docker build -t westernfront .
+# Run the container
+docker run -p 8000:8000 --env-file .env westernfront
+```
+## API Endpoints
+### Root Endpoint
+- `GET /`: Basic API information
+### Health Check
+- `GET /health`: Check the health of the API and its components
+### Analysis
+- `GET /analysis`: Get the latest conflict analysis
+- `POST /analysis/update`: Trigger an analysis update
+  - Request Body: `{ "force": boolean }` (optional, defaults to false)
+### News Sources
+- `GET /sources`: Get the current list of news sources
+- `POST /sources`: Update the list of news sources
+  - Request Body: Array of NewsSource objects
+### Keywords
+- `GET /keywords`: Get the current search keywords
+- `POST /keywords`: Update the search keywords
+  - Request Body: Array of strings
+### Tension Levels
+- `GET /tension-levels`: Get the available tension levels
+## Data Models
+### News Source
+```json
+{
+  "name": "BBC News",
+  "twitter_handle": "BBCWorld",
+  "country": "UK",
+  "reliability_score": 0.9,
+  "is_active": true
+}
+```
+### Conflict Analysis
+```json
+{
+  "analysis_id": "uuid",
+  "generated_at": "2023-05-01T12:00:00Z",
+  "situation_summary": "...",
+  "key_developments": [
+    {
+      "title": "Development 1",
+      "description": "...",
+      "sources": ["@BBCWorld", "@Reuters"],
+      "timestamp": "2023-05-01T10:30:00Z"
+    }
+  ],
+  "reliability_assessment": "...",
+  "regional_implications": "...",
+  "tension_level": "Medium",
+  "source_tweets": [],
+  "update_triggered_by": "scheduled"
+}
+```
+## Implementation Notes
+- The application uses asyncio for handling concurrent requests
+- Implements in-memory caching (can be extended to Redis)
+- Rate limiting and throttling for Twitter scraping to avoid blocking
+- Proper error handling and logging via loguru
+- Secure credential management via environment variables
+## Future Enhancements
+- Redis integration for more robust caching
+- User authentication for API access
+- Email/notification alerts for critical tension levels
+- Historical data storage and trend analysis
+- Additional data sources beyond Twitter
+## License
+MIT License
+## Disclaimer
+This application is designed for educational and research purposes. The analysis provided should not be used as the sole source for critical decision-making related to regional conflicts.

analysis_service.py ADDED Viewed

	@@ -0,0 +1,288 @@

+import os
+import uuid
+from datetime import datetime
+from typing import Dict, List
+import google.generativeai as genai
+from loguru import logger
+from tenacity import RetryError, retry, stop_after_attempt, wait_exponential
+from models import ConflictAnalysis, KeyDevelopment, TensionLevel, Tweet
+class AnalysisService:
+    """Service for analyzing tweets using Google's Gemini AI."""
+    def __init__(self):
+        self.api_key = os.getenv("GEMINI_API_KEY")
+        self.model = None
+        self.search_keywords = [
+            "India Pakistan", "Kashmir", "LOC", "Line of Control",
+            "border tension", "ceasefire", "military", "diplomatic relations",
+            "India-Pakistan", "cross-border", "terrorism", "bilateral relations"
+        ]
+        self.initialize()
+    def initialize(self) -> bool:
+        """Initialize the Gemini AI client."""
+        if not self.api_key:
+            logger.error("GEMINI_API_KEY not provided")
+            return False
+        try:
+            logger.info("Initializing Gemini AI")
+            genai.configure(api_key=self.api_key)
+            # Configure model with lower temperature for more factual responses
+            generation_config = {
+                "temperature": 0.1,
+                "top_p": 0.95,
+                "top_k": 40
+            }
+            self.model = genai.GenerativeModel('gemini-2.0-flash', generation_config=generation_config)
+            logger.info("Gemini AI initialized successfully")
+            return True
+        except Exception as e:
+            logger.error(f"Failed to initialize Gemini AI: {str(e)}")
+            return False
+    def _prepare_prompt(self, tweets: List[Tweet]) -> str:
+        """Prepare the prompt for analysis with intelligence sources data."""
+        # Sort tweets by recency to help with latest status identification
+        sorted_tweets = sorted(tweets, key=lambda x: x.created_at if hasattr(x, 'created_at') else datetime.now(), reverse=True)
+        source_entries = [
+            f"DATA POINT {i+1}: [TIMESTAMP: {tweet.created_at if hasattr(tweet, 'created_at') else 'unknown'}, SOURCE: @{tweet.author}]\n{tweet.text}"
+            for i, tweet in enumerate(sorted_tweets)
+        ]
+        intelligence_data = "\n\n".join(source_entries)
+        prompt = f"""
+        INTELLIGENCE BRIEF: INDIA-PAKISTAN SITUATION ANALYSIS
+        DATE: {datetime.now().strftime("%Y-%m-%d")}
+        CLASSIFICATION: STRATEGIC ASSESSMENT
+        SOURCE DATA:
+        {intelligence_data}
+        ANALYTICAL PARAMETERS:
+        - Analyze the data points objectively without commentary
+        - Identify factual developments and official statements
+        - Assess tension levels based on concrete actions and statements
+        - Maintain professional, analytical tone throughout
+        - Cite specific data points in all assessments
+        - Do not introduce information not present in the data points
+        - Include exact timestamps when available
+        REQUIRED OUTPUT FORMAT:
+        {{
+            "latest_status": "Most recent significant development with exact timestamp and source citation",
+            "situation_summary": "Precise assessment of current Indo-Pak situation with timestamps and citations",
+            "key_developments": [
+                {{
+                    "title": "Precise event designation",
+                    "description": "Factual account with supporting evidence and timestamps",
+                    "sources": ["@source1", "@source2"]
+                }}
+            ],
+            "reliability_assessment": {{
+                "source_credibility": "Assessment of source authority and reliability",
+                "information_gaps": "Specific identification of intelligence gaps",
+                "confidence_rating": "HIGH|MEDIUM|LOW based on data quality"
+            }},
+            "regional_implications": {{
+                "security": "Concrete security implications based on factual developments",
+                "diplomatic": "Diplomatic consequences with specific references",
+                "economic": "Economic impacts if applicable to current situation"
+            }},
+            "tension_level": "LOW|MEDIUM|HIGH|CRITICAL",
+            "tension_rationale": "Specific evidence supporting tension level assessment"
+        }}
+        IMPORTANT DIRECTIVES:
+        - Return ONLY valid JSON without any additional text or markdown formatting
+        - Do not use conversational language or first-person perspective
+        - Focus on factual analysis, not speculation
+        - Prioritize verified information from official channels
+        - Highlight the most recent developments in the latest_status section
+        """
+        return prompt
+    @retry(wait=wait_exponential(min=1, max=10), stop=stop_after_attempt(3))
+    async def _call_gemini(self, prompt: str) -> Dict:
+        """Call the Gemini API with retry logic and improved parsing."""
+        if not self.model:
+            if not self.initialize():
+                logger.error("Could not analyze tweets, Gemini AI not initialized")
+                raise Exception("Gemini AI initialization failed")
+        try:
+            logger.info("Calling Gemini API for conflict analysis")
+            response = await self.model.generate_content_async(prompt)
+            result = response.text
+            import json
+            import re
+            # Better JSON extraction with multiple patterns
+            json_match = re.search(r'```(?:json)?\n(.*?)\n```', result, re.DOTALL)
+            if json_match:
+                result = json_match.group(1)
+            else:
+                # Try to find JSON objects with or without formatting
+                json_pattern = r'({[\s\S]*})'
+                json_match = re.search(json_pattern, result)
+                if json_match:
+                    result = json_match.group(1)
+            # Clean the result of any non-JSON content
+            result = re.sub(r'```', '', result).strip()
+            # Parse JSON with error handling
+            try:
+                analysis_data = json.loads(result)
+                logger.info("Successfully received and parsed Gemini response")
+                return analysis_data
+            except json.JSONDecodeError as e:
+                logger.error(f"JSON parsing error: {str(e)}")
+                # Attempt cleanup and retry parsing
+                result = re.sub(r'[\n\r\t]', ' ', result)
+                result = re.search(r'({.*})', result).group(1) if re.search(r'({.*})', result) else result
+                analysis_data = json.loads(result)
+                logger.info("Successfully parsed Gemini response after cleanup")
+                return analysis_data
+        except Exception as e:
+            logger.error(f"Error calling Gemini API: {str(e)}")
+            logger.debug(f"Raw response content: {result if 'result' in locals() else 'No response'}")
+            raise
+    def _extract_tension_level(self, level_text: str) -> TensionLevel:
+        """Extract tension level enum from text."""
+        level_text = level_text.lower()
+        if "critical" in level_text:
+            return TensionLevel.CRITICAL
+        elif "high" in level_text:
+            return TensionLevel.HIGH
+        elif "medium" in level_text:
+            return TensionLevel.MEDIUM
+        else:
+            return TensionLevel.LOW
+    def _process_key_developments(self, developments_data: List[Dict]) -> List[KeyDevelopment]:
+        """Process key developments from API response."""
+        key_developments = []
+        for dev in developments_data:
+            key_developments.append(
+                KeyDevelopment(
+                    title=dev.get("title", "Unnamed Development"),
+                    description=dev.get("description", "No description provided"),
+                    sources=dev.get("sources", []),
+                    timestamp=datetime.now()
+                )
+            )
+        return key_developments
+    def _format_reliability_assessment(self, reliability_data: Dict) -> str:
+        """Format reliability assessment data into a structured string."""
+        if isinstance(reliability_data, str):
+            return reliability_data
+        if isinstance(reliability_data, dict):
+            sections = []
+            if "source_credibility" in reliability_data:
+                sections.append(f"SOURCE CREDIBILITY: {reliability_data['source_credibility']}")
+            if "information_gaps" in reliability_data:
+                sections.append(f"INFORMATION GAPS: {reliability_data['information_gaps']}")
+            if "confidence_rating" in reliability_data:
+                sections.append(f"CONFIDENCE: {reliability_data['confidence_rating']}")
+            if sections:
+                return "\n\n".join(sections)
+        return str(reliability_data)
+    def _format_regional_implications(self, implications_data: Dict) -> str:
+        """Format regional implications data into a structured string."""
+        if isinstance(implications_data, str):
+            return implications_data
+        if isinstance(implications_data, dict):
+            sections = []
+            if "security" in implications_data:
+                sections.append(f"SECURITY: {implications_data['security']}")
+            if "diplomatic" in implications_data:
+                sections.append(f"DIPLOMATIC: {implications_data['diplomatic']}")
+            if "economic" in implications_data:
+                sections.append(f"ECONOMIC: {implications_data['economic']}")
+            if sections:
+                return "\n\n".join(sections)
+        return str(implications_data)
+    async def analyze_tweets(self, tweets: List[Tweet], trigger: str = "scheduled") -> ConflictAnalysis:
+        """Analyze tweets using Gemini AI and generate a conflict analysis."""
+        if not tweets:
+            logger.warning("No tweets provided for analysis")
+            return None
+        try:
+            prompt = self._prepare_prompt(tweets)
+            analysis_data = await self._call_gemini(prompt)
+            # Process and extract data with proper error handling
+            key_developments = self._process_key_developments(analysis_data.get("key_developments", []))
+            # Format complex nested structures if present
+            reliability_assessment = self._format_reliability_assessment(
+                analysis_data.get("reliability_assessment", "No reliability assessment provided")
+            )
+            regional_implications = self._format_regional_implications(
+                analysis_data.get("regional_implications", "No regional implications provided")
+            )
+            # Extract tension rationale if available
+            tension_info = analysis_data.get("tension_level", "Low")
+            tension_rationale = analysis_data.get("tension_rationale", "")
+            # Combine tension level and rationale if both exist
+            if tension_rationale:
+                tension_display = f"{tension_info} - {tension_rationale}"
+            else:
+                tension_display = tension_info
+            # Get the latest status
+            latest_status = analysis_data.get("latest_status", "No recent status update available")
+            analysis = ConflictAnalysis(
+                analysis_id=str(uuid.uuid4()),
+                generated_at=datetime.now(),
+                situation_summary=analysis_data.get("situation_summary", "No summary provided"),
+                key_developments=key_developments,
+                reliability_assessment=reliability_assessment,
+                regional_implications=regional_implications,
+                tension_level=self._extract_tension_level(tension_display),
+                source_tweets=tweets,
+                update_triggered_by=trigger,
+                latest_status=latest_status  # Added new parameter
+            )
+            logger.info(f"Generated conflict analysis with ID: {analysis.analysis_id}")
+            return analysis
+        except RetryError as e:
+            logger.error(f"Failed to generate analysis after multiple retries: {str(e)}")
+            return None
+        except Exception as e:
+            logger.error(f"Unexpected error in tweet analysis: {str(e)}")
+            return None
+    def get_search_keywords(self) -> List[str]:
+        """Get the current search keywords."""
+        return self.search_keywords
+    def update_search_keywords(self, keywords: List[str]) -> None:
+        """Update the search keywords."""
+        self.search_keywords = keywords
+        logger.info(f"Updated search keywords. New count: {len(keywords)}")

app.py ADDED Viewed

	@@ -0,0 +1,237 @@

+import asyncio
+import os
+from datetime import datetime
+from typing import Dict, List, Optional
+from dotenv import load_dotenv
+from fastapi import BackgroundTasks, Depends, FastAPI, HTTPException, status
+from fastapi.middleware.cors import CORSMiddleware
+from loguru import logger
+from analysis_service import AnalysisService
+from models import (ConflictAnalysis, HealthCheck, NewsSource, TensionLevel,
+                    Tweet, UpdateRequest)
+from twitter_service import TwitterService
+# Load environment variables from .env file
+load_dotenv()
+# Configure logging
+os.makedirs("logs", exist_ok=True)
+logger.add("logs/app.log", rotation="500 MB", level=os.getenv("LOG_LEVEL", "INFO"))
+# Create FastAPI application
+app = FastAPI(
+    title="WesternFront API",
+    description="AI-powered conflict tracker for monitoring India-Pakistan tensions",
+    version="1.0.0"
+)
+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Adjust this for production
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Services
+twitter_service = TwitterService()
+analysis_service = AnalysisService()
+# In-memory store for latest analysis
+latest_analysis: Optional[ConflictAnalysis] = None
+last_update_time: Optional[datetime] = None
+async def get_twitter_service() -> TwitterService:
+    """Dependency to get the Twitter service."""
+    return twitter_service
+async def get_analysis_service() -> AnalysisService:
+    """Dependency to get the Analysis service."""
+    return analysis_service
+@app.on_event("startup")
+async def startup_event():
+    """Initialize services on startup."""
+    logger.info("Starting up WesternFront API")
+    # Initialize Twitter service
+    initialized = await twitter_service.initialize()
+    if not initialized:
+        logger.warning("Twitter service initialization failed. Some features may not work.")
+    # Schedule first update
+    background_tasks = BackgroundTasks()
+    background_tasks.add_task(update_analysis_task)
+    # Set up periodic update task
+    asyncio.create_task(periodic_update())
+@app.on_event("shutdown")
+async def shutdown_event():
+    """Clean up resources on shutdown."""
+    logger.info("Shutting down WesternFront API")
+    if twitter_service and hasattr(twitter_service, 'close'):
+        await twitter_service.close()
+async def update_analysis_task(trigger: str = "scheduled") -> None:
+    """Task to update the conflict analysis."""
+    global latest_analysis, last_update_time
+    try:
+        logger.info(f"Starting analysis update ({trigger})")
+        # Get tweets related to India-Pakistan conflict
+        keywords = analysis_service.get_search_keywords()
+        tweets = await twitter_service.get_related_tweets(keywords, days_back=2)
+        if not tweets:
+            logger.warning("No relevant tweets found for analysis")
+            return
+        logger.info(f"Found {len(tweets)} relevant tweets for analysis")
+        # Analyze tweets
+        analysis = await analysis_service.analyze_tweets(tweets, trigger)
+        if analysis:
+            latest_analysis = analysis
+            last_update_time = datetime.now()
+            logger.info(f"Analysis updated successfully. Tension level: {analysis.tension_level}")
+        else:
+            logger.error("Failed to generate analysis")
+    except Exception as e:
+        logger.error(f"Error in update_analysis_task: {str(e)}")
+async def periodic_update() -> None:
+    """Periodically update the analysis."""
+    update_interval = int(os.getenv("UPDATE_INTERVAL_MINUTES", 60))
+    while True:
+        try:
+            await asyncio.sleep(update_interval * 60)  # Convert to seconds
+            await update_analysis_task("scheduled")
+        except Exception as e:
+            logger.error(f"Error in periodic_update: {str(e)}")
+            await asyncio.sleep(300)  # Wait 5 minutes if there was an error
+@app.get("/", response_model=Dict)
+async def root():
+    """Root endpoint with basic information about the API."""
+    return {
+        "name": "WesternFront API",
+        "description": "AI-powered conflict tracker for India-Pakistan tensions",
+        "version": "1.0.0"
+    }
+@app.get("/health", response_model=HealthCheck)
+async def health_check():
+    """Health check endpoint."""
+    twitter_initialized = hasattr(twitter_service, 'http_client') and twitter_service.http_client is not None
+    gemini_initialized = analysis_service.model is not None
+    return HealthCheck(
+        status="healthy",
+        version="1.0.0",
+        timestamp=datetime.now(),
+        last_update=last_update_time,
+        components_status={
+            "twitter_service": twitter_initialized,
+            "analysis_service": gemini_initialized
+        }
+    )
+@app.get("/analysis", response_model=Optional[ConflictAnalysis])
+async def get_latest_analysis():
+    """Get the latest conflict analysis."""
+    if not latest_analysis:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail="No analysis available yet. Try triggering an update."
+        )
+    return latest_analysis
+@app.post("/analysis/update", response_model=Dict)
+async def trigger_update(
+    request: UpdateRequest,
+    background_tasks: BackgroundTasks
+):
+    """Trigger an analysis update."""
+    if request.force:
+        # Clear cache to force fresh tweets
+        twitter_service.tweet_cache.clear()
+    # Add update task to background tasks
+    background_tasks.add_task(update_analysis_task, "manual")
+    return {
+        "message": "Analysis update triggered",
+        "timestamp": datetime.now(),
+        "force_refresh": request.force
+    }
+@app.get("/sources", response_model=List[NewsSource])
+async def get_news_sources(
+    twitter: TwitterService = Depends(get_twitter_service)
+):
+    """Get the current list of news sources."""
+    return twitter.get_sources()
+@app.post("/sources", response_model=Dict)
+async def update_news_sources(
+    sources: List[NewsSource],
+    twitter: TwitterService = Depends(get_twitter_service)
+):
+    """Update the list of news sources."""
+    twitter.update_sources(sources)
+    return {
+        "message": "News sources updated",
+        "count": len(sources)
+    }
+@app.get("/keywords", response_model=List[str])
+async def get_search_keywords(
+    analysis: AnalysisService = Depends(get_analysis_service)
+):
+    """Get the current search keywords."""
+    return analysis.get_search_keywords()
+@app.post("/keywords", response_model=Dict)
+async def update_search_keywords(
+    keywords: List[str],
+    analysis: AnalysisService = Depends(get_analysis_service)
+):
+    """Update the search keywords."""
+    analysis.update_search_keywords(keywords)
+    return {
+        "message": "Search keywords updated",
+        "count": len(keywords)
+    }
+@app.get("/tension-levels", response_model=List[str])
+async def get_tension_levels():
+    """Get the available tension levels."""
+    return [level.value for level in TensionLevel]
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=True)

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,20 @@

+version: '3'
+services:
+  westernfront-api:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "8000:8000"
+    volumes:
+      - ./logs:/app/logs
+    env_file:
+      - .env
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 10s

models.py ADDED Viewed

	@@ -0,0 +1,68 @@

+from datetime import datetime
+from enum import Enum
+from typing import Dict, List, Optional
+from pydantic import BaseModel, Field
+class TensionLevel(str, Enum):
+    """Enum for tension levels between India and Pakistan."""
+    LOW = "Low"
+    MEDIUM = "Medium"
+    HIGH = "High"
+    CRITICAL = "Critical"
+class NewsSource(BaseModel):
+    """Model for a news source."""
+    name: str
+    twitter_handle: str
+    country: str
+    reliability_score: float = Field(ge=0.0, le=1.0)
+    is_active: bool = True
+class Tweet(BaseModel):
+    """Model for a tweet."""
+    id: str
+    text: str
+    author: str
+    created_at: datetime
+    engagement: Dict[str, int] = {"likes": 0, "retweets": 0, "replies": 0, "views": 0}
+    url: str
+class KeyDevelopment(BaseModel):
+    """Model for a key development in the conflict."""
+    title: str
+    description: str
+    sources: List[str]
+    timestamp: Optional[datetime] = None
+class ConflictAnalysis(BaseModel):
+    """Model for a conflict analysis."""
+    analysis_id: str
+    generated_at: datetime
+    latest_status: str  # Added this field
+    situation_summary: str
+    key_developments: List[KeyDevelopment]
+    reliability_assessment: str
+    regional_implications: str
+    tension_level: TensionLevel
+    source_tweets: List[Tweet]
+    update_triggered_by: str
+class UpdateRequest(BaseModel):
+    """Model for an update request."""
+    force: bool = False
+class HealthCheck(BaseModel):
+    """Model for a health check response."""
+    status: str
+    version: str
+    timestamp: datetime
+    last_update: Optional[datetime] = None
+    components_status: Dict[str, bool]

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+fastapi==0.103.1
+uvicorn[standard]==0.23.2
+python-dotenv==1.0.0
+loguru==0.7.0
+google-generativeai==0.3.0
+tenacity==8.2.2
+cachetools==5.3.0
+pydantic==2.3.0
+httpx==0.24.1
+beautifulsoup4==4.12.2

twitter_service.py ADDED Viewed

	@@ -0,0 +1,945 @@

+import asyncio
+import json
+import os
+import re
+import time
+import random
+from datetime import datetime, timedelta
+from typing import Dict, List, Optional, Tuple
+from urllib.parse import urlparse, quote
+import httpx
+from bs4 import BeautifulSoup
+from cachetools import TTLCache
+from fastapi import HTTPException
+from loguru import logger
+from models import NewsSource, Tweet
+class FingerprintRandomizer:
+    """Randomizes browser fingerprints to evade detection"""
+    def __init__(self):
+        # Common screen resolutions
+        self.resolutions = [
+            (1920, 1080), (1366, 768), (1280, 720),
+            (1440, 900), (1536, 864), (2560, 1440),
+            (1680, 1050), (1920, 1200), (1024, 768)
+        ]
+        # Common color depths
+        self.color_depths = [24, 30, 32]
+        # Common platforms
+        self.platforms = [
+            "Win32", "MacIntel", "Linux x86_64",
+            "Linux armv8l", "iPhone", "iPad"
+        ]
+        # Browser variants
+        self.browsers = ["Chrome", "Firefox", "Safari", "Edge"]
+        # Common languages
+        self.languages = [
+            "en-US", "en-GB", "en-CA", "fr-FR", "de-DE",
+            "es-ES", "it-IT", "pt-BR", "ja-JP", "zh-CN"
+        ]
+        # Common timezone offsets
+        self.timezone_offsets = [-60, -120, -180, -240, 0, 60, 120, 180, 330, 480, 540]
+    def generate_headers(self):
+        """Generate randomized headers that mimic a real browser"""
+        browser = random.choice(self.browsers)
+        platform = random.choice(self.platforms)
+        language = random.choice(self.languages)
+        user_agent = self._generate_user_agent(browser, platform)
+        headers = {
+            "User-Agent": user_agent,
+            "Accept": self._generate_accept_header(browser),
+            "Accept-Language": f"{language},en;q=0.9",
+            "Accept-Encoding": "gzip, deflate, br",
+            "Connection": "keep-alive",
+        }
+        # Add browser-specific headers
+        if browser == "Chrome" or browser == "Edge":
+            headers["sec-ch-ua"] = f'"Google Chrome";v="{random.randint(90, 110)}", "Chromium";v="{random.randint(90, 110)}"'
+            headers["sec-ch-ua-mobile"] = "?0"
+            headers["sec-ch-ua-platform"] = f'"{platform}"'
+        # Randomize header order (matters for fingerprinting)
+        return dict(sorted(headers.items(), key=lambda x: random.random()))
+    def _generate_user_agent(self, browser, platform):
+        """Generate a realistic user agent string"""
+        if browser == "Chrome":
+            chrome_version = f"{random.randint(90, 110)}.0.{random.randint(1000, 9999)}.{random.randint(10, 999)}"
+            if "Win" in platform:
+                return f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_version} Safari/537.36"
+            elif "Mac" in platform:
+                return f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_{random.randint(11, 15)}_{random.randint(1, 7)}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_version} Safari/537.36"
+            else:
+                return f"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_version} Safari/537.36"
+        elif browser == "Firefox":
+            ff_version = f"{random.randint(80, 100)}.0"
+            if "Win" in platform:
+                return f"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:{ff_version}) Gecko/20100101 Firefox/{ff_version}"
+            elif "Mac" in platform:
+                return f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.{random.randint(11, 15)}; rv:{ff_version}) Gecko/20100101 Firefox/{ff_version}"
+            else:
+                return f"Mozilla/5.0 (X11; Linux i686; rv:{ff_version}) Gecko/20100101 Firefox/{ff_version}"
+        elif browser == "Safari":
+            webkit_version = f"605.1.{random.randint(1, 15)}"
+            safari_version = f"{random.randint(13, 16)}.{random.randint(0, 1)}"
+            return f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_{random.randint(11, 15)}_{random.randint(1, 7)}) AppleWebKit/{webkit_version} (KHTML, like Gecko) Version/{safari_version} Safari/{webkit_version}"
+        elif browser == "Edge":
+            edge_version = f"{random.randint(90, 110)}.0.{random.randint(1000, 9999)}.{random.randint(10, 999)}"
+            return f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{edge_version} Safari/537.36 Edg/{edge_version}"
+    def _generate_accept_header(self, browser):
+        """Generate browser-specific Accept header"""
+        if browser == "Chrome" or browser == "Edge":
+            return "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
+        elif browser == "Firefox":
+            return "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"
+        elif browser == "Safari":
+            return "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
+class CookieManager:
+    """Intelligently manages cookies to maintain sessions"""
+    def __init__(self):
+        self.cookies_by_domain = {}
+        self.cookie_jar_path = os.path.join(os.path.dirname(__file__), '.cookie_store')
+        os.makedirs(self.cookie_jar_path, exist_ok=True)
+        self.load_cookies()
+    def load_cookies(self):
+        """Load cookies from storage"""
+        try:
+            for filename in os.listdir(self.cookie_jar_path):
+                if filename.endswith('.json'):
+                    domain = filename[:-5]  # Remove .json
+                    file_path = os.path.join(self.cookie_jar_path, filename)
+                    with open(file_path, 'r') as f:
+                        try:
+                            cookie_data = json.load(f)
+                            self.cookies_by_domain[domain] = cookie_data
+                        except json.JSONDecodeError:
+                            logger.warning(f"Invalid cookie file for {domain}, skipping")
+        except Exception as e:
+            logger.error(f"Error loading cookies: {e}")
+    def save_cookies(self):
+        """Save cookies to storage"""
+        for domain, cookies in self.cookies_by_domain.items():
+            file_path = os.path.join(self.cookie_jar_path, f"{domain}.json")
+            try:
+                with open(file_path, 'w') as f:
+                    json.dump(cookies, f)
+            except Exception as e:
+                logger.error(f"Error saving cookies for {domain}: {e}")
+    def update_cookies(self, url, response_cookies):
+        """Update cookies from a response"""
+        domain = urlparse(url).netloc
+        if domain not in self.cookies_by_domain:
+            self.cookies_by_domain[domain] = {}
+        # Update with new cookies
+        for name, value in response_cookies.items():
+            self.cookies_by_domain[domain][name] = value
+        # Save updated cookies
+        self.save_cookies()
+    def get_cookies_for_url(self, url):
+        """Get cookies for a specific URL"""
+        domain = urlparse(url).netloc
+        return self.cookies_by_domain.get(domain, {})
+    def clear_cookies_for_domain(self, domain):
+        """Clear cookies for a specific domain"""
+        if domain in self.cookies_by_domain:
+            del self.cookies_by_domain[domain]
+            file_path = os.path.join(self.cookie_jar_path, f"{domain}.json")
+            if os.path.exists(file_path):
+                os.remove(file_path)
+class NitterBypass:
+    """Advanced Nitter rate limit bypass system"""
+    def __init__(self, fingerprint_randomizer, cookie_manager):
+        # Expanded list of Nitter instances for better rotation
+        self.instances = [
+            "https://nitter.net",
+            "https://nitter.lacontrevoie.fr",
+            "https://nitter.1d4.us",
+            "https://nitter.poast.org",
+            "https://nitter.unixfox.eu",
+            "https://nitter.kavin.rocks",
+            "https://nitter.privacydev.net",
+            "https://nitter.projectsegfau.lt",
+            "https://nitter.pussthecat.org",
+            "https://nitter.42l.fr",
+            "https://nitter.fdn.fr",
+            "https://nitter.cz",
+            "https://bird.habedieehre.com",
+            "https://tweet.lambda.dance",
+            "https://nitter.cutelab.space",
+            "https://nitter.fly.dev",
+            "https://notabird.site",
+            "https://nitter.weiler.dev",
+            "https://nitter.sethforprivacy.com",
+            "https://nitter.mask.sh",
+            "https://nitter.space",
+            "https://nitter.hu",
+            "https://nitter.moomoo.me",
+            "https://nitter.grimneko.de",
+        ]
+        self.fingerprint_randomizer = fingerprint_randomizer
+        self.cookie_manager = cookie_manager
+        # Tracking usage statistics per instance
+        self.usage_counts = {instance: 0 for instance in self.instances}
+        self.success_counts = {instance: 0 for instance in self.instances}
+        self.failure_counts = {instance: 0 for instance in self.instances}
+        self.response_times = {instance: [] for instance in self.instances}
+        # Track banned instances with timeout
+        self.banned_instances = set()
+        self.banned_time = {}
+        self.ban_duration = 3600  # Default 1 hour ban time
+        # Client collection, one per instance
+        self.clients = {}
+        # Request flow control
+        self.last_request_time = 0
+        self.min_request_interval = 2.0
+        self.request_jitter = True  # Add random jitter to requests
+        # Dynamic proxy rotation (if available)
+        self.proxies = self._load_proxies()
+        self.proxy_index = 0
+    def _load_proxies(self):
+        """Load proxy list if available"""
+        proxies = []
+        try:
+            proxy_file = os.path.join(os.path.dirname(__file__), 'proxies.txt')
+            if os.path.exists(proxy_file):
+                with open(proxy_file, 'r') as f:
+                    for line in f:
+                        line = line.strip()
+                        if line and not line.startswith('#'):
+                            proxies.append(line)
+                logger.info(f"Loaded {len(proxies)} proxies")
+        except Exception as e:
+            logger.error(f"Error loading proxies: {e}")
+        return proxies
+    def _get_next_proxy(self):
+        """Get next proxy in rotation"""
+        if not self.proxies:
+            return None
+        proxy = self.proxies[self.proxy_index]
+        self.proxy_index = (self.proxy_index + 1) % len(self.proxies)
+        return proxy
+    async def initialize(self):
+        """Initialize Nitter bypass system"""
+        # Create clients for each instance
+        for instance in self.instances:
+            await self._initialize_client(instance)
+        # Test instances to determine which are working
+        await self._test_instances()
+    async def _initialize_client(self, instance):
+        """Create an HTTP client for an instance"""
+        headers = self.fingerprint_randomizer.generate_headers()
+        # Get proxy if available
+        proxy = self._get_next_proxy()
+        proxies = {"all://": proxy} if proxy else None
+        # Create client with unique settings for this instance
+        self.clients[instance] = httpx.AsyncClient(
+            timeout=30.0,
+            follow_redirects=True,
+            headers=headers,
+            http2=True,
+            limits=httpx.Limits(max_connections=5, max_keepalive_connections=2),
+            proxies=proxies
+        )
+        # Initialize with cookies if we have any
+        domain = urlparse(instance).netloc
+        cookies = self.cookie_manager.get_cookies_for_url(instance)
+        if cookies:
+            for name, value in cookies.items():
+                self.clients[instance].cookies.set(name, value, domain=domain)
+    async def _test_instances(self):
+        """Test all instances to check availability"""
+        for instance in self.instances:
+            try:
+                start_time = time.time()
+                client = self.clients[instance]
+                # Add custom parameter to avoid caches
+                params = {"_": str(int(time.time()))}
+                response = await client.get(f"{instance}/", params=params, timeout=5.0)
+                end_time = time.time()
+                if response.status_code == 200:
+                    logger.debug(f"Instance {instance} is available, response time: {end_time - start_time:.2f}s")
+                    # Update cookies from response
+                    self.cookie_manager.update_cookies(instance, dict(client.cookies))
+                    # Track response time for prioritization
+                    self.response_times[instance].append(end_time - start_time)
+                    if len(self.response_times[instance]) > 5:
+                        self.response_times[instance].pop(0)  # Keep only last 5 measurements
+                else:
+                    logger.warning(f"Instance {instance} returned status {response.status_code}")
+                    if response.status_code in [429, 403, 503]:
+                        self.banned_instances.add(instance)
+                        self.banned_time[instance] = time.time()
+            except Exception as e:
+                logger.warning(f"Instance {instance} test failed: {e}")
+                self.banned_instances.add(instance)
+                self.banned_time[instance] = time.time()
+            # Add delay between tests
+            await asyncio.sleep(random.uniform(0.5, 1.5))
+    def _get_best_instance(self):
+        """Select the best instance based on health metrics"""
+        now = time.time()
+        # Unban instances that have served their time
+        for instance in list(self.banned_instances):
+            if instance in self.banned_time and now - self.banned_time[instance] > self.ban_duration:
+                self.banned_instances.remove(instance)
+                logger.info(f"Unbanned instance {instance} after timeout")
+        # Get available instances
+        available = [i for i in self.instances if i not in self.banned_instances]
+        if not available:
+            # If all are banned, try the least recently banned one
+            if self.banned_time:
+                instance = min(self.banned_time.items(), key=lambda x: x[1])[0]
+                logger.warning(f"All instances banned, trying least recent: {instance}")
+                return instance
+            else:
+                # Fallback to any instance
+                return random.choice(self.instances)
+        # Calculate a health score for each instance
+        health_scores = {}
+        for instance in available:
+            # Base score
+            score = 100
+            # Adjust for success rate
+            total_requests = self.success_counts[instance] + self.failure_counts[instance]
+            if total_requests > 0:
+                success_rate = self.success_counts[instance] / total_requests
+                score *= (0.5 + 0.5 * success_rate)  # Weight success rate as 50% of score
+            # Adjust for response time
+            if self.response_times[instance]:
+                avg_response_time = sum(self.response_times[instance]) / len(self.response_times[instance])
+                # Faster responses get higher scores (up to 1.5x bonus for fast responses)
+                speed_factor = min(1.5, max(0.5, 1.0 / (avg_response_time / 2)))
+                score *= speed_factor
+            # Adjust for usage count (prefer less used instances)
+            usage_penalty = min(0.9, 0.5 + 0.5 / (1 + self.usage_counts[instance] / 5))
+            score *= usage_penalty
+            health_scores[instance] = score
+        # Select from top 3 instances with probability weighted by health score
+        top_instances = sorted(health_scores.items(), key=lambda x: x[1], reverse=True)[:3]
+        if not top_instances:
+            return random.choice(available)
+        # Extract instances and scores
+        instances = [i[0] for i in top_instances]
+        scores = [i[1] for i in top_instances]
+        # Normalize scores for weighted random selection
+        total_score = sum(scores)
+        if total_score > 0:
+            probabilities = [score / total_score for score in scores]
+            chosen = random.choices(instances, weights=probabilities, k=1)[0]
+        else:
+            chosen = random.choice(instances)
+        # Update usage count
+        self.usage_counts[chosen] += 1
+        return chosen
+    async def request(self, path, params=None):
+        """Make an intelligent request to a Nitter instance"""
+        if params is None:
+            params = {}
+        # Add random parameter to avoid caching
+        params["_nonce"] = str(random.randint(10000, 99999999))
+        # Rate limiting with jitter
+        now = time.time()
+        since_last = now - self.last_request_time
+        if since_last < self.min_request_interval:
+            if self.request_jitter:
+                # Add jitter to make request patterns less predictable
+                jitter = random.uniform(1.0, 3.0)
+                delay = self.min_request_interval - since_last + jitter
+            else:
+                delay = self.min_request_interval - since_last
+            await asyncio.sleep(delay)
+        # Get the best instance
+        instance = self._get_best_instance()
+        client = self.clients[instance]
+        # Update headers with new fingerprint to avoid detection
+        client.headers.update(self.fingerprint_randomizer.generate_headers())
+        # Update cookies
+        domain = urlparse(instance).netloc
+        cookies = self.cookie_manager.get_cookies_for_url(instance)
+        for name, value in cookies.items():
+            client.cookies.set(name, value, domain=domain)
+        # Update request timestamp
+        self.last_request_time = time.time()
+        url = f"{instance}{path}"
+        try:
+            # Make the request with timing
+            start_time = time.time()
+            response = await client.get(url, params=params)
+            end_time = time.time()
+            response_time = end_time - start_time
+            # Update cookies from response
+            if len(response.cookies) > 0:
+                self.cookie_manager.update_cookies(url, dict(response.cookies))
+            # Update response time tracking
+            self.response_times[instance].append(response_time)
+            if len(self.response_times[instance]) > 5:
+                self.response_times[instance].pop(0)
+            # Handle response based on status code
+            if response.status_code == 200:
+                # Success
+                self.success_counts[instance] += 1
+                return response
+            elif response.status_code in [429, 403, 503]:
+                # Rate limited or banned
+                logger.warning(f"Rate limit detected on {instance}: {response.status_code}")
+                self.failure_counts[instance] += 1
+                self.banned_instances.add(instance)
+                self.banned_time[instance] = time.time()
+                # Different ban durations based on response
+                if response.status_code == 429:  # Rate limited
+                    self.ban_duration = min(self.ban_duration * 2, 7200)  # Max 2 hour ban, increasing
+                else:  # Other error
+                    self.ban_duration = 1800  # 30 minute ban
+                # Retry with a different instance
+                return await self.request(path, params)
+            else:
+                # Other error
+                logger.error(f"Error with {instance}: HTTP {response.status_code}")
+                self.failure_counts[instance] += 1
+                # Don't immediately ban for non-rate-limit errors
+                if self.failure_counts[instance] > 3:  # After 3 failures, ban temporarily
+                    self.banned_instances.add(instance)
+                    self.banned_time[instance] = time.time()
+                    self.ban_duration = 900  # 15 minute ban
+                # Retry with a different instance
+                return await self.request(path, params)
+        except httpx.HTTPError as e:
+            logger.error(f"HTTP error with {instance}: {str(e)}")
+            self.failure_counts[instance] += 1
+            # Ban instance after HTTP errors
+            self.banned_instances.add(instance)
+            self.banned_time[instance] = time.time()
+            # Retry with a different instance
+            return await self.request(path, params)
+        except Exception as e:
+            logger.error(f"Error with {instance}: {str(e)}")
+            self.failure_counts[instance] += 1
+            # Ban instance after errors
+            self.banned_instances.add(instance)
+            self.banned_time[instance] = time.time()
+            # Retry with a different instance
+            return await self.request(path, params)
+    async def close(self):
+        """Close all HTTP clients"""
+        for client in self.clients.values():
+            await client.aclose()
+class TwitterService:
+    """Service for collecting tweets via web scraping using Nitter alternative frontends."""
+    def __init__(self):
+        self.cache_expiry = int(os.getenv("CACHE_EXPIRY_MINUTES", 120))
+        # Initialize advanced components for rate limit bypass
+        self.fingerprint_randomizer = FingerprintRandomizer()
+        self.cookie_manager = CookieManager()
+        self.nitter_bypass = None  # Will be initialized later
+        # Enhanced cache with TTL and persistence
+        self.tweet_cache_dir = os.path.join(os.path.dirname(__file__), ".tweet_cache")
+        os.makedirs(self.tweet_cache_dir, exist_ok=True)
+        self.in_memory_cache = TTLCache(maxsize=100, ttl=self.cache_expiry * 60)
+        # Statistics and monitoring
+        self.stats = {
+            "requests": 0,
+            "cache_hits": 0,
+            "rate_limits": 0,
+            "errors": 0,
+            "success": 0
+        }
+        self.last_stats_reset = time.time()
+        # Default trusted news sources - focused on India-Pakistan relations
+        self.news_sources = [
+            NewsSource(name="Shiv Aroor", twitter_handle="ShivAroor", country="India", reliability_score=0.85),
+            NewsSource(name="Sidhant Sibal", twitter_handle="sidhant", country="India", reliability_score=0.85),
+            NewsSource(name="Indian Air Force", twitter_handle="IAF_MCC", country="India", reliability_score=0.95),
+            NewsSource(name="Indian Army", twitter_handle="adgpi", country="India", reliability_score=0.95),
+            NewsSource(name="Indian Defence Ministry", twitter_handle="SpokespersonMoD", country="India", reliability_score=0.95),
+            NewsSource(name="MIB India", twitter_handle="MIB_India", country="India", reliability_score=0.95),
+            NewsSource(name="Indian External Affairs Minister", twitter_handle="DrSJaishankar", country="India", reliability_score=0.95),
+        ]
+    async def initialize(self) -> bool:
+        """Initialize the Twitter service."""
+        try:
+            logger.info("Initializing Twitter service with advanced bypass techniques")
+            # Initialize the Nitter bypass engine
+            self.nitter_bypass = NitterBypass(self.fingerprint_randomizer, self.cookie_manager)
+            await self.nitter_bypass.initialize()
+            # Schedule background health checks for instances
+            asyncio.create_task(self._background_maintenance())
+            logger.info("Twitter service initialized successfully with bypass capabilities")
+            return True
+        except Exception as e:
+            logger.error(f"Failed to initialize Twitter service: {str(e)}")
+            return False
+    async def _background_maintenance(self):
+        """Run background maintenance tasks"""
+        while True:
+            try:
+                # Wait between maintenance cycles
+                await asyncio.sleep(900)  # 15 minutes
+                # Log statistics
+                self._log_statistics()
+                # Clean up cache files
+                self._cleanup_expired_cache()
+                # Reset statistics periodically
+                if time.time() - self.last_stats_reset > 3600:  # Reset every hour
+                    self.stats = {key: 0 for key in self.stats}
+                    self.last_stats_reset = time.time()
+            except Exception as e:
+                logger.error(f"Error in background maintenance: {str(e)}")
+    def _log_statistics(self):
+        """Log service statistics"""
+        total_requests = max(1, self.stats["requests"])
+        cache_hit_rate = self.stats["cache_hits"] / total_requests * 100
+        error_rate = (self.stats["errors"] + self.stats["rate_limits"]) / total_requests * 100
+        logger.info(f"TwitterService stats - Requests: {total_requests}, " +
+                    f"Cache hits: {self.stats['cache_hits']} ({cache_hit_rate:.1f}%), " +
+                    f"Rate limits: {self.stats['rate_limits']}, " +
+                    f"Errors: {self.stats['errors']} ({error_rate:.1f}%)")
+    def _cleanup_expired_cache(self):
+        """Clean up expired cache files"""
+        now = time.time()
+        expiry_time = self.cache_expiry * 60
+        try:
+            for filename in os.listdir(self.tweet_cache_dir):
+                if not filename.endswith('.json'):
+                    continue
+                file_path = os.path.join(self.tweet_cache_dir, filename)
+                try:
+                    file_modified_time = os.path.getmtime(file_path)
+                    if now - file_modified_time > expiry_time:
+                        os.remove(file_path)
+                        logger.debug(f"Removed expired cache file: {filename}")
+                except Exception as e:
+                    logger.error(f"Error cleaning up cache file {filename}: {e}")
+        except Exception as e:
+            logger.error(f"Error during cache cleanup: {e}")
+    def _get_cache_path(self, key):
+        """Get filesystem path for a cache key"""
+        # Create a safe filename from the cache key
+        safe_key = re.sub(r'[^a-zA-Z0-9_-]', '_', key)
+        return os.path.join(self.tweet_cache_dir, f"{safe_key}.json")
+    def _get_from_cache(self, cache_key):
+        """Get tweets from cache (memory or disk)"""
+        # Check memory cache first
+        if cache_key in self.in_memory_cache:
+            self.stats["cache_hits"] += 1
+            return self.in_memory_cache[cache_key]
+        # Check disk cache
+        cache_path = self._get_cache_path(cache_key)
+        if os.path.exists(cache_path):
+            try:
+                with open(cache_path, 'r') as f:
+                    cache_data = json.load(f)
+                # Check if cache is still valid
+                if time.time() - cache_data['timestamp'] < self.cache_expiry * 60:
+                    # Convert dictionaries back to Tweet objects
+                    tweets = []
+                    for tweet_dict in cache_data['tweets']:
+                        # Parse created_at back to datetime if it's stored as a string
+                        if 'created_at' in tweet_dict and isinstance(tweet_dict['created_at'], str):
+                            try:
+                                tweet_dict['created_at'] = datetime.fromisoformat(tweet_dict['created_at'])
+                            except ValueError:
+                                tweet_dict['created_at'] = datetime.now()
+                        tweets.append(Tweet(**tweet_dict))
+                    # Restore to memory cache and return
+                    self.in_memory_cache[cache_key] = tweets
+                    self.stats["cache_hits"] += 1
+                    return tweets
+                else:
+                    # Cache expired, remove file
+                    os.remove(cache_path)
+            except Exception as e:
+                logger.error(f"Error reading cache file {cache_path}: {e}")
+        return None
+    def _save_to_cache(self, cache_key, tweets):
+        """Save tweets to cache (memory and disk)"""
+        # Save to memory cache
+        self.in_memory_cache[cache_key] = tweets
+        # Convert tweets to dictionaries for JSON serialization
+        tweet_dicts = []
+        for tweet in tweets:
+            tweet_dicts.append({
+                'id': tweet.id,
+                'text': tweet.text,
+                'author': tweet.author,
+                'created_at': tweet.created_at.isoformat() if hasattr(tweet.created_at, 'isoformat') else str(tweet.created_at),
+                'engagement': tweet.engagement,
+                'url': tweet.url
+            })
+        # Save to disk cache
+        cache_path = self._get_cache_path(cache_key)
+        try:
+            with open(cache_path, 'w') as f:
+                json.dump({
+                    'tweets': tweet_dicts,
+                    'timestamp': time.time()
+                }, f)
+        except Exception as e:
+            logger.error(f"Error writing to cache file {cache_path}: {e}")
+    async def get_tweets_from_source(self, source: NewsSource, limit: int = 20, retries: int = 3) -> List[Tweet]:
+        """Get tweets from a specific Twitter source using advanced bypass techniques."""
+        cache_key = f"{source.twitter_handle}_{limit}"
+        # Check cache first
+        cached_tweets = self._get_from_cache(cache_key)
+        if cached_tweets:
+            logger.debug(f"Returning cached tweets for {source.twitter_handle}")
+            return cached_tweets
+        self.stats["requests"] += 1
+        # Extract tweets with retry logic
+        all_attempts = retries + 1
+        tweets = []
+        for attempt in range(all_attempts):
+            try:
+                logger.info(f"Fetching tweets from {source.twitter_handle} (attempt {attempt + 1}/{all_attempts})")
+                # Build path with randomization to avoid caching patterns
+                path = f"/{source.twitter_handle}"
+                params = {
+                    "f": "tweets",  # Filter to tweets only
+                    "r": str(random.randint(10000, 99999))  # Random param to bypass caches
+                }
+                # Get the response using our bypass system
+                response = await self.nitter_bypass.request(path, params)
+                if response.status_code == 200:
+                    # Success - extract tweets
+                    self.stats["success"] += 1
+                    # Parse the HTML
+                    soup = BeautifulSoup(response.text, "html.parser")
+                    # Find tweet containers
+                    tweet_containers = soup.select(".timeline-item")
+                    for container in tweet_containers[:limit]:
+                        try:
+                            # Extract tweet ID from the permalink
+                            permalink_element = container.select_one(".tweet-link")
+                            if not permalink_element:
+                                continue
+                            permalink = permalink_element.get("href", "")
+                            tweet_id = permalink.split("/")[-1]
+                            # Extract tweet text
+                            text_element = container.select_one(".tweet-content")
+                            tweet_text = text_element.get_text().strip() if text_element else ""
+                            # Extract timestamp
+                            time_element = container.select_one(".tweet-date")
+                            timestamp = time_element.find("a").get("title") if time_element and time_element.find("a") else None
+                            if timestamp:
+                                try:
+                                    created_at = datetime.strptime(timestamp, "%d/%m/%Y, %H:%M:%S")
+                                except ValueError:
+                                    created_at = datetime.now()
+                            else:
+                                created_at = datetime.now()
+                            # Extract engagement metrics
+                            stats_container = container.select_one(".tweet-stats")
+                            engagement = {"likes": 0, "retweets": 0, "replies": 0, "views": 0}
+                            if stats_container:
+                                stats = stats_container.select(".icon-container")
+                                for stat in stats:
+                                    stat_text = stat.get_text().strip()
+                                    if "retweet" in stat.get("class", []):
+                                        engagement["retweets"] = self._parse_count(stat_text)
+                                    elif "heart" in stat.get("class", []):
+                                        engagement["likes"] = self._parse_count(stat_text)
+                                    elif "comment" in stat.get("class", []):
+                                        engagement["replies"] = self._parse_count(stat_text)
+                            tweet_url = f"https://x.com/{source.twitter_handle}/status/{tweet_id}"
+                            tweets.append(
+                                Tweet(
+                                    id=tweet_id,
+                                    text=tweet_text,
+                                    author=source.twitter_handle,
+                                    created_at=created_at,
+                                    engagement=engagement,
+                                    url=tweet_url
+                                )
+                            )
+                        except Exception as e:
+                            logger.error(f"Error processing tweet from {source.twitter_handle}: {str(e)}")
+                    # Cache the results
+                    if tweets:
+                        self._save_to_cache(cache_key, tweets)
+                        logger.info(f"Fetched and cached {len(tweets)} tweets from {source.twitter_handle}")
+                    return tweets
+                elif response.status_code == 429:
+                    # Rate limited
+                    self.stats["rate_limits"] += 1
+                    logger.warning(f"Rate limited (429) when fetching tweets from {source.twitter_handle}")
+                    if attempt < retries:
+                        backoff_time = min(30 * (2 ** attempt), 300)  # Exponential backoff, max 5 minutes
+                        logger.info(f"Retrying in {backoff_time}s...")
+                        await asyncio.sleep(backoff_time)
+                    else:
+                        logger.error(f"Failed to fetch tweets from {source.twitter_handle} after {retries} retries: HTTP 429")
+                        return []
+                else:
+                    # Other error
+                    self.stats["errors"] += 1
+                    logger.error(f"Failed to fetch tweets from {source.twitter_handle}: HTTP {response.status_code}")
+                    if attempt < retries:
+                        await asyncio.sleep(5)
+                        continue
+                    else:
+                        return []
+            except Exception as e:
+                self.stats["errors"] += 1
+                logger.error(f"Error fetching tweets from {source.twitter_handle}: {str(e)}")
+                if attempt < retries:
+                    await asyncio.sleep(5)
+                    continue
+        return []  # Return empty list if all retries failed
+    def _parse_count(self, count_text: str) -> int:
+        """Parse count text like '1.2K' into integer value."""
+        try:
+            count_text = count_text.strip()
+            if not count_text:
+                return 0
+            if 'K' in count_text:
+                return int(float(count_text.replace('K', '')) * 1000)
+            elif 'M' in count_text:
+                return int(float(count_text.replace('M', '')) * 1000000)
+            else:
+                return int(count_text)
+        except (ValueError, TypeError):
+            return 0
+    async def get_related_tweets(self, keywords: List[str], days_back: int = 2) -> List[Tweet]:
+        """
+        Get tweets related to specific keywords from trusted news sources only.
+        Uses intelligent batching and failover strategies.
+        """
+        all_tweets = []
+        cutoff_date = datetime.now() - timedelta(days=days_back)
+        # Process sources in smaller batches with smart ordering
+        active_sources = [source for source in self.news_sources if source.is_active]
+        # Sort sources by reliability score (prioritize higher scores)
+        active_sources.sort(key=lambda s: s.reliability_score, reverse=True)
+        # Dynamic batch size - larger when we have fewer sources to optimize throughput
+        source_count = len(active_sources)
+        batch_size = max(1, min(3, 10 // source_count if source_count > 0 else 3))
+        logger.info(f"Collecting tweets from {len(active_sources)} trusted news sources")
+        for i in range(0, len(active_sources), batch_size):
+            batch_sources = active_sources[i:i+batch_size]
+            # Process batch with smart concurrency
+            tasks = []
+            for source in batch_sources:
+                # Adaptive limit based on source reliability
+                fetch_limit = int(50 * min(1.5, source.reliability_score))
+                tasks.append(self.get_tweets_from_source(source, limit=fetch_limit))
+            source_tweets_list = await asyncio.gather(*tasks)
+            # Process batch results
+            batch_tweets = []
+            for source_tweets in source_tweets_list:
+                # Filter tweets by keywords and date
+                for tweet in source_tweets:
+                    if (tweet.created_at >= cutoff_date and
+                        any(keyword.lower() in tweet.text.lower() for keyword in keywords)):
+                        batch_tweets.append(tweet)
+            all_tweets.extend(batch_tweets)
+            # Dynamic delay between batches based on results
+            # If we got fewer tweets than expected, slow down more
+            batch_delay = random.uniform(2.0, 5.0)
+            if len(batch_tweets) < batch_size * 3:  # Fewer than 3 tweets per source
+                batch_delay += random.uniform(3.0, 7.0)  # Add extra delay
+            await asyncio.sleep(batch_delay)
+        # If we have very few results, try with more relaxed filtering
+        if len(all_tweets) < 5 and active_sources:
+            logger.info("Few relevant tweets found, trying more relaxed filtering")
+            # Take top 3 most reliable sources
+            key_sources = active_sources[:min(3, len(active_sources))]
+            tasks = [self.get_tweets_from_source(source, limit=100, retries=5) for source in key_sources]
+            more_tweets_list = await asyncio.gather(*tasks)
+            # Process with more relaxed keyword matching
+            for source_tweets in more_tweets_list:
+                for tweet in source_tweets:
+                    # Use partial keyword matching
+                    if tweet.created_at >= cutoff_date:
+                        for keyword in keywords:
+                            # Split keyword into parts and check if any part matches
+                            keyword_parts = keyword.lower().split()
+                            if any(part in tweet.text.lower() for part in keyword_parts if len(part) > 3):
+                                if tweet.id not in [t.id for t in all_tweets]:
+                                    all_tweets.append(tweet)
+                                break
+        # Sort by recency
+        all_tweets.sort(key=lambda x: x.created_at, reverse=True)
+        logger.info(f"Found {len(all_tweets)} tweets from trusted sources related to keywords: {keywords}")
+        return all_tweets
+    def update_sources(self, sources: List[NewsSource]) -> None:
+        """Update the list of trusted news sources."""
+        self.news_sources = sources
+        # Clear cache when sources are updated
+        self.in_memory_cache.clear()
+        logger.info(f"Updated trusted news sources. New count: {len(sources)}")
+    def get_sources(self) -> List[NewsSource]:
+        """Get the current list of trusted news sources."""
+        return self.news_sources
+    async def close(self):
+        """Clean up resources."""
+        if self.nitter_bypass:
+            await self.nitter_bypass.close()

vercel.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "version": 2,
+  "builds": [
+    {
+      "src": "/app.py",
+      "use": "@vercel/python"
+    }
+  ],
+  "routes": [
+    {
+      "src": "/(.*)",
+      "dest": "/app.py"
+    }
+  ]
+}