Spaces:

digopala
/

ai-inference-architecture-healthcare

Running

App Files Files Community

digopala commited on 15 days ago

Commit

e6dc6ea

verified ·

1 Parent(s): 701495c

Upload 4 files

Browse files

Files changed (4) hide show

README.md +33 -38
docker-compose.yaml +13 -13
hpa.yaml +18 -0
k8s.yaml +12 -18

README.md CHANGED Viewed

@@ -1,58 +1,53 @@
----
-title: AI Inference Architecture for Healthcare
-emoji: 🧠
-colorFrom: blue
-colorTo: green
-sdk: static
-app_file: index.html
-pinned: false
-tags:
-  - healthcare
-  - docker
-  - fastapi
-  - kubernetes
-  - triton-inference-server
-  - llm-inference
-  - production-ready
----
 # AI Inference Architecture for Healthcare
-This project provides a scalable, production-ready AI inference architecture designed for healthcare and pharmaceutical applications. It integrates Triton Inference Server, FastAPI, Kubernetes, and Torch/ONNX models, allowing for secure, reliable, and fast deployment of AI workloads such as LLMs, image segmentation, or biomedical predictions.
-## Key Features
-- Modular container-based architecture
-- Routing layer using FastAPI or NGINX
-- LLM model support via TorchScript / ONNX
-- Optional user auth, billing hooks, and monitoring
-- Designed for HIPAA-compliant environments
-## Deployment Options
-- **Standalone (Local)**: via `docker-compose.yaml`
-- **Production (Kubernetes)**: via `k8s.yaml`
----
-## Quickstart (Docker Compose)
 ```bash
 docker compose up --build
 ```
-## Kubernetes
 ```bash
 kubectl apply -f k8s.yaml
 ```
----
-## Who is this for?
-Healthcare ML teams, pharma startups, or infrastructure engineers looking to fast-track AI deployment pipelines with production best practices.
-## License
-Apache 2.0

 # AI Inference Architecture for Healthcare
+This project provides a scalable, production-ready AI inference architecture designed for healthcare and pharmaceutical applications. It integrates Triton Inference Server, FastAPI, and Kubernetes to support high-throughput model inference.
+## 🚀 Key Features
+- Modular container-based architecture with FastAPI gateway
+- Supports NLP and CV models with optional preprocessing
+- Inference via Triton Inference Server using ONNX or TorchScript models
+- GitHub Actions-powered CI/CD pipeline to auto-deploy model updates
+- Kubernetes-based pod management, autoscaling, and volume mounting
+- Full observability stack: Prometheus + Grafana for metrics and monitoring
+- Compliant with HIPAA-aligned standards: secure APIs, logging, encryption
+## 🧱 Architecture Overview
+```
+Healthcare/Pharma Clients → FastAPI Gateway → Optional Preprocessor → Triton Pod
+       ↓                        ↓                            ↓             ↓
+ Model Registry ← GitHub CI/CD Pipeline ← Kubernetes ← Monitoring (Prometheus + Grafana)
+```
+## ⚙️ Deployment Options
+### ▶️ Local (Docker Compose)
 ```bash
 docker compose up --build
 ```
+### ☸️ Kubernetes (Production)
 ```bash
 kubectl apply -f k8s.yaml
+kubectl apply -f hpa.yaml
 ```
+## 📦 Model Lifecycle
+1. Train model locally or in pipeline (e.g., PyTorch/ONNX)
+2. Push model to GitHub repository
+3. GitHub Actions CI/CD triggers and pushes model to Model Registry
+4. Kubernetes mounts model volume into Triton pod
+5. Triton automatically reloads model
+## 🔍 Monitoring and Observability
+- Metrics via Prometheus sidecar scraping port 8002 on Triton pod
+- Dashboards in Grafana track latency, throughput, failures
+## 🧪 Sample Inference Request
+```bash
+curl -X POST http://localhost:8000/infer   -H "Content-Type: application/json"   -d '{"input": "Patient data or image here"}'
+```

docker-compose.yaml CHANGED Viewed

@@ -1,19 +1,19 @@
-version: "3.9"
 services:
-  inference:
-    image: nvcr.io/nvidia/tritonserver:23.03-py3
     ports:
       - "8000:8000"
       - "8001:8001"
     volumes:
       - ./models:/models
-    command: [
-      "tritonserver",
-      "--model-repository=/models"
-    ]
-  api:
-    image: tiangolo/uvicorn-gunicorn-fastapi:python3.9
-    volumes:
-      - ./app:/app
-    ports:
-      - "8080:80"

+version: '3.8'
 services:
+  fastapi:
+    image: tiangolo/uvicorn-gunicorn-fastapi:python3.9
+    ports:
+      - "8000:80"
+    volumes:
+      - ./app:/app
+  triton:
+    image: nvcr.io/nvidia/tritonserver:22.10-py3
+    command: ["tritonserver", "--model-repository=/models"]
     ports:
       - "8000:8000"
       - "8001:8001"
+      - "8002:8002"
     volumes:
       - ./models:/models

hpa.yaml ADDED Viewed

	@@ -0,0 +1,18 @@

+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: triton-hpa
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: triton-deployment
+  minReplicas: 1
+  maxReplicas: 5
+  metrics:
+  - type: Resource
+    resource:
+      name: cpu
+      target:
+        type: Utilization
+        averageUtilization: 70

k8s.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
-  name: triton-inference
 spec:
   replicas: 1
   selector:
@@ -11,28 +11,22 @@ spec:
     metadata:
       labels:
         app: triton
     spec:
       containers:
       - name: triton
-        image: nvcr.io/nvidia/tritonserver:23.03-py3
         ports:
         - containerPort: 8000
-        args: ["tritonserver", "--model-repository=/models"]
         volumeMounts:
-        - mountPath: /models
-          name: model-volume
       volumes:
       - name: model-volume
-        emptyDir: {}
----
-apiVersion: v1
-kind: Service
-metadata:
-  name: triton-service
-spec:
-  selector:
-    app: triton
-  ports:
-    - protocol: TCP
-      port: 80
-      targetPort: 8000

 apiVersion: apps/v1
 kind: Deployment
 metadata:
+  name: triton-deployment
 spec:
   replicas: 1
   selector:
     metadata:
       labels:
         app: triton
+      annotations:
+        prometheus.io/scrape: "true"
+        prometheus.io/port: "8002"
     spec:
       containers:
       - name: triton
+        image: nvcr.io/nvidia/tritonserver:22.10-py3
+        args: ["tritonserver", "--model-repository=/models"]
         ports:
         - containerPort: 8000
+        - containerPort: 8001
+        - containerPort: 8002
         volumeMounts:
+        - name: model-volume
+          mountPath: /models
       volumes:
       - name: model-volume
+        persistentVolumeClaim:
+          claimName: model-pvc