Spaces:

digopala
/

ai-inference-architecture-healthcare

Running

digopala commited on 26 days ago

Commit

e3674cb

verified ·

1 Parent(s): 7145d38

Upload 4 files

Add production-ready AI inference system assets for healthcare architecture

Files changed (5) hide show

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+A_flowchart_in_the_image_illustrates_an_AI_inferen.png filter=lfs diff=lfs merge=lfs -text

A_flowchart_in_the_image_illustrates_an_AI_inferen.png ADDED Viewed

README.md CHANGED Viewed

@@ -1,12 +1,40 @@
 ---
-title: Ai Inference Architecture Healthcare
-emoji: 👁
-colorFrom: pink
-colorTo: gray
-sdk: static
-pinned: false
-license: apache-2.0
-short_description: AI Inference Architecture for Healthcare & LLMs
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# AI Inference Architecture for Healthcare
+This project provides a scalable, production-ready AI inference architecture designed for healthcare and pharmaceutical applications. It integrates Triton Inference Server, FastAPI, Kubernetes, and Torch/ONNX models, allowing for secure, reliable, and fast deployment of AI workloads such as LLMs, image segmentation, or biomedical predictions.
+## Key Features
+- Modular container-based architecture
+- Routing layer using FastAPI or NGINX
+- LLM model support via TorchScript / ONNX
+- Optional user auth, billing hooks, and monitoring
+- Designed for HIPAA-compliant environments
+## Deployment Options
+- **Standalone (Local)**: via `docker-compose.yaml`
+- **Production (Kubernetes)**: via `k8s.yaml`
 ---
+## Quickstart (Docker Compose)
+```bash
+docker compose up --build
+```
+## Kubernetes
+```bash
+kubectl apply -f k8s.yaml
+```
 ---
+## Who is this for?
+Healthcare ML teams, pharma startups, or infrastructure engineers looking to fast-track AI deployment pipelines with production best practices.
+## License
+Apache 2.0

docker-compose.yaml ADDED Viewed

+version: "3.9"
+services:
+  inference:
+    image: nvcr.io/nvidia/tritonserver:23.03-py3
+    ports:
+      - "8000:8000"
+      - "8001:8001"
+    volumes:
+      - ./models:/models
+    command: [
+      "tritonserver",
+      "--model-repository=/models"
+    ]
+  api:
+    image: tiangolo/uvicorn-gunicorn-fastapi:python3.9
+    volumes:
+      - ./app:/app
+    ports:
+      - "8080:80"

k8s.yaml ADDED Viewed

+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: triton-inference
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: triton
+  template:
+    metadata:
+      labels:
+        app: triton
+    spec:
+      containers:
+      - name: triton
+        image: nvcr.io/nvidia/tritonserver:23.03-py3
+        ports:
+        - containerPort: 8000
+        args: ["tritonserver", "--model-repository=/models"]
+        volumeMounts:
+        - mountPath: /models
+          name: model-volume
+      volumes:
+      - name: model-volume
+        emptyDir: {}
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: triton-service
+spec:
+  selector:
+    app: triton
+  ports:
+    - protocol: TCP
+      port: 80
+      targetPort: 8000