digopala/ai-inference-architecture-healthcare

Hello digopala,

Checked your architecture diagram and other files. Looks great for the initial attempt.

The input part is pretty clear about which protocol will be used to pass inference to ML server. The ML server diagram makes it clear that it will be a Triton inference server. Solid choice and should be done from the beginning as it is a state-of-the-art inference solution. Also good to see choosing Kubernetes from the beginning for potential scalability necessity in the future. Although it is not clear why the replicas field is set to 1.

Unfortunately, I am unclear aboutthe rest of the diagram. I have the following comments and suggestions:

Regarding Diagram

All the arrows should have annotations. For example, the arrow direction on the input side and from model load side give the impression that it indicates data flow. But if that is the case, then why are the output receiver entities indicated while the input entities are not? Who will be the user for this system?
Why Pharma industry has an undirected connection to Kubernetes (which I guess is used to spawn a new inference server instance in a pod for scalability). For consistency and clarity, I suggest annotate all the arrows with the triggering event and give direction

Regarding YAMLs

In k8s.yaml, replicas field is set to 1. If I am not mistaken, that will ensure only one pod will be there (or will be respawned if it fails for some reason). How does that ensure scalability? Is it not doing only auto-start? There should be another HorizontalPodAutoscaler kind configuration file and replicas:<[2,]> for fault tolerance, right?

Thank you,
Good luck with designing your system :)

Spaces:

digopala
/

ai-inference-architecture-healthcare

Running

Feedback JM

Regarding Diagram

Regarding YAMLs