Show HN: GPU-Based Kubernetes HPA for Triton Inference Server

1 points | by uzunenes 12 hours ago

1 comments

uzunenes 12 hours ago
I built this guide after struggling to find a complete tutorial for scaling Triton Inference Server based on GPU metrics.
It covers the full stack: NVIDIA Triton Inference Server + AI Model + DCGM Exporter + Prometheus + Kubernetes HPA.
Happy to answer any questions!