BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
CALSCALE:GREGORIAN
PRODID:-//WordPress - MECv7.28.0//EN
X-ORIGINAL-URL:https://stackconf.eu/
X-WR-CALNAME:stackconf
X-WR-CALDESC:Cloud Native Infrastructure Solutions
X-WR-TIMEZONE:Europe/Berlin
BEGIN:VTIMEZONE
TZID:Europe/Berlin
X-LIC-LOCATION:Europe/Berlin
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T030000
RRULE:FREQ=YEARLY;BYMONTH=03;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=4SU
END:STANDARD
END:VTIMEZONE
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-PUBLISHED-TTL:PT1H
X-MS-OLK-FORCEINSPECTOROPEN:TRUE
BEGIN:VEVENT
CLASS:PUBLIC
UID:MEC-16b0ad329cb228a9007353d31da0dad5@stackconf.eu
DTSTART;TZID=Europe/Berlin:20260428T091500
DTEND;TZID=Europe/Berlin:20260428T094500
DTSTAMP:20251125T115403Z
CREATED:20251125
LAST-MODIFIED:20260327
PRIORITY:5
SEQUENCE:5
TRANSP:OPAQUE
SUMMARY:Combining Kubernetes and vLLM to Deliver Scalable, Distributed Inference with llm-d
DESCRIPTION:Effectively managing and scaling modern AI applications requires rethinking how we use Kubernetes, as traditional load balancing and scheduling fall short for diverse inference workloads. This session introduces llm-d, a joint open-source, Kubernetes-native framework designed specifically for distributed LLM inference. We will explore the core challenges of scalability, reliability, and hardware mapping in AI, and demonstrate how llm-d solves them through intelligent inference scheduling and a pluggable architecture.\nBy moving beyond standard Kubernetes primitives, llm-d optimizes performance across various hardware accelerators while avoiding vendor lock-in. We will dive into the lifecycle of an inference request and explore advanced production techniques, including precise KV-cache-aware routing and Prefill-Decode disaggregation. Join us to learn how llm-d’s specialized worker topologies maximize GPU utilization and how this collaborative project is reshaping the future of high-performance, cost-effective AI deployments.\n
URL:https://stackconf.eu/talks/cloud-native-model-serving-vllms-lifecycle-in-kubernetes/
END:VEVENT
END:VCALENDAR