BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
CALSCALE:GREGORIAN
PRODID:-//WordPress - MECv7.28.0//EN
X-ORIGINAL-URL:https://stackconf.eu/
X-WR-CALNAME:stackconf
X-WR-CALDESC:Cloud Native Infrastructure Solutions
X-WR-TIMEZONE:Europe/Berlin
BEGIN:VTIMEZONE
TZID:Europe/Berlin
X-LIC-LOCATION:Europe/Berlin
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T030000
RRULE:FREQ=YEARLY;BYMONTH=03;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=4SU
END:STANDARD
END:VTIMEZONE
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-PUBLISHED-TTL:PT1H
X-MS-OLK-FORCEINSPECTOROPEN:TRUE
BEGIN:VEVENT
CLASS:PUBLIC
UID:MEC-3e3dab60e3a76214aed4a502603e8636@stackconf.eu
DTSTART;TZID=Europe/Berlin:20260429T110000
DTEND;TZID=Europe/Berlin:20260429T113000
DTSTAMP:20251125T164247Z
CREATED:20251125
LAST-MODIFIED:20260108
PRIORITY:5
SEQUENCE:4
TRANSP:OPAQUE
SUMMARY:Stop Treating LLMs Like REST APIs
DESCRIPTION:Why do PoCs run smoothly while launch day implodes? Because LLM traffic is a streaming, state-heavy beast that breaks every REST assumption: requests aren’t stateless, payloads snowball with context, and GPU memory melts under token floods. We’ll map the three checkpoints where most projects stall—context explosion, batch backfires, cache chaos—and show how LLM-D’s open-source sharding plus a hybrid NVIDIA/AMD node pool turns each choke point into a green light. You’ll see live before-and-after dashboards, get a YAML ladder you can drop into any cluster, and learn a back-of-the-napkin formula to keep cost per 1 000 tokens under control.\n
URL:https://stackconf.eu/talks/stop-treating-llms-like-rest-apis/
END:VEVENT
END:VCALENDAR