Better Living by Changing Less – IncrativeOps

DevOps has always been about dramatic changes to improve IT. You don’t only need to use a different set of tools, you need to change your entire IT culture! It’s all exhausting, really. Worse, this imperative to change never goes away. Will we ever actually be done and “be like Google”? Instead of carrying the flag of “change or die,” this talk proposes an alternate, more practical, sustainable, and comforting approach to improving: IncrativeOps.


Dynamic Image Optimization with imgproxy at Schwarz IT

Images account for 42% of the LCP Element of all websites. But in 2023, still, we are seeing too large images delivered on websites – even on the Stackconf speakers page, I currently see 15 MB size and one 5 MB Image. Even if the techniques to resize and optimize images are available for quite some time – they’re still not used everywhere. In this talk I’ll give you an overview of the capabilities of the open source image optimizer imgproxy. At Schwarz we’re using it a dynamic image optimization engine for our Digital Leaflets delivered in all Lidl & Kaufland countries. Imgproxy is not only resizing images, but also delivering modern image formats like WEBP and AVIF. Ironically, the app itself is scaling up and down as well in STACKIT Appcloud.


How to reduce expenses on monitoring with VictoriaMetrics

Given recent economic changes, cost efficiency has become a top priority for many businesses. This is especially important for monitoring because the nature of telemetry data tends to exponential growth. Many monitoring solutions are now switching their focus to optimize costs. The talk will cover open-source instruments from VictoriaMetrics ecosystem for improving monitoring cost-efficiency.


Keys To An Accessibility Mindset

Accessible design is good design. In this talk, we will discover the importance of web accessibility, and highlight how we all benefit from the accessible design choices all around us. Gain an understanding of the four principles of web accessibility, before exploring three keys for applying those principles naturally to your design and development process. This presentation will include experiences of advocating for and leading company initiatives toward accessibility, as well as practical examples demonstrating the before and after of our keys to an accessibility mindset.


Scaling a Collaboration Service like Nextcloud to 20 Million users

We are heading into a world where the files of most users are hosted by 4 big companies in the US. This is the case for most home users, companies but also education and research institutions. If we want to keep our sovereignty over our data, protect our privacy and prevent vendor lock-in then we need open source self-hosted and federated alternatives. The internet and the web use a distributed and federated architecture. Now we have to make sure that cloud services follow the same model. This talk will cover how this can be implemented in a real world example. It covers how a 20 million user instance of Nextcloud can be scaled across different hosting centers and continents. It will cover high level concepts but also concrete Kubernetes, Ceph and MariaDB clustering setups.



Analyzing Public Conversation using LDA and Topic Modeling, in Python

In this talk, we will build “Choir”. An OSINT (Open-source intelligent) project focused on gathering context-based connections between social profiles using AI models like LDA and topic modeling, written in Python to explain what the world discusses over a specific domain and by high-ranking influencers in that domain and focus on what’s going on at the margins.


Open-Source: Open Choice – A DevOps Guide for OSS Adoption

Choosing the right open-source project to use can be quite challenging – not knowing if it’s going to be the right fit, how it will behave, and if you end up wasting time trying to make it all work. We’ve all been there. But what if I told you there’s a practical way to have a clear understanding of how to incorporate an OSS project in your environment? In this talk, I’m going to speak about the DevOps perspective on open-source and the challenges Infra-focused engineers have with choosing the right project for their environment. As a DevOps Engineer, I’ve seen a lot of things, stumbled upon a lot of non-based decisions, and so will present practical advice on how to choose an OSS project for your dev/prod environment and will talk about the business mindset you should have to evaluate the key indicators based on your needs and specific pain points.


Observing Minecraft

This talk will cover adding observability to your Java Minecraft Server. We’ll go through exporting important metrics and logs from the server to track things like player uptime and efficiency of potential mods. We’ll look into monitoring server health and alerting on issues in real time. Participants should expect to leave knowing some observability basics and how to use the monitoring to understand system health and avoid some potential incidents.




IGNITE: What RomComs didn’t teach me about Incident Management

RomComs are beautifully hilarious in the way they actively misunderstand each other at any given moment and formulaic to a fault. Incidents are not. As an avid reader and consumer of romantic stories and a former SRE, I’ll go over what not to do in incidents from watching way too many RomComs. This talk will cover what if you responded to an incident like a RomCom main character.


IGNITE: A blaming culture is not your fault

A must-have for an incident strategy is blameless collaboration. Tooling and wording are a great start, but people’s mindsets must also be adjusted. Although no one points with the finger officially, we all carry our own history and experience with errors. The presentation outlines key elements to create a blameless and collaborative culture and also tells you  why it is not so easy after all.


IGNITE: How to build and deploy ethical AI features

This talk covers why ethical, open source and local AI systems are needed and how  Nextcloud is able to build and ship several innovative AI features that run completely on prem.


IGNITE: Terraform Practice to Enable Infrastructure Scaling

Terraform is a GREAT tool, but like a lot of other things in life, it has its pitfalls and bad practices. In this ignite talk, I’ll cover the preliminary practice you should think of in regards to Terraform – Structuring its code base – And explain how this decision has an effect on Terraform Infra & teams scaling, as well as a crucial effect on the core behaviour of Terraform itself.


GitOps in Kubernetes

Argo CD is a popular Cloud Native Computing Foundation (CNCF) open source GitOps Kubernetes Operator for declarative configuration on Kubernetes clusters. Argo CD works based on the GitOps pattern of using Git repositories as the source of truth for defining the desired application state. Kubernetes manifests can be specified in several ways, such as Helm, customize, and plain JSON or YAML, among others. Argo CD automates deploying applications to multiple customers by syncing Kubernetes manifests to the target clusters and making sure the clusters are in the desired state.



Gain SRE Superpowers with K8sGPT

Did you spend hours finding production issues on your Kubernetes environments? Were you really upset when you found out that it were trivial things which held you back? In this talk, you will learn about some common issues you find in Kubernetes clusters and how AI can help you troubleshoot them. Finally, you will learn how to gain SRE superpowers with your AI-driven troubleshooting-buddy K8sGPT, who helps you to find problems in your K8s environment and solve them.


Continuous Deployment Workflows

Releasing small, incremental updates to production multiple times a day is the pinnacle of productivity that a software team can achieve. In this talk, I present the main advantages of continuous deployment over traditional release processes, explain the essential components of a continuous deployment infrastructure, and discuss typical challenges as well as strategies to overcome them.


Measuring Reliability in Production

Measuring Reliability in Production uses an example application to describe how to define SLIs and SLOs. It includes an overview of application architecture, a how-to for developing SLOs, and suggestions for implementing SLOs in Cloud Operations. There’s also a focus on how to identify CUJs (Critical User Journeys) and recommendations for implementing metrics to use as SLI and SLO targets.



Cooking up o11y w/ Feature Flagging

Feature flags allow you to enable and disable code without changing or deploying any source code and selectively route traffic to certain users or a percentage of certain users, along with other great tricks. It’s powerful stuff … but when you combine it with observability (the ability to understand the inner workings of your complex systems and other unknown-unknowns), what you get is a supercharged, superpowered version of both. With o11y and feature flags, you and your teams get deep technical and business insights in real-time about how your code is working, what changed with your last deploy, and how changes are impacting different users, apps, or groups in fine-grained detail. This is the best, easiest way to understand your systems as never before. You’ll have to see it to believe it. 👀


Elastic ❤️ Terraform

Elastic, the company behind Elasticsearch, has grown in popularity over the years and so have the configuration options. As of now, we are mostly focusing on Terraform while we have given up on Ansible and others. This talk dives into the “why” and how we are splitting up our providers to give our large user-base the best possible experience. While also retaining our own sanity.


It’s time to rebuild DevOps

It’s almost 15 years since the inception of DevOps. The core value of DevOps was to break down the silos and improve communication to achieve stability, reliability, availability, and security. In the boom of the ecosystem since that point, it sometimes feels like we’ve created more silos and stemmed communication in every way with the tooling we have. What if we take the lessons we’ve learned along the way and try to reimagine DevOps tooling to fulfill the original promise of the DevOps movement? What if we could remove the 200% problem (the need to know a specialist language AND a cloud framework) from our tooling? What if we could focus on delivery in a collaborative manner rather than communicating via a series of handoffs via pull requests? It’s time for a second wave of DevOps tools.
In this talk, Paul is going to reflect on the lessons we’ve learned along the DevOps journey, for example Infrastructure as Code, and talk about the work System Initiative is doing to revolutionize how people collaborate to build and maintain complex infrastructure. System Initiative is the beginning of an ecosystem to create a real-time, multiplayer, multi-modal reinvention of DevOps tooling. System Initiative provides a modern, state of the art approach to infrastructure management that increases productivity with its simulation-based workflow.
Once you see what it’s possible to achieve, you won’t want to settle any longer.


SCS: Buildig Open Source Cloud and Container Infrastructure

Linux is everywhere. Open Source has won! It has not. While Open Source components are all over the place, the big IT players use them to build platforms that are not fully open but designed to lock their users in. The question to ask these days is not: “Are you building on top of open source?”, because everyone is. The question should be: “Do you allow others to rebuild your whole platform?” and “Do you allow others to contribute to it and shape its future?” Sounds utopian? Sovereign Cloud Stack (SCS) tries to do exactly this: Build a network of operators to define common standards together, implement them in a complete, openly developed and fully open source manner and then even collaborate on operating it well — which can be harder than building it. The speaker will discuss the vision behind the the SCS project, how it has build the community and the technology stack, what it has achieved so far and where it will go next.


What the Heck is Edge Computing Anyway?

The Edge is the new frontier of computing possibilities, offering promises, opportunities, and it’s own set of challenges. In this talk, we’ll break down what it is, why it’s awesome, and how it fits into your application architecture. We’ll cover things like:

  • What are the benefits
  • What are the limitations
  • When it makes sense
  • When it doesn’t make sense
  • How to get started


Practical introduction to OpenTelemetry tracing

Tracking a request’s flow across different components in distributed systems is essential. With the rise of microservices, their importance has risen to critical levels. Some proprietary tools for tracking have been used already: Jaeger and Zipkin naturally come to mind. Observability is built on three pillars: logging, metrics, and tracing. OpenTelemetry is a joint effort to bring an open standard to them. Jaeger and Zipkin joined the effort so that they are now OpenTelemetry compatible. In this talk, I’ll describe the above in more detail and showcase a (simple) use case to demo how you could benefit from OpenTelemetry in your distributed architecture.


Architecting an open observability stack

Microservices divide the complexity of the application code. Hence it is easier to debug single services with only a specific usage. On the other side, microservices add complexity to the communication and infrastructure layer. Therefore, when discussing software architecture, only considering the application design is not enough. We need to consider how we manage the complexity of the infrastructure components as well. We learned over the years that it is good practice to define open and flexible software for change. What if we take the same approach with our observability stack? Let’s look at how an observability stack built with open-source tools can help infrastructure teams add more flexibility and prevent vendor lock-in. Let’s find out why OpenTelemetry is so exciting in helping achieve this and how we can use application code with it. The talk presents a concept of how to instrument code based on the application functions and finally also takes a look into different Open Source backends to store and visualize telemetry data like Prometheus and Grafana.


How the Network Protocols You Choose Ultimately Affect Your Applications

In a microservices architecture, there are many components that need to communicate through different technology layers in order to gain the business value we seek. Oftentimes, on each of the layers, we make our choice of communication protocols, which ultimately have a fundamental impact on the system performance, reliability and troubleshooting process. Some of us make these choices after due research, while others may opt for the default configuration in hopes it fits their needs.
The thing is, there are different aspects as to how network protocols can affect the development and maintenance of production services such as cost factors, performance factors, network throughput, traffic security and authentication––the list goes on (really it does…). In order to tackle this one step at a time, we can separate them into three distinct layers: application logic, transport, and system level. In each of these layers, a well-educated decision can then be made to monitor, fine-tune, or even replace a selected protocol in favour of a better one. A special concern may even be when we can’t change the protocol in use by a third party, and yet, do not be alarmed! We still have techniques at hand that can maximize system performance.
In this talk, we’ll walk through some of the most popular protocols used in cloud operations today, and talk about the trade-offs from performance, cost, security, and other perspectives that we need to take into consideration when making protocol decisions based on our SLA and performance requirements to our customers.


How to survive Cloud – An Ops perspective

Cloud Computing sneaks into our daily business more and more. This brought some changes to the ops role as well. Is it even needed at all? Even after years of cloud computing defining the scope of the ops role, and even more the identification of its demand, seems to be a topic of relevance. I’ll share my experience coming from classical ops, following our customers into the cloud. I’ll explain our main challenges and how the scope of our tasks has shifted as part of this process, which finally led us to a well working collaboration between dev and ops.


Bringing Order to Chaos: Make Your Systems More Resilient with Chaos Engineering

Chaos Engineering is a new approach that helps identify & address weaknesses in software systems by intentionally introducing controlled failures. This talk covers principles & practices of chaos engineering, using real-world examples to show how it has improved resiliency, performance & saved costs. You’ll learn how to design & execute chaos experiments, interpret results, and implement chaos engineering within your organization. The goal is to create highly resilient systems that can withstand any challenge in today’s fast-paced digital landscape.


Climbing high — getting started with cloud native security through open source

Security-specific tools are often overlooked until it becomes a requirement, necessity or things have gone terribly wrong. While many organisations will build a security team to address related issues, many smaller organisations and individual contributors do not have this option. This talk is divided into two sections. In the first one, Anais will share the similarities between climbing and the importance of establishing a security-centric mindset. What happens if we do not have security specialists supporting our team? Free-climbing might be an option for experts with years of experience but not for most cluster admins. The second part will go over security-specific tools in the cloud native ecosystem. A live demo will focus on Trivy, an open source tool with 11k+ stars on GitHub. Anais will showcase how we can get started and the benefits of integrating cloud native security tools, such as Trivy, into our existing processes and monitoring stack. The goal is to provide Kubernetes cluster admins and engineers with the tools and knowledge to take ownership of securing their resources without having to become security experts.


Ceph for Public Cloud Workloads

Public clouds were initially popularized under the premise that workloads are dynamic, and that you could easily match available compute resources to the peaks and troughs in your consumption, rather than having to maintain mostly idle buffer capacity to meet peak user demand. However, what has become more apparent is that this isn’t necessarily true when it comes to storage. Typically what is observed in production environments is a continual growth of all data sets, across those that are actively used for decision making or transactional processing, those maintained as training data for AI/ML, or kept for archival purposes, and simply just backups of critical data. During this talk we will discuss how Ceph can be deployed in a cost effective manner adjacent to public clouds, and investigate the financial implications of both approaches.


IGNITE: Are you still doing microservices?

Tongue in cheek talk about the virtues but mainly vices of microservices. And why we should be discussing more interesting topics than the unit of deployment.


IGNITE: Unleashing the Magic: Building SaaS Platforms with Cloud-Native Multi-Tenancy

Multi-tenancy plays a fundamental role in delivering your applications to customers, especially when building SaaS Environments. In this Ignite Talk, you will discover architectural considerations from IaC to GitOps, benefits, and real-world use cases when building such platforms. We will also delve into some challenges and discuss strategies for how to conquer them. Join me for a quick, insightful journey into the wonderland of cloud-native multi-tenancy.



IGNITE: How to collect Telemetry data using

Telemetry is a technology that enables remote data collection and transmission from sensors or instruments, facilitating real-time monitoring and control across diverse industries. In this short session, we will demonstrate how we built a telemetry pipeline in a few hours using


IGNITE: Schrödinger’s backups: how to avoid uncertainty

Sometimes when one has to restore a database from backup it occurs that backups were not created regularly or available backups are corrupted. To avoid this issue every service owner has to restore from backups on a regular basis. But what if you provide a managed database service in the cloud with automated backups? How can you guarantee that backups of your customers’ databases are restorable? In this ignite talk we will share our experience in automated backup and restore testing.


Database Infrastructure with Open Source Kubernetes Database Operators

In this talk, we’ll explore how Kubernetes Operators are revolutionizing the way we manage and deploy database infrastructure in modern, cloud-native environments. During this session, we’ll delve into the key features and benefits of using open source Kubernetes database Operators, such as their ability to automate tasks like backup and recovery, scaling, and monitoring. We’ll also discuss how Operators provide a standardized approach to managing different types of databases, from traditional relational databases to newer NoSQL and cloud-native options.


Integrating cloud native security into the SRE culture

When we are talking about DevSecOps, we often focus on Security for Developers or Security for workload management and deployments. While the discussion between DevOps and SRE continues until the end of time, we can agree that SRE is more focused on the culture and the processes put in place to build reliable and efficient infrastructure for our deployments. If we just adapt security tools into our SRE workflows, we might risk introducing decoupled processes.
This talk will showcase how we can integrate open source security solutions and a security-centric mindset into the SRE culture and our existing monitoring stack. Anaïs Urlichs will first provide an overview of the top security risks that we face during our cloud native infrastructure management and deployments; and then highlight how we may adapt our workflows to become security-centric.


Infrastructure-From-Code and the end of Microservices

Infrastructure-from-Code (IfC) is the newest frontier in cloud development. A novel new approach that’s superseding Infrastructure-as-Code and creating new capabilities and generational productivity gains.
We’ll investigate the 4 emerging approaches to IfC: SDK-based (Ampt, Nitric), in-code annotations based (Klotho), a combination of the two (Encore, Shuttle), and explicitly defined through a new programming language (Wing, DarkLang).
We’ll compare these approaches to the existing generation of tools, discuss their trade-offs and draw parallels to other disciplines that have used similar approaches. How will the wave of open source IfC technologies impact the current technologies and platforms, and will it force organizations to revisit the DevOps movement altogether? Join us and see what you think!


Rolling Updates: Database Version Migrations with minimum Disruption

YDB is an open-source Distributed SQL database available under Apache 2.0 license. It’s easily scalable across thousands of nodes and is known to be always available. Version migration is a natural process for every software system. In majority of use cases YDB is used as a mission critical OLTP database that cannot afford maintenance windows and must remain available during version migrations. In this talk we will briefly describe YDB layered architecture and share some tricks to minimize database unavailability during minor and major versions migrations from YDB server and applications point of view.


Implementing holistic security for containers and Kubernetes with Calico and NeuVector

Zero-trust workload security is a critical aspect of any Kubernetes cluster. In this talk, we will explore how the Calico and NeuVector Container Security Platform can be used together to provide a comprehensive solution for securing your container and Kubernetes. We will cover topics such as IP management, workload segmentation, secure egress access, encryption, as well as vulnerability management, threat detection, and defense.