An Essential Guide to Kubernetes Observability Challenges with Pixie

Photo by Lukas Blazek on Unsplash

The observability of decentralized systems has always been a challenge. Handling latency, distributed transactions, failures, etc. has become increasingly complex. The more abstract a decentralized system is, the harder it is to reason, debug, and troubleshoot.

Debugging on K8 is difficult

The main reason that makes observability of Kubernetes so difficult is the volatile and dynamic nature of workloads and resources. Instead of dealing with a single server, we are now dealing with an unknown number of servers (due to autocalls). Rather than having a monolithic application, we now have multiple distributed services. The same goes for databases, which often reside outside the cluster.

Let’s say you make an HTTP(s) call to an API running on a Kubernetes cluster hosted on a cloud provider. Here is a simplified sequence diagram showing the critical points where

At any point in this communication chain, things can go wrong, performance can degrade, security issues can arise, and so on. Knowledge of what is happening on the cluster and detailed information on each step of the communication chain are essential for operational performance.

Where to watch

Now we know what to observe, but the question is how and where to place our points of observability, the gateways of insight.

There are several options:

  • Embed observability in service and code. This brings a high degree of control but is cumbersome to maintain and not scalable.
  • Use a sidecar pattern to inject observability logic into each pod. Better, but it can lead to performance issues and is difficult to scale, as some different workloads require different metrics which can change over time.
  • Use low-level system calls to monitor usage of common protocols, stdout and stderr. This means installing something on the cluster itself, which is better, but often means giving that something extended privileges and renting Inge in Linux Kernel.
  • Use low-level system calls with eBPF probes. This brings high scalability and low overhead.

Combine that with exporting larger metrics into Prometheus and you’re good to go.

Underlying technology

Such a level of granular observability is possible thanks to eBPF (Extended Barkley Packet Filter). A protocol that makes the kernel programmable in a safe and efficient way.

eBPF is a breakthrough technology from the Linux kernel that can run sandboxed programs in an operating system kernel. It is used to safely and efficiently extend kernel capabilities without the need to modify kernel source code or load kernel modules. To learn more about eBPF, visit Presentation of the eBPF. Source: https://ebpf.io/what-is-ebpf/

here is a good video with Liz Rice explaining eBPF in detail.

The diagram below shows how eBPF works at a high level

picture

Use cases and demo

Earlier we saw a diagram with an example of traffic flow in Kubernetes. Each step of this traffic should yield valuable insights into our workloads.

Here is a list of typical information that will be of interest to both Dev and Ops.

  • how pods work
  • what is the latency between the different calls
  • what is http payload
  • how does the cluster behave under load

In the demo part, we will look at HTTP traffic on a sample application.

Pixie CLI comes with pre-built demo applications that we can install directly from the command line. However, these demos take a long time to load, instead we will be using a different app.

Preconditions

To follow the demo, you will need to install the following components:

  • docker-engine
  • minikube-cluster
  • kubectl
  • closed off

Install Pixie

Pixie is an open source observability tool for Kubernetes applications. Pixie uses eBPF to automatically capture telemetry data without the need for manual instrumentation.

We will choose the docker option for Pixie CLI to minimize system clutter.

alias px="docker run -i --rm -v ${HOME}/.pixie:/root/.pixie pixielabs/px"

Installing Minikube

Pixie currently only supports minikube, the following installation instructions are for Debian Linux Other installation instructions are available at Minikube page

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.debsudo dpkg -i minikube_latest_amd64.deb

Start minikube

This will start minikube with a KVM driver.

minikube start --driver=kvm2 --cni=flannel --cpus=4 --memory=8000 -p=pixie-cluster

If you are using Windows/MacOs, use --driver=hyperkit the option

Create an account with Pixie Cloud

It is possible to self-host Pixie, but for demonstration purposes we will create a free account to access the Metrics UI.

px auth login

Get the deployment key

px deploy-key create
export PIXIE_DEPLOY_KEY=

Install Pixie on the cluster

helm install pixie pixie-operator/pixie-operator-chart --set deployKey=$PIXIE_DEPLOY_KEY --set clusterName=pixie-cluster --namespace pl --create-namespace

Installation may take a few minutes.

Install Kuard

Kuard is a K8s demo application from the book “Kubernetes Up and Running”

kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard

Once the pod is ready, forward the port and go to the web UI

kubectl port-forward kuard 8080:8080
open http://localhost:8080/
picture

Explore data

It’s possible to run a query directly from a command line, but we’ll jump straight to a live UI.

Move towards https://work.withpixie.ai/live/

From cluster menu select your cluster

picture

Click the script drop-down menu and select http/data.

picture

In destination filter type kuard to filter only traffic to our pod.

picture

Refresh the kuard page multiple times and run the script again using RUN button on the right side of Pixie UI.

Feel free to explore Pixie’s user interface further and find the metrics that interest you.

Architectural concerns

If you are considering Pixie for your workloads, there may be architectural considerations you want to address. Here are some facts about how Pixie works that might help solve some of them.

  • The data stays on the cluster, you can decide if you want to export anything, for example to Prometheus metrics.
  • Data is retained for 24h and the footprint is small enough to work well on edge devices.
  • Queries are extensible and can be written in PiXL language (a derivative of Python).
  • Somewhere on the roadmap is integration with Open telemetry to enable seamless data export between well-known tools like Prometheus or Jaeger.

Learn more

Pixie is a CNCF sandbox project.

picture

To learn more about Pixie, check out their Web page with more examples and in-depth explanations along with their Github repository.

Also posted here.

Comments are closed.