Monitoring Proxmox VE deployments with Elastic Observability

In this blog post, you will learn how to leverage Elastic Observability to monitor Proxmox VE and the software running on top of it, both in the form of Linux Containers (LXCs) and Virtual Machines (VMs).

Why use Elastic Observability with Proxmox?

Here at Elastic, we are passionate about efficiently managing and monitoring infrastructure and applications. Many of us have fun playing with home labs, oftentimes running Proxmox VE, a powerful open-source virtualization platform used to run virtual machines and Linux Containers (LXCs) with ease. While Proxmox provides robust tools for managing virtualized resources, gaining deep insights into the performance and health of your LXCs, VMs, and hosts requires a comprehensive monitoring solution. This blog post will guide you through leveraging the power of Elastic Observability, in conjunction with Elastic Agent, to effectively monitor your Proxmox VE deployment, ensuring optimal performance and proactive issue resolution thanks to Kibana Alerts.

The homelab setup

Our homelab setup centers around an Intel N100 mini PC, serving as the host for Proxmox VE. This setup is simple and minimal, yet effective for showcasing a few interesting capabilities. On top of this mini PC, we run several Linux Containers (LXCs) for various services, along with a dedicated virtual machine for Home Assistant.

Elastic Agent installation and configuration

Before beginning, it is worth noting that there are numerous ways to install and configure the Elastic Agent. For the sake of simplicity, we will showcase a setup in which only one instance of the Elastic Agent is running on the host machine. The Elastic Agent reports to an Elastic Cloud Observability deployment and is managed via Fleet, which makes it tremendously easy to upgrade and re-configure it whenever needed.

Diving into the host

Kibana offers various panes that make it nice and easy to learn about a system's health at a quick glance.

As a first step, let's take a look at the

Infrastructure > Hosts

page in Kibana:

Here we can see various information about our Proxmox VE host (i.e. the mini PC). The top processes running on it are presented, including processes running in LXCs such as

pia-daemon

. We can also see a

kvm

process, specifically running a Home Assistant virtual machine, and a Proxmox

pve-firewall

process.

Let's now take a look at

Universal Profiling > Flamegraph

. This graph shows how much CPU time is consumed by different stack traces from processes running on the host system. You can drill down into specific processes using the search bar at the top. For instance, you can filter by

kvm

to only see information regarding this specific process.

The Observability AI Assistant

All the Kibana panes we visited so far have proved to be highly interesting, but they struggle to answer urgent questions such as:

did anything happen in our mini PC recently?
was there any significant change in functionality?
is there any precious information hidden among the thousands of data points collected?

The Elastic Observability AI Assistant helps us by answering these questions in natural language. By default, on Elastic Cloud, it uses the Elastic-managed LLM connector, which means users do not need to configure anything to get started with it. It just works!

Let's go to the

Observability > AI Assistant

pane in Kibana and let's try to ask a generic prompt such as: "please give me an overview of the health of my

prox

host".

Let's then wait a minute so that it can dig into the data... et voilà, here comes lots of relevant information in the form of graphs and natural language explanations. The Observability AI Assistant understood our question, went through all the data for our Proxmox host, ran data analytics on it, and reported back in a matter of seconds!

Alerting upon disruption with Kibana Alerts

As a final step, let's try to define a Kibana Alert to help us understand whether our host is overloaded. Let's head to

Observability > Alerts > Rules

and create a new rule. We will create a Custom Threshold rule that will fire if CPU usage for the host is higher than 80% on average for the last 15 minutes. Kibana will send us an email in case the rule fires. The rule is also configured to fire if no data appears for the last 15 minutes, which is extremely helpful as it would imply the presence of some issues to be debugged: broken network or no electricity in the house, a faulty Agent deployment, or even a hardware issue with the mini PC.

Conclusion

In this blog post we showcased how to effectively use the Elastic Stack to monitor Proxmox VE deployments. If you would like to try out such a setup first-hand, you are more than welcome to enjoy Elastic Cloud's 14-days free trial.

In future blog posts, we will investigate how to dig deeper into LXCs and VMs to gather even more information from our home lab and create more tailored alerts. Stay tuned!