Samstag, 15. Januar 2022

eBPF on ExaCC

 

Recently I had to answer for a customer a quite simple question: 
Which processes suffer from having their memory in swap?
Basically having some memory pages swapped out and the physical memory used for soemething else is not a problem at all. Only if and when these pages are required by the program, it has to wait. This translates the question into a question easier to answer: 

Which processes need to wait for memory pages read from swap (every second).

The processes does not know if they are waiting for a memory page to be read in. It's not a syscall they do. The process only waits. So tools like strace does not provide any information here. 

Luckily there is a quite clever engine for instrumentation (and other useful extension like networking, security, ...) vailable in the modern linux kernel, called eBPF. Brendan Gregg wrote a suite of linux tracing tools. There you can find a tool called swapin (manpage, examples). 

Unfortunately, Oracle doesn't seem to think, performance tools are of any use in Exadata Cloud at Customers (ExaCC) and so bcc/eBPF isn't part of their installation image. Luckily at the end it's a simple Oracle Linux, and in the public repositories the rpms are available. 
I just had to install 
the packages
llvm-private 
python-netaddr 
bcc  python-bcc  bcc-tools 

the packages 
libdtrace-ctf 
kernel-uek-devel matching  uname -r

Unfortunately I didn't find swapin script in /usr/share/bcc/tools, so I had to create it there. 

But now, after these steps, we can answer if having memory in swap on a system is bad at all and which processes are affected. 
(of course, after all these preparations, the symptoms were gone and there is nothing to observe. But now everything is prepared!)