2023-09-05

Identifying programs which get swapped

Swapping is seen as a nasty effect on modern systems. When the systems performance goes down there is a bunch of scenarios where significant swapping is observed. Unfortunately a root cause is hard to identify. Those processes which get part of their memory paged out are not those which create memory pressure. Also the memory demand can be quite short lived and therefore hard to observe in sample based monitoring. 

Still sometimes I'd like to see at least which processes are affected from paged out memory. If they don't need to access this memory for whatever reason, that doesn't mean any problem at all. But of course, we mostly have to look at those systems which show any kind of issue.

Unfortunately I did not find any tool which shows me a useful number of memory paged out per process, so I had to come up with my own. 
My simple script checks all processes /proc/pid/status "file". There the line with name should help out: 

VmSwap Swapped-out virtual memory size by anonymous
       private pages; shmem swap usage is not included
       (since Linux 2.6.34).  This value is inaccurate;
       see /proc/pid/statm above.

The warning doesn't sound promising, but the explanation given for statm is somehow acceptable (for me):

Some of these values are inaccurate because of a kernel- internal scalability optimization. If accurate values are required, use /proc/pid/smaps or /proc/pid/smaps_rollup instead, which are much slower but provide accurate, detailed information.

For my purpose that's good enough! I need an overview and doesn't want to put even more load on a system by parsing all smaps entries.

So my script is quite short:

(echo "        SWAP        PID       USER PROGRAM"
echo
ps -ewo pid,user,comm \
  | while read -r pid user cmd ; do
  VmSwap=$(awk ' /VmSwap/ { print $2 } '  /proc/${pid}/status 2>/dev/null)
  if [ ${VmSwap:-0} -gt 0 ]; then 
    printf "%12d kB %7d %10s %s \n" "${VmSwap:-0}" "$pid" "$user" "$cmd"
  fi
done | sort -n
echo
echo "        SWAP        PID       USER PROGRAM"
echo )
and the result is simple, but useful:

        SWAP        PID       USER PROGRAM
 
       55640 kB   51304       root crsd.bin
       59068 kB  385709     oracle ora_lgwr_zzz1
       60672 kB   94092     oracle ora_lg04_zzz3
       97668 kB  396323       root java
      115136 kB  385731     oracle ora_lg00_zzz1
      119728 kB  385879     oracle ora_mmon_zzz1
      121484 kB  385841     oracle ora_lg04_zzz1
      185296 kB   14656       root ohasd.bin
      431344 kB   53850       grid java
      564416 kB   19273     oracle java

        SWAP        PID       USER PROGRAM
Again, those processes are not responsible for swapping, they are just victims. But for me I try to see the symptoms bright and clear, and this is one possibility. 

Keine Kommentare: