Posts mit dem Label swap werden angezeigt. Alle Posts anzeigen
Posts mit dem Label swap werden angezeigt. Alle Posts anzeigen

2023-09-10

ExaWatcher custom sampler

Oracles ExaWatcher is a nice tool to sample some OS related measurements and keep them for a short period (normally some days) for later analysis. As its name indicates, it exists only on Oracles Engineered systems. A more general tool for non-Engineered Systems is OSWatcher - similar in many areas, but not in the content of this post. 

ExaWatcher collect a lot of default information about the system. But sometimes (or somewhere) additional details would be fine to be collected. Of course I can always write my own script to be scheduled in cron and then handle the results. But I am lucky: ExaWatcher provides an interface to extend it's collection: cusom samplers can be implemented easily. 

Speaking of easily, the documentation is correct, but somehow a little sparse (Or I'm just not capable of reading it, which is also likely).  

I created a script PsSwap.sh based on my previous post stored in /opt/oracle.ExaWatcher/

To use it, the documentation provides the parameter --customcmd.

-u | --customcmd 'sample_name ;; "custom_command;... " '



To include a custom collection module in the current group.

Example: --customcmd 'Lsl; "/bin/ls -l"'


My first attempt failed with a quite useless error message: 
  
/opt/oracle.ExaWatcher/ExaWatcher.sh --group --start "now" --end "never" --interval 5 --count 360 \
    --command_mode "SELECTED"   --customcmd 'ProcSwap;;"ProcSwap.sh"'

Can't exec "ProcSwap.sh": No such file or directory at /opt/oracle.ExaWatcher//ExaWatcherParserElements.pm line 2086, <gen3> line 24.
[1693909750][2023-09-05 12:29:10][WARNING][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] The custom command "ProcSwap.sh" is not supported by your system. It will be skipped.
It's fine to know it can not be executed, but no reason was given.
But at least I know the file and location, so after changing the code to be slightly more verbose the reason was obvious: ExaWatcher needs the scripts/binaries to execute in the $PATH (which does not contain /opt/oracle.ExaWatcher ) or the full path added. quite obvious once analysed. 
 
In the next iteration which accepted my ProcSwap sampler now I created a sample config file: bx. By analysing this sample file, I want to know how to include my ProcSwap into the default ExaWatcher.conf permanently:

/opt/oracle.ExaWatcher/ExaWatcher.sh --group --start "now" --end "never" --interval 5 --count 360 \
    --command_mode "SELECTED"   --customcmd 'ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"' --createconf bx
the result looks promising:

...
[1693909952][2023-09-05 12:32:32][INFO][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] CCMDInfo: ProcSwap "/opt/oracle.ExaWatcher/ProcSwap.sh". - scalar: 2

[1693909952][2023-09-05 12:32:38][INFO][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] ExaWatcher will automatically generate a name for custom commmand "/opt/oracle.ExaWatcher/ProcSwap.sh".
The previous custom command name "ProcSwap" will be replaced by a new name "CustomCMD0_ProcSwap".  
and also the config file contains the expected lines:

...
<Group>
<Start> now
<End> never
<Interval:s> 5
<Count> 360
<CommandMode> SELECTED
<CustomCMD> CustomCMD0_ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"

<RunEnd>

The last steps are easy: Adding the line 
<CustomCMD> ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"
to ExaWatcher.conf and restart ExaWatcher.  After ProcSwap.sh is executed once, there is a new directory in /opt/oracle.ExaWatcher/archive/CustomCMD.ExaWatcher/CustomCMD0_ProcSwap and a file there 2023_09_10_10_59_59_CustomCMD0_ProcSwap_<hostname>.dat The header of this file looks promising as it's identical to all the other data collections by ExaWatcher:

############################################################
# Starting Time:        09/10/2023 10:59:59
# Sample Interval(s):   5
# Archive Count:        360
# Collection Module:    CustomCMD0_ProcSwap
# Collection Command:   /opt/oracle.ExaWatcher/ProcSwap.sh
# Misc Info: ############################################################
zzz <09/10/2023 10:59:59> Count:0
         112 kB       1       root systemd
          12 kB     815       root lvmetad  

As always: once the method is clear, it's quite easy!

2023-09-05

Identifying programs which get swapped

Swapping is seen as a nasty effect on modern systems. When the systems performance goes down there is a bunch of scenarios where significant swapping is observed. Unfortunately a root cause is hard to identify. Those processes which get part of their memory paged out are not those which create memory pressure. Also the memory demand can be quite short lived and therefore hard to observe in sample based monitoring. 

Still sometimes I'd like to see at least which processes are affected from paged out memory. If they don't need to access this memory for whatever reason, that doesn't mean any problem at all. But of course, we mostly have to look at those systems which show any kind of issue.

Unfortunately I did not find any tool which shows me a useful number of memory paged out per process, so I had to come up with my own. 
My simple script checks all processes /proc/pid/status "file". There the line with name should help out: 

VmSwap Swapped-out virtual memory size by anonymous
       private pages; shmem swap usage is not included
       (since Linux 2.6.34).  This value is inaccurate;
       see /proc/pid/statm above.

The warning doesn't sound promising, but the explanation given for statm is somehow acceptable (for me):

Some of these values are inaccurate because of a kernel- internal scalability optimization. If accurate values are required, use /proc/pid/smaps or /proc/pid/smaps_rollup instead, which are much slower but provide accurate, detailed information.

For my purpose that's good enough! I need an overview and doesn't want to put even more load on a system by parsing all smaps entries.

So my script is quite short:

(echo "        SWAP        PID       USER PROGRAM"
echo
ps -ewo pid,user,comm \
  | while read -r pid user cmd ; do
  VmSwap=$(awk ' /VmSwap/ { print $2 } '  /proc/${pid}/status 2>/dev/null)
  if [ ${VmSwap:-0} -gt 0 ]; then 
    printf "%12d kB %7d %10s %s \n" "${VmSwap:-0}" "$pid" "$user" "$cmd"
  fi
done | sort -n
echo
echo "        SWAP        PID       USER PROGRAM"
echo )
and the result is simple, but useful:

        SWAP        PID       USER PROGRAM
 
       55640 kB   51304       root crsd.bin
       59068 kB  385709     oracle ora_lgwr_zzz1
       60672 kB   94092     oracle ora_lg04_zzz3
       97668 kB  396323       root java
      115136 kB  385731     oracle ora_lg00_zzz1
      119728 kB  385879     oracle ora_mmon_zzz1
      121484 kB  385841     oracle ora_lg04_zzz1
      185296 kB   14656       root ohasd.bin
      431344 kB   53850       grid java
      564416 kB   19273     oracle java

        SWAP        PID       USER PROGRAM
Again, those processes are not responsible for swapping, they are just victims. But for me I try to see the symptoms bright and clear, and this is one possibility.