2023-09-23

changes in SQL real time monitor report in 23c free

I like Oracle SQL real time monitor reports as they contain a huge amount of valuable information and can be viewed and transferred very simple. 
It's content increased over time - and as oracle updates the java script libraries to generate a nice view, sometimes the same report reveals even more details when you open the same old report later! So it was a kind of downer when I saw the amount of content decreasing since 19c! Here some screenshots what changed: 

19c:


21c:

Here the report misses Activity and Metrics. 

23c free: 

This report misses Activity, Metrics and Outlines - compared to 19c


That's quite sad! 
And this information is really missing in the html file - so we can't blame the java script library for this. 

The data is stored as a XML structure in the html report. But to be readable for humans, we need to decode it first  - at least I can't read this directly:



  <script id="fxtmodel" type="text/xml">
   <!--FXTMODEL-->
   <report db_version="19.0.0.0.0" elapsed_time="3.23" cpu_time="3.21" cpu_cores="4" hyperthread="N" con_id="3" con_name="PDB1" timezone_offset="0" packs="2" service_type="0" encode="base64" compress="zlib">
    <report_id><![CDATA[/orarep/sqlmonitor/main%3fsql_exec_id%3d16777216%26sql_exec_start%3d09%3a23%3a2023%2012%3a42%3a10%26sql_id%3d2u4jc5hhwhc7n]]></report_id>
                eAHtffl348aR8O/+K/AxTnZmP9siDl6OxLc6OI4cHWMdSbz79vFRFCTRpkiZBOfI
X79VfaAPNIAGCFIaDbz7MiLQ6Kvu6qrq3eUf0+HjfDaJ5ovhInyaLyLnQ7hYTuaz
vUbwQ7PhLD8vb0dRuNdo9nY8f8drer7jej8G3o9ut9H/xnF26WfDp9Fi9BhG8DE+
hefY9eS2762C38ath4ePD+PObHeHPRVNwk/hGNu57U6n47lt2oQ/1doto9Ei6ien
...

Luckily it claims what it is doing: encode="base64" compress="zlib" And so Franck Pachot already cared about a way to decode it.

I slightly adapted it to my laptops needs:

xmllint --xpath '/html/body/script/report' 19c.html  | grep -v report | base64 -id | \
python3 -c "import zlib,sys;sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))"  > 19c.xml


The missing sections in 21c and 23cfree are


Activity:

   <activity_sampled>
    <activity class="Cpu">1</activity>
  </activity_sampled>
  <activity_detail start_time="09/23/2023 12:42:10" end_time="09/23/2023 12:42:19" first_sample_time="09/23/2023 12:42:10" last_sample_time="09/23/2023 12:42:10" duration="1" sample_interval="1" bucket_interval="1" bucket_count="10" bucket_duration="10" cpu_cores="4" total_cpu_cores="4" hyperthread="N">
    <bucket number="1">
      <activity class="Cpu" line="77">1</activity>
    </bucket>
  </activity_detail>


Metrics:

  <stattype name="metrics" cpu_cores="4" hyperthread="N">
    <stat_info>
      <stat id="1" name="nb_cpu"/>
      <stat id="2" name="nb_sess"/>
      <stat id="3" name="reads" unit="per_sec"/>
      <stat id="4" name="writes" unit="per_sec"/>
      <stat id="5" name="read_kb" unit="bytes_per_sec" factor="1024"/>
      <stat id="6" name="write_kb" unit="bytes_per_sec" factor="1024"/>
      <stat id="7" name="interc_kb" unit="bytes_per_sec" factor="1024"/>
      <stat id="8" name="cache_kb" unit="bytes_per_sec" factor="1024"/>
      <stat id="9" name="pga_kb" unit="bytes" factor="1024"/>
      <stat id="10" name="tmp_kb" unit="bytes" factor="1024"/>
    </stat_info>
    <buckets bucket_interval="1" bucket_count="10" start_time="09/23/2023 12:42:10" end_time="09/23/2023 12:42:19" duration="10">
      <bucket bucket_id="1">
        <stat id="1" value=".26"/>
        <stat id="3" value="494"/>
        <stat id="5" value="4008"/>
        <stat id="7" value="4008"/>
        <stat id="8" value="154224"/>
        <stat id="9" value="12586"/>
      </bucket>
    </buckets>
  </stattype>

Outlines:

        <outline_data>
          <hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint>
          <hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('19.1.0')]]></hint>
          <hint><![CDATA[DB_VERSION('19.1.0')]]></hint>
          <hint><![CDATA[ALL_ROWS]]></hint>
          <hint><![CDATA[OUTLINE_LEAF(@"SEL$3")]]></hint>
          <hint><![CDATA[OUTLINE_LEAF(@"SEL$4")]]></hint>
          <hint><![CDATA[OUTLINE_LEAF(@"SEL$11")]]></hint>
...
  



It would be a quite depressing post when the story ends here. 
But there is hope:

In 23c_free Oracle introduced new parameters for _fix_control
This one is really helpful: 

350691421QKSFM_CBO_35069142use ADVANCED format for plan display in SQL Monitor

With a simple 

alter session set "_fix_control"='35069142:1';

My Report shows much more content now:


Activity, Metrics and Outlines are back and new content is visible also: SQL Analysis. 

This contains of 2 sections: 

SQL Analysis

Line ID
Query Block
Message
 1
SEL$43The query block has 2 cartesian products which may be expensive. Consider adding join conditions or removing the disconnected tables or views.
 13
SEL$E029B2FF / "O"@"SEL$5"The following columns have predicates which preclude their use as keys in index range scan. Consider rewriting the predicates.
  "NAME"
  "TYPE#"
 45
SEL$D0BBB93D / "O"@"SEL$19"The following columns have predicates which preclude their use as keys in index range scan. Consider rewriting the predicates.
  "NAME"
  "TYPE#"
 77
SEL$B2BBC534 / "O"@"SEL$33"The following columns have predicates which preclude their use as keys in index range scan. Consider rewriting the predicates.
  "NAME"
  "TYPE#"
selected


which comes from the XML section 
      <sqla>
        <sec id="1">
          <n><![CDATA[SEL$43]]></n>
          <msg>
            <m li="1"><![CDATA[The query block has 2 cartesian products which may be ]]></m>
            <m li="2"><![CDATA[expensive. Consider adding join conditions or removing the ]]></m>
            <m li="3"><![CDATA[disconnected tables or views.]]></m>
          </msg>
        </sec>
        <sec id="13">
          <n><![CDATA[SEL$E029B2FF / "O"@"SEL$5"]]></n>
          <msg>
            <m li="1"><![CDATA[The following columns have predicates which preclude their ]]></m>
            <m li="2"><![CDATA[use as keys in index range scan. Consider rewriting the ]]></m>
            <m li="3"><![CDATA[predicates.]]></m>
            <m li="4" list="Y"><![CDATA["NAME"]]></m>
            <m li="5" list="Y"><![CDATA["TYPE#"]]></m>
          </msg>
        </sec>
        <sec id="45">
          <n><![CDATA[SEL$D0BBB93D / "O"@"SEL$19"]]></n>
          <msg>
            <m li="1"><![CDATA[The following columns have predicates which preclude their ]]></m>
            <m li="2"><![CDATA[use as keys in index range scan. Consider rewriting the ]]></m>
            <m li="3"><![CDATA[predicates.]]></m>
            <m li="4" list="Y"><![CDATA["NAME"]]></m>
            <m li="5" list="Y"><![CDATA["TYPE#"]]></m>
          </msg>
        </sec>
        <sec id="77">
          <n><![CDATA[SEL$B2BBC534 / "O"@"SEL$33"]]></n>
          <msg>
            <m li="1"><![CDATA[The following columns have predicates which preclude their ]]></m>
            <m li="2"><![CDATA[use as keys in index range scan. Consider rewriting the ]]></m>
            <m li="3"><![CDATA[predicates.]]></m>
            <m li="4" list="Y"><![CDATA["NAME"]]></m>
            <m li="5" list="Y"><![CDATA["TYPE#"]]></m>
          </msg>
        </sec>
      </sqla>



Hint Usage Issues

Line ID
Query Block
Hint
Hint Status
 2
SEL$1materialize Applied Successfully
 34
SEL$15materialize Applied Successfully
 66
SEL$29materialize Applied Successfully

with the XML  
      <hreport>
        <leg>
          <t>3</t>
        </leg>
        <sec id="2">
          <n><![CDATA[SEL$1]]></n>
          <h><![CDATA[materialize]]></h>
        </sec>
        <sec id="34">
          <n><![CDATA[SEL$15]]></n>
          <h><![CDATA[materialize]]></h>
        </sec>
        <sec id="66">
          <n><![CDATA[SEL$29]]></n>
          <h><![CDATA[materialize]]></h>
        </sec>
      </hreport>

We will see how 35069142 evolves over time.




2023-09-10

ExaWatcher custom sampler

Oracles ExaWatcher is a nice tool to sample some OS related measurements and keep them for a short period (normally some days) for later analysis. As its name indicates, it exists only on Oracles Engineered systems. A more general tool for non-Engineered Systems is OSWatcher - similar in many areas, but not in the content of this post. 

ExaWatcher collect a lot of default information about the system. But sometimes (or somewhere) additional details would be fine to be collected. Of course I can always write my own script to be scheduled in cron and then handle the results. But I am lucky: ExaWatcher provides an interface to extend it's collection: cusom samplers can be implemented easily. 

Speaking of easily, the documentation is correct, but somehow a little sparse (Or I'm just not capable of reading it, which is also likely).  

I created a script PsSwap.sh based on my previous post stored in /opt/oracle.ExaWatcher/

To use it, the documentation provides the parameter --customcmd.

-u | --customcmd 'sample_name ;; "custom_command;... " '



To include a custom collection module in the current group.

Example: --customcmd 'Lsl; "/bin/ls -l"'


My first attempt failed with a quite useless error message: 
  
/opt/oracle.ExaWatcher/ExaWatcher.sh --group --start "now" --end "never" --interval 5 --count 360 \
    --command_mode "SELECTED"   --customcmd 'ProcSwap;;"ProcSwap.sh"'

Can't exec "ProcSwap.sh": No such file or directory at /opt/oracle.ExaWatcher//ExaWatcherParserElements.pm line 2086, <gen3> line 24.
[1693909750][2023-09-05 12:29:10][WARNING][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] The custom command "ProcSwap.sh" is not supported by your system. It will be skipped.
It's fine to know it can not be executed, but no reason was given.
But at least I know the file and location, so after changing the code to be slightly more verbose the reason was obvious: ExaWatcher needs the scripts/binaries to execute in the $PATH (which does not contain /opt/oracle.ExaWatcher ) or the full path added. quite obvious once analysed. 
 
In the next iteration which accepted my ProcSwap sampler now I created a sample config file: bx. By analysing this sample file, I want to know how to include my ProcSwap into the default ExaWatcher.conf permanently:

/opt/oracle.ExaWatcher/ExaWatcher.sh --group --start "now" --end "never" --interval 5 --count 360 \
    --command_mode "SELECTED"   --customcmd 'ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"' --createconf bx
the result looks promising:

...
[1693909952][2023-09-05 12:32:32][INFO][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] CCMDInfo: ProcSwap "/opt/oracle.ExaWatcher/ProcSwap.sh". - scalar: 2

[1693909952][2023-09-05 12:32:38][INFO][/opt/oracle.ExaWatcher/ParserExaWatcher.pl][ExaWatcherParserElements::format_custom_CMDs][] ExaWatcher will automatically generate a name for custom commmand "/opt/oracle.ExaWatcher/ProcSwap.sh".
The previous custom command name "ProcSwap" will be replaced by a new name "CustomCMD0_ProcSwap".  
and also the config file contains the expected lines:

...
<Group>
<Start> now
<End> never
<Interval:s> 5
<Count> 360
<CommandMode> SELECTED
<CustomCMD> CustomCMD0_ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"

<RunEnd>

The last steps are easy: Adding the line 
<CustomCMD> ProcSwap;;"/opt/oracle.ExaWatcher/ProcSwap.sh"
to ExaWatcher.conf and restart ExaWatcher.  After ProcSwap.sh is executed once, there is a new directory in /opt/oracle.ExaWatcher/archive/CustomCMD.ExaWatcher/CustomCMD0_ProcSwap and a file there 2023_09_10_10_59_59_CustomCMD0_ProcSwap_<hostname>.dat The header of this file looks promising as it's identical to all the other data collections by ExaWatcher:

############################################################
# Starting Time:        09/10/2023 10:59:59
# Sample Interval(s):   5
# Archive Count:        360
# Collection Module:    CustomCMD0_ProcSwap
# Collection Command:   /opt/oracle.ExaWatcher/ProcSwap.sh
# Misc Info: ############################################################
zzz <09/10/2023 10:59:59> Count:0
         112 kB       1       root systemd
          12 kB     815       root lvmetad  

As always: once the method is clear, it's quite easy!

2023-09-05

Identifying programs which get swapped

Swapping is seen as a nasty effect on modern systems. When the systems performance goes down there is a bunch of scenarios where significant swapping is observed. Unfortunately a root cause is hard to identify. Those processes which get part of their memory paged out are not those which create memory pressure. Also the memory demand can be quite short lived and therefore hard to observe in sample based monitoring. 

Still sometimes I'd like to see at least which processes are affected from paged out memory. If they don't need to access this memory for whatever reason, that doesn't mean any problem at all. But of course, we mostly have to look at those systems which show any kind of issue.

Unfortunately I did not find any tool which shows me a useful number of memory paged out per process, so I had to come up with my own. 
My simple script checks all processes /proc/pid/status "file". There the line with name should help out: 

VmSwap Swapped-out virtual memory size by anonymous
       private pages; shmem swap usage is not included
       (since Linux 2.6.34).  This value is inaccurate;
       see /proc/pid/statm above.

The warning doesn't sound promising, but the explanation given for statm is somehow acceptable (for me):

Some of these values are inaccurate because of a kernel- internal scalability optimization. If accurate values are required, use /proc/pid/smaps or /proc/pid/smaps_rollup instead, which are much slower but provide accurate, detailed information.

For my purpose that's good enough! I need an overview and doesn't want to put even more load on a system by parsing all smaps entries.

So my script is quite short:

(echo "        SWAP        PID       USER PROGRAM"
echo
ps -ewo pid,user,comm \
  | while read -r pid user cmd ; do
  VmSwap=$(awk ' /VmSwap/ { print $2 } '  /proc/${pid}/status 2>/dev/null)
  if [ ${VmSwap:-0} -gt 0 ]; then 
    printf "%12d kB %7d %10s %s \n" "${VmSwap:-0}" "$pid" "$user" "$cmd"
  fi
done | sort -n
echo
echo "        SWAP        PID       USER PROGRAM"
echo )
and the result is simple, but useful:

        SWAP        PID       USER PROGRAM
 
       55640 kB   51304       root crsd.bin
       59068 kB  385709     oracle ora_lgwr_zzz1
       60672 kB   94092     oracle ora_lg04_zzz3
       97668 kB  396323       root java
      115136 kB  385731     oracle ora_lg00_zzz1
      119728 kB  385879     oracle ora_mmon_zzz1
      121484 kB  385841     oracle ora_lg04_zzz1
      185296 kB   14656       root ohasd.bin
      431344 kB   53850       grid java
      564416 kB   19273     oracle java

        SWAP        PID       USER PROGRAM
Again, those processes are not responsible for swapping, they are just victims. But for me I try to see the symptoms bright and clear, and this is one possibility.