2020-06-23

sampling everything

Oracle ASH is a great tool! 
It's on by default, and (when properly licensed) can give a lot of information about a system.
Unfortunately there are some drawbacks. One is the limited information for users experience, when part of the work is spent outside of the instance (and it's subsystems like disk-IO). Normally this part is spent on application side. 
Oracle indicates this with wait event SQL*Net message from client. As this wait event belongs to the wait class idle, it's not visible in ASH for a simple reason: ASH stands for Active Session History.  And idle just doesn't count as active. 
In my previous post I shortly explained why idle stati can be of some interest from users experience perspective. 

There is a parameter which transforms ASH into ALL Session History:

_ash_sample_all = TRUE

With this undocumented (and unsupported) setting, all sessions (not only those not idle) are sampled and the sample are written into ASH memory ring buffer. 
(There are more ash related undocumented parameters

On my sandbox system a small test created these entries:

SAMPLE_ID SQL_ID EVENT SEQ# WAIT_CLASS
485161   ON CPU 8  
485162 30kanbhk5wg4m SQL*Net message from client 32 Idle
485163 30kanbhk5wg4m SQL*Net message from client 32 Idle
485164 817dp83nj72zp ON CPU 32  
485165   SQL*Net message from client 36 Idle
485166   SQL*Net message from client 36 Idle


This corresponds quite well with the matching tracefile (filtered for SQL*Net message form client

WAIT #139691636527808: nam='SQL*Net message from client' ela= 22382 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=740940045
WAIT #139691636521672: nam='SQL*Net message from client' ela= 217489 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=741185838
WAIT #139691636521672: nam='SQL*Net message from client' ela= 11484 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=741200014
WAIT #139691637134592: nam='SQL*Net message from client' ela= 2241071 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=743459220
WAIT #139691636521672: nam='SQL*Net message from client' ela= 11994 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=743475271
WAIT #0: nam='SQL*Net message from client' ela= 2223365 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=745699337
  
In this specific case we can match the event of sample_id 485162 and 485163 (it is the same event, as seq# doesn't change) matches the 4th line of the trace excerpt (the one which lasts 2241071 ). 
The event of samles 485165 and 485166 match to the last line in the trace excerpt - ela= 2223365



Of course putting additional load onto the system comes at some risk. Let's consider how risky it is - especially if enabled for short time for specific analysis.

First a busy system where most sessions are active doing some work. 
In this instance most of the sessions are sampled anyhow - because active. The additional idle samples only add a relative small amount of lines - little additional load. 

Next a system where most sessions are idle. 
In this instance, much more lines will appear in ASH. Initially this can be seend as dangerous load. 
But all these idle sessions can become active anyhow - in worst case all together. 
If the instance is sized for the load of most sessions active, writing ALL sessions information to ASH will not hurt while it's idle. If the instance is NOT sized for the sessions coming active, the system is in high structural danger anyhow. 
 
Still another detail sould be cared of: ASH is written to AWR at some time - which will lead to more storage used when enabled for longer time. 


When compared with DBMS_MONITOR.DATABASE_TRACE_ENABLE, both methods have slightly different impact. When the system is busy, it will have a lot of quite fast wait events. A lot will be written to tracefiles and many lines will appear in ASH. 
But when the system has events which last quite long (as the 2 events in my example ago) they will create only one entry in the tracefile, but more lines in ASH. 

Keine Kommentare: