One of those "small & unimportant" test DBs had an interesting IO pattern:
This was something interesting, so a colleague decided to have a deeper look at it. The next obvious step is to see if AAS shows anything interesting. And it does:
A lot of blue (which is used for IO in all tools I'm aware of) was expected, but it's obviously dominated by green (CPU). What a surprise!
After some quick crosschecks if any of the systems show fake data, we agreed it's all correct. Still the question exists: why does system statistics show a lot of IO, whereas ASH looks very different?
More data is needed, so one question was: out of all the direct path read IOs we saw in this timespawn, how fast were they? The first source for a check is DBA_HIST_SYSTEM_EVENT:
SELECT sysev.event_name, sysev.wait_class, MAX(total_waits) - MIN(total_waits) AS sum_total_waits, MAX(time_waited_micro) - MIN(time_waited_micro) AS sum_time_waited, ( MAX(time_waited_micro) - MIN(time_waited_micro) ) / ( MAX(total_waits) - MIN(total_waits) + 1) time_per_wait FROM dba_hist_snapshot snap, dba_hist_system_event sysev WHERE snap.snap_id = sysev.snap_id AND begin_interval_time BETWEEN to_timestamp('2019-05-03 06:00:00', 'YYYY-MM-DD HH24:MI:SS') AND to_timestamp('2019-05-03 20:00:00', 'YYYY-MM-DD HH24:MI:SS') GROUP BY GROUPING SETS ( ( sysev.event_name, sysev.wait_class ), ( sysev.wait_class ) ) ORDER BY 3 DESC, 2, 1;
EVENT_NAME WAIT_CLASS SUM_TOTAL_WAITS SUM_TIME_WAITED TIME_PER_WAIT ------------------------------ --------------- --------------- --------------- -------------- User I/O 156659399 87378811924 557.76 System I/O 134570949 10821645896 80.42 Idle 66603429 ############### 2432741.08 Network 65725910 5330619377 81.10 Administrative 20587618 191843235750 9318.38 direct path read User I/O 10353091 2001455155 193.32 db file sequential read User I/O 1008098 255164480 253.11 Other 947941 10721785435 11310.59 control file sequential read System I/O 472653 11995170 25.38 Application 460657 73815782 160.24 SQL*Net message from client Idle 303956 500628620721 1647037.64
The topmost User I/O event is direct path read. During the 14 hour period, it happened at about 200 times every second. Based on the ASH graph above we can assume it all happened in one session, so in this session every second about 40ms were spent in direct path read.
This now raises the question why these
Short events are underrepresented in sampled data. And as one single direct path read lasted only 193μs on average, the observation was hit by this bias.
With this very simple test-DB it was possible to investigate the situation to a reasonable degree of understanding, But chasing the real producer of high IOs in a busy production system might be misleading when only ASH is used.
This situation is kind of a downer for me as ASH does not even close show the truth. (this is a good example, where tracing is better than sampling). So a reliable source for additional information is required.
In recent Oracle versions (10.1 or later) there is some information available: Of course v$sesstat shows statistics for every session, but the numbers are cumulative and must be processed somehow.
Oracle provides v$sessmetric for this purpose, but it only shows the last x seconds and only a small subset of statistics. A more generic solution is Tanel Põders snapper. But both approaches can only show what's going on at the time of observation.
For something more DBA_HIST_ish, DBA_HIST_SESSMETRIC_HISTORY looks promising, but it is misleading, as This view is populated only if a session metric exceeds a server metric threshold that was configured using the
DBMS_SERVER_ALERT
package.There is much more about these features & their limitations in Kyle Hailys Oracle 10g Performance: chapter 04 new features.
With all that in mind, I hope for additional information about session statistic deltas. (SSD would be a perfect and unambiguous abbreviation 😅)
In short, it should write all changed session stats for all sessions at every snapshot. In addition for every closed session their statistics must be saved, otherwise their data is lost. Tanel implemented something similar already in his Session-level statspack but he decided not to store session statistics by default to save space. Also no logoff-trigger is there (yet).
Let's see if there will ever be a SSD implementation available.