Samstag, 20. Juni 2020

comparison between SQL*TRACE and ASH - get the right information

I'm quite interested in analyzing and improving Performance in Oracle databases. 
There are different approaches. A lot was discussed already if tracing or sampling is appropriate. Namely SQL*Trace or ASH/AWR. 
I will not favor one side here, just pick one aspect and compare both methods. 

Todays aspect is to compare how a specific user experience can be identified. As samples (ASH) are always enabled, it means to filter the right data out of this heap of data; In SQL*Trace it's to scope what or where SQL*Trace should be enabled. 

As I'm a lazy guy, I refer to Cary Millsaps Mastering Oracle Trace Data. In my copy of the book, Chapter 3 (Tracing Oracle) he shows an ordered list of tracing preferences. I will follow this list:

  1. If the application set a distinct client ID for each business task execution ...
    SQL*Trace can be enabled by DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE
    ASH can be filtered on column CLIENT_ID
  2. If the application set a distinct module and action name ... 
    SQL*Trace can be enabled by DBMS_MONITOR.SERV_MODE_ACT_TRACE_ENABLE
    ASH can be filtered on columns MODULE and ACTION
  3. If the program run long enough to be identified ... and still run long enough ... 
    SQL*Trace can be enabled for this session by DBMS_MONITOR.SESSION_TRACE_ENABLE 
    ASH can be filtered on columns SESSION_ID and SESSION_SERIAL# 
  4. If there is any x such as SYS_CONTEXT('USERENV', x)  ...
    SQL*Trace can be enabled by an AFTER LOGON trigger
    ASH use this AFTER LOGON trigger to set CLIENT_ID, MODULE or ACTION and then filter on any of those
  5. If a new service can be created for the problematic experience ... 
    SQL*Trace can be enabled by DBMS_MONITOR.SERV_MODE_ACT_TRACE_ENABLE
    ASH can be filtered on column SERVICE_HASH (with a little join of DBA_SERVICES)
  6. If nothing else helps ... 
    SQL*Trace can be enabled for everything a short period of time by DBMS_MONITOR.DATABASE_TRACE_ENABLE and the user experience can be identified later
    ASH is on all the time, still the user experience mist be identified by the same investigation effort 

This comparison seems to be equal between SQL*Trace and ASH sampling. But I know of a small but specific difference: Wherever the user experience can not be accurately enabled by any setting, it must be filtered by other means. Here SQL*Trace has a big advantage, especially when investigated by a toolset like mrskew in Method-R Workbench: Some activities can be identified by specific start and stop statements, e.g. getting some parameters from a configuration table with a SELECT at the beginning, and write a trace line at the end. mrskew can be used to generate an artificial experience-ID for every experience between these 2 markers. In ASH, as the sampling might not catch any of these statements for a specific execution. This needs additional triggers on the statements to set any CLIENT_ID, MODULE or ACTION as in point 4 above. For sure more work in  this case, especially for SELECTs

My high level summary on identifying the users experience: both SQL*Trace and ASH are comparable. The big advantage of ASH is it's always on implementation, whereas SQL*Trace has advantages when it's about specific SQLs as markers. 

Kommentare:

Mikhail Velikikh hat gesagt…

Hi Martin,

You wrote:

If the application set a distinct client ID for each business task execution ...

I have seen such applications. I consider it is very unnatural and I highly discourage such usage. Why is that? Oracle already provides the Execution Context ID to do the same. That is what I have been using in the shops I have worked for.
It is exposed in GV$SESSION, GV$ACTIVE_SESSION_HISTORY:

SQL> declare
2 v_ecid raw(16) := sys_guid();
3 v_count int;
4 begin
5 dbms_session.set_context('E2E_CONTEXT', 'ECID_UID', v_ecid);
6 -- run something
7 select count(*)
8 into v_count
9 from dba_objects o1,
10 dba_objects o2
11 where rownum <= 1e8;
12 dbms_session.set_context('E2E_CONTEXT', 'ECID_UID', null);
13 dbms_output.put_line(v_ecid);
14 end;
15 /
A89D7CE0FD3C2F66E053B51F13ACE87E

PL/SQL procedure successfully completed.

SQL> select count(*) from v$active_Session_history where ecid='A89D7CE0FD3C2F66E053B51F13ACE87E';

COUNT(*)
----------
6


JDBC applications can set it in addition to other E2E attributes you mentioned: MODULE, ACTION, CLIENT_ID - it would not require an extra network roundtrip which looks especially appealing.

Apparently, the ECID functionality was initially introduced for application servers since I remember it can be used as a search parameter in OC4J Application servers.

My current shop uses APEX extensively and I asked our developers to specifically identify all business operations with an ECID - a business operation is an application level term and it can include a plenty of database operations, such as SELECT, INSERT, etc.. It would look a teensy bit clunky to use CLIENT_ID for that. I would rather see those CLIENT_ID as APEX populates them by default which is something like: 'SCOTT:123456' in which 'SCOTT' is a username, '123456' is an APEX session ID.

The ECID is a natural choice when an end-to-end flow control is required. For instance, say you have a classic three tier architecture. ECID can be generated on the application server side and all database calls can contain it. When something goes awry and a post-morted analysis is requested, it is enough to know the ECID of a problem operation to see where it spent most time - the app server vs the DB server.

I used it in one shop to address users complaining about performance - poor performance from user's perspective is not always a database issue - ECIDs can be used to identify what components contribute to the most time.
A further extension of this idea is to generate ECIDs as close to the end-users as possible, i.e. in JavaScript, and pass it through all the layers up to the database.

On the other day, I implemented an automatic performance analysis tool based on those ECIDs. The stakeholders wanted to improve user experience and they asked me to identify any database operations of online users that took longer than 5 seconds. It was a breeze since I was already using ECIDs to answer intermittent questions related to unusual application slowdowns and higher than expected latencies.
Obviously, those ECIDs were generated at the app server first and passed to the database, which was called 'blackbox' by the app server team. ECIDs allowed to spot the light on what database was doing and cross-reference it with the app server logs - ECIDs were written there as well and allowed to connect the dots between the DB and the App server.

I think ECID is one of the most underrated features in Oracle database.
There may be the instrumentation maturity grade somewhere that could look as below:
1. no services or the default one - SYS$USERS
2. dedicated services are used
3. as above + modules
4. as above + actions
5. as above + client_ids
6. as above + ECIDs

CLIENT_INFO and Long-Ops is not in the list but it is also nice to have, however, they are not in ASH.

Regards,
Mikhail.

Martin Berger hat gesagt…

Hi Mikhail,
thank you for your great comment.
I like the way, ECID is handed back & forth between applications and databases to indicate an end-to-end flow.
Unfortunately, it's not written into SQL*TRACE right now.
That's probably the reason, Cary didn't mention it in his list ;-)
There is an entry in Database Ideas section of Oracles forum: Write ECID to the tracefile when enabling Oracle Extended Trace.
When Oracle ever decides to implement it, ECID can be used in both areas: SQL*Trace and ASH/AWR.
best regards,
Martin