2023-12-09

The cost of BIND information in SQL Trace files

 Oracle SQL Trace files are a very valuable source of information - especially when it comes to performance analysis. 

But people often are afraid of the impact of tracing to performance. 

Luckily, Oracle chose quite useful defaults - at least when using DBMS_MONITOR:

WAITS=>TRUE

BINDS=>FALSE

PLAN_STAT=>'FIRST_EXECUTION'


This still leaves the question about the impact of writing BINDS to the tracefile. 

The best information I could find is by Cary Millsap in his book Tracing Oracle - chapter 25: Why We Recommend Not Tracing BINDS:

Here’s why that matters: every time an application binds a value to a placeholder, a BINDS=>TRUE trace causes Oracle to write 5 lines (about 180 bytes) into the trace file, using one write call per line. So, an EXEC that might consume only a few hundred microseconds when BINDS=>FALSE has to write an extra 36KB to the trace file with 2,000 extra write calls when BINDS=>TRUE

200 placeholders × 180 bytes/placeholder = 36,000 bytes

200 placeholders × 5 lines/placeholder × 1 write/line = 1,000 writes

Even this is a great technical explanation, it does not tell us how much time it takes. 

While analyzing some other questions, a tracefile with a lot of BIND entries was generated - and it significantly slowed down the application. Not a problem in this case, but something worth to check. 

The tracefile I generated covers ~117 seconds - 1 minutes and 57 seconds. 

mrskew just doesn't know about some of this time:


CALL-NAME                      DURATION       %   CALLS      MEAN       MIN       MAX
---------------------------  ----------  ------  ------  --------  --------  --------
EXEC                          91.022874   89.9%   4,398  0.020696  0.000000  0.093324
SQL*Net message from client    6.423181    6.3%   4,398  0.001460  0.000334  0.518238
log file sync                  3.633550    3.6%   4,397  0.000826  0.000372  0.013874
enq: SQ - contention           0.070351    0.1%      63  0.001117  0.000007  0.003597
library cache: mutex X         0.040133    0.0%       6  0.006689  0.000022  0.015928
cursor: pin S wait on X        0.027345    0.0%       3  0.009115  0.003581  0.019512
SQL*Net message to client      0.015619    0.0%   4,398  0.000004  0.000002  0.000079
CLOSE                          0.014905    0.0%   4,405  0.000003  0.000000  0.000079
enq: TX - index contention     0.012613    0.0%      11  0.001147  0.000036  0.005059
buffer busy waits              0.008619    0.0%     184  0.000047  0.000003  0.002146
14 others                      0.019216    0.0%   4,486  0.000004  0.000000  0.007130
---------------------------  ----------  ------  ------  --------  --------  --------
TOTAL (24)                   101.288406  100.0%  26,749  0.003787  0.000000  0.518238


mrskew only aggregates waits and dbcalls but doesn't care about the time "in between". We can assume the missing 16 seconds are the amount of time used to write the binds - and call it a day. 

mrprof shows some more information in the profile:

SubroutineDurationCallsDuration per call (seconds)
secondsRmeanminmaxskew
1CPU: EXEC dbcalls92.56228078.9%17,7100.005227·0.062486▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▁▁▁▁▁
2unaccounted-for within dbcalls9.0623527.7%39,8400.000227−0.0257810.026413▁▁▁▁▁▂▆▁▁▁▁▁▁▁▂█▄▁▁▁▁▁
3SQL*Net message from client6.4231815.5%4,3980.0014600.0003340.518238▁▁▁▄█▁▂▁▁▁▁
4log file sync3.6335503.1%4,3970.0008260.0003720.013874▁▁▁█▄▁▁▁▁▁▁
525 other subroutines5.6635944.8%51,2760.000110·0.019512▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▂▁▁▁▁▁▁
6Total (29)117.344957100.0%117,6210.000998−0.0257810.518238


There 9 seconds are called unaccounted-for within dbcalls - the explanation how this artificial component is calculated is 

mrprof synthesizes an “unaccounted-for between calls” call when there is missing time between dep = 0 database and system calls. Every adjacent pair of such calls will define some non-zero “unaccounted-for between calls” duration


That is somehow useful, but there is also a different approach possible. 
In the tracefile, I don't have timing information for the BINDS, but at least I have some activity before and afterwards. e.g. 


PARSE #1795421110760:c=32,e=32,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=968215712176
BINDS #1795421110760:

 Bind#0
  oacdty=02 mxl=22(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=881f7430  bln=22  avl=06  flg=09
  value=6607304129
  ...
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=06881698  bln=22  avl=02  flg=05
  value=1
EXEC #1795421110760:c=605,e=604,p=0,cr=0,cu=4,mis=0,r=1,dep=1,og=4,plh=0,tim=968215713003  

I see the EXEC completed at 968215713003 and had a duration of  604 microseconds - so it started at 968215712399. The last call before was a PARSE at 968215712176. This leaves a gap of 223 μs. Not a lot, but worth to care. To make this time somehow visible, I have a little perl script : 


#!/usr/bin/perl -w
#bind_synt_wait.pl
# creating a synthetic bind "WAIT" for otherwise UFBCs 
use strict;

my $prev_cursor     = 0 ;
my $cursor          = 0 ;
my $prev_timestamp  = 0 ;
my $timestamp       = 0 ;
my $calc_timestamp  = 0 ;
my $ela             = 0 ;
my $calc_ela        = 0 ;
my $to_be_processed = 0 ; 
# 0 => nothing to do, 
# 1 => BIND seen, but no WAIT, 
# 2 => BIND & WAIT seen ==> process!

#my $tablespace;
#my $block;
#my $object_id;

while (<>) {
  my $line = $_;
  if ($line =~ m/.*#(\d+)[: ].*/) {
    ($cursor = $line) =~ s/.*#(\d+)[: ].*/$1/ ;
  }
  $cursor =~ tr/\n//d;

  if ($line =~ m/.*tim=(\d+).*/) {
    ($timestamp = $line) =~ s/.*tim=(\d+).*/$1/ ;
	if ($to_be_processed == 1) {
	  $to_be_processed = 2;	
	}
  }
  $timestamp =~ tr/\n//d;
#  print " ---- timestamp = $timestamp \n";

  if ($line =~ m/.*ela= (\d+) .*/) {
    ($ela = $line) =~ s/.*ela= (\d+) .*/$1/ ;
  }
  $ela =~ tr/\n//d;
  
  if ($line =~ m/.*,e=(\d+),.*/) {
    ($ela = $line) =~ s/.*,e=(\d+),.*/$1/ ;
  }
  $ela =~ tr/\n//d;
#  print " ---- ela = $ela \n";
  
  if ($line =~ m/^BINDS/) {
    $to_be_processed = 1;
  }
  
  if ($to_be_processed == 2) {
# WAIT #391723720: nam='db file scattered read' ela= 2876 file#=5 block#=129 blocks=127 obj#=71123 tim=90982740602
    $calc_ela = ($timestamp - $ela) - $prev_timestamp;
	if ( $calc_ela < 0) {
		$calc_ela = 0 ; #$timestamp - $prev_timestamp;
	}
	$calc_timestamp = $timestamp-$ela;
    print "WAIT #"; # as the hash is somethign special ...
    print "$cursor: nam='BIND to tracefile write' ela= $calc_ela tim=$calc_timestamp\n";
    $to_be_processed = 0;
  }
  print $line;
  
  # will need these when generating tim & ela for BIND
  $prev_cursor = $cursor;
  $prev_timestamp = $timestamp;
}

which does exactly this calculation and inserts an artificial line into the tracefile:

WAIT #1795421110760: nam='BIND to tracefile write' ela= 223 tim=968215712399
Now mrskew knows about this time:

    
CALL-NAME                      DURATION       %   CALLS      MEAN       MIN       MAX
---------------------------  ----------  ------  ------  --------  --------  --------
EXEC                          91.022874   66.3%   4,398  0.020696  0.000000  0.093324
BIND to tracefile write       36.088810   26.3%  13,487  0.002676  0.000000  0.031517
SQL*Net message from client    6.423181    4.7%   4,398  0.001460  0.000334  0.518238
log file sync                  3.633550    2.6%   4,397  0.000826  0.000372  0.013874
enq: SQ - contention           0.070351    0.1%      63  0.001117  0.000007  0.003597
library cache: mutex X         0.040133    0.0%       6  0.006689  0.000022  0.015928
cursor: pin S wait on X        0.027345    0.0%       3  0.009115  0.003581  0.019512
SQL*Net message to client      0.015619    0.0%   4,398  0.000004  0.000002  0.000079
CLOSE                          0.014905    0.0%   4,405  0.000003  0.000000  0.000079
enq: TX - index contention     0.012613    0.0%      11  0.001147  0.000036  0.005059
15 others                      0.027835    0.0%   4,670  0.000006  0.000000  0.007130
---------------------------  ----------  ------  ------  --------  --------  --------
TOTAL (25)                   137.377216  100.0%  40,236  0.003414  0.000000  0.518238

and also the profile shows my artificial wait event: 

SubroutineDurationCallsDuration per call (seconds)
secondsRmeanminmaxskew
1CPU: EXEC dbcalls92.56228078.9%17,7100.005227·0.062486▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▁▁▁▁▁
2BIND to tracefile write36.08881030.8%13,4870.002676·0.031517▁▁▁▂▃█▁▁▁▁▁
3SQL*Net message from client6.4231815.5%4,3980.0014600.0003340.518238▁▁▁▄█▁▂▁▁▁▁
4log file sync3.6335503.1%4,3970.0008260.0003720.013874▁▁▁█▄▁▁▁▁▁▁
5unaccounted-for within dbcalls−25.605864−21.8%39,840−0.000643−0.0383240.026413▁▁▁▁▁█▇▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁
625 other subroutines4.2430003.6%64,5220.000066−0.0163850.019512▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▂▁▁▁▁▁▁
7Total (30)117.344957100.0%144,3540.000813−0.0383240.518238


But unfortunately, mrskews TOTAL is now bigger than 117 sec, and mrprof needs to introduce negative unaccounted-for within db calls to compensate that. 

The problem in this situation is the different dept and how their EXEC dbcalls are aggregated - and the artificial BIND to tracefile write doesn't fit well. It would required further modifications which would lead to further problems - and are probably not worth. 

At the end, it would be fine if Oracle could also instrument the code for writing BIND information to the tracefile - this could eliminate all this manual work. Until then, it might sometimes help to slightly improve a tracefile with data already hidden inside. 



Keine Kommentare: