2023-12-24

remote external procedure

 During the migration of an application from an older AIX onto modern Linux based database, I stumbled across a little issue: The application uses external procedure calls. This is not an issue in itself as extproc is supported and nowadays doesn't even require a listener to run the code anymore. Even security is covered if the extproc agent is run as a different user, which can be enforced in extproc.ora

But - as the application is quite old - the application responsible team could not provide any linux libraries up until now. All the oracle documenation always shows a local extproc configuration only. either using an agent process spawned directly from the instance, or a listener, addressed by  (PROTOCOL=ipc)

Still, there is hope! 


In the listener-config, basically it's just a listener which listens to some protocol and provides the extproc binary as a SID_LIST. This could enable me to run the extproc libraries on a dedicated AIX host, and leave the poor RDBMS nodes for RDBMS stuff. 
I did not find any such configuration on the internet but was made aware by a colleague about this article: 
How to call external Procedure from PL/SQL on remote Host ?
If it was promoted for Oracle 8 some years ago, let's give it a try! 

Sitting on my sandbox I only have one linux node with 23c binaries - but I'll just use a different os user to show connectivity is using SQL*Net

To keep the setting simple (and avoid some additional work) I loosely follow the MOS document 10 Steps to Create and Run a Sample External Procedure Program on Unix (Doc ID 312564.1). 

Beside the oracle user, I create a new user extproc_runner (group extproc_runner) and copied the full ORACLE_HOME there. 

my first step is to create the proper shared object file. In /home/extproc_runner 
I create a very simple shell.c


#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void sh(char *command) {
  int num;
  num = system(command);
}
  
and compile it with

gcc -fPIC -c shell.c	
ld -shared -o shell.so shell.o
chmod 755 shell.so

The easy part is done!


the extproc_runners listener.ora is created as

LISTENER_EXTPROC =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1555))
    )
  )

sid_list_listener_extproc =
 (sid_list=
    (sid_desc=
      (sid_name=extproc_run)
      (oracle_home=/home/extproc_runner/product/23c/dbhomeFree)
      (program=extproc)
    )
)

and starts without any problem:

[extproc_runner@localhost admin]$ lsnrctl start listener_extproc

LSNRCTL for Linux: Version 23.0.0.0.0 - Developer-Release on 23-DEC-2023 17:57:25

Copyright (c) 1991, 2023, Oracle.  All rights reserved.

Starting /home/extproc_runner/product/23c/dbhomeFree/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 23.0.0.0.0 - Developer-Release
System parameter file is /home/extproc_runner/product/23c/dbhomeFree/network/admin/listener.ora
Log messages written to /home/extproc_runner/base/diag/tnslsnr/localhost/listener_extproc/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1555)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1555)))
STATUS of the LISTENER
------------------------
Alias                     listener_extproc
Version                   TNSLSNR for Linux: Version 23.0.0.0.0 - Developer-Release
Start Date                23-DEC-2023 17:57:26
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /home/extproc_runner/product/23c/dbhomeFree/network/admin/listener.ora
Listener Log File         /home/extproc_runner/base/diag/tnslsnr/localhost/listener_extproc/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1555)))
Services Summary...
Service "extproc_run" has 1 instance(s).
  Instance "extproc_run", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

That's the easy part. Now let's tell the instance about this available service: tnsnames.ora is extended by this entry:

extproc_run_alias =
  (description=
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1555))
    (connect_data=(sid=extproc_run))
  )

and it can be reached somehow:

[oracle@localhost admin]$ tnsping extproc_run_alias

TNS Ping Utility for Linux: Version 23.0.0.0.0 - Developer-Release on 23-DEC-2023 18:05:54

Copyright (c) 1997, 2023, Oracle.  All rights reserved.

Used parameter files:
/opt/oracle/product/23c/dbhomeFree/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (description= (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1555)) (connect_data=(sid=extproc_run)))
OK (0 msec)
[oracle@localhost admin]$ sqlplus x/x@extproc_run_alias

SQL*Plus: Release 23.0.0.0.0 - Developer-Release on Sat Dec 23 18:05:58 2023
Version 23.2.0.0.0

Copyright (c) 1982, 2023, Oracle.  All rights reserved.

ERROR:
ORA-28547: connection to server failed, probable Oracle Net admin error

Now onto the last part: let's tell a schema there is an extproc it can use: . . .

To keep it simple, I use SYSTEM in my PDB, but any schema with proper permissions should do. 


CREATE DATABASE LINK extproc_agent_link USING 'extproc_run_alias';

Database link EXTPROC_AGENT_LINK created.


CREATE LIBRARY shell_lib is '/home/extproc_runner/shell.so' agent 'extproc_agent_link';

Library SHELL_LIB compiled


CREATE OR REPLACE PROCEDURE shell(command IN char)
AS EXTERNAL
NAME "sh"
LIBRARY shell_lib
LANGUAGE C
PARAMETERS (command string);
/

Procedure SHELL compiled


set serveroutput on
exec shell('/usr/bin/env >/tmp/bla');

ORA-28595: Extproc agent: Invalid DLL Path
ORA-06512: at "SYSTEM.SHELL", line 1
ORA-06512: at line 1
28595. 00000 -  "Extproc agent : Invalid DLL Path"
*Cause:    The path of DLL supplied for the extproc execution is invalid.
*Action:   Check if the DLL path is set properly using the EXTPROC_DLLS
           environment variable.
that was close, but not good enough.
Some more research & reading of 
ORA-28595: Extproc&nsbp;Agent&nsbp;: Invalid DLL Path Error in 12c With Listener&nsbp;Configuration (Doc&nsbp;ID&nsbp;1669896.1)    
External Procedure Calls and ORA-28595 In Versions 9.2 through&nsbp;10gR2 (Doc&nsbp;ID&nsbp;198523.1)  
and some check of tracefiles in /home/extproc_runner/product/23c/dbhomeFree/hs/log
 (at least I kenw the extproc was somehow involved already) led to an extended listener.ora for extproc_runner

LISTENER_EXTPROC =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1555))
    )
  )

sid_list_listener_extproc =
 (sid_list=
    (sid_desc=
      (sid_name=extproc_run)
      (oracle_home=/home/extproc_runner/product/23c/dbhomeFree)
      (program=extproc)
      (ENVS="EXTPROC_DLLS=ANY,
LD_LIBRARY_PATH=/home/extproc_runner/product/23c/dbhomeFree/lib:/home/extproc_runner")
    )
  )

With this settings, I can call my external procedure through SQL*Net: 

exec shell('/usr/bin/env >/tmp/bla');

PL/SQL procedure successfully completed.
No error anymore - and I can check the file /tmp/bla
-rw-rw-r--. 1 extproc_runner extproc_runner 4841 Dec 24 20:23 /tmp/bla

ORACLE_SID=extproc_run
ORACLE_BASE=/home/extproc_runner/base
MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH LD_PRELOAD
_=/usr/bin/env
LANG=en_US.UTF-8
HISTCONTROL=ignoredups
EXTPROC_DLLS=ANY
ORACLE_HOME=/home/extproc_runner/product/23c/dbhomeFree
HOSTNAME=localhost.localdomain
JAVA_HOME=/var/opt/jdk-20.0.1
...
USER=extproc_runner
PWD=/home/extproc_runner/product/23c/dbhomeFree/hs/admin
HOME=/home/extproc_runner

LD_LIBRARY_PATH=/home/extproc_runner/product/23c/dbhomeFree/lib:/home/extproc_runner
ORA_NET2_DESC=18,21
LOADEDMODULES=
...
SHELL=/bin/bash
SHLVL=2
MANPATH=:
MODULEPATH=/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles
LOGNAME=extproc_runner
MODULEPATH_modshare=/usr/share/Modules/modulefiles:2:/etc/modulefiles:2:/usr/share/modulefiles:2
PATH=/home/extproc_runner/product/23c/dbhomeFree/bin:/home/extproc_runner/.local/bin:/home/extproc_runner/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/var/opt/jdk-20.0.1/bin:/opt/oracle/product/sqlcl/bin
MODULESHOME=/usr/share/Modules
...

This should be sufficient as a proof of concept.

2023-12-09

The cost of BIND information in SQL Trace files

 Oracle SQL Trace files are a very valuable source of information - especially when it comes to performance analysis. 

But people often are afraid of the impact of tracing to performance. 

Luckily, Oracle chose quite useful defaults - at least when using DBMS_MONITOR:

WAITS=>TRUE

BINDS=>FALSE

PLAN_STAT=>'FIRST_EXECUTION'


This still leaves the question about the impact of writing BINDS to the tracefile. 

The best information I could find is by Cary Millsap in his book Tracing Oracle - chapter 25: Why We Recommend Not Tracing BINDS:

Here’s why that matters: every time an application binds a value to a placeholder, a BINDS=>TRUE trace causes Oracle to write 5 lines (about 180 bytes) into the trace file, using one write call per line. So, an EXEC that might consume only a few hundred microseconds when BINDS=>FALSE has to write an extra 36KB to the trace file with 2,000 extra write calls when BINDS=>TRUE

200 placeholders × 180 bytes/placeholder = 36,000 bytes

200 placeholders × 5 lines/placeholder × 1 write/line = 1,000 writes

Even this is a great technical explanation, it does not tell us how much time it takes. 

While analyzing some other questions, a tracefile with a lot of BIND entries was generated - and it significantly slowed down the application. Not a problem in this case, but something worth to check. 

The tracefile I generated covers ~117 seconds - 1 minutes and 57 seconds. 

mrskew just doesn't know about some of this time:


CALL-NAME                      DURATION       %   CALLS      MEAN       MIN       MAX
---------------------------  ----------  ------  ------  --------  --------  --------
EXEC                          91.022874   89.9%   4,398  0.020696  0.000000  0.093324
SQL*Net message from client    6.423181    6.3%   4,398  0.001460  0.000334  0.518238
log file sync                  3.633550    3.6%   4,397  0.000826  0.000372  0.013874
enq: SQ - contention           0.070351    0.1%      63  0.001117  0.000007  0.003597
library cache: mutex X         0.040133    0.0%       6  0.006689  0.000022  0.015928
cursor: pin S wait on X        0.027345    0.0%       3  0.009115  0.003581  0.019512
SQL*Net message to client      0.015619    0.0%   4,398  0.000004  0.000002  0.000079
CLOSE                          0.014905    0.0%   4,405  0.000003  0.000000  0.000079
enq: TX - index contention     0.012613    0.0%      11  0.001147  0.000036  0.005059
buffer busy waits              0.008619    0.0%     184  0.000047  0.000003  0.002146
14 others                      0.019216    0.0%   4,486  0.000004  0.000000  0.007130
---------------------------  ----------  ------  ------  --------  --------  --------
TOTAL (24)                   101.288406  100.0%  26,749  0.003787  0.000000  0.518238


mrskew only aggregates waits and dbcalls but doesn't care about the time "in between". We can assume the missing 16 seconds are the amount of time used to write the binds - and call it a day. 

mrprof shows some more information in the profile:

SubroutineDurationCallsDuration per call (seconds)
secondsRmeanminmaxskew
1CPU: EXEC dbcalls92.56228078.9%17,7100.005227·0.062486▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▁▁▁▁▁
2unaccounted-for within dbcalls9.0623527.7%39,8400.000227−0.0257810.026413▁▁▁▁▁▂▆▁▁▁▁▁▁▁▂█▄▁▁▁▁▁
3SQL*Net message from client6.4231815.5%4,3980.0014600.0003340.518238▁▁▁▄█▁▂▁▁▁▁
4log file sync3.6335503.1%4,3970.0008260.0003720.013874▁▁▁█▄▁▁▁▁▁▁
525 other subroutines5.6635944.8%51,2760.000110·0.019512▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▂▁▁▁▁▁▁
6Total (29)117.344957100.0%117,6210.000998−0.0257810.518238


There 9 seconds are called unaccounted-for within dbcalls - the explanation how this artificial component is calculated is 

mrprof synthesizes an “unaccounted-for between calls” call when there is missing time between dep = 0 database and system calls. Every adjacent pair of such calls will define some non-zero “unaccounted-for between calls” duration


That is somehow useful, but there is also a different approach possible. 
In the tracefile, I don't have timing information for the BINDS, but at least I have some activity before and afterwards. e.g. 


PARSE #1795421110760:c=32,e=32,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=968215712176
BINDS #1795421110760:

 Bind#0
  oacdty=02 mxl=22(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=881f7430  bln=22  avl=06  flg=09
  value=6607304129
  ...
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=06881698  bln=22  avl=02  flg=05
  value=1
EXEC #1795421110760:c=605,e=604,p=0,cr=0,cu=4,mis=0,r=1,dep=1,og=4,plh=0,tim=968215713003  

I see the EXEC completed at 968215713003 and had a duration of  604 microseconds - so it started at 968215712399. The last call before was a PARSE at 968215712176. This leaves a gap of 223 μs. Not a lot, but worth to care. To make this time somehow visible, I have a little perl script : 


#!/usr/bin/perl -w
#bind_synt_wait.pl
# creating a synthetic bind "WAIT" for otherwise UFBCs 
use strict;

my $prev_cursor     = 0 ;
my $cursor          = 0 ;
my $prev_timestamp  = 0 ;
my $timestamp       = 0 ;
my $calc_timestamp  = 0 ;
my $ela             = 0 ;
my $calc_ela        = 0 ;
my $to_be_processed = 0 ; 
# 0 => nothing to do, 
# 1 => BIND seen, but no WAIT, 
# 2 => BIND & WAIT seen ==> process!

#my $tablespace;
#my $block;
#my $object_id;

while (<>) {
  my $line = $_;
  if ($line =~ m/.*#(\d+)[: ].*/) {
    ($cursor = $line) =~ s/.*#(\d+)[: ].*/$1/ ;
  }
  $cursor =~ tr/\n//d;

  if ($line =~ m/.*tim=(\d+).*/) {
    ($timestamp = $line) =~ s/.*tim=(\d+).*/$1/ ;
	if ($to_be_processed == 1) {
	  $to_be_processed = 2;	
	}
  }
  $timestamp =~ tr/\n//d;
#  print " ---- timestamp = $timestamp \n";

  if ($line =~ m/.*ela= (\d+) .*/) {
    ($ela = $line) =~ s/.*ela= (\d+) .*/$1/ ;
  }
  $ela =~ tr/\n//d;
  
  if ($line =~ m/.*,e=(\d+),.*/) {
    ($ela = $line) =~ s/.*,e=(\d+),.*/$1/ ;
  }
  $ela =~ tr/\n//d;
#  print " ---- ela = $ela \n";
  
  if ($line =~ m/^BINDS/) {
    $to_be_processed = 1;
  }
  
  if ($to_be_processed == 2) {
# WAIT #391723720: nam='db file scattered read' ela= 2876 file#=5 block#=129 blocks=127 obj#=71123 tim=90982740602
    $calc_ela = ($timestamp - $ela) - $prev_timestamp;
	if ( $calc_ela < 0) {
		$calc_ela = 0 ; #$timestamp - $prev_timestamp;
	}
	$calc_timestamp = $timestamp-$ela;
    print "WAIT #"; # as the hash is somethign special ...
    print "$cursor: nam='BIND to tracefile write' ela= $calc_ela tim=$calc_timestamp\n";
    $to_be_processed = 0;
  }
  print $line;
  
  # will need these when generating tim & ela for BIND
  $prev_cursor = $cursor;
  $prev_timestamp = $timestamp;
}

which does exactly this calculation and inserts an artificial line into the tracefile:

WAIT #1795421110760: nam='BIND to tracefile write' ela= 223 tim=968215712399
Now mrskew knows about this time:

    
CALL-NAME                      DURATION       %   CALLS      MEAN       MIN       MAX
---------------------------  ----------  ------  ------  --------  --------  --------
EXEC                          91.022874   66.3%   4,398  0.020696  0.000000  0.093324
BIND to tracefile write       36.088810   26.3%  13,487  0.002676  0.000000  0.031517
SQL*Net message from client    6.423181    4.7%   4,398  0.001460  0.000334  0.518238
log file sync                  3.633550    2.6%   4,397  0.000826  0.000372  0.013874
enq: SQ - contention           0.070351    0.1%      63  0.001117  0.000007  0.003597
library cache: mutex X         0.040133    0.0%       6  0.006689  0.000022  0.015928
cursor: pin S wait on X        0.027345    0.0%       3  0.009115  0.003581  0.019512
SQL*Net message to client      0.015619    0.0%   4,398  0.000004  0.000002  0.000079
CLOSE                          0.014905    0.0%   4,405  0.000003  0.000000  0.000079
enq: TX - index contention     0.012613    0.0%      11  0.001147  0.000036  0.005059
15 others                      0.027835    0.0%   4,670  0.000006  0.000000  0.007130
---------------------------  ----------  ------  ------  --------  --------  --------
TOTAL (25)                   137.377216  100.0%  40,236  0.003414  0.000000  0.518238

and also the profile shows my artificial wait event: 

SubroutineDurationCallsDuration per call (seconds)
secondsRmeanminmaxskew
1CPU: EXEC dbcalls92.56228078.9%17,7100.005227·0.062486▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆█▁▁▁▁▁
2BIND to tracefile write36.08881030.8%13,4870.002676·0.031517▁▁▁▂▃█▁▁▁▁▁
3SQL*Net message from client6.4231815.5%4,3980.0014600.0003340.518238▁▁▁▄█▁▂▁▁▁▁
4log file sync3.6335503.1%4,3970.0008260.0003720.013874▁▁▁█▄▁▁▁▁▁▁
5unaccounted-for within dbcalls−25.605864−21.8%39,840−0.000643−0.0383240.026413▁▁▁▁▁█▇▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁
625 other subroutines4.2430003.6%64,5220.000066−0.0163850.019512▁▁▁▁▁▁▁▁▁▁▁▁▁▄█▂▁▁▁▁▁▁
7Total (30)117.344957100.0%144,3540.000813−0.0383240.518238


But unfortunately, mrskews TOTAL is now bigger than 117 sec, and mrprof needs to introduce negative unaccounted-for within db calls to compensate that. 

The problem in this situation is the different dept and how their EXEC dbcalls are aggregated - and the artificial BIND to tracefile write doesn't fit well. It would required further modifications which would lead to further problems - and are probably not worth. 

At the end, it would be fine if Oracle could also instrument the code for writing BIND information to the tracefile - this could eliminate all this manual work. Until then, it might sometimes help to slightly improve a tracefile with data already hidden inside.