Sonntag, 2. Mai 2021

tracing network ACLs

If there are problems with ACLs (Access Control Lists) in Oracle, the note SRDC - How to Collect Standard Information for Access Control Lists (ACLs) (Doc ID 1905572.1) is an important starting point.
Basically it suggests to enable these events.

alter session set events '10937 trace name context forever, level 7'; 
alter session set events '24247 trace name errorstack level 3';
But the only suggestion is to open a SR at oracle and send the results in. Sometimes, it already helps to understand parts of the trace file to identify an issue - no need to invest precious time on MOS. First the events itself. Their meaning are:

10937, 00000, "trace name context forever"
// * Cause:
// * Action: When set, this event enables tracing of PL/SQL packages.
// * Comment: Setting this event could result in some performance slowdown.

24247, 00000, "network access denied by access control list (ACL)"
// *Cause:    No access control list (ACL) has been assigned to the target
//            host or the privilege necessary to access the target host has not
//            been granted to the user in the access control list.
// *Action:   Ensure that an access control list (ACL) has been assigned to
//            the target host and the privilege necessary to access the target
//            host has been granted to the user.
Unfortunately I did not find a comprehensive documentation of these events levels, maybe this is area for some other posts.

A first try with a newly created user (grants: connect, resource & alter session) fails as expected:

SQL> select utl_http.request(url => '') from dual;

Error starting at line : 1 in command -
select utl_http.request(url => '') from dual
Error report -
ORA-29273: HTTP request failed
ORA-06512: at "SYS.UTL_HTTP", line 1530
ORA-24247: network access denied by access control list (ACL)
ORA-06512: at "SYS.UTL_HTTP", line 380
ORA-06512: at "SYS.UTL_HTTP", line 1470
ORA-06512: at line 1
With no ACLS at all, it's obvious.
According tho the documentation of UTL_HTTP:

This package is an invoker's rights package. The invoking user will need the connect privilege granted in the access control list assigned to the remote network host to which he wants to connect, as well as the use-client-certificates or the use-passwords privilege to authenticate himself with the remote Web server using the credentials stored in an Oracle wallet.

So connect it is: 

  DBMS_NETWORK_ACL_ADMIN.append_host_ace (
    host       => '', 
    lower_port => 80,
    upper_port => 443,
    ace        => xs$ace_type(privilege_list => xs$name_list('connect'),
                              principal_name => 'BERX',
                              principal_type => xs_acl.ptype_db)); 
With this privilege, let's see what the ACL trace file says. (I have to set event 10937 to level 6 (not 7) in my lab env - lvl7 did not generate anything). 

SQL> alter session set events '10937 trace name context forever, level 6';

Session altered.

SQL> alter session set events '24247 trace name errorstack level 3';

Session altered.

SQL> select utl_http.request(url => '') from dual;

<html code="" itemscope="" itemtype="" lan...
The select is fine now - and what's in the trace?

Trace file /u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_5483.trc
Oracle Database 19c Enterprise Edition Release - Production
Build label:    RDBMS_19.
ORACLE_HOME:    /u01/app/oracle/product/19.0.0/dbhome_1
System name:    Linux
Node name:      localhost.localdomain
Release:        5.4.17-2011.5.3.el8uek.x86_64
Version:        #2 SMP Wed Jul 29 22:09:11 PDT 2020
Machine:        x86_64
Instance name: cdb1
Redo thread mounted by this instance: 1
Oracle process number: 53
Unix process pid: 5483, image: oracle@localhost.localdomain

*** 2021-05-02T07:43:38.309047+00:00 (PDB1(3))
*** SESSION ID:(270.62847) 2021-05-02T07:43:38.309125+00:00
*** CLIENT ID:() 2021-05-02T07:43:38.309135+00:00
*** SERVICE NAME:(pdb1) 2021-05-02T07:43:38.309144+00:00
*** MODULE NAME:(java@localhost.localdomain (TNS V1-V3)) 2021-05-02T07:43:38.309153+00:00
*** ACTION NAME:() 2021-05-02T07:43:38.309162+00:00
*** CLIENT DRIVER:(jdbcoci : 2021-05-02T07:43:38.309170+00:00
*** CONTAINER ID:(3) 2021-05-02T07:43:38.309210+00:00

psdnop: BERX 120 3
psdnopCheckAcl: priv 2147483693 cnt 1
psdnopCheckAcl: aclid 2147493757 status 3
psdnop: denied
psdnop: BERX 120 2
psdnopCheckAcl: priv 2147483690 cnt 1
psdnopCheckAcl: aclid 2147493757 status 1
psdnop: granted
According to the last line, everything is fine - access is granted. But there is more! 
the first line of interest is psdnop: BERX 120 3 The username is obvious (well, not so if I think about PLSQL with PL/SQL AUTHID CURRENT_USER|DEFINER ).
120 is the USER_ID of user BERX. 
I have no idea what the 3rd parameter of psdnop is - any suggestion is very welcome! 
The last entry is the host & port I try to connect to. 
For me it's the summary what will be checked against ACLs. 
The last line of this first block is psdnop: denied. Even in the 2nd block it's granted, I still 

The second line in this block is psdnopCheckAcl: priv 2147483693 cnt 1. It checks if an ACL is granted. But which one? 
It's first parameter tells us it's a priviliege the number is 2147483693. Unfortunately it's not written in the tracefile, so we need to identify this privilege.

  SQL> select name, id  from sys.xs$obj where id = 2147483693 ;
   NAME            ID
_______ _____________
HTTP       2147483693
This means Oracle first checks if the HTTP privilege is present - as I did not grant it, it's not there. 
The last 2 parameters of psdnopCheckAcl are unclear: cnt 1 - I have no explanation. 

In case you don't want to query sys.xs$obj for a given privilege, have a peek in $ORACLE_HOME/rdbms/admin/catts.sql. The 3 important parts there (for privilege HTTP) are 

RESERVED_ID              NUMBER := 2147483648;
PRIV_HTTP                   NUMBER := RESERVED_ID + 45;
-- Create http privilege
insert into xs$obj(name, owner, tenant, id, type, status, flags)

Next line is psdnopCheckAcl: aclid 2147493757 status 3. Again a check, just for an aclid now.  Also this one can be identified, in our case: 
SQL> select * from nacl$_HOST where acl#=2147493757;

             HOST    LOWER_PORT    UPPER_PORT          ACL#
_________________ _____________ _____________ _____________               80           443    2147493757
Now it's clear why the first block is denied: for this host that privilege is just not granted.

To crosscheck on the second (granted) block: 
SQL> select name, id  from sys.xs$obj where id = 2147483690 ;

      NAME            ID
__________ _____________
CONNECT       2147483690

This is by far not a comprehensive introduction into ACL traces, but at least it might help to identify, WHAT is missing when hitting ORA-24247

Sonntag, 7. MĂ€rz 2021

reading PDBs spfile parameters

Last week I was asked by a customer if changes in a PDBs spfile parameter can be monitored easily. 

My first thought was to check if AWR on CDB$ROOT level does this for me already - unfortunately it doesn't. 

So the next idea was to sample PDB_SPFILE$ on my own. 
Unfortunately this table is not documented by Oracle and my colleague Miguel Anjo found records there are not deleted when they are reset on PDB level

To identify such behavior, I like to have a look in all the nice files in $ORACLE_HOME/rdbms/admin.
In e1201000.sql I luckily found this code. Even e1201000.sql is used in case of downgrade, it's quite helpful.

Rem *************************************************************************
Rem BEGIN BUG 23300354 - remove HWM and flags column values in pdb_spfile$
Rem *************************************************************************
update pdb_spfile$ set
db_uniq_name=(select sys_context('userenv', 'DB_UNIQUE_NAME') from dual)
where db_uniq_name='*';
Rem Set HWM value to 0
update pdb_spfile$ set spare1=0;
Rem Delete deleted rows and reset flags to 0
delete from pdb_spfile$ where bitand(nvl(spare2,0),1) = 1;
update pdb_spfile$ set spare2=0;
Another approach is to check what#s Oracle doing at UNPLUGGING a PDB: 

select value$ from pdb_spfile$ where name=:1 and pdb_uid=:2                   and bitand(nvl(spare2,0),1)=0

I tried the probably most obvious method: 

CDB$ROOT> select * from containers(v$spparameter);

But this only show results for CDB$OOT (CON_ID => 1). 

Even a test with x$kspspfile shows only values for CON_ID => 1. 

This is quite sad, especially as x$kspspfile IS used when I query v$spparameter from my PDB1 (which has CON_ID=3 in this test), but there seems to be some additional magic in x$kspspfile which I can not mitigate on SQL layer.

Dienstag, 5. Januar 2021

convert SQL Trace to a list of database calls in chronological order with human readable timestamps

 It all started so easy! 
A customer found in one database one statement which is outstanding high in statspack reports. Unfortunately, as the application is not instrumented, it's hard to understand where the SQL is coming from and what's its purpose. 
To bring it into context, a simple SQL TRACE can do miracles. 

But as SQL Trace is hard to read and has a lot of confusing content, I used mrskew and a slightly modified calls.rc to generate a chronological list of all DB calls. 

The result loks similar to this output

  12568091.341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
  12568091.343769         42               XCTEND    
For everyone used to calculate seconds and miliseconds by just looking at it, and with the little additional information the tracefile starts at
*** 2020-11-26 10:06:01.867
it can be no issue at all to see when exactly a call was executed, and maybe correlate it to otle rlogfiles of the application. 

As I want to provide more value than only relative timestamps, my idea was to let a program do the calculations. As I struggled doing so in pure mrskew commands, I asked for some help at stackoverflow
There Cary confirmed my goal isn't possible with pure mrskew, but mrwhen can be of some help. 

The first aproach was to extend my calls.rc file to create this output:

tim=12568091341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
tim=12568091343769         42               XCTEND    
but mrwhen wasn't happy (yet):

mrwhen: no START value in file 'bx1.trc'
mrwhen: no DEP value in file 'bx1.trc'
mrwhen: file 'bx1.trc' contains no timestamp line
The START value is not documented right now, but it can be easily guessed mrwhen needs a real timestamp to correlate the tim values. The next try is:

*** 2020-11-26 10:06:01.867

tim=12568091341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
tim=12568091343769         42               XCTEND    
and now mrwhen changes it's mind slightly:

mrwhen: no DEP value in file 'bx1.trc'
mrwhen: file 'bx1.trc' contains no tim-containing line  
No complain about START anymore, and it doesn't see any tim-containing line? I'm very sure I see tim= there. 

After playing around by adding dep= ( as it complained about it) and e= (to provide some duration) - all without any success - it seems mrwhen wants the line to look more like a real tracefile. 

My final attempt is a file bx1.trc:

*** 2020-11-26 10:06:01.867

XCTEND tim=12568091341788 e=2 dep=0          36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ..
XCTEND tim=12568091343769 e=1 dep=0          42               XCTEND    
Now mrwhen still complains 

mrwhen: no DEP value in file 'bx1.trc'

But the result is useful now:

2020-11-26 10:06:01.867000  2020-11-26 10:06:01.867000  *** 2020-11-26 10:06:01.867

                                                                  END-TIM       LINE       SQL_ID CALL-NAME STATEMENT-TEXT
2020-11-26T10:06:01.867000  2020-11-26T10:06:01.867000  XCTEND tim=12568091341788 e=2 dep=0          36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
2020-11-26T10:06:01.868981  2020-11-26T10:06:01.868981  XCTEND tim=12568091343769 e=1 dep=0          42               XCTEND

For post-processing I only need to remove everything starting with XCTEND e and 45 characters long.

sed -E 's/XCTEND t.{37}//'
The remaining post-processing can be done manually. 

With little additional help, i can generate the output desired.
Still I have some wishes for mrskew extensions, but they are not part of this post.

Montag, 4. Januar 2021

wrong checksum in VirtualBox due to Windows Defender Credential Guard?

I use Virtualbox a lot on my Laptops for all kind of playgrounds. I install the software to test in the VMs and do whatever is required for my test. 
But in one VM, I could not install a specific package. 
It seemed to me the checksum of the package to install was always wrong. 
I tried several ways to get the package onto the VM, like downloading it directly with wget, using an alternate repository, download it to my Windows laptop first (there the checksum was correct) and transfer it to the VM, ... 
Every time the checksum was wrong - but different! 
It differed depending on the position in the VMs filesystem. 

After some research this led me to a question on stackexchange
And some more clicks later I hit the DG_Readiness_Tool
It really seems, Microsofts Defender somehow affected either the reading of the file in the VM or the calculation of it's checksum. 
However, after going through the articles noted above (and some more) I decided to give it a try with 

DG_Readiness_Tool.ps1 -Disable -AutoReboot

The Laptop rebooted - and everything is fine now. 
(I had to re-load the package into the VM - this gives some clue where the issue happened, But I have no prove so no finger-pointing here)

I can not explain, why (and how) this happened, and I don't recommend to disable ANY security-feature. Just in my case it helped. 
(On another, quite similar, system the same VM with same package and also up-to-date Win10 never showed such issues)

Sonntag, 3. Januar 2021

identify index corruption

X for U

One of my customers faced a problem with false results
The most remarkable detail was the inconsistency of these wrong results. It depended on the queries he run.  So his first assumption was an error in the SQL parser/optimizer/wherever. 
But as I striped down the correct/wrong SQLs more and more to get a simple testcase, maybe something to provide to Oracle Support, it became clear the SQLs are fine, just one physical representation of the data was wrong!

Yes, even in a perfectly normalized relational database, there is a lot of physically duplicated data. One might think of materialized views, but in fact, every index is an implicit duplication of some data for optimized physical access. 

Each RDBMS promises to keep this duplicated in sync (if the word would have been used in IT in last millenium, they would have called it autonomous ), and each RDBMS fails in doing so sometimes. 

I tried to identify those rows and indexes where information in the index and the row differs. Then the customer needs to understand the data and decide, which data is correct, store this data in the block and re-create all indexes which does not match the tables data. 

The basic idea is to select the rows from the index and from the table and compare it's data. 
Even this sounds easy, Oracles Optimizer does a lot of effort to avoid what it sees as unnecessary work. Getting the same column of the same row from 2 different sources is such an avoidable work. 
I first tried some queries where I get the questionable columns (and their ROWIDs) in a sub-query and then join to the table again retrieving the tables row and value. Something like: 

select /*+ qb_name(main) */, t.problematic_column
from TABLE t 
where t.rowid in
  select /*+ qb_name(get_rowid)  */ ti.rowid
  from TABLE ti 
  where ti.problematic_column='123456'
But to avoid the optimizers clevernes, I had to use some hints like NO_MERGE,  NO_UNNEST,  NO_ELIMINATE_SQ(@GET_ROWID), INDEX, MATERIALIZE and there might be even more with higher versions. 

My final approach is maybe not as eligant in pure SQL, but for me it provides more stability:  

  FUNCTION get_problematic_column (rid in varchar2) return number IS
   cid number;
    SELECT problematic_column into cid
    from TABLE t
    where rowid=rid;
    return cid;
  select /*+ INDEX(ti, SUSPICIOUS_INDEX) qb_name(get_rowid) */
    ti.rowid,, ti.problematic_column as problematic_column_in_index,
    get_problematic_column(rowid) as problematic_column_at_ROWID
  from TABLE ti 
  where ti.problematic_column='123456'

This is not the most beautiful query, but it should give a good idea. 
The function get_problematic_column does a TABLE ACCESS BY USER ROWID and we can be very sure this will not change and provide the data as it's in the tables block. 
The query get_rowid should access the TABLE via the SUSPICIOUS_INDEX - I only have to ensure the WHERE clause matches the index columns. 

With little adaptions I could test all indices for invalid columns. 
In this case the tables data was valid and one index needed to be rebuilt. Much easier than opening a Service Request at MOS. 

If anyone wants to know the underlying hardware: it's an Exadata cluster (I was never told exact specifications) with proper ASM Diskgroup configuration. 

Samstag, 19. September 2020

instrumentation in Informatica

 Instrumentation in applications is important. Especially when it comes to troubleshooting performance 😏. Unfortunately some applications seems to miss this feature. 

For me, Informatica Workflow Manager is such an application. I might be wrong - if I am, please tell me! 

But recently I was facing several issues in Oracle databases where I was only told the workflow name and workflow instance. After a lot of guesswork to identify the problematic sessions and SQLs, I was sure there must be something more useful, so I started to browse & search the internet. 

I found the Transaction Environment SQL. Even not explicit stated, for mee it seems a good point to identify a specific activity. Informatica has some Built-in Variables which are valid candidates, e.g. $PMWorkflowName

My first suggestion
    module_name => '$PMWorkflowName',
    action_name => '$PMWorkflowRunInstanceName');

failed with some internal errors:
ORA-06550: line 4, column 35:

PLS-00103: Encountered the symbol "end-of-file" when expecting one of the following:


   := . ( % ;


Database driver error...

Function Name : executeDirect



    module_name => 'WF_SHAREPOINT_MYCOMPLAINCE',

    action_name => 's_M_STG_MC_RUN')

Oracle Fatal Error

Database driver error...

Function Name : ExecuteDirect


Oracle Fatal Error


Some more research and another KB article showed  a small additional requirements: 
Semicolons ( ; ) are seen as termination character. As there are multiple semicolons inside the PL/SQL block, they must be escaped by a backslash ( \ ).

With this knowledge, the working code is 

    module_name => '$PMWorkflowName',
    action_name => '$PMSessionName')\;
DBMS_SESSION.set_identifier( $PMWorkflowRunId )\;
DBMS_SESSION.set_context('E2E_CONTEXT', 'ECID_UID', $PMWorkflowRunId )\;

This solution wil not fit every need, but hopefully it can be seen as a valid startingpoint for better instrumentation in Informatica. 

Sonntag, 28. Juni 2020

the size of a thumb

The size of a thumb was used by humans as a measurement for very long time. Used by craftsmen, the size of body parts like a thumb, hand, forearm, foot and so on were both good enough to be used within a team, and even the ratio between those measurement were acceptable accurate and comparable for each individual. 
But such a system comes to its limits. Different regions had different commonly agreed definitions for the same unit. This makes it hard to properly compare and convert, especially when it comes to written contracts between different regions. 
The metric system was designed to overcome problems in converting units between regions and types of units. 
Still it took some time and effort to show the advantage of this system. As an example, around 1999 NASA invested approx $125 million to teach some engineers in Colorado the difference between pound and newton.    

For me it seems, this problem is also exists in rules of thumb. Some of them are similar, still different. 
My current example is a rule of thumb to decide, if an access path of a SQL statement is acceptable.

The first rule I found by Jeff Holt and Cary Millsap in Optimize Oracle Performance (2003):
High LIO count (more than about 10 LIO calls per nonaggregate row returned per table in the FROM clause)  

It was explained in more detail by Cary in Mastering Oracle Trace Data (2019): 
The quantity 10 is a nice round number that represents the most touches on the buffer cache you should need for plunging an index from root to leaf, plus accessing a data block (even if you have row migration or row chaining). On average, you shouldn’t need more than 10 LIOs to find a row from a single-table query. You shouldn’t need more than 20 LIOs to find a row from a two-table join, more than 30 LIOs to find a row from a three-table join, and so on. You shouldn’t need more than 420 LIOs to retrieve 7 rows from a 6-table join.

A similar, but different rule by Chris Antognini in Troubleshooting Oracle Performance (2019): 

  • Access paths that lead to less than about 5 logical reads per returned row are probably good.
  • Access paths that lead to up to about 10–15 logical reads per returned row are probably acceptable.
  • Access paths that lead to more than about 20 logical reads per returned row are probably inefficient. In other words, there’s probably room for improvement.

These different rules might be based on different assumptions. 
The example of 420 LIOs for 7 rows from a 6 table join is acceptable for Cary, maybe because he accepts the schema which enforces this 6 table join as given, but Chris might suggest a schema redesign or physical design optimization. 

At the end, all rules of thumb must be considered with reasonable care. It's never a law, but can be seen as a starting point. 

Update 2020-06-30: Chris Antognini informed me about a very important error I made:
He is referring to Access paths, not full complete execution plans.
Access paths are defined as
An access path is a technique used by a query to retrieve rows from a row source.

This means there is no big difference between Chris and Carys rules - which makes this post kind of useless. Still it will stay to document the importance of accurate reading!
I have to apologize to Chris and Cary for artificially generating a disagreement.