Dienstag, 5. Januar 2021

convert SQL Trace to a list of database calls in chronological order with human readable timestamps

 It all started so easy! 
A customer found in one database one statement which is outstanding high in statspack reports. Unfortunately, as the application is not instrumented, it's hard to understand where the SQL is coming from and what's its purpose. 
To bring it into context, a simple SQL TRACE can do miracles. 

But as SQL Trace is hard to read and has a lot of confusing content, I used mrskew and a slightly modified calls.rc to generate a chronological list of all DB calls. 

The result loks similar to this output

  12568091.341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
  12568091.343769         42               XCTEND    
For everyone used to calculate seconds and miliseconds by just looking at it, and with the little additional information the tracefile starts at
*** 2020-11-26 10:06:01.867
it can be no issue at all to see when exactly a call was executed, and maybe correlate it to otle rlogfiles of the application. 

As I want to provide more value than only relative timestamps, my idea was to let a program do the calculations. As I struggled doing so in pure mrskew commands, I asked for some help at stackoverflow
There Cary confirmed my goal isn't possible with pure mrskew, but mrwhen can be of some help. 

The first aproach was to extend my calls.rc file to create this output:

tim=12568091341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
tim=12568091343769         42               XCTEND    
but mrwhen wasn't happy (yet):

mrwhen: no START value in file 'bx1.trc'
mrwhen: no DEP value in file 'bx1.trc'
mrwhen: file 'bx1.trc' contains no timestamp line
The START value is not documented right now, but it can be easily guessed mrwhen needs a real timestamp to correlate the tim values. The next try is:

*** 2020-11-26 10:06:01.867

tim=12568091341788         36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
tim=12568091343769         42               XCTEND    
and now mrwhen changes it's mind slightly:

mrwhen: no DEP value in file 'bx1.trc'
mrwhen: file 'bx1.trc' contains no tim-containing line  
No complain about START anymore, and it doesn't see any tim-containing line? I'm very sure I see tim= there. 

After playing around by adding dep= ( as it complained about it) and e= (to provide some duration) - all without any success - it seems mrwhen wants the line to look more like a real tracefile. 

My final attempt is a file bx1.trc:

*** 2020-11-26 10:06:01.867

XCTEND tim=12568091341788 e=2 dep=0          36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ..
XCTEND tim=12568091343769 e=1 dep=0          42               XCTEND    
Now mrwhen still complains 

mrwhen: no DEP value in file 'bx1.trc'

But the result is useful now:

2020-11-26 10:06:01.867000  2020-11-26 10:06:01.867000  *** 2020-11-26 10:06:01.867

                                                                  END-TIM       LINE       SQL_ID CALL-NAME STATEMENT-TEXT
2020-11-26T10:06:01.867000  2020-11-26T10:06:01.867000  XCTEND tim=12568091341788 e=2 dep=0          36 5kpbj024phrws EXEC     /*Begin.Work*/ SELECT ...
2020-11-26T10:06:01.868981  2020-11-26T10:06:01.868981  XCTEND tim=12568091343769 e=1 dep=0          42               XCTEND

For post-processing I only need to remove everything starting with XCTEND e and 45 characters long.

sed -E 's/XCTEND t.{37}//'
The remaining post-processing can be done manually. 

With little additional help, i can generate the output desired.
Still I have some wishes for mrskew extensions, but they are not part of this post.

Montag, 4. Januar 2021

wrong checksum in VirtualBox due to Windows Defender Credential Guard?

I use Virtualbox a lot on my Laptops for all kind of playgrounds. I install the software to test in the VMs and do whatever is required for my test. 
But in one VM, I could not install a specific package. 
It seemed to me the checksum of the package to install was always wrong. 
I tried several ways to get the package onto the VM, like downloading it directly with wget, using an alternate repository, download it to my Windows laptop first (there the checksum was correct) and transfer it to the VM, ... 
Every time the checksum was wrong - but different! 
It differed depending on the position in the VMs filesystem. 

After some research this led me to a question on stackexchange
And some more clicks later I hit the DG_Readiness_Tool
It really seems, Microsofts Defender somehow affected either the reading of the file in the VM or the calculation of it's checksum. 
However, after going through the articles noted above (and some more) I decided to give it a try with 

DG_Readiness_Tool.ps1 -Disable -AutoReboot

The Laptop rebooted - and everything is fine now. 
(I had to re-load the package into the VM - this gives some clue where the issue happened, But I have no prove so no finger-pointing here)

I can not explain, why (and how) this happened, and I don't recommend to disable ANY security-feature. Just in my case it helped. 
(On another, quite similar, system the same VM with same package and also up-to-date Win10 never showed such issues)

Sonntag, 3. Januar 2021

identify index corruption

X for U

One of my customers faced a problem with false results
The most remarkable detail was the inconsistency of these wrong results. It depended on the queries he run.  So his first assumption was an error in the SQL parser/optimizer/wherever. 
But as I striped down the correct/wrong SQLs more and more to get a simple testcase, maybe something to provide to Oracle Support, it became clear the SQLs are fine, just one physical representation of the data was wrong!

Yes, even in a perfectly normalized relational database, there is a lot of physically duplicated data. One might think of materialized views, but in fact, every index is an implicit duplication of some data for optimized physical access. 

Each RDBMS promises to keep this duplicated in sync (if the word would have been used in IT in last millenium, they would have called it autonomous ), and each RDBMS fails in doing so sometimes. 

I tried to identify those rows and indexes where information in the index and the row differs. Then the customer needs to understand the data and decide, which data is correct, store this data in the block and re-create all indexes which does not match the tables data. 

The basic idea is to select the rows from the index and from the table and compare it's data. 
Even this sounds easy, Oracles Optimizer does a lot of effort to avoid what it sees as unnecessary work. Getting the same column of the same row from 2 different sources is such an avoidable work. 
I first tried some queries where I get the questionable columns (and their ROWIDs) in a sub-query and then join to the table again retrieving the tables row and value. Something like: 

select /*+ qb_name(main) */ t.pk, t.problematic_column
from TABLE t 
where t.rowid in
  select /*+ qb_name(get_rowid)  */ ti.rowid
  from TABLE ti 
  where ti.problematic_column='123456'
But to avoid the optimizers clevernes, I had to use some hints like NO_MERGE,  NO_UNNEST,  NO_ELIMINATE_SQ(@GET_ROWID), INDEX, MATERIALIZE and there might be even more with higher versions. 

My final approach is maybe not as eligant in pure SQL, but for me it provides more stability:  

  FUNCTION get_problematic_column (rid in varchar2) return number IS
   cid number;
    SELECT problematic_column into cid
    from TABLE t
    where rowid=rid;
    return cid;
  select /*+ INDEX(ti, SUSPICIOUS_INDEX) qb_name(get_rowid) */
    ti.rowid, ti.pk, ti.problematic_column as problematic_column_in_index,
    get_problematic_column(rowid) as problematic_column_at_ROWID
  from TABLE ti 
  where ti.problematic_column='123456'

This is not the most beautiful query, but it should give a good idea. 
The function get_problematic_column does a TABLE ACCESS BY USER ROWID and we can be very sure this will not change and provide the data as it's in the tables block. 
The query get_rowid should access the TABLE via the SUSPICIOUS_INDEX - I only have to ensure the WHERE clause matches the index columns. 

With little adaptions I could test all indices for invalid columns. 
In this case the tables data was valid and one index needed to be rebuilt. Much easier than opening a Service Request at MOS. 

If anyone wants to know the underlying hardware: it's an Exadata cluster (I was never told exact specifications) with proper ASM Diskgroup configuration.