2021-01-03

identify index corruption

X for U

One of my customers faced a problem with false results
The most remarkable detail was the inconsistency of these wrong results. It depended on the queries he run.  So his first assumption was an error in the SQL parser/optimizer/wherever. 
But as I striped down the correct/wrong SQLs more and more to get a simple testcase, maybe something to provide to Oracle Support, it became clear the SQLs are fine, just one physical representation of the data was wrong!

Yes, even in a perfectly normalized relational database, there is a lot of physically duplicated data. One might think of materialized views, but in fact, every index is an implicit duplication of some data for optimized physical access. 

Each RDBMS promises to keep this duplicated in sync (if the word would have been used in IT in last millenium, they would have called it autonomous ), and each RDBMS fails in doing so sometimes. 

I tried to identify those rows and indexes where information in the index and the row differs. Then the customer needs to understand the data and decide, which data is correct, store this data in the block and re-create all indexes which does not match the tables data. 

The basic idea is to select the rows from the index and from the table and compare it's data. 
Even this sounds easy, Oracles Optimizer does a lot of effort to avoid what it sees as unnecessary work. Getting the same column of the same row from 2 different sources is such an avoidable work. 
I first tried some queries where I get the questionable columns (and their ROWIDs) in a sub-query and then join to the table again retrieving the tables row and value. Something like: 

select /*+ qb_name(main) */ t.pk, t.problematic_column
from TABLE t 
where t.rowid in
  (
  select /*+ qb_name(get_rowid)  */ ti.rowid
  from TABLE ti 
  where ti.problematic_column='123456'
  );
But to avoid the optimizers clevernes, I had to use some hints like NO_MERGE,  NO_UNNEST,  NO_ELIMINATE_SQ(@GET_ROWID), INDEX, MATERIALIZE and there might be even more with higher versions. 

My final approach is maybe not as eligant in pure SQL, but for me it provides more stability:  

with
  FUNCTION get_problematic_column (rid in varchar2) return number IS
   cid number;
  begin
    SELECT problematic_column into cid
    from TABLE t
    where rowid=rid;
    return cid;
  end;
  select /*+ INDEX(ti, SUSPICIOUS_INDEX) qb_name(get_rowid) */
    ti.rowid, ti.pk, ti.problematic_column as problematic_column_in_index,
    get_problematic_column(rowid) as problematic_column_at_ROWID
  from TABLE ti 
  where ti.problematic_column='123456'
;

This is not the most beautiful query, but it should give a good idea. 
The function get_problematic_column does a TABLE ACCESS BY USER ROWID and we can be very sure this will not change and provide the data as it's in the tables block. 
The query get_rowid should access the TABLE via the SUSPICIOUS_INDEX - I only have to ensure the WHERE clause matches the index columns. 

With little adaptions I could test all indices for invalid columns. 
In this case the tables data was valid and one index needed to be rebuilt. Much easier than opening a Service Request at MOS. 

If anyone wants to know the underlying hardware: it's an Exadata cluster (I was never told exact specifications) with proper ASM Diskgroup configuration. 

Keine Kommentare: