berxblog: materialize

Posts mit dem Label materialize werden angezeigt. Alle Posts anzeigen

2021-01-03

identify index corruption

One of my customers faced a problem with false results.
The most remarkable detail was the inconsistency of these wrong results. It depended on the queries he run. So his first assumption was an error in the SQL parser/optimizer/wherever.
But as I striped down the correct/wrong SQLs more and more to get a simple testcase, maybe something to provide to Oracle Support, it became clear the SQLs are fine, just one physical representation of the data was wrong!

Yes, even in a perfectly normalized relational database, there is a lot of physically duplicated data. One might think of materialized views, but in fact, every index is an implicit duplication of some data for optimized physical access.

Each RDBMS promises to keep this duplicated in sync (if the word would have been used in IT in last millenium, they would have called it autonomous ), and each RDBMS fails in doing so sometimes.

I tried to identify those rows and indexes where information in the index and the row differs. Then the customer needs to understand the data and decide, which data is correct, store this data in the block and re-create all indexes which does not match the tables data.

The basic idea is to select the rows from the index and from the table and compare it's data.
Even this sounds easy, Oracles Optimizer does a lot of effort to avoid what it sees as unnecessary work. Getting the same column of the same row from 2 different sources is such an avoidable work.
I first tried some queries where I get the questionable columns (and their ROWIDs) in a sub-query and then join to the table again retrieving the tables row and value. Something like:

select /*+ qb_name(main) */ t.pk, t.problematic_column
from TABLE t 
where t.rowid in
  (
  select /*+ qb_name(get_rowid)  */ ti.rowid
  from TABLE ti 
  where ti.problematic_column='123456'
  );

But to avoid the optimizers clevernes, I had to use some hints like NO_MERGE, NO_UNNEST, NO_ELIMINATE_SQ(@GET_ROWID), INDEX, MATERIALIZE and there might be even more with higher versions.

My final approach is maybe not as eligant in pure SQL, but for me it provides more stability:

with
  FUNCTION get_problematic_column (rid in varchar2) return number IS
   cid number;
  begin
    SELECT problematic_column into cid
    from TABLE t
    where rowid=rid;
    return cid;
  end;
  select /*+ INDEX(ti, SUSPICIOUS_INDEX) qb_name(get_rowid) */
    ti.rowid, ti.pk, ti.problematic_column as problematic_column_in_index,
    get_problematic_column(rowid) as problematic_column_at_ROWID
  from TABLE ti 
  where ti.problematic_column='123456'
;

This is not the most beautiful query, but it should give a good idea.
The function get_problematic_column does a TABLE ACCESS BY USER ROWID and we can be very sure this will not change and provide the data as it's in the tables block.
The query get_rowid should access the TABLE via the SUSPICIOUS_INDEX - I only have to ensure the WHERE clause matches the index columns.

With little adaptions I could test all indices for invalid columns.

In this case the tables data was valid and one index needed to be rebuilt. Much easier than opening a Service Request at MOS.

If anyone wants to know the underlying hardware: it's an Exadata cluster (I was never told exact specifications) with proper ASM Diskgroup configuration.

2020-03-07

materialized WITH query

Sometimes I have to improve a SQL query where the same (or similar) subquery is used several times within the whole statement. This leads to many times the tables needs to be visited, even for the same rows.
It also makes the query hard to read.

A real life example I had to deal with this week is something like

SELECT *
FROM ( SELECT columns, aggregate functions
       FROM some tables
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col1
         FROM T1
         UNION 
         SELECT 0, IDb, col col1
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col2
         FROM T1
         UNION 
         SELECT 0, IDb, col col2
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col3
         FROM T1
         UNION 
         SELECT 0, IDb, col col3
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col4
         FROM T1
         UNION 
         SELECT 0, IDb, col col4
         FROM T2 )
       ON some joins
       WHERE some filters
       GROUP BY columns)
WHERE more filters

In this case it's quite visible there are 4 INNER JOINs to the same UNION of 2 tables, only the columns differ.

Fig 1 original PLAN - begin

The optimizer tried it's best and the beginning of the plan looked like Fig1. 3 more iterations with SORT - VIEW - SORT - UNION ALL follow. It's amazing the cost is so low, But even with higher cost there is not much the optimizer could do.

So my idea was to put it into a WITH clause and replace the INNER JOIN select with it.

The WITH clause is

WITH my_inner as (
SELECT IDa, 0 as IDb, col1, col2, col3, col4
FROM T1
UNION ALL
SELECT   0, IDb,      col1, col2, col3, col4
FROM T2
)

And when replacing the first INNER JOIN

INNER JOIN
       ( SELECT * from my_inner )

Fig 2 changed PLAN - 1 replacement

It seems the optimizer did not like this as the cost increased.

But I continued with the next replacement:

Fig 3 changed PLAN - 2 replacements

The optimizer "understands" it can created a temporary object (the one which begins with SYS_TEMP_) and then re-uses it later.

With all 4 replacements, the Plan is different now:

Fig 4 changed PLAN - 4 replacements

The cost is still higher than in the original plan, but it's very likely the statement is faster than the original one.
In this case, there was no need to add the MATERIALIZED hint - it was done automatically, for very good reasons.

I like this optimization as it both, improves the readability of the query AND it's performance!

berxblog

2021-01-03

identify index corruption

2020-03-07

materialized WITH query

Follower

Über mich