2020-03-07

materialized WITH query

Sometimes I have to improve a SQL query where the same (or similar) subquery is used several times within the whole statement. This leads to many times the tables needs to be visited, even for the same rows.
It also makes the query hard to read.

A real life example I had to deal with this week is something like

SELECT *
FROM ( SELECT columns, aggregate functions
       FROM some tables
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col1
         FROM T1
         UNION 
         SELECT 0, IDb, col col1
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col2
         FROM T1
         UNION 
         SELECT 0, IDb, col col2
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col3
         FROM T1
         UNION 
         SELECT 0, IDb, col col3
         FROM T2 )
       ON some joins
       INNER JOIN
       ( SELECT IDa, 0 as IDb, col4
         FROM T1
         UNION 
         SELECT 0, IDb, col col4
         FROM T2 )
       ON some joins
       WHERE some filters
       GROUP BY columns)
WHERE more filters

In this case it's quite visible there are 4 INNER JOINs to the same UNION of 2 tables, only the columns differ.

Fig 1 original PLAN - begin


The optimizer tried it's best and the beginning of the plan looked like Fig1. 3 more iterations with SORT - VIEW - SORT - UNION ALL follow. It's amazing the cost is so low, But even with higher cost there is not much the optimizer could do.


So my idea was to put it into a WITH clause and replace the INNER JOIN select with it.

The WITH clause is

WITH my_inner as (
SELECT IDa, 0 as IDb, col1, col2, col3, col4
FROM T1
UNION ALL
SELECT   0, IDb,      col1, col2, col3, col4
FROM T2
)
 

And when replacing the first INNER JOIN
INNER JOIN
       ( SELECT * from my_inner )

Fig 2 changed PLAN - 1 replacement

It seems the optimizer did not like this as the cost increased.


But I continued with the next replacement:
Fig 3 changed PLAN - 2 replacements

The optimizer "understands" it can created a temporary object (the one which begins with SYS_TEMP_) and then re-uses it later.

With all 4 replacements, the Plan is different now:

Fig 4 changed PLAN - 4 replacements


The cost is still higher than in the original plan, but it's very likely the statement is faster than the original one.
In this case, there was no need to add the MATERIALIZED hint - it was done automatically, for very good reasons.

I like this optimization as it both, improves the readability of the query AND it's performance!