Dienstag, 23. Juni 2015

SQL Plan Directives and result cache affects performance

In my current company we are preparing a migration of our billing application to a new version. During this migration there are the ordinary changes of infrastructure as well. Application servers from HPUX to Linux, database servers from RH5 to RH6, different storage subsystem, Oracle from 10.2 to 12c, different application partitioning, and so on ...
At least from the management perspective the expectation is clear: everything is shiny new and costed a lot of money. So it must be faster than before.

In this case the first test of a billrun was not faster. Our application mate contacted the DBA team as it seems to be significantly slower than on the [old env]

As you can see in the graph most sessions spent their time in wait class Other. It's the wait latch free - so it's a serialization issue. But the latch was Result Cache: RC Latch and it was (at least for some sessions) in exclusive mode, as a shared mode only would not cause the sessions to wait.
The DB has set RESULT_CACHE_MODE = MANUAL and the statements affected where like (slightly simplified)
select distinct ID 
from instance_history 
where base__id = 123456 
  and status_code = 3 
  and start_date >= to_date('20150515', 'yyyymmdd')     
  and start_date <= to_date('20150614', 'yyyymmdd')
As there is no RESULT_CACHE hint MODE is MANUAL and nothing is set in the sessions as well, it was not obvious why and where the latch is required at all. A quick check in ASH showed all sessions with this wait had IN_PARSE=Y.  It seems to be strange to have the result cache involved in parsing of a schema in exclusive mode, as this is only required to enter values to the result cache, and the statement is not executed yet.
But sometimes there is a statement executed in PARSE state: There are recursive SQLs fired by the optimizer to get more informations and deliver better plans. In this case it's dynamic sampling.
In the DB the parameter optimizer_dynamic_sampling is 2. And the table had statistics. The optimizer should not do any dynamic sampling based on this parameter.
But a new 12c feature initiates dynamic sampling regardless the DB settings: SQL Plan Directives. Here the SPD told the optimizer the stats on table instance_history might not be enough and it should do some dynamic sampling there as well.
By some luck I attended a presentation from Christian Antognini about Adaptive Dynamic Sampling some days ago. There he mentioned in the optimizer uses optimizer_dynamic_sampling level 11 regardless th spfile settings.  Even worse, it lies about this in the Note section of explain plan.

With all these informations at hand the quick summary at that time was:
A lot of parse creates a huge amount of (identical) dynamic sampling statements. Those try to utilize the result cache by the hint RESULT_CACHE(SNAPSHOT=3600). As the result cache was configured very small, there were few hits but many misses - and all those misses tried to add their results to the cache. Other sampling statements had to wait for that. And as totally different statements are run at the same time, those wiped out the cache again and again.

As a first reaction we increased the result cache in the database.It looks much better now. latch free is down to 16% (CPU:50%, db file sequential read: 25%). But still sometimes the throughput drops due to serialization.  Of course we could again increase the result cache. But we decided another approach.

Back to the first issue, we have many statements in PARSE. As all of these statements had the same execution plan, there is no need for parsing the statement again and again. But it has literals instead of bind variables.
One suggestion was to set cursor_sharing=force. We could do this on instance-level or via login-trigger for a specific user/host/program. But still I prefer to have a most specific solution with least side effects.
So we asked the colleague responsible for the application tests if the statement could be changed to use bind variables. And to my big surprise it was possible without a big issue!

Now all the latch free waits are gone. The topmost "latch"-event is latch: cache buffer chains.

During these investigations I learned a lot about SQL Plan Directives, dynamic sampling and even result cache latches. It shows it's still true in 12c to avoid unnecessary parses - for one more reason now.

I have to thank Chris Antognini, Franck Pachot, Martin Klier and Stefan Koehler for their blogs, presentations and fast support via twitter.

Update (2016-07-21): In the meantime Oracle created a Note about this behavior: High "Latch Free" Waits with Contention on 'Result Cache: RC Latch' when RESULT_CACHE_MODE = MANUAL on Oracle 12c (Doc ID 2002089.1). There it is called an expected behavior, but the Solution is to disable Automatic Dynamic Statistics using
alter system set "_optimizer_ads_use_result_cache" = FALSE;
Thank you Laurent Leturgez for sharing: