Today a single session made my day!
In the later morning I got a call form a colleague. I would call him a kind of ApplicationDBA, even there is no such definition in our company. He asked my, why a particular session is doing a lot of sorts in gv$session_longops. He also tole me, what the session was doing:
this script was started in the early morning by an operations-guy:
drop materialized view MV;
CREATE MATERIALIZED VIEW MV
TABLESPACE MVIEW_DATA
NOCACHE
NOLOGGING
NOPARALLEL
REFRESH FAST WITH ROWID
AS
select * from USER.TABLE@LINK;
CREATE INDEX I1 ON MV
(...)
LOGGING;
CREATE INDEX I2 ON MV
(...);
CREATE INDEX I3 ON MV
(...)
;
CREATE INDEX I4 ON MV
(...)
;
CREATE INDEX I4 ON MV
(...)
;
CREATE INDEX I5 ON MV
(...)
;
CREATE INDEX I6 ON MV
(...)
;
CREATE INDEX I7 ON MV
(...)
;
CREATE INDEX I8 ON MV
(...)
;
CREATE INDEX I9 ON MV
(...)
;
CREATE INDEX I10 ON MV
(...)
;
CREATE INDEX I11 ON MV
(...)
;
CREATE INDEX I12 ON MV
(...)
;
CREATE INDEX I13 ON MV
(...)
;
GRANT SELECT ON MV TO USER1;
--currently running as snapadmin:
exec dbms_snapshot.refresh('USER.MV ','c');
In fact, I did not know what's going on. But that's enough teason to start digging.
And as the session did run for some time now, I was pretty sure it would continue to do so and therefore give me enough time to
search and
think. (Yes, some Services where unavailable, but he agreed not to kill any session or reboot an instance without knowing what's going on).
My first search was the actual SQL (v$session sql_hash_value => v$sql_text).
It was something like
INSERT /*+ APPEND */ INTO MV AS
SELECT .... FROM
USER.TABLE@LINK;
I told my colleague about this, but he replied '
Yes, but I checked the MV, it's empty'.
Ok, this even I could explain by a simpe '
ASIC' - or in this phone call '
you will see the rows after the commit, and this can last'.
This answer was acceptable, but nevertheless, the legitimate question was '
how long will it run; and wehre is it standing right now?'
How to answer this?
v$session_longops
Does only give informations about the 'current' longop, not about the total transaction.
But as we assumed, there might be indices written right now (vsession_longops showed
sort and
sort merges), the next step was to check anything we can get about the indices.
Even we could not access them directly, we could check for some footprints:
select seg.owner, seg.segment_name, seg.segment_type, (sum(blocks)*8192)/(1024*1024) MB
from dba_segments seg, dba_indexes ind
where ind.table_owner ='OWNER and ind.table_NAME='MV'
and ind.owner = seg.owner
and ind.index_name = seg.segment_name
group by seg.owner, seg.segment_name, seg.segment_type
order by MB desc
/
By sampling this query we saw some full populated indices, and one just growing. So even for us it was clear what's going on: the indices are calculated and populated.
The next question I had to face was '
why is the index-creation so slow?'
Hmmm. Who can answer this?
manual sampling vsession_wait gave some hints:
some gc... events and latches on the buffer where top.
(Did I mention, this is on a 2-node RAC? ;-) )
It was time to switch to the other node and check for running sessions which has also cache fusion waits, and could interfer with our session of the day.
After some sampling and searching (vsession, vsession_wait, later on vsql_text) and additional clicking in Grid control (yes, it can be useful!) I found a possible culprit:
USER1 run a lot of queries on a view which contains USER.MV.
I didn't proove, but I guess, Instance2 held the master for MV as there where 100 sessions hanging on the View. But I didn't care of -
GV$GCSPFMASTER_INFO - I just wanted to get rid of all the cache-fusion waits until the MV is created.
Some might have seen the solution for this yet, it's not hidden.
Our
Silver Bullet of The Day was a simple:
REVOKE SELECT ON MV FROM USER1;
(had to be run for several times, as there where DDL-timeouts...))
Afterwards, the index-creation run quite fast - fast enough for us.
It's not the end of the story, but enough for now,
just to leave enough space for
Part 2 ;-)
Some infos at the end:
Many parts I can not explain in more detail, simply as I didn't log everything I did and therefore have no exact queries, output or whatever. (Maybe this is the most important area to improve myselve!)
I also used Tanel Poders
latchprofx and
snapper to collect well sampled data about the session.