Donnerstag, 31. Dezember 2009

USL in excel - without the need to read

Neil Gunther created a nice small excel spreadsheet to calculate the key values sigma and kappa for his Universal Scalability Law - primarily for the use in his GCaP class. Beside some other problems with excels numerical precision and Negative Scalability Coefficients in Excel I dislike the ugly 'create a graphic, let it show a trendline and it's coefficients, add these into some cells and continue' step. Fortunately, Scott Roberts has created some google spreadsheets where he implemented a real formula (as google spreadsheets does not support trendlines and their coefficients). These are mentioned in Neils blog entry Scalability in a Spreadsheet - google style.
As this method is also available in excel, I decided to extend Neils excel by some functions to avoud the read and insert part.
sscalc-class_berx1 It is not just easier to apply, it's also more accurate as the coefficients shown in the graph are shown with small rounding.

With the formula shown here:
sscalc-class_berx2

Don't be confused by the semicolons in the formula, depending on your language settings, excel sometimes use colon or semicolon to separate fields of the formula.
Here the detailed fields I've added:

Field H11:
Field I8:=INDEX(LINEST($G8:$G14;$F8:$F14^{1,2}; FALSE); 1)
Field I9: =INDEX(INDEX(LINEST($G8:$G14;$F8:$F14^{1,2}; FALSE;TRUE); 1);2)
Field I10: =INDEX(INDEX(LINEST($G8:$G14;$F8:$F14^{1,2}; FALSE;TRUE); 1);3)
Field I11: =INDEX(INDEX(LINEST($G8:$G14;$F8:$F14^{1,2}; FALSE;TRUE); 3);1)

I hope i used the LINEST function correct, and this might help others to reduce one step while playing with USL.

Mittwoch, 30. Dezember 2009

MOS can't count

MOS (the flash version) can not even count up to 3:

Can someone show me the hidden element?

Guerrilla Capacity Planning arrived - let's prepare the underground resistance!

Some weeks ago one of my X-mas gifts to myself arrived. (One of those I'm always allowed to buy as none of my relatives would ever imagine to buy me such books for Christmas): Guerrilla Capacity Planning by Neil J. Gunther. In fact, I was only interested in his Universal Scalability Law, which did me a good service in a performance review some weeks ago.
I really like this book and I'm sure it will go with me for some more time; at least every day I go to work by public transport, but also in many other situations.

The idea which gave the book it's name is the Guerrilla Capacity Planning. This is described in the first 2 chapters.
I read it as another iteration of KISS, applied to Capacity Planning. The main goal is to achieve fast and just good enough predictions by the usage of minimal resources and time. It just leads one step further and advises to be well prepared in lean tools and flexible methods to reach the goal when needed. In my current situation, I also see it as a hint to do Capacity Planning in every project, even if it's not calculated within the project plan. But keep it small enough to hide it within the jitter every project contains. The following chapters try to provide some simple weapons to be prepared for all tactical situations which can occur.

Chapter 3 is about significant digits, rounding rules and errors.
These 13 pages where worth reading for me, because I never had a full qualified mathematical or statistical education. It might be just enough to estimate the errors I carry through all my work and provide them (and their meanings to the results) in any discussion, presentation and so on.

Chapters 4 to 6 are the real reason why I ordered this book: the Universal Scalability Model (or Law?). Neil leads in a very consistent way, why scalability is not only limited by contention, as Amdahls law implies, but also by coherency, which leads to a retrograde in scalability beyond p*.
Not only the Model itself is described, but - important for all Guerrilleros - an easy method to gain the parameters for σ and κ out of some measured data with excel is provided. Not mentioned in the book directly but easily to find is the spreadsheet which contains exactly the method Neil provides.
I would like to have also other methods provided within the book to circumvent the problem with excels numerical precision, but Neil provided (and discussed it's difficulties) a glimpse of a implementation in R somewhere else.

In chapter 7 the main focus is on virtualization across all scales, from in-CPU (Hyperthreading) up to Grids and P2P on the other end of the scale. Here it's more about queues, schedules and polling cycles. I had the feeling I should have read Analyzing Computer System Performance with Perl::PDQ before. But I have not. (I will, after the 2nd ed. is published). Also there is not too much ammunition for my underground resistance. More a rough description of that area and some major snares.

Currently I cannot say much about chapters 8 to 11 as I'm in the middle of 8 at the moment. they will also be worth to write about them - in the future.

There is also some criticism outside there which laments, USL does only provide limited practical use for the forecast, until a set of data is measured and σ and κ are derived from these. In comparison, Amdahls s can be measured so much easier as the single threaded phase of a program. This criticism provides its own error inside: If s can be measured, this mean the setup is well instrumented and known and the target hardware is chosen (otherwise, the measurement must be somehow translated to the target hardware) at least for a single process. But, with a well instrumented and know setup, also all code parts which will account to coherency can be spotted and therefore not only Amdahls s leads to σ, but the coherency part leads to κ.
Unfortunately, even in a well instrumented Software like Oracle, nobody sorted all the wait events where they account to.

Sonntag, 22. November 2009

PRKO-2007 is not always correct

This week I had to stop and start an instance in a CRS-cluster. Nothing special so far, I just hit a PRKO-2007 error. For those which does not know it by heart, here the short description:
PRKO-2007: "Invalid instance name: "
Cause: An invalid instance name was entered.
Action: Use the correct instance name for the database. Run 'srvctl config database
-d ' command to find out all instances of a database in the
Cluster Database Configuration.


and here the copy of my commands (solution included):
oracle@aves742p:~/ [EDWP031] srvctl stop instance -d EDWP03 -i EDWP031
PRKO-2007 : Invalid instance name: EDWP031
oracle@aves742p:~/ [EDWP031] srvctl stop instance -d EDWP03_SITE1 -i EDWP031


I'm sure, you see the difference, and it's not the instance!
Just once again: don't trust the error message 100%, sometimes it's still worth to use the brain ;-)

Mittwoch, 11. November 2009

hardware lasts longer than software

2.0
A week ago, Oracle has replaced its good old support platform MetaLink with MyOracleSupport (MOS).
The thing I like most about MOS is the non Flash-based HTML-Version.
Many discussions about the need, target and failures during the last week of transition are ongoing in blogs, mailing lists, forums; I also had to suffer and had to communicate it. But that's not what's this blog is about.

I got into touch with MetaLink in 2000, as Oracle tried to move the customers from calling the Support to a self-service platform. They promoted MetaLink on many events, with everything you can imagine. The transition was not forced, but within a short period of time most customers liked the possibility to get many answers faster than on the phone. A real win-win situation.
Also I learned how to use MetaLink effective and therefore like it.

Metalink iTar

On one of these events, I got the promo-mug. I like it, because I like MetaLink.
Nowadays, everytime I feel without any Support from Oracle, I cling to the mug, hopefully filled with good, strong, hot coffee. This makes me feel a little bit better.
And sometimes, hardware really lasts longer than software, especially if you have the hardware in your hands and take care of.

Dienstag, 10. November 2009

rollback in PL/SQL

I recently came across a slightly, but possible dangerous mismatch between transactions and the PL/SQL code based on it.
Here a cut down example (not very realistic, but easy to follow):

create table mytab (myval number);

create or replace procedure tuwas (
varA in number,
varB in number,
varC OUT number)
is
begin
varC := varA;
insert into mytab values (varA);
if varB = 0 then
commit;
else
rollback;
end if;
end;
/

The Idea behind the code is, to get varA and varB as input parameters, do some calculations (missing here), insert the result into a table and return the calculated value for next steps.
There might be reasons to rollback the DML within the procedure, which is not seen as an error.

And here the result of the procedure:

set serverout on

declare
vA number;
vB number;
vC number;
begin
vA := 1;
vB := 0; -- commit;
tuwas(vA, vB, vC);
dbms_output.put_line('vC: ' ||vC);
vA := 2;
vB := 1; -- rollback;
tuwas(vA, vB, vC);
dbms_output.put_line('vC: ' ||vC);
END;
/

vC: 1
vC: 2

select * from mytab;

MYVAL
----------
1


What's dangerous here? Even with

vA := 2;
vB := 1;
the return value

vC: 2
is given. But t's not visible in the table, as the procedure did a rollback.
I learned from this example never to populate return-values until you are sure it's really stored (if this is a requirement, of course).

Montag, 9. November 2009

Oracle Security Options


Last week I attended a 1 day Seminar about Oracle Security by Dominique Jeunot.
Dominique did a great job to squeeze everything she knows into one day, and it was allways obvious there is a lot more she could tell us in the script and even more in her brain. Nevertheless she created a very good overview with the right level on all topics. It was obvious she could only make a kind of teaser, but it was a very tasty one!
Maybe the most important picture she draw was the first overview about all available features, options and products. I did the effort to draw it for my own joy and will share it here, as it might be helpful for others as well. All options are colored red, so you can see where you have to take extra money.

In general you can say: for nearly every issue there are possibilities to do it without extra money (but extra effort) or you can buy a solution.

If someone wants the visio behind this jpeg, just contact me :)

Dienstag, 27. Oktober 2009

ML-fun

ERR-7620 Could not determine workspace for application ()
sometimes it's really funny to see error messages.
This ony I got today:
I think, I'm missing my workspace (or is it the application, who knows ...)
Maybe this is only because of the ongoing transition phase and within the next days everything will be better...

Freitag, 2. Oktober 2009

xplan 2.1.3 is out

Alberto Dell'Era released a ne version of his xplan utility. Version 2.1.3 is out now.
Why is it's worth to mention?
1st he fixed a problem I faced within extraordinary speed (< 4 hrs!):


declare /* xplan_exec_marker */ *** error before main block (
processing XPLAN_OPTIONS ) *** -- main block
*
ERROR at line 1:
ORA-06550: line 1, column 34:
PLS-00103: Encountered the symbol "*" when expecting one of the following:
begin function pragma procedure subtype type <an identifier>
<a double-quoted delimited-identifier> current cursor delete
exists prior

--in setup.lst it's this error:

declare /* xplan_exec_marker */ -- process options
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at line 28


and 2nd: he didn't even announce it himself, so someone must do it :)

Montag, 14. September 2009

result cache not enabled for owner SYS and SYSTEM

EDIT: the original title of this blog was "result cache not enabled in SYSTEM and SYSAUX Tablespace".
After a totally correct post from Coskan I had to rework it totally. I mixed up the problem and conclusio so much it was totally messed up.

I hope this one is the better one:




During some Tests with result cache in 11gR2 I discovered a small docu bug.
You can still see the bug in the 11gR1 docu:

You cannot cache results when you use the following database objects or functions in your SQL query:

  • Dictionary and temporary tables

  • Sequence CURRVAL and NEXTVAL pseudo columns

  • SQL functions current_date, current_timestamp, local_timestamp, userenv/sys_context (with non-constant variables), sys_guid, sysdate, and sys_timestamp

  • Non-deterministic PL/SQL functions


In 11gR2 it sounds slightly different: 7.6.4.2 Additional Requirements for the Result Cache:

You cannot cache results when the following objects or functions are in a query:

  • Temporary tables and tables in the SYS or SYSTEM schemas

  • Sequence CURRVAL and NEXTVAL pseudo columns

  • SQL functions CURRENT_DATE, CURRENT_TIMESTAMP, LOCAL_TIMESTAMP, USERENV/SYS_CONTEXT (with non-constant variables), SYS_GUID, SYSDATE, andSYS_TIMESTAMP


for those of you who want to run the testcase:

create tablespace berx1 datafile '+DG' SIZE 50M;
create user berx1 identified by berx1;
grant connect to berx1;
grant unlimited tablespace to berx1;

create table sys.berx1_1 tablespace system as select * from dual;
create table sys.berx1_2 tablespace sysaux as select * from dual;
create table sys.berx1_3 tablespace berx1 as select * from dual;
create table system.berx1_4 tablespace system as select * from dual;
create table system.berx1_5 tablespace sysaux as select * from dual;
create table system.berx1_6 tablespace berx1 as select * from dual;
create table berx1.berx1_7 tablespace system as select * from dual;
create table berx1.berx1_8 tablespace sysaux as select * from dual;
create table berx1.berx1_9 tablespace berx1 as select * from dual;
grant select on sys.berx1_1 to berx1;
grant select on sys.berx1_2 to berx1;
grant select on sys.berx1_3 to berx1;
grant select on system.berx1_4 to berx1;
grant select on system.berx1_5 to berx1;
grant select on system.berx1_6 to berx1;

set autotrace on

-- run as user sys, system and berx1:

select /*+ result_cache */ * from sys.berx1_1;
select /*+ result_cache */ * from sys.berx1_2;
select /*+ result_cache */ * from sys.berx1_3;
select /*+ result_cache */ * from system.berx1_4;
select /*+ result_cache */ * from system.berx1_5;
select /*+ result_cache */ * from system.berx1_6;
select /*+ result_cache */ * from berx1.berx1_7;
select /*+ result_cache */ * from berx1.berx1_8;
select /*+ result_cache */ * from berx1.berx1_9;


my results are reflected in this matrix:





berx1_1 berx1_2 berx1_3 berx1_4 berx1_5 berx1_6 berx1_7 berx1_8 berx1_9
sys N N N N N N Y Y Y
system N N N N N N Y Y Y
berx1 N N N N N N Y Y Y


So in my testcase it's not dependend of the schema (or the data dichtionary), but the tablespace.
This issue is addressed in BUG:8558309 and fixed in the latest docu :)

I know, I should never create any objects in SYS or SYSTEM schema. Especially if you want this object to be cached in result cache.

Once again thank you Coskan for showing my ridiculous error.

Mittwoch, 26. August 2009

-prelim and direct_access

I recently asked a question about the 'difference' between /prelim and direct_access on oracle-l.
Tanel Poder gave a great answer. Now it's nearly clear, at least for me:

-prelim allows sysdba to start a preliminary connection to db, which
means that a server process is started for you and it attaches to sga
segments, but it does not allocate anything inside sga (no process
state nor session state object and session structures are allocated
for example). This means tha no latch gets are needed in prelim mode
and if you have an instance hang due these latches your session wont
get stuck. You can do very limited things with prelim mode, mostly
just various dumps with oradebug.

Oradebug direct-access allows you to also query few x dollar tables
with sql-like syntax. thats fake sql though its oradebugs hack, not
real sql. The same is doable with oradebug dumptype as well.


Or, to summarise it (just for me):
  • -prelim allows me to start a sqlplus binary and connect it to the (correct ≈ right $ORACLE_HOME and $ORACLE_SID) SGA without creating anything within this SGA. So for me it's the cheapest way to get access to the shared memory of the specific instance.
  • this prelim state is of no really use in ordinary life. Just for some really rare occasions, it might be of any use. Most of these are oradebug variants. I have not tested any ALTER SYSTEM or ALTER SESSION statement yet. Some of them might be worth testing, but most will fail. Ordinary SQL will fail all the time (I'm sure).
  • oradebug direct-access is a nice gadget to query some x$ tables without really going through all the sql-parse-pin-heap-whatever engine ;-)
  • at last, -prelim is a kind of last resort to access an instance and at least get some informations, until some really hard direct SGA access methods must be considered

Dienstag, 18. August 2009

The truth about NULL

"Do you want something to eat?" - "NO"
"Do you want nothintg to eat?" - "NO"
"What do you want?" - "I don't know" (off she run crying)

That was a conversation between me and my 3 years old daughter.
As she run away, totally overstrained by the situation, her dad and her world over all, I had pity on her.
And one thought further, I recognised the true answer she gave me:
it's not "YES", it's not "NO" (regarding the 1st question), it's "NULL". Or translated into what I think are the thoughts of a 3 years old girl: "I don't know what to do now, so I'm against anything and run away!"
From now on, if any application designer argues with me, why any column can not be set NOT NULL, I will remember this picture of a overstrained, 3 years old girl and just feel pity.

Dienstag, 4. August 2009

cbo costing for pl/sql functions

I rarely create a blog entry just to refer to another article, but this time it's worth to do so:
pl/sql functions and cbo costing from Adrian Billington is worth reading and at the end a must know for anyone, who writes pl/sql functions and considers performance.

Mittwoch, 15. Juli 2009

linesize.sql

For a new set of db-servers I'm currently reworking our sql-script-toolbox.
Many of these scripts are copied and modified from other sources, but this particular is created by myself.
It's target is to set the linesize of sqlplus according the actual size of the unix terminal (yes, it's only for unix systems). So if you change the size of your terminal window, and you have this script in your SQL_PATH, just type @linesize and you use your terminal the best way you can.

It uses a defined mycpid to distinguish different processes at the same time, and I populated this define with a logon.sql which is derived from Tanel Poders init.sql out of his package TPT_public.zip



-- linesize.sql
--
-- mycpid must be defined before - e.g. in logon.sql
--
-- set linesize according to stty
-- handle with care, as stty is not consistent over platform boundaries
--

set termout off

define _linesize_cmd1 = 'echo -n "set linesize " > /tmp/linesize_&mycpid..sql '
-- for solaris:
define _linesize_cmd2 = 'stty -a | awk -F''[ ;]'' ''/columns/ { print $7 }'' >>/tmp/linesize_&mycpid..sql'
-- for linux:
-- define _linesize_cmd2 = 'stty -a | awk -F''[ ;]'' ''/columns/ { print $9 }'' >>/tmp/linesize_&mycpid..sql'

HOST &_linesize_cmd1
HOST &_linesize_cmd2

@ '/tmp/linesize_&mycpid..sql'

undef _linesize_cmd1 _linesize_cmd2

set termout on



Unfortunately the output of stty -a is not defined, so every OS provides a slightly different format and I have to change the parsing for every OS and sometomes even major release.

Might it be useful for someone else, too ;)

Montag, 18. Mai 2009

before select trigger

The whole idea to create a 'before select trigger' is based on a comment Martin Jensen dropped during his Seminar about Advanced Materialized Views in Vienna this week. So this all is not my idea, I just wanted to test the suggestion to get used to it.
The whole functionality utilizes FGAC.
As some others have also noticed, the function is called twice per every execution, so I have to reduce the effective execution of my action.

So let's start with - a log_table:
create table log_table
(datum date,
c varchar2(80)) ;

Now something which can write into this table.

The package is necessary to hold my package variable tot_calls and eliminate every 2nd execution of the function (otherwise it would fire twice, not recommended in most cases, I guess).

The autonomous_transaction might be necessary in most cases, but if someone find business-cases where it's recommended not to be autonomous, just drop this line and have the trigger under full transaction control.

CREATE OR REPLACE PACKAGE my_select_package AS
FUNCTION my_select_trigger (p_schema in varchar2,
p_object in varchar2 ) return varchar2;
END my_select_package;
/

CREATE OR REPLACE PACKAGE BODY my_select_package AS
tot_calls NUMBER := 0;
function my_select_trigger( p_schema in varchar2,
p_object in varchar2 ) return varchar2
as
PRAGMA AUTONOMOUS_TRANSACTION;
begin
tot_calls := tot_calls + 1;
if mod(tot_calls, 2) > 0
then
insert into log_table values (sysdate, 'function was here: ' ||p_schema ||'.' || p_object || ' tot_calls: ' ||tot_calls);
commit;
end if;
return NULL;
end;
END my_select_package;
/
What are we still missing? A table to connect the trigger to, with some data:
create table my_table
( data varchar2(30)
);

insert into my_table ( data ) values ( 'Some Data' );

insert into my_table ( data ) values ( 'Some other Data' );

commit;
Now let's tight these pieces together:
begin
dbms_rls.add_policy
( object_schema => 'SR3',
object_name => 'MY_TABLE',
policy_name => 'MY_SELECT_TRIGGER',
function_schema => 'SR3',
policy_function => 'MY_SELECT_PACKAGE.MY_SELECT_TRIGGER',
statement_types => 'select' ,
update_check => FALSE );
end;
/
and some trivial tests:
select * from my_table;

DATA OWNER
------------------------------ ------------------------------
Some Data SR3
Some Data Owned by SCOTT SCOTT

select * from log_table;

DATUM C
------------------- --------------------------------------------------
2009-05-18 13:19:09 function was here: SR3.MY_TABLE tot_calls: 1

select * from my_table;
-- ...

select * from log_table;

DATUM C
------------------- --------------------------------------------------
2009-05-18 13:19:09 function was here: SR3.MY_TABLE tot_calls: 1
2009-05-18 13:19:15 function was here: SR3.MY_TABLE tot_calls: 3
works like a charm.

Anything more to say?
Of course, as allways, this is only a proove of concept.
As allways, most parts are borrowed from several sites.
Don't use this blindly anywhere. I totally left everything out, which could confuse me. Like exception handling, hard testing, theoretical and practical security checks, performance baselines, etc.
...
and at the end, the most global citation from Martin Jensen: 'If it's not tested, it does not work'

Samstag, 9. Mai 2009

odyssey of a session - Part 2

An hour after I finished my work on explaining and supporting one poor session, I got a call from my honoured colleague again.
'The session stopped doing any sorts in v$session_longops and also no more segments on the indices are written. What's going on?'
My first guess was 'Did it just finish?', but that was not the case. The table still seemed to be empty (so no commit yet). The session was still active, but SQL_HASH_VALUE was NULL. Also sampling v$session_wait showed diferent events, so the process was doing anything. We just did not know, what.
Even I didn't know, what's going on, I knew something is going on. I couldn't pin it down with the knowledge I have using plain oracle methods, so I decided to switch to another method. First OStackProf came to my mind, but unfortunately I was sitting on an Apple Computer at this moment, and I didn't want to care about VBS on Apple. Not a big deal at all, because after a short crawling through Tanles box of marvelous miracles, I grabbed os_explain.
Let's have a look what the process is doing.
After running
pstack 11460 | ./os_explain
some times (5 or 6 where enough) it was clear there is no ordinary execution plan running. We got 2 stacks ( with little changes in the end, but this didn't count). Putting these side by side for easier camparison:
ksupop                  ksupop      
ksudlc ksudlc
ktcrsp ktcrsp
ktursp ktursp
ktubko ktubko
kcbgcur ktundo
kcbzib kqrpre
kcfrbd kqrpre1
ksfdread kslfre
ksfd_odmio
odm_io
ioctl
I highlighted the first diference:
kcbgcur vs. ktundo.
To decipher these names (ok, in this particular case it wasn't too difficult, even for me) we had a look in Note:175982.1 - ORA-600 Lookup Error Categories. kcbgcur will be more or less Kernel Cache Buffer Current (don't know about the g) and ktundo is about undo and rollback management.
This still didn't answer all my questions, but what can a session be doing if it doesn't execute a SQL, (there is no sign for PL/SQL, java or similar currently executed by the session) and there is a lot of work performed in Cache and Undo? The next idea was: Rollback!
v$transaction came into my mind. And a little bit googling led me to this query:


SELECT a.sid, a.username, b.xidusn, b.used_urec, b.used_ublk
FROM v$session a, v$transaction b
WHERE a.saddr = b.ses_addr;

Yes, USED_UREC was decreasing! So the poor session, which worked for mor than 12 hours now, did a rollback.
Some phone calls later, the operations-guy who started the whole script told us 'Yes, about noon there where some hanging situations on my PC, but Toad looks ok again, I'm just waiting for the script to finish'.
Poor guy, and lucky me for enabling Dead connection Detection in sqlnet.ora. Otherwise the whole statement would have run until the end and started the rollback THEN.

This was the last chapter in the odyssey of my little poor session. After the rollback finished, it just closed (as the client connection died before).

Some words at the end?
I assume, the script which caused the problem was never tested properly.
Even with limited knowledge, a more or less systematic aproach is possible.
There is a lot of free information and tools out there. Just try to know about them before you need them.
I should try to learn more about sessions in rollback.
No fancy Oracle-features where needed. No statspack, no AWR (as the DB is 9i - did I mention that?). Not even dtrace.
It was a great day for me, working with a DB again and not only talking about DBs in endless meetings.

Donnerstag, 7. Mai 2009

odyssey of a session - Part 1

Today a single session made my day!
In the later morning I got a call form a colleague. I would call him a kind of ApplicationDBA, even there is no such definition in our company. He asked my, why a particular session is doing a lot of sorts in gv$session_longops. He also tole me, what the session was doing:
this script was started in the early morning by an operations-guy:

drop materialized view MV;

CREATE MATERIALIZED VIEW MV
TABLESPACE MVIEW_DATA
NOCACHE
NOLOGGING
NOPARALLEL
REFRESH FAST WITH ROWID
AS
select * from USER.TABLE@LINK;

CREATE INDEX I1 ON MV
(...)
LOGGING;

CREATE INDEX I2 ON MV
(...);

CREATE INDEX I3 ON MV
(...)
;

CREATE INDEX I4 ON MV
(...)
;

CREATE INDEX I4 ON MV
(...)
;

CREATE INDEX I5 ON MV
(...)
;

CREATE INDEX I6 ON MV
(...)
;

CREATE INDEX I7 ON MV
(...)
;

CREATE INDEX I8 ON MV
(...)
;

CREATE INDEX I9 ON MV
(...)
;

CREATE INDEX I10 ON MV
(...)
;

CREATE INDEX I11 ON MV
(...)
;

CREATE INDEX I12 ON MV
(...)
;

CREATE INDEX I13 ON MV
(...)
;

GRANT SELECT ON MV TO USER1;

--currently running as snapadmin:

exec dbms_snapshot.refresh('USER.MV ','c');
In fact, I did not know what's going on. But that's enough teason to start digging.
And as the session did run for some time now, I was pretty sure it would continue to do so and therefore give me enough time to search and think. (Yes, some Services where unavailable, but he agreed not to kill any session or reboot an instance without knowing what's going on).

My first search was the actual SQL (v$session sql_hash_value => v$sql_text).
It was something like
INSERT /*+ APPEND */ INTO MV AS
SELECT .... FROM
USER.TABLE@LINK;
I told my colleague about this, but he replied 'Yes, but I checked the MV, it's empty'.
Ok, this even I could explain by a simpe 'ASIC' - or in this phone call 'you will see the rows after the commit, and this can last'.
This answer was acceptable, but nevertheless, the legitimate question was 'how long will it run; and wehre is it standing right now?'
How to answer this?
v$session_longops
Does only give informations about the 'current' longop, not about the total transaction.

But as we assumed, there might be indices written right now (vsession_longops showed sort and sort merges), the next step was to check anything we can get about the indices.
Even we could not access them directly, we could check for some footprints:
select seg.owner, seg.segment_name, seg.segment_type, (sum(blocks)*8192)/(1024*1024) MB
from dba_segments seg, dba_indexes ind
where ind.table_owner ='OWNER and ind.table_NAME='MV'
and ind.owner = seg.owner
and ind.index_name = seg.segment_name
group by seg.owner, seg.segment_name, seg.segment_type
order by MB desc
/
By sampling this query we saw some full populated indices, and one just growing. So even for us it was clear what's going on: the indices are calculated and populated.

The next question I had to face was 'why is the index-creation so slow?'
Hmmm. Who can answer this?
manual sampling vsession_wait gave some hints:
some gc... events and latches on the buffer where top.
(Did I mention, this is on a 2-node RAC? ;-) )

It was time to switch to the other node and check for running sessions which has also cache fusion waits, and could interfer with our session of the day.

After some sampling and searching (vsession, vsession_wait, later on vsql_text) and additional clicking in Grid control (yes, it can be useful!) I found a possible culprit:
USER1 run a lot of queries on a view which contains USER.MV.
I didn't proove, but I guess, Instance2 held the master for MV as there where 100 sessions hanging on the View. But I didn't care of - GV$GCSPFMASTER_INFO - I just wanted to get rid of all the cache-fusion waits until the MV is created.
Some might have seen the solution for this yet, it's not hidden.
Our Silver Bullet of The Day was a simple:
REVOKE SELECT ON MV FROM USER1;
(had to be run for several times, as there where DDL-timeouts...))

Afterwards, the index-creation run quite fast - fast enough for us.

It's not the end of the story, but enough for now,
just to leave enough space for Part 2 ;-)

Some infos at the end:
Many parts I can not explain in more detail, simply as I didn't log everything I did and therefore have no exact queries, output or whatever. (Maybe this is the most important area to improve myselve!)
I also used Tanel Poders latchprofx and snapper to collect well sampled data about the session.

New Blogger in Bloggosphere

There is a new Blogger: Mathias Zarick wrote hist first blog entry.
He is a clever guy, with a lot of knowledge and passion and a very systematic aproach.
I'm very glad he shares his findings with us now.

Montag, 2. März 2009

'symbolic link' for Windows ODBC drivers

Maybe you know this situation: (nearly) everyone in your Company has local Admin grants on 'his' (and her) local PC. So they install everything they want, also ODBC-Drivers with any name on any local location.
No problem so far, but if they want to share something with an ODBC connection string, they are doomed. Everyone has another version in another ODBC-location.

What else than to 'fake' any ODBC name to any directory you want?

Here is my solution (I don't know, if it's supported. It just works for me!)

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\<INSERT_ODBC_NAME_HERE>]
"APILevel"="1"
"CPTimeout"="60"
"ConnectFunctions"="YYY"
"Driver"="C:\\PATH\\TO\\YOUR\\DRIVER\\SQORA32.DLL"
"DriverODBCVer"="03.51"
"FileUsage"="0"
"Setup"="C:\\PATH\\TO\\YOUR\\DRIVER\\SQORAS32.DLL"
"SQLLevel"="1"
Just use the Driver and Setup files from an existing ODBC-Source.
Might it work for you also ;-)

Samstag, 28. Februar 2009

please increase my TEMP Tablespace

Today I recieved a call. I was asked to increase a TEMP Tablespace as it was 100% used. I checked the TS and it was correct - 100% full. So I added additional 10% to the tempfile.
So what?
It's only fighting symptoms.
After some minutes I got the same call again.
I suggested the Apps responsible to check the statements or response time. After some minutes he agreed. Because of some new analyses some plans flipped. This caused some more scans and in memory sorts than necessary, which landed in Temp TS.
The solution was obvious: change the objects stats back to 'good' values.
The learnings: once again: don't fight symptoms. It will bring you to no end. Find the initial cause and fix it.

Dienstag, 20. Januar 2009

pipelined function vs. dbms_output

I don't like dbms_output.
So I came to pipelined functions.
here a small example:
create or replace
function dummy
return DBMS_DEBUG_VC2COLL
PIPELINED -- NOTE the pipelined keyword
is
begin
pipe row (to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS' ));
dbms_lock.sleep(15);
pipe row (to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS' ));
return;
end;
/
Make sure you set the arraysize of SQL*PLUS (or whatever) small enough, otherwise you will get a bunch of results at once, not when they occure.
set arraysize 1
select * from table(dummy());
gives
COLUMN_VALUE
--------------------
20-JAN-2009 14:24:17
and after 15 sec.
20-JAN-2009 14:24:32
It might not be a big advantage for anyone, just a thing I like.

Dienstag, 13. Januar 2009

multiple ASM instances on one node (11gR1)

I just searched a way to circumvent the limit of 63 ASM DiskGroups in a storage system.
My first thought was to create a 2nd (or multiple) ASM instances on the same node which manage seperated DGs. This is quite easily possible: Just create a spfile for the 2nd ASM, just make sure to add *.DB_UNIQUE_NAME to avoid ORA-15150. I also recommend to use different ASM_DISKSTRINGs, to avoid multiple mount-attempts of the same DG on all ASM instances.
So I could create different DGs in different ASM-instances.
But now the troubles starts: In my test-RDBMS I could only see the DGs of the first ASM.
So I created a SR(7275580.994) at MetaLink to ask how to do it and wether or not it's supported in single instance or RAC.
To summarize the answers:
  • It is possible to run multiple ASMs on a node
  • One RDBMS can onlybe served by one ASM
  • all the GUIs are not aware of multiple ASMs
  • it's not supported in RAC
Even these answers are worth some forther testcases.
I have 2 ASMs: +ASM (default) and +ASM2 (2nd, with changed DB_UNIQUE_NAME and ASM_DISKSTRING) and one RDBMS. DG ASM1_DG created in +ASM, DG ASM2_DG created in +ASM2.

  1. Test 1
    +ASM up
    +ASM2 up
    => only ASM1_DG1 visible in RDBMS .
  2. Test 2
    +ASM down
    +ASM2 up
    => ASM2_DG1 visible
    ! create tablespace in ASM2_DG1
  3. Test 3
    +ASM up
    +ASM2 up
    - restart RDBMS
    => only ASM2_DG1 visible in RDBMS
  4. Test 4
    +ASM up
    +ASM2 up
    => DBCA only shows ASM1_DG1: 
  5.  Test 5
    • shutdown +ASM
    • switch ASM1_DG to +ASM2 (some fiddling with ASM_DISKSTRING and symlinks)
    • mount ASM1_DG in +ASM2
    • create TS in ASM1_DG
    • shutdown RDBMS and +ASM2
    • switch ASM1_DG back to +ASM
    • startup +ASM, +ASM2 and rdbms
    • this constelation (ASM1_DG mounted on +ASM, ASM2_DG mounted on +ASM2) led to this error-message:

      ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
      ORA-01110: data file 8: '+ASM1_DG/berx2/datafile/asm1_dg.256.676065339'
      ORA-17503: ksfdopn:2 Failed to open file +ASM1_DG/berx2/datafile/asm1_dg.256.67
      6065339
      ORA-15001: diskgroup "ASM1_DG" does not exist or is not mounted
      ORA-15001: diskgroup "ASM1_DG" does not exist or is not mounted
      
Test5 led me to one guess: At startup a RDBMS can potentially access all ASMs, but the first attempt to access a DG pins the RDBMS to the ASM. Unfortunately I do not know how to prove this.
Within the SR Bug:6991236 was mentioned.
This Bug is not visible right now, but interpreting the header I can guess it's an enhancement request to allow or support multiple ASM-instances with differen OS-users in a cluster-environment. It's shown to be fixed (implemented) in 11gR2 and has status 98 (Suggestion Implemented). We will see when 11gR2 is out.

Freitag, 9. Januar 2009

avoid ORA-1555 in long running queries

11.1.0.7 brought an interresting new Feature: Allow ORA-1555 to be ignored during table scan.
From the description for Bug 6688108:

ORA-1555 can occur during long running queries involving a ROWID Range or full table scan if the undo has recycled.
This error is normal and expected.

This fix is a special enhancement to allow an SCN_ASCENDING hint to be supplied in the SQL statement. With this hint the select will take a new snapshot SCN and continue the scan rather than reporting an ORA-1555 to the caller. This can lead to inconsistent results as the SELECT is allowed to see data which is not all consistent to one point in time.

It is somewhat amazing Oracle allows inconsistent results, or it's the 'other end' of READ UNCOMMITTED isolation Level.

I'm not sure where this feature makes any sense, but there will be environments where it's better to get false data than no data.

Donnerstag, 8. Januar 2009

buffer cache poisioning

Out of some tests and discussions I wanted to create a way to pollute the buffercache with blocks not existing there. As a result the BCHR also goes down, so I might produce a contradiction to Connor McDonalds choose_a_hit_ratio. There might be easier ways doing so, feel free to promote, if you want.

How i test the BCHR:
column name format A32;
SELECT id, name, block_size,
 (1-(PHYSICAL_READS/(DB_BLOCK_GETS+CONSISTENT_GETS))) HIT
FROM V$BUFFER_POOL_STATISTICS
ORDER BY NAME, BLOCK_SIZE;
On my test system the script is really slow and I'm not sure about side effects it will have on other systems than the buffer cache.

and my results (before the first run):
ID NAME       BLOCK_SIZE        HIT
-- ---------- ---------- ----------
3 DEFAULT          8192 .920069518
1 KEEP             8192 .158558559
The scripts functionality explained:
  1. a Cursor c_tab provides all Tables of some quality (e.g. not owned by SYS)
  2. for every entry in this cursor, all rowids are collected into a TABLE OF ROWID
  3. every rowid is checked if it's block exists in x$bh, if not, it's directly selected and therefore bushed into bufer cache.
and here the script itselve (a big part just copied with permission from Reinhard Krämer)
DECLARE
counter  number;
obj_id   number;
rel_fno  number;
block_id number;
bh_count number;
bh_stmt  varchar2(200);
rid_stmt varchar2(200);
v_stt    varchar2(50);
v_rid    ROWID;
CURSOR c_tab
IS
  SELECT owner,
         table_name,
            owner
         || '.'
         || table_name owntab
  FROM DBA_tables
  WHERE owner NOT IN ('SYS')
         -- comment next line, if you want to have all tables
         --AND table_name = 'T_KEEP'
         AND BUFFER_POOL NOT IN ('KEEP')
  --       AND owner = UPPER ('TEST')
  ;

g_string VARCHAR2 (4000) := NULL;
-- memory-array
TYPE t_rowid IS TABLE OF ROWID
  INDEX BY PLS_INTEGER;

r_rowid t_rowid;

-- print-procedure (mini-version of MSG)
PROCEDURE PRINT (
  p_text IN VARCHAR2 := NULL
)
IS
BEGIN
-- remove comment for debug
--      DBMS_OUTPUT.put_line (p_text);
 NULL;
END;
BEGIN
-- main
EXECUTE IMMEDIATE 'select b.ksppstvl InstValue
                 from x$ksppi a, x$ksppcv b
                 where a.indx = b.indx
                   and a.ksppinm=''_small_table_threshold''' into v_stt;
EXECUTE IMMEDIATE 'alter session set "_small_table_threshold" = 10';
counter := 0;
bh_stmt := 'select count(*)
          from x$bh
          where obj=:obj_id and dbablk=:block_id and dbarfil=:rel_fno';

FOR crec IN c_tab
LOOP
 PRINT(' tab: ' ||crec.owntab);
  g_string :=    'select rowid the_rowid from '
              || crec.owntab;

  BEGIN
     PRINT (   'using command: '
            || g_string);

     EXECUTE IMMEDIATE (g_string)
     BULK COLLECT INTO r_rowid;
  EXCEPTION
     WHEN OTHERS
     THEN
        PRINT (   SQLERRM
               || ' on '
               || g_string);
  END;

  IF r_rowid.COUNT > 0
  THEN
     PRINT (   'showing rowids for: '
            || crec.owntab);

     FOR i IN r_rowid.FIRST .. r_rowid.LAST
     LOOP
        PRINT (r_rowid (i));
        -- load block if not in X$bh
   
        obj_id   := dbms_rowid.rowid_object(r_rowid(i));
        rel_fno  := dbms_rowid.rowid_relative_fno(r_rowid(i));
        block_id := dbms_rowid.rowid_block_number(r_rowid(i));
        rid_stmt := 'SELECT rowid the_rowid FROM '
              || crec.owntab || ' WHERE rowid = :rid';
        execute immediate bh_stmt into bh_count using obj_id, block_id, rel_fno;
   
        IF bh_count = 0 THEN -- block curently not in buffer
          BEGIN
            execute immediate rid_stmt into v_rid using r_rowid(i);
            counter := counter + 1;
          END;
        END IF;
     END LOOP;
  END IF;
END LOOP;
EXECUTE IMMEDIATE 'alter session set "_small_table_threshold" = :stt' using v_stt;
DBMS_OUTPUT.put_line ('counter: ' || counter);
END;
/


After a 2nd check the BCHR is really lower than before.

ID NAME       BLOCK_SIZE        HIT
-- ---------- ---------- ----------
3 DEFAULT          8192 .856567413
1 KEEP             8192 .158558559


One question might be why I didn't run just a lot of full table scans? I can not be sure how many blocks of this tables exist in buffer cache, and every existing buffer would increse the BCHR. Also in Oracle 11g this might result in direct path reads which will not affect the buffer cache directly (in fact, it will increase the BCHR, as all internal selects in SEG$, TS$ and all the other data dictionary tables needed to access blocks).
One last purpose: If ever anyone argues about the need of ratios to judge the health of a database, just ask if you get additional ressources if the BCHR is above/below any limit - and then grab these resources :-)

Mittwoch, 7. Januar 2009

Instance parameters derived from cpu_count

Today I have a lot of pictures!
We got a brand new SUN T5140, and I managed to get my hands on it to run some tests.

Glenn Fawcett wrote some interresting entries in his blog about these so called chip multithreading CPUs. 2 should be mentioned here, as they had influence to this test-case:
Oracle's vision of multi-core processors on Solaris and Virtual CPUs effect on Oracle SGA allocations.
I wanted to know which parameters are derived from cpu_count.
A fresh installed and patched RDBMS 11.1.0.7.0 was my victim.
I used this script to get my data:
#!/usr/bin/ksh  -x

integer i=1
while ((i <= 128));
do
  print " $i ";

  echo "*.compatible='11.1.0.0.0'
*.control_files='/appl/oracle/oradata/BERX2/control01.ctl','/appl/oracle/oradata/BERX2/control02.ctl','/appl/oracle/oradata/BERX2/control03
.ctl'
*.db_block_size=8192
*.db_name='BERX2'
*.MEMORY_MAX_TARGET=16G
*.SGA_TARGET=13G
*.cpu_count=$i"> initBERX2.ora

  echo "startup pfile='/appl/oracle/product/11.1.0.7/dbs/initBERX2.ora' ;

insert into init_parameters
select $i         ,
       a.ksppinm  ,
       b.ksppstvl ,
       c.ksppstvl
from x\$ksppi a, x\$ksppcv b, x\$ksppsv c
where a.indx = b.indx and a.indx = c.indx;

commit;

select distinct(\"CPUs\") from init_parameters;

shutdown;

exit;"> cpus/doit$i.sql

sqlplus "/ as sysdba" @cpus/doit$i.sql >cpus/log$i.log

sleep 5

  (( i = i + 1));
done

Afterwards I just had to get the interresting data out:
select /*+ PARALLEL */ i."CPUs" || ',' || i."Parameter" || ',' || i."InstanceValue"  
from init_parameters i  
where "Parameter" in (select distinct(i1."Parameter")
    from init_parameters i1,
       init_parameters i2
    where i1."Parameter" = i2."Parameter"
       and i1."CPUs" <> i2."CPUs"
      and i1."InstanceValue" <> i2."InstanceValue" )  
order by i."CPUs", i."Parameter";

and waste my time in excel creating the graphs:
(the whole spool output can be found here.)
I hope everyone enjoys these graphs. I'm sure there can be done much more tests, so if you have an interresting testcase which can be automated such as mine, feel free to contact me.

Montag, 5. Januar 2009

KEEP and DEFAULT buffer really seperated (at least for LRU lists)

I had to answer a really simple question:
If I manage to put a table totally into KEEP buffer (and it's the only table which will ever go there), will it be affected by other operations which are going into DEFAULT buffer cache?
At least if you disable Automated Memory Management the answer is yes (even if some points has to be considered, but that's not part of the primary question).
Here my little testcase:
I have a table t_keep which should go and stay in keep buffer cache (and fits into it).
I have 2 other tables, t and t2, where t is mich bigger than the default pool, t2 also fits into default-pool but together they kick each other out (at least partially).
And now for the tests:
(I had to make _small_table_threshold big enough to load all tables in buffer cache and avoiding direct path reads)

about the tables:
select table_name, buffer_pool, blocks from dba_tables where owner ='TEST';TABLE_NAME                     BUFFER_     BLOCKS
------------------------------ ------- ----------
T2                             DEFAULT        401
T_KEEP                         KEEP           496
T                              DEFAULT      26289

select OBJECT_NAME, DATA_OBJECT_ID 
from dba_objects
 where owner ='TEST';
OBJ_NAME DATA_OBJECT_ID
-------- --------------
T2                64351
T_KEEP            64143
T                 64142
the test itselve:
SELECT ROWID FROM TEST.T_KEEP;

... ROWIDs ...

select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj;

OBJ   COUNT(*)
---------- ----------
64143        452

SELECT ROWID FROM TEST.T;

... ROWIDs ...

select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj;

OBJ   COUNT(*)
---------- ----------
64143        452
64142       1080

select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj; 

OBJ   COUNT(*)
---------- ----------
64351        386
64143        452
64142          1 
So ObjectID:64143 == T_KEEP is still in KEEP buffer cache.

Now the same test with Automated Memory Management:
(only the results)
select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj;
       OBJ   COUNT(*)
---------- ----------
     64143        452

select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj;
       OBJ   COUNT(*)
---------- ----------
     64143        452
     64142      20111

select obj, count(*) from x$bh where obj in (64142, 64143,64351) group by obj;
       OBJ   COUNT(*)
---------- ----------
     64351        386
     64143        452
     64142      19725
So also in this case 64143 survived, whereas the other 2 were fighting for the same blocks in DEFAULT buffer cache.