Donnerstag, 3. Januar 2019

connection manager - what happens at startup & shutdown

Oracle Connection Manager (cman) is a great tool to create a gateway between networks which can not be connected, or filter those who can access to a service.
Most of the time, it's a very robust implementation and doesn't need a lot of attention. But if it's required to dig into it, a basic knowledge about it's components can be useful.

In this post I'll show what happens in a simple startup (& shutdown) of a mostly "default" cman.
The test was done on Oracle Linux 7.5 with cman


In my setup, processes are called this order.
at startup, cmctl spawns several processes. these processes communicate via sockets and has some shared memory segments

The details: 

The results in this post show some lines of interest cenerated by
strace -f -o <some_dir&gt>/cman_$$.strace -y -s 128 cman
and the 3 commands
administer cman_berx

Before each quote of the tracefile I write the line number - this is not an exact offset for your tests, just shows the sequence and sometimes big gaps which I do not show. I will name PIDs and executes together for better readability.

10115 execve("<ORACLE_HOME>/bin/cmctl", ["cmctl"], [/* 118 vars */]) = 0
process 10115 is now cmctl.

10115 open("<ORACLE_HOME>/env.ora", O_RDONLY) = 3
10115 read(3, "# Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.\n# \n# NAME\n#   env.ora\n#\n# FUNCTION\n#   Oracle ENVironme"..., 4096) = 852
The undocumented file <ORACLE_HOME>/env.ora is checked for TNSNAMES_ORA parameter there - my file is default, so nothing is set there.

10115 access("<TNS_ADMIN>/cman.ora", F_OK) = -1 ENOENT (No such file or directory)
10115 access("/etc/cman.ora", F_OK)     = -1 ENOENT (No such file or directory)
10115 access("<ORACLE_HOME>/network/admin/cman.ora", F_OK) = 0
10115 open("<ORACLE_HOME>/network/admin/cman.ora", O_RDONLY) = 3
10115 read(3<<ORACLE_HOME>/network/admin/cman.ora>, "######################################################################\n#\n# Copyright (c) 2001,2002, Oracle Corporation. All righ"..., 4096) = 3996 
10115 close(3<<ORACLE_HOME>/network/admin/cman.ora>) = 0 
10115 (cman) searches for cman.ora in <TNS_ADMIN> (nothing there) /etc (also nothing) and <ORACLE_HOME>/network/admin (there is an entry for cman_berx).

10115 write(1</dev/pts/0>, "CMCTL> ", 7) = 7
10115 read(0</dev/pts/0>, "administer cman_berx\n", 1024) = 21
10115 (cmctl) writes a prompt (CMCTL>) and gets my command line command (administer cman_berx).

10115 connect(4<socket:[1549779484]>, {sa_family=AF_INET, sin_port=htons(1522), sin_addr=inet_addr("x.y.z.78")}, 16) = -1 EINPROGRESS (Operation now in progress)
10115 (cmctl) tries to connect to the address:port provided in cman.ora - and can't connect. I assume that's the moment it realises it must start all related processes.

10115 geteuid()                         = 5001
10115 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b8713915150) = 10286
10115 rt_sigprocmask(SIG_BLOCK, [PIPE], NULL, 8) = 0
10286 set_robust_list(0x2b8713915160, 24 <unfinished ...>
10115 rt_sigaction(SIGPIPE, {0x2b8717669de0, ~[ILL TRAP ABRT BUS FPE SEGV USR2 TERM XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2b8718e9f680},  <unfinished ...>
10286 <... set_robust_list resumed> )   = 0
10115 <... rt_sigaction resumed> {SIG_DFL, [], 0}, 8) = 0
10115 rt_sigprocmask(SIG_UNBLOCK, [PIPE], NULL, 8) = 0
10286 clone( <unfinished ...>
10115 wait4(10286,  <unfinished ...>
10286 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b8713915150) = 10287
10287 set_robust_list(0x2b8713915160, 24) = 0
10287 close(6<pipe:[1549773687]>)       = 0
10287 close(7<pipe:[1549773688]> <unfinished ...>
10286 exit_group(0)                     = ?
10287 <... close resumed> )             = 0
10287 setsid( <unfinished ...>
10286 +++ exited with 0 +++
10115 (cmctl) creates (clone) a new process (10286) - this process immediately clones another one (10287) and then terminates itself (exit_group). By this there is no parent-child relationship between 10115 (cmctl) and 10287. Still they both use the same binary (cmctl) and filehandles etc. - so 10287 closes some filehandles it doesn't require.

10287 execve("/appl/oracle/product/cman_12201/bin/cmadmin", ["/appl/oracle/product/cman_12201/bin/cmadmin", "cman_berx", "-inherit"], [/* 119 vars */]) = 0
process 10287 is now cmadmin.

10287 access("/var/tmp/.oracle", F_OK)  = 0
10287 chmod("/var/tmp/.oracle", 01777)  = -1 EPERM (Operation not permitted)
10287 socket(AF_LOCAL, SOCK_STREAM, 0)  = 6<socket:[1549770245]>
10287 access("/var/tmp/.oracle_500100", F_OK) = -1 ENOENT (No such file or directory)
10287 mkdir("/var/tmp/.oracle_500100", 0700) = 0
10287 chmod("/var/tmp/.oracle_500100", 0700) = 0
10287 access("/var/tmp/.oracle_500100/s#10287.1", F_OK) = -1 ENOENT (No such file or directory)
10287 bind(6<socket:[1549770245]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10287.1"}, 110) = 0
10287 chmod("/var/tmp/.oracle_500100/s#10287.1", 0777) = 0
10287 listen(6<socket:[1549770245]>, 100) = 0
10287 getsockname(6<socket:[1549770245]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10287.1"}, [36]) = 0
10287 chmod("/var/tmp/.oracle_500100/s#10287.1", 0777) = 0 
10287 (cmadmin) checks if /var/tmp/.oracle exists and if it can change it's permission. In my environment the directory exists (as Grid Infrastructure is installed) but it's permissions can't be changed. It belongs to root.  Then /var/tmp/.oracle_500100 is created and a socket is created there. For easier readability, the PID is part of the socket name.

10287 clone(child_stack=0x2b6fd20de570, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2b6fd20e99d0, tls=0x2b6fd20e9700, child_tidptr=0x2b6fd20e99d0) = 10289
10287 (cmadmin) creates another process (10289).

10287 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10290
10287 (cmadmin) creates another process (10290).

10290 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10291
10290 exit_group(0)                     = ?
10290 +++ exited with 0 +++
10290 creates 10291 and terminates itself.

10291 execve("/appl/oracle/product/cman_12201/bin/tnslsnr", ["/appl/oracle/product/cman_12201/bin/tnslsnr", "ifile=/appl/oracle/product/cman_12201/network/admin/cman.ora", "cman_berx", "-inherit", "-mode", "proxy"], [/* 119 vars */]) = 0
10291 becomes tnslsnr

10291 bind(11<socket:[1549780776]>, {sa_family=AF_INET, sin_port=htons(1522), sin_addr=inet_addr("")}, 16) = 0
10291 listen(11<socket:[1549780776]>, 128) = 0
10291 getsockname(11<socket:[1549780776]>, {sa_family=AF_INET, sin_port=htons(1522), sin_addr=inet_addr("")}, [16]) = 0
10291 (tnslsnr) listens on - But in cman.ora a specific IP address is given.

10291 write(12<<DIAG_DEST>/netcman/<hostname>/cman_berx/trace/cman_berx.log>, "Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=<server.domain>)(PORT=1522)))\n", 91) = 91
10291 (tnslsnr) writes something different to the logfile (?)

10291 access("/var/tmp/.oracle", F_OK)  = 0
10291 chmod("/var/tmp/.oracle", 01777)  = -1 EPERM (Operation not permitted)
10291 socket(AF_LOCAL, SOCK_STREAM, 0)  = 14<socket:[1549790968]>
10291 access("/var/tmp/.oracle/s#10291.1", F_OK) = -1 ENOENT (No such file or directory)
10291 bind(14<socket:[1549790968]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/s#10291.1"}, 110) = 0
10291 chmod("/var/tmp/.oracle/s#10291.1", 0777) = 0
10291 listen(14<socket:[1549790968]>, 1) = 0
10291 getsockname(14<socket:[1549790968]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/s#10291.1"}, [29]) = 0
10291 chmod("/var/tmp/.oracle/s#10291.1", 0777) = 0
10291 (tnslsnr) creates its own socket file for communication purpose.

10291 connect(16<socket:[1549778909]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/sprocr_local_conn_0_PROL"}, 110) = 0

10291 connect(15<socket:[1549778910]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle/sCRSD_UI_SOCKET"}, 110) = 0
10291 (tnslsnr) checks some sockets which belong to Grid Infrastructure.

10291 clone(child_stack=0x2ac6b580b570, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2ac6b58169d0, tls=0x2ac6b5816700, child_tidptr=0x2ac6b58169d0) = 10292
10291 (tnslsnr) creates PID 10292

10292 clone( <unfinished ...>
10291 <... close resumed> )             = 0
10292 <... clone resumed> child_stack=0x2ac6b584c570, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2ac6b58579d0, tls=0x2ac6b5857700, child_tidptr=0x2ac6b58579d0) = 10293
10293 set_robust_list(0x2ac6b58579e0, 24 <unfinished ...>
10292 creates PID 10293

10293 exit(0)                           = ?
10292 <... poll resumed> )              = 1 ([{fd=15, revents=POLLIN}])
10293 +++ exited with 0 +++
10293 exits - it did not "clone" or "execve" anything

10287 shmget(0x59248000, 2462280, IPC_CREAT|IPC_EXCL|0600 <unfinished ...>
10287 (cmadmin) allocates a shared memory segment with key 0x59248000.

10287 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10294
10294 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10295
10294 exit_group(0)                     = ?
10294 +++ exited with 0 +++
10287 (cmadmin) creates PID 10294, this creates PID 10295 and terminates itself.

10295 execve("/appl/oracle/product/cman_12201/bin/cmgw", ["/appl/oracle/product/cman_12201/bin/cmgw", "cmgw0", "0", "16", "cman_berx", "SNLSM:59248000"], [/* 119 vars */]) = 0
10295 becomes cmgw0 - you can see the parameter SNLSM - it gives the shared memory key from 10287 (cmadmin).

10295 shmget(0x59248000, 1, 0)          = 2099576860
10295 (cmgw0) attaches to shared memory 10287 (cmadmin).

10287 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10296
10296 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b6fd1f80150) = 10297
10296 exit_group(0)                     = ?
10287 (cmadmin) creates PID 10296 which creates 10297 and exits. - same as above

10297 execve("/appl/oracle/product/cman_12201/bin/cmgw", ["/appl/oracle/product/cman_12201/bin/cmgw", "cmgw1", "1", "16", "cman_berx", "SNLSM:59248000"], [/* 119 vars */] <unfinished ...>
10297 becomes cmgw1.

some (later) important sockets:
10295 connect(5<socket:[1549804554]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10287.1"}, 110 <unfinished ...>

10297 connect(5<socket:[1549798977]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10287.1"}, 110 <unfinished ...>

10295 bind(7<socket:[1549804555]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10295.1"}, 110 <unfinished ...>

10297 bind(7<socket:[1549798978]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10297.1"}, 110 <unfinished ...>

10115 connect(5<socket:[1549779550]>, {sa_family=AF_LOCAL, sun_path="/var/tmp/.oracle_500100/s#10287.1"}, 110 <unfinished ...> 

At this stage, cman is started, all processes are in memory and happy to server any requests.

The shutdown- command will be much faster:

10115 <... read resumed> "shutdown\n", 1024) = 9
10115 (cmctl) recieves "shutdown" from command line

10115 write(5, "\0\0\0S\6$\0\0\0\0E\0\0\200\2\0\0\0\0\4\0\0\220_8\2\0\0\0\0\20\0#\0\0\000901A4C4F9FD8-3939-E034-0800208AB384\1\0v\0\0\0\3\0\0\0\0\0", 83) = 83
101115 (cmctl) informs 10287 (cmadmin)

10287 write(5&lt;socket:[1549778933]&gt;, "\0\0\0S\6 \0\0\0\0E\0\0\200\2\0\0\0\0\4\0\0p\344t\2\0\0\0\0\20\0#\0\0\00091986B2B10AC-3001-E034-0800208AB384\1\0l\0\0\0\1\0\0\0\0\0", 83) = 83
10287 times( &lt;unfinished ...&gt;
10295 &lt;... epoll_wait resumed&gt; [{EPOLLIN, {u32=13989488, u64=13989488}}], 1024, -1) = 1
10287 &lt;... times resumed&gt; NULL)         = 1269637186
10287 epoll_wait(7&lt;anon_inode:[eventpoll]&gt;,  &lt;unfinished ...&gt;
10295 read(5&lt;socket:[1549804554]&gt;, "\0\0\0S\6 \0\0\0\0E\0\0\200\2\0\0\0\0\4\0\0p\344t\2\0\0\0\0\20\0#\0\0\00091986B2B10AC-3001-E034-0800208AB384\1\0l\0\0\0\1\0\0\0\0\0", 8208) = 83
10287 (cmadmin) informs 10295 (cmgw0) (and later 10297 (cmgw1) by the same method) to shutdown

some cleanup takes place:

10295 unlink("/var/tmp/.oracle_500100/s#10295.1" <unfinished ...>

10295 rmdir("/var/tmp/.oracle_500100" <unfinished ...>

10297 unlink("/var/tmp/.oracle_500100/s#10297.1" <unfinished ...>

and all the processes terminate:

10289 exit(0)                           = ?
10289 +++ exited with 0 +++

10295 exit_group(0)                     = ?

10297 exit_group(0)                     = ?

10291 unlink("/var/tmp/.oracle/s#10291.1" <unfinished ...>

10291 exit_group(0)                     = ?

10287 exit_group(0)                     = ?

10115 exit_group(0)                     = ?

That's a simple startup and shutdown of connection manager - nothing fancy here, just a lot of processes and sockets which are created - and then deleted.

I hope you never need to read this article to it's end. But if you need to, it should provide some value for you!

Donnerstag, 18. Oktober 2018

preupgrade.jar - enforced recommendations

Recently I upgraded a database from version 12.1 to 12.2.
I take upgrades serious - similar (but not as skilled) to Mike Dietrich. So I downloaded latest and unzipped it [I made a small error by unzipping it NOT to $ORACLE_HOME/rdbms/admin but to a temporary directory - later more about this].
Beside such errors, I try to follow Oracles recommended Upgrade Process.

At some point Oracle recommends to Gather Dictionary Stats "the night before starting the upgrade".

For me this is a recommendation, not an obligation. I had (from my perspective) very good reasons NOT to follow this recommendation.

Still, during the upgrade, the preupgrade.log shows
+ (AUTOFIXUP) Gather stale data dictionary statistics prior to database
     upgrade in off-peak time using:


     Dictionary statistics do not exist or are stale (not up-to-date).

     Dictionary statistics help the Oracle optimizer find efficient SQL
     execution plans and are essential for proper upgrade timing. Oracle
     recommends gathering dictionary statistics in the last 24 hours before
     database upgrade.

     For information on managing optimizer statistics, refer to the
     Oracle Database SQL Tuning Guide.

If they say AUTOFIXUP - treat this as a promise (or treat).

After preupgrade.jar, it's to run preupgrade_fixups.sql.
This calls (among others)
fixup_result := dbms_preup.run_fixup('dictionary_stats');
If you go through the package and other functionality, you will find function dictionary_stats_check in preupgrade_package.sql. And there it runs
     ' select 1 from dual where exists(
              select distinct operation
                from DBA_OPTSTAT_OPERATIONS
                        where operation =''gather_dictionary_stats''
                        and start_time > systimestamp -  INTERVAL ''24''  HOUR) '
      INTO dictionary_stats_recent;

and if no dictionary stats were conpleted recently, this function is executed:

-- *****************************************************************
--     This fixup executes dictionary stats pre upgrade
-- *****************************************************************
FUNCTION dictionary_stats_fixup          (
         result_txt IN OUT VARCHAR2,
         pSqlcode    IN OUT NUMBER) RETURN NUMBER
    stats_result BOOLEAN;
    sys_string varchar2(5):='SYS';
   stats_result := run_int_proc('DBMS_STATS.GATHER_DICTIONARY_STATS', result_txt, pSqlcode);

   IF (stats_result) THEN
       RETURN c_success;
       RETURN c_failure;
   END IF;
END dictionary_stats_fixup;

That's the moment where I recognised how serious Oracle takes gathering of dictionary stats.

A little rant on twitter about this aggressive recommendation gave me a nice reply where the recommendation is stated:


At least I know where to change the code - but it must be done before preupgrade.jar is run!

But there is a 2nd situation when DBMS_STATS.GATHER_DICTIONARY_STATS is run:
After the upgrade itself, postupgrade_fixups.sql must run.
Here the side note to my initial statement why MUST be unzipped to $ORACLE_HOME/rdbms/admin:
VARIABLE admin_preupgrade_dir VARCHAR2(512);

REM    point PREUPGRADE_DIR to OH/rdbms/admin
    oh VARCHAR2(4000);
    dbms_system.get_env('ORACLE_HOME', oh);
    :admin_preupgrade_dir := dbms_assert.enquote_literal(oh || '/rdbms/admin');

    command varchar2(4000);
    command := 'CREATE OR REPLACE DIRECTORY PREUPGRADE_DIR AS ' || :admin_preupgrade_dir;


REM    Execute the preupgrade_package from the PREUPGRADE_DIR
REM    This is needed because the file
REM    lives there too, and is read by preupgrade_package.sql using
COLUMN directory_path NEW_VALUE admin_preupgrade_dir NOPRINT;
select directory_path from dba_directories where directory_name='PREUPGRADE_DIR';
set concat '.';
That's the reason why unzipping it ton another directory, after the upgrade an older version of preupgrade scripts is used - maybe not a desired thing!

Later there another fixup is called:
fixup_result := dbms_preup.run_fixup('post_dictionary');
This in that case that function is run:

-- *****************************************************************
--     This check recommends re-gathering dictionary stats post upgrade
--     The logic in the query is:  Check if statistics has been taken
--     after upgrade, if not report it and generate the fixup in the
--     postupgrade fixup script, after the fixup run, it will not fail
--     and therefore it will report this check as successfull.
-- *****************************************************************
FUNCTION post_dictionary_check          (result_txt OUT VARCHAR2) RETURN NUMBER
dictionary_stats_recent  NUMBER := 0;
correct_version boolean := TRUE;

  IF dbms_registry_extended.compare_versions(db_version_4_dots, C_ORACLE_HIGH_VERSION_4_DOTS, 4) < 0 THEN
     correct_version := FALSE;

     ' select 1 from dual where exists(
              select distinct operation
                from DBA_OPTSTAT_OPERATIONS
                        where operation =''gather_dictionary_stats''
                        and start_time > (select max(OPTIME) from registry$log where cid =''UPGRD_END'')) '
      INTO dictionary_stats_recent;

After all this, I STILL recommend to use LATEST preupgrade.jar,
unzip it to proper $ORACLE_HOME/rdbms/admin,
execute it as documented
and don't mess around with it!
Don't mess around with it unless Oracle Support says so (or you take all the unforseen side effects on your credits).

Donnerstag, 11. Oktober 2018

SQL Real Time Monitoring pure HTML report (thanks to SQLDeveloper 18.3)

Yesterday (from the writing of this post perspective) SQLDeveloper 18.3 came out.
(it seems SQL Developer does not strict follow Oracle Database Release Number schema, otherwise it must be 18.4 already as it's released in 4th quarter of 2018)

There are many bugs fixed on 18.3 and some nice enhancements there as well.

My favorite enhancement is Real Time SQL Monitoring, HTML exports now available - no flash!

As Tanel Poder wrote about it already (Generate Oracle SQL Monitoring Reports as HTML using SQL Developer v18.3 (no Flash needed)) I'll show how you can use it without SQL Developer.
I'm not against SQL Developer, in fact I'm very happy with it!
Just the Real Time SQL Monitor tab seems kind of unresponsive to me, especially on DBs with many active sessions and high load.

So if I don't want to watch a java process drawing circles and showing blur previews, I first go to query v$sql_monitor. Beside all the columns I'll need to identify my SQL of interest, I need the columns SQL_ID, SQL_EXEC_ID, and SQL_EXEC_START.
And, of course at least once you must create a new HTML Report with SQL Developer! Save this report as a template for later use.

When you open the HTML report in an editor, there are 2 lines of special interest:
 var data_sqlId = '3v64dcg0rja6k'; 
 var data_xml = '<report db_version=">"

You can replace them easily with the context of your SQL of interest.
To get the proper XML you can use DBMS_SQLTUNE.report_sql_monitor. But as this function provides a multiline XML, but the HTML expects the XML in one line, the call should be
SELECT XMLSerialize( DOCUMENT xmltype(DBMS_SQLTUNE.report_sql_monitor(
  sql_id         => '3v64dcg0rja6k',
  type           => 'XML',
  SQL_EXEC_ID    => 33554648, 
  SQL_EXEC_START => to_date('2018-10-10 05:36:17'),
  report_level   => 'ALL')) NO INDENT ) AS XML_report
FROM dual;
If you omit SQL_EXEC_ID or SQL_EXEC_START, the report will still compile but you should understand the difference in data you get displayed.

So the minimal count you create a new HTML-Only SQL Real Time Monitoring report with SQL Developer should be once. But if you read this ( and your DBs are properly licensed), spread the good news with all the people who might benefit from these reports!

Please keep in mind that the HTML file is loading CSS and JS files from and URL at Oracles Content Delivery platform - so they are subject to any change without a new release of SQL Developer, RDBMS or anything. So expect unexpected changes every time!

Montag, 1. Oktober 2018

seing your DB as it was some minutes ago

Last week during a discussion with a colleague we thought if it would make sense to have in SQLDeveloper the possibility to see the system "as it was some minutes ago".
Small errors can happen and also resource control isn't always perfect in every company.

Björn Rost suggested to use DBMS_FLASHBACK.ENABLE_AT_TIME:

A very clever suggestion!

Let's first look at the documentation:


Using DBMS_FLASHBACK, you can flash back to a version of the database at a specified time or a specified system change number (SCN).

and in more detail:


DBMS_FLASHBACK provides an interface for the user to view the database at a particular time in the past, with the additional capacity provided by transaction back out features that allow for selective removal of the effects of individual transactions. This is different from a flashback database which moves the database back in time.
When DBMS_FLASHBACK is enabled, the user session uses the Flashback version of the database, and applications can execute against the Flashback version of the database.
DBMS_FLASHBACK is relevant only for the session in which it's called, so it doesn't change the database.
But it's unclear what's meant by database in this context.

Here a small testcase which shows some unexpected results of DBMS_FLASHBACK:
(I slightly edite the text for better readability)

drop table x1;
create table x1 (u number);

insert into x1 (u) values (1);

    i NUMBER
-- Version 1
    IF ( i = 1 ) THEN
        RETURN 'red';
        RETURN 'blue';
    END IF;
END return_something;

select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS') now, u, return_something(u) rrr,
     dbms_flashback.get_system_change_number SCN
from x1;

NOW                          U RRR          SCN
------------------- ---------- ----- ----------
2018-10-01 19:37:56          1 red     24042787

exec sys.DBMS_SESSION.sleep(60);
There should be nothing fancy up to this time.
The table and function are created and the select works fine.
I need the DBMS_SESSION.sleep to copy/paste the proper timestamp into the next block of code:

update x1 set u = 2 where u = 1;

1 row updated.

    i NUMBER
-- Version 2
    IF ( i = 1 ) THEN
        RETURN 'oans';
        RETURN 'zwoa';
    END IF;
END return_something;

select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS') now, u, return_something(u) rrr
from x1;

NOW                          U RRR  
------------------- ---------- -----
2018-10-01 19:39:32          2 zwoa 

Still nothing spectacular here - just the preparation done.
Now onto the interesting part:

EXEC dbms_flashback.enable_at_time(to_timestamp('2018-10-01 19:38:00','YYYY-MM-DD HH24:MI:SS'));

select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS') now, u, return_something(u) rrr
from x1;

NOW                          U RRR  
------------------- ---------- -----
2018-10-01 19:39:33          1 oans 

col text for A50
select text
from dba_source
where owner='BERX'
order by LINE asc;

    i NUMBER
-- Version 1
    IF ( i = 1 ) THEN
        RETURN 'red';
        RETURN 'blue';
    END IF;
END return_something;

11 rows selected. 

Here we can see how DBMS_FLASHBACK.ENABLE_AT_TIME is set to a time between the 1st insert & create function and the 2nd block.
With this setting, the content of table X1 is as expected. Also DBA_SOURCE shows the code of RETURN_SOMETHING.
But the function itself is not changed in memory and works as of it's state NOW, not at the given flashback time.
The flashback time version of RETURN_SOMETHING is even visible when you open it in SQLDeveloper (you have to believe me or test it).
Don't forget to clean up after the tests:

exec dbms_flashback.disable;

select to_char(sysdate, 'YYYY-MM-DD HH24:MI:SS') now, u, return_something(u) rrr
from x1;

NOW                          U RRR  
------------------- ---------- -----
2018-10-01 19:39:33          2 zwoa 

For this testcase no COMMIT was used. ;-)

Dienstag, 4. September 2018

TFA reports full dba_feature_usage_statistics to MyOracleSupport

Those who deal with MyOracleSupport (MOS) regularly will have noticed their demand for Trace File Analyzer (TFA) diagnostic collections.
Few will know they are also providing a full content of their DBs DBA_FEATURE_USAGE_STATISTICS view; at least when running the parameters
tfactl diagcollect -database <DB_UNIQUE_NAME>

The report calls tfa_home/resources/sql/db_feature_usage.sql and it's result is stored in the <hostname>.tfa_<timestamp>.zip file.

I did not find any mention about this behavior in the documentation, so it might be worth to post about it.

This is at least true for version (latest version available at 2018-09-04)

RBAL process needs to release FORMER disk

Oracles Automatic Storage Management (ASM) has many nice advantages against other volume managers. One of my favorite is to add and remove or rebalance disks without any interruption for the database. Unfortunately the remove of devices after the rebalance is completed is not as straight forward as expected: 
In a current environment (Grid Infrastructure:, rdbms: and others) after the ALTER DISKGROUP xxx DROP DISK yyy specific discs are in status FORMER
Due to  a specific schema of directories & symlinks it takes some effort to come to the real /dev/dm-xxx device. 
Still several RBAL processes of different DBs (I did not find a derterministic pattern which processes or DBs) hold a file handle on this specific device. To make it more interesting is the fact that the diskgroup (DG) or disk was never used by any of those DBs. We have a strict mapping of DBs to DGs, so it's easy for me to claim that. 

To tell RBAL processes to release this "stale" filehandle, I have to run a
Any DG does the trick, it is not required to be related to the DB of RBAL process at all. 
As diskgroups are normally balanced, this is a kind of NOP. Still it's required to wake up RBAL and makes it release unused filehandles. 

I still have no idea why these RBAL processes open a device it's DB never needs at all; 
but as I have my proper workaround, I don't investigate any further. 

Might this help someone out there, or at least me when I hit this odd behavior again ;-) 

Mittwoch, 8. August 2018

unwanted cleanup in /var/tmp/.oracle

In my previous blog missing oraagent.bin process I raised a question:
why does files in /var/tmp/.oracle disappear? 
At least this question I am now able to answer.

By the help of Linux kernel's audit system and a proper rule:
-a always,exit -F arch=b64 -S rename,rmdir,unlink,unlinkat,renameat -F path=/var/tmp/.oracle -F key=ora.delete

I was able to see the culprit process:
type=PROCTITLE msg=audit(08/06/2018 17:13:48.382:58122) : proctitle=/usr/bin/systemd-tmpfiles --clean 
type=PATH msg=audit(08/06/2018 17:13:48.382:58122) : item=1 name=sCRSD_IPC_SOCKET_11 inode=50924979 dev=00:25 mode=socket,777 ouid=root ogid=root rdev=00:00 nametype=DELETE 
type=PATH msg=audit(08/06/2018 17:13:48.382:58122) : item=0 name=/ inode=45518 dev=00:25 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 nametype=PARENT 
type=CWD msg=audit(08/06/2018 17:13:48.382:58122) :  cwd=/ 
type=SYSCALL msg=audit(08/06/2018 17:13:48.382:58122) : arch=x86_64 syscall=unlinkat success=yes exit=0 a0=0x5 a1=0x556fb11fb17b a2=0x0 a3=0x3 items=2 ppid=1 pid=25307 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=systemd-tmpfile exe=/usr/bin/systemd-tmpfiles key=ora.delete 

So it was a feature of systemd: systemd-tmpfiles.

In detail it comes from
this particular line:
v /var/tmp 1777 root root 30dy

The fix for this is easy: create a proper file (copy) in
and add these lines at the end:
# Exclude /var/tmp/.oracle
x /var/tmp/.oracle
But as a research at MOS and google did not come up with any issue with systemd-tmpfiles and crsd.bin, oraagent.bin or any other keyword, something else must be a problem here as well.
This (and the other question from my previous post) are questions for further investigations.

So much for now.