berxblog: ASM

Posts mit dem Label ASM werden angezeigt. Alle Posts anzeigen

2023-07-13

ASM directories size and age

Automatic Storage Management is a very powerful method of managing oracles datafiles (and some other files as spfile, controlfile, archivelogs, flashbacklogs, ... also).
Unfortunately (of for good reasons) Oracle never made the ASM diskgroups directly mountable into an OS - instead the tool asmcmd is given which can be used to navigate the structure and manipulate files.

Normally there is little need to tweak with files there as all files are oracle-managed by default.

Still sometimes I have to do a cleanup. most of the time someone suspects there are leftovers which do consume significant space but are not required anymore. Unfortunately asmcmd does not help me a lot finding those directories which contain files of significant space but at the same time does have files which were not touched for smoe time. So I created a little report for my help.

The report has about 70 lines, so I put it into my git as ASM_agg_dirs.sql. It contains 3 sub-selects which do the work:
all_directories to get the path of the directories,

byte_to_every_dir to get the size and dates for a sub-path down from the leaves and

dir_sizes_dates to aggregate the byte_to_every_dir.

All together it shows for every directory the size of all its containing files including the sub-directories. For the age of the files, the earliest and latest modify date is aggregated and propagated. Its result can be something like


FULL_ALIAS_PATH                                                 SUM_BYTES        GB LVL EARLIEST_MODIFY_ LATEST_MODIFY_DA
------------------------------------------------------------- ----------- --------- --- ---------------- ----------------
+DATA_DG/ASM                                                          6656       .00   1 2020-12-10 16:00 2020-12-10 16:00
+DATA_DG/ASM/PASSWORD                                                 6656       .00   2 2020-12-10 16:00 2020-12-10 16:00
+DATA_DG/DB_UNKNOWN                                                  36864       .00   1 2022-06-08 09:00 2022-06-08 09:00
+DATA_DG/DB_UNKNOWN/PASSWORD                                         36864       .00   2 2022-06-08 09:00 2022-06-08 09:00
+DATA_DG/my_cluster                                              354004480       .33   1 2020-12-10 16:00 2023-07-13 11:00
+DATA_DG/my_cluster/ASMPARAMETERFILE                                  3584       .00   2 2020-12-10 16:00 2020-12-10 16:00
+DATA_DG/my_cluster/OCRBACKUP                                    188026880       .18   2 2021-02-23 18:00 2023-07-13 11:00
+DATA_DG/my_cluster/OCRFILE                                      165974016       .15   2 2023-06-11 11:00 2023-06-11 11:00
+DATA_DG/DB_UNIQ_1                                            610545300480    568.61   1 2022-04-13 08:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_1/BC3B5B02E2AAAD7BE0532401210A1E60           492219031552    458.41   2 2023-06-11 10:00 2023-07-04 22:00
+DATA_DG/DB_UNIQ_1/BC3B5B02E2AAAD7BE0532401210A1E60/DATAFILE  491145281536    457.41   3 2023-07-04 21:00 2023-07-04 22:00
+DATA_DG/DB_UNIQ_1/BC3B5B02E2AAAD7BE0532401210A1E60/TEMPFILE    1073750016      1.00   3 2023-06-11 10:00 2023-06-11 10:00
+DATA_DG/DB_UNIQ_1/B50214B4214037B1E053C40DD10A4E69             7231012864      6.73   2 2023-06-11 10:00 2023-07-04 22:00
+DATA_DG/DB_UNIQ_1/B50214B4214037B1E053C40DD10A4E69/DATAFILE    6182428672      5.76   3 2023-07-04 22:00 2023-07-04 22:00
+DATA_DG/DB_UNIQ_1/B50214B4214037B1E053C40DD10A4E69/TEMPFILE    1048584192       .98   3 2023-06-11 10:00 2023-06-11 10:00
+DATA_DG/DB_UNIQ_1/CONTROLFILE                                   676773888       .63   2 2023-06-11 10:00 2023-07-12 23:00
+DATA_DG/DB_UNIQ_1/DATAFILE                                    31809601536     29.63   2 2023-07-04 22:00 2023-07-04 22:00
+DATA_DG/DB_UNIQ_1/DATAGUARDCONFIG                                   32768       .00   2 2023-07-13 14:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_1/DG                                                32768       .00   2 2023-07-13 14:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_1/ONLINELOG                                   77309429760     72.00   2 2022-12-10 08:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_1/PARAMETERFILE                                     35840       .00   2 2023-07-13 12:00 2023-07-13 12:00
+DATA_DG/DB_UNIQ_1/PASSWORD                                           5120       .00   2 2022-04-13 08:00 2022-04-13 08:00
+DATA_DG/DB_UNIQ_1/pdbseed                                                             2
+DATA_DG/DB_UNIQ_1/pdbseed/DATAFILE                                                    3
+DATA_DG/DB_UNIQ_1/pdb1                                                                2
+DATA_DG/DB_UNIQ_1/pdb1/DATAFILE                                                       3
+DATA_DG/DB_UNIQ_1/TEMPFILE                                     1073750016      1.00   2 2023-06-11 10:00 2023-06-11 10:00
+DATA_DG/DB_UNIQ_2                                            780968243712    727.33   1 2021-09-21 15:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_2/CONTROLFILE                                   627933184       .58   2 2023-06-11 10:00 2023-07-13 02:00
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165             8196775936      7.63   2 2021-09-21 15:00 2023-07-04 02:00
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165/DATAFILE    8069890048      7.52   3 2021-09-21 15:00 2023-07-04 02:00
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165/TEMPFILE     126885888       .12   3 2023-06-11 10:00 2023-06-11 10:00
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9           705188782080    656.76   2 2021-09-21 15:00 2023-07-13 02:00
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9/DATAFILE  686935162880    639.76   3 2021-09-21 15:00 2023-07-13 02:00
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9/TEMPFILE   18253619200     17.00   3 2023-06-11 10:00 2023-06-11 10:00
+DATA_DG/DB_UNIQ_2/DATAFILE                                    55903256576     52.06   2 2021-09-21 15:00 2023-07-13 02:00
+DATA_DG/DB_UNIQ_2/DATAGUARDCONFIG                                   24576       .00   2 2023-07-13 14:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_2/DG                                                24576       .00   2 2023-07-13 14:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_2/ONLINELOG                                    9663694848      9.00   2 2022-12-10 10:00 2023-07-13 14:00
+DATA_DG/DB_UNIQ_2/PARAMETERFILE                                     27648       .00   2 2023-07-13 12:00 2023-07-13 12:00
+DATA_DG/DB_UNIQ_2/PASSWORD                                           5120       .00   2 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/pdbseed                                                             2
+DATA_DG/DB_UNIQ_2/pdbseed/DATAFILE                                                    3
+DATA_DG/DB_UNIQ_2/pdb1                                                                2
+DATA_DG/DB_UNIQ_2/pdb1/DATAFILE                                                       3
+DATA_DG/DB_UNIQ_2/TEMPFILE                                     1073750016      1.00   2 2023-06-11 10:00 2023-06-11 10:00

If you have a close look, there are some directories with a EARLIEST_MODIFY_DATE from 2021 or 2022 - so they might be good candidates to have a look and eliminate some files. To see how much data might be able to release, a small modification can be done to the SQL:

In lines 51 & 52 of the script the projection should be commented and instead the block between 55 and 59 should be active. I there only aggregate data for files last modified 90 days or older in the past.

The result provides a much more clear picture of possible savings:


+DATA_DG/DB_UNIQ_2                                             408085720576    380.06   1 2021-09-21 15:00 2022-12-10 10:00
+DATA_DG/DB_UNIQ_2/CONTROLFILE                                                          2
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165              1887461376      1.76   2 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165/DATAFILE     1887461376      1.76   3 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/C7CAD0D3231198B1E0532401210A7165/TEMPFILE                            3
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9            378750590976    352.74   2 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9/DATAFILE   378750590976    352.74   3 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/C7CAF2050A22AC66E0532601210A9FF9/TEMPFILE                            3
+DATA_DG/DB_UNIQ_2/DATAFILE                                     25300172800     23.56   2 2021-09-21 15:00 2021-09-21 15:00
+DATA_DG/DB_UNIQ_2/DATAGUARDCONFIG                                                      2
+DATA_DG/DB_UNIQ_2/DG                                                                   2
+DATA_DG/DB_UNIQ_2/ONLINELOG                                     2147487744      2.00   2 2022-12-10 10:00 2022-12-10 10:00
+DATA_DG/DB_UNIQ_2/PARAMETERFILE                                                        2
+DATA_DG/DB_UNIQ_2/PASSWORD                                            5120       .00   2 2021-09-21 15:00 2021-09-21 15:00

Somewhere in the PDB with GUID C7CAF2050A22AC66E0532601210A9FF9 about 352 GB of "old" files may be hidden.

Checking the files the situation is obvious:


 ls -l
Type      Redund  Striped  Time             Sys  Name
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    ABC_DATA.2828.1083856881
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    ABC_DATA.2850.1083857223
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    XYZ.2829.1083856889
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    XYZ.2851.1083857229
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    DEF.2827.1083856881
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    DEF.2849.1083857221
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    GHI.2826.1083856879
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    GHI.2848.1083857221
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    SYSAUX.2819.1083856867
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    SYSAUX.2841.1083857211
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    SYSTEM.2818.1083856865
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    SYSTEM.2840.1083857209
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    UNDOTBS1.2820.1083856871
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    UNDOTBS1.2842.1083857213
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    UNDO_4.2821.1083856873
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    UNDO_4.2843.1083857215
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    UNDO_5.2822.1083856873
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    UNDO_5.2844.1132579919
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    UNDO_6.2823.1083856875
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    UNDO_6.2845.1132579917
DATAFILE  HIGH    COARSE   SEP 21 2021      Y    UVW.2824.1083856877
DATAFILE  HIGH    COARSE   JUL 13 02:00:00  Y    UVW.2846.1083857217

It seems in September 2021 somehow all the files were copied into the ASM, but the step was done twice for whatever reason. Unfortunately the old files were never cleared up, but now I can release some space in the diskgroup (of course firs ' crosscheck with the CDB if the datafiles are really not needed anymore).

2021-08-20

ASM lsof with process ID

Oracle ASM can list files which belong to a (current active) instance.

The command is lsof. Unfortunately it does not provide the process-ID which holds the file open. lsof gets the data from internal view X$KFFOF, which contains additional information.

Based on all I found (which is close to nothing) I can provide this SQL - might it help, to find the holder of an open AMS file form now on.

SELECT lsof.GROUP_KFFOF        as GROUP_NUMBER
     , lsof.NUMBER_KFFOF       as FILE_NUMBER
     , lsof.INSTANCENAME_KFFOF as INSTANCE_NAME
     , lsof.DBNAME_KFFOF       as DB_NAME
     , lsof.CLUSTERNAME_KFFOF  as CLUSTER_NAME
     , lsof.OSID_KFFOF         as OS_PID
     , lsof.PATH_KFFOF         as PATH
     , lsof.FLAGS_KFFOF        as FLAGS 
     , lsof.TIMESTAMP_KFFOF    as TIMESTAMP
FROM X$KFFOF lsof
WHERE 1=1
  AND bitand(flags_kffof, to_number('00004000', 'XXXXXXXX')) = 0  -- kffilf_invalid = 0x00004000
;

2019-03-30

limit IOs by instances

This is a follow up to my previous post.

I had to answer if I can limit the (IO) resources a DB instance can utilize. Unfortunately, in a simple instance I can not do so. It can be done in PDBs, but right now PDBs are out of scope.
So a simpler approach was developed: limiting IOs by creating cgroups.

There are 2 steps:

disks

create a proper cgroup
get all related devices for a given ASM diskgroup
set proper limits for all the devices

regularly add all matching processes to this cgroup

In the cgroups mountpoint, a new directory must be created which is used as "root" for all cgroups, so it does not collide with other cgroups implementations in the system.

This leads to a structure like

/sys/fs/cgroup/blkio/ORACLE/BX1/

where ORACLE is the "root" and "BX1" is the name of a specific cgroup.
In this group limits can be set, e.g. in blkio.throttle.read_bps_device or blkio.throttle.read_iops_device.
As the limit is per device, the total limit is divided by the number of devices.

To have any effect on processes, all processes PIDs which are under the regime of this cgroup are added to /sys/fs/cgroup/blkio/ORACLE/BX1/tasks.
For more flexibility, a list of patters (to match cmd in ps -eo pid,cmd ) are defined for each cgroup individualy. e.g. all processes, which either match a foreground process which was connected via listener BXTST02.*LOCAL=NO or any parallel process ora_p[[:digit:]].{2}_BXTST02.

In my configuration (crontab), the disks are added to the cgroup once per hour, whereas processes are added every minute.
This can lead to some delays if disks are added, and every single process can live up to one minute without any limits.

But for a longer period, it should be quite stable. (or at least, the best I can do in the sort time given)

The effect is acceptable. In a generic test with SLOB the picture is obvious:
When the processes were added to the cgroup, MB/s dropped down to the 5MB/s configured.

Of course by these limits, the average response time (or wait time) goes up.
In another test where SLOB was in the cgroup all the time, the MEAN responsetime was 0.027 sec.
But a histogram shows more than 50% of the calls finish within 10 ms (a reasonalbe value for the storage system in this test) but there is a peak between 25 and 50 ms which dominates the total response time.

RANGE {min ≤ e < max}    DURATION       %   CALLS      MEAN       MIN       MAX
---------------------  ----------  ------  ------  --------  --------  --------
 1.     0ms      5ms     6.010682    2.0%   2,290  0.002625  0.000252  0.004999
 2.     5ms     10ms    26.712672    9.0%   3,662  0.007295  0.005002  0.009997
 3.    10ms     15ms    12.935713    4.4%   1,090  0.011868  0.010003  0.014998
 4.    15ms     20ms     6.828035    2.3%     398  0.017156  0.015003  0.019980
 5.    20ms     25ms     4.846490    1.6%     218  0.022232  0.020039  0.024902
 6.    25ms     50ms    17.002454    5.7%     471  0.036099  0.025085  0.049976
 7.    50ms    100ms   182.104408   61.6%   2,338  0.077889  0.050053  0.099991
 8.   100ms  1,000ms    39.354627   13.3%     326  0.120720  0.100008  0.410570
 9. 1,000ms       +∞
---------------------  ----------  ------  ------  --------  --------  --------
TOTAL (9)              295.795081  100.0%  10,793  0.027406  0.000252  0.410570

This also can be seen in a graph:

The system is working and stable; probably not perfect but good enough for it's requirements.
There was a discussion if this should be achieved on storage layer. This would limit every process immediately, but would also be a much stricter setting. As an example I can exclude logwriter from any cgroup and let it work as fast as possible, whereas IO limits on storage side would put logwriter and foreground processes in same limits.

The script can be found at github, but it has some prerequisits and might not work on other than my current systems without adaption. Don't hesitate to contact me if you really want to give it a try 😉

2019-02-17

shared resources

Some days ago I had one of these calls from an application colleague:

The database was slow yesterday, can you please check what's the problem?

Of course, I had some short discussion if he really means the DB or should it be called the application is slow. Also some other questions needed to be asked first, e.g. if it's a response time or throughput issue, when it was "good" last time, what "bad" and "good" means in numbers (seconds/ experience, requests / second), if it's affecting all the experiences on this DB or only a subset. Can it be reproduced or at least does it occur on a known pattern. Also the DB name and users affected I had to ask.
Some time I should create a comprehensive checklist for these questions as I'm missing some of them regularly. But that not today's topic.

As the answers are very vague (at least I got a DB & username with some rough timestamp when it might has started) I checked the DB. I'm it would be more precise to focus on user experience. But I know the developers quite well; it would cost a lot of effort & management pressure to convince them to implement proper instrumentation.
So I was going for low hanging fruits first. ASH/AWR showed this picture:

There is time of more activities in the Database, and it's quite visible where it started and ended. With some more analysis it was obvious the number of calls didn't change dramatically. Getting data from the application (yes, there was at least an activity log) showed in fact the number of requests followed a daily pattern, but this didn't look like the picture above. The pattern of demand was similar to other days; but on any other day AWR showed no such picture, only a "breathe" similar to the applications demand.
By this conversation we implicit generated a (still very blurry) "good" pattern to compare the "bad" one.

These bits of information were not perfect, but at least something I could use for further investigations.
AWR shows much higher IO times than normal. I crosschecked the most active SQL_IDs from a "good" pattern against the "bad" one and could confirm they are IO-bound in "good" as well, so they might be affected by this unusual IO.
On the DB-cluster TFA is installed, and so is oswatcher. (Again, aggregates only. But still much better information than nothing). A small awk script analysed the data there (I just don't know why oswatcher doesn't has a SQL interface similar to osquery ) and there were huge response time, but no queueing on the Linux hosts.
This would now require digging deeper through the storage layers (FC SAN and several big central storage boxes). Unfortunately I does not have access to them and their performance data is not externalized right now and the SAN admin was to busy with other things, like compiling reports for senior managers 🤦.
But at least I'm somehow lucky: At this company a nice schema is implemented which gives ASM-disks much more information than only /dev/mapper/anything. As the name of the storage box is part of the ASM-disk path, by some knowledge about the possible servers involved (e.g. only servers in this datacenter, and with some other attributes could be involved at all) and a centralized repository I got a list of all DB-servers and DBs potentially involved on that layer.
With this list and another part of our repository (in that case periodic v$sysstat snapshots [aggregates of aggregates - guess the accuracy there]) a list of other DBs with high IO numbers during that time could be compiled.
Checking these databases there was a top producer of IO:

Some additional analysis showed this load on this particular DB created sufficient IO to saturate one subsystem in the storage box. By doing so all other system which use the same subsystem also suffered. To crosscheck that, the Storage admin (who completed the report in the meantime) provided other DBs utilizing that subsystem - and they showed similar patterns at exactly that time.

Even with all the data collected, this accumulation of coincidents is still no prove what was going on. But it was sufficient likely for the management to think about remediation activities.

Here the question might show up why only the admins of this one application showed up and complained about their system? During the investigation it was visible that other DBs showed similar patterns and so other applications were likely to suffer also.
I invested some time to get answers to these questions. This is not related to the initial goal of the activities, only to satisfy my curiosity and improve my knowledge about the whole landscape.
It showed these other applications also suffered, but the impact was much smaller for 2 reasons:

The portion of DB IO time to their user experience was relatively small. Even these high response time didn't cause a lot of harm.
The requirements of the most suffering application were more strict. It can be explained by the time a person accepts to wait after a click: If this click is on a webpage, most readers of this blog might understand what's going on behind the scene and some tenth of a second up to some seconds are acceptable for an answer. But if you flip a switch to turn on the light, they expect the light to shine immediately (aka. within less time than the human senses can grasp)

After all the analysis was done and roughly explained to the manager of the most suffering application, he invited for a meeting.
A manager, some people with different roles in development and ops, a representative for the "culprit application which caused the high load" and I were there. He opened the meeting with a summary (condensed by me to):

We and they share the same storage, they did some heavy query and we suffered.

Technically this is right, but the way it's provided to those not experienced in IT leads to the managers request:

We must ensure they do not cause such IO load anymore.

This might seem legit, but it might not lead to the managers real goal: Make his application reasonable fast in as many situations as possible. So I had to kill this branch of the discussion for a moment and explained, why they are not the real problem:
As the whole IT is a shared system on various layers (different hosts share the same storage system, different virtual hosts share the same VM layer, different instances share the same DB-cluster, different sessions share the same instance, ...) it is an unreasonable big effort to define resource limits, quotas and similar for all components involved. Especially in a highly dynamic environment with hundreds of DBs, thousands of servers and zillions of processes.
A majority in this room showed some shock & horror and claimed they were not aware of the high level of complexity in basic infrastructure.

With some effort we can limit the IO consumption of this particular DB which caused the issue this time. But next time (and there will be a next time) we only can be sure this particular DB is not involved.
As that's not sufficient, the next request was:

Let's then monitor the storage box and alarm every time a component is saturated.

A nice idea, isn't it?
But whom to alarm? -

The SAN admin of course.

What should he do then? He can't reduce the load and can't increase the storage capacity either (at least not short term without budget, vendor involved, ...).

So let's inform those which are connected to this resource!

In our example it would have been two dozen applications - most of them didn't suffer or care.

Then only the top 5.

In our case you wouldn't have been alarmed.

…
This also doesn't lead to anything useful.

About at that time I raised a simple question:

Who in this room is responsible for the (suffering part of the) application?

---

<silence>

(the managers face went pale for a second).

After some moments, an external (not employed) developer raised his hand and muttered something like "If no one wants to, I can take that".

I was slightly bored of this meeting, so to speed up things my next question was:

Is every IO your application doing [during the critical activity] necessary?

The argument in short goes like that: If you are less dependent on IO timing, even bad IOs can't harm that much. I call this

make it robust against impacts out of your control

Beside the obvious effect to improve the application, a nice side effect is a less busy total environment. This again pays back as better service for the whole company.

The meeting stopped close to that point. The new found architect & developers promised to see if they can make their application more robust, the manager wants to raise awareness in all IT departments what a shared environment is and how it can affect business processes within the company.
I have a mixed feeling: the analysis seems valid, but it was full of educated guess & shortcuts - this doesn't feel right as it contains "experience" and "luck"; there is a method which describes a more structured way, I just felt unable to follow these steps . I hope I could transfer the initial reaction (finger-pointing to another team) to something more valuable for the company: use less resources, make your application robust.
But I doubt my effort was lasting.

2018-09-04

RBAL process needs to release FORMER disk

Oracles Automatic Storage Management (ASM) has many nice advantages against other volume managers. One of my favorite is to add and remove or rebalance disks without any interruption for the database. Unfortunately the remove of devices after the rebalance is completed is not as straight forward as expected:

In a current environment (Grid Infrastructure: 12.2.0.1.180417, rdbms: 12.1.0.2.180116 and others) after the ALTER DISKGROUP xxx DROP DISK yyy specific discs are in status FORMER.

Due to a specific schema of directories & symlinks it takes some effort to come to the real /dev/dm-xxx device.

Still several RBAL processes of different DBs (I did not find a derterministic pattern which processes or DBs) hold a file handle on this specific device. To make it more interesting is the fact that the diskgroup (DG) or disk was never used by any of those DBs. We have a strict mapping of DBs to DGs, so it's easy for me to claim that.

To tell RBAL processes to release this "stale" filehandle, I have to run a

ALTER DISKGROUP zzz REBALANCE.

Any DG does the trick, it is not required to be related to the DB of RBAL process at all.

As diskgroups are normally balanced, this is a kind of NOP. Still it's required to wake up RBAL and makes it release unused filehandles.

I still have no idea why these RBAL processes open a device it's DB never needs at all;

but as I have my proper workaround, I don't investigate any further.

Might this help someone out there, or at least me when I hit this odd behavior again ;-)

2015-11-26

handling disks for ASM - when DB, Linux and Storage admins work together

A proper management of ASM Disks can be a complicated task.

On DOAG2015 I discussed with Martin Bach about the concept in my current company, where we implemented a setting which is consistent, robust and enables Storage, Linux and DB admins to work together easily.

As we started to think about ASM when 10.1 was out we tried to evaluate our possibility. asmlib was discarded quite early as it only increased complexity without additional value: We have a SAN (fibre channel) infrastructure with 2 separated fabrics. So a multipath configuration is needed. ASM (or asmlib) can not handle this, so a proper multipath configuration is needed at all. Also asmlib hides storage details from DBAs/ASM-Admins, where we wanted to enable every person involved know as many details as possible easily.

We also saw ASM sometimes takes a long time to scan for new disks (every time you access v$asm_disks - so use v$asm_disk_stat as this view does not rescan but only shows infos about devices it has in SGA already) if there are many "files" (devices) in asm_diskstring.

asm_diskstring

We set asm_diskstring to a dedicated directory. In our case it's called /appl/oracle/asm_disks/* This speeds up a rescan of all "disks", it's also a clear indicator of all disks ASM uses. There we have symlinks to devices in /dev/mapper/

symlinks

The symlink has this format:

/appl/oracle/asm_disks/360060e80167bd70000017bd700000007p1_p9500_b52_MONIQP01_000 -> /dev/mapper/360060e80167bd70000017bd700000007p1

Some informations about all the values we stored there:
360060e80167bd70000017bd700000007p1 is the WWN of the disk, together with it's partition (p1). The WWN is very useful in every discussion with Storage Admins, as it identifies the LUN from their perspective. We decided to partition the disks. It's shown in our records that Linux Admins touches un-formatted devices more often than devices which are formatted already. There were also some cases in early tests when the first block of a disk was cached by the kernel. Both issues are addressed when we format every disk. If required partitioning can help do adapt alignments.
p9500 is a shortname which identifies the Storage box with a name we can use during discussions. It's somewhere within the WWN as well. So it's a pure redundancy. But it makes discussions much easier.
b52 is a shortname to identify the datacenter. As pur fabrics are spawned across several datacenters, sometimes it's nice to have a fast knowledge about the topology.
MONIQP01_000 is the label used in some Storage boxes. It contains the Diskgroup name and some number. At the moment it's NOT the NAME of an ASM-disk, but this can be introduced easily.

As the name of a diskgroup is coded into our naming schema, it's not accepted to reuse a disk for some other diskgroup. (Technically it's still possible, we just agreed not to do so). Even it seems this limits the DBAs flexibility, there are good reasons to do so. Disks are sometimes created with dedicated settings/parameters for a special purpose. Reusing such disks in other DGs would cause strange and hard to find performance symptoms. So If disks are not needed anymore we always "destroy" them and re-create new if needed.

udev rules

Our udev ruleset on RedHat6 is quite simple:
the file /etc/udev/rules.d/41-multipath.rules contains such lines:

ACTION=="add|change", ENV{DM_NAME}=="360060e80167bd70000017bd700000007p1", OWNER:="oracle", MODE:="0660", GROUP:="asmadmin"

We do not do any mapping of names here - it's only there to set permissions.

multipath

The config in /etc/multipath.conf is quite simple, only parameters required for every specific storage vendor / product.

I can not say a lot about configurations outside if the Linux server, so both SAN fabrics and the storage system are "just working".

2011-03-16

2 spfiles - some confusion

At the moment I'm in a big migration project from 9i to 11gR2.
Most of the migrations are done like

create new DB

exp/imp

delete old DB

Even we have templates for all these steps, sometimes something goes wrong.

Today I hit such a something:
As we use Clusters for most of our DBs, cluster_database=TRUE should be set, otherwise it's generating problems. At the Creation of one DB, the 2nd instance did not come up. After some research I found out cluster_database was set to FALSE. No big deal, let's change it:

alter database set cluster_database=TRUE scope=spfile;

But after a

srvctl stop instance -d DB_UNIQUE_NAME -i INSTANCE_NAME

srvctl start instance -d DB_UNIQUE_NAME -i INSTANCE_NAME

a show parameter cluster_database still gave me FALSE. This confused me!
So I checked $ORACLE_HOME/dbs/initINSTANCE_NAME and there I found

spfile=+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.268.745343891 +ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.269.745406153

there where 2 spfiles in the pfile-definition!
After a


create pfile='/tmp/pfile1' from spfile='+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.268.745343891';
create pfile='/tmp/pfile2' from spfile='+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.269.745406153';

and a diff /tmp/pfile1 /tmp/pfile2 I decided to use only the 1st spfile in the $ORACLE_HOME/dbs/initINSTANCE_NAME and apply all the differences manually.
Now both instances are up and fine - I just have to delete the correct spfile in ASM.

2010-03-12

NOTE:Unident of disk

I just wondered where the lines


*** 2010-03-12 10:25:13.862
NOTE:Unident of disk:/appl/oracle/asm_disks/c3t60014380024D39280000A000039B0000d0s0_eva
NOTE:Unident of disk:/appl/oracle/asm_disks/c3t60014380024D39280000A00003A10000d0s0_eva

came from in my +ASMS744_rbal_3119.trc trace file every minute.

As i did not find any matches for NOTE:Unident of disk in MOS, I tried to open a SR there. Just in the preperation, I checked the status of this particular Disk:


select header_status, path 
from v$asm_disk 
where path like '%c3t60014380024D39280000A000039B0000d0s0%';

HEADER_STATUS PATH
------------- ------------------------------------------------------------------
CANDIDATE     /appl/oracle/asm_disks/c3t60014380024D39280000A000039B0000d0s0_eva

So I added these disks to a DiskGroup and the trace file did not grow any further.

I didn't found anything about Unident of disk in the docu, MOS or google. sadly.

2009-01-13

multiple ASM instances on one node (11gR1)

I just searched a way to circumvent the limit of 63 ASM DiskGroups in a storage system.
My first thought was to create a 2nd (or multiple) ASM instances on the same node which manage seperated DGs. This is quite easily possible: Just create a spfile for the 2nd ASM, just make sure to add *.DB_UNIQUE_NAME to avoid ORA-15150. I also recommend to use different ASM_DISKSTRINGs, to avoid multiple mount-attempts of the same DG on all ASM instances.
So I could create different DGs in different ASM-instances.
But now the troubles starts: In my test-RDBMS I could only see the DGs of the first ASM.
So I created a SR(7275580.994) at MetaLink to ask how to do it and wether or not it's supported in single instance or RAC.
To summarize the answers:

It is possible to run multiple ASMs on a node
One RDBMS can onlybe served by one ASM
all the GUIs are not aware of multiple ASMs
it's not supported in RAC

Even these answers are worth some forther testcases.
I have 2 ASMs: +ASM (default) and +ASM2 (2nd, with changed DB_UNIQUE_NAME and ASM_DISKSTRING) and one RDBMS. DG ASM1_DG created in +ASM, DG ASM2_DG created in +ASM2.

Test 1
+ASM up
+ASM2 up
=> only ASM1_DG1 visible in RDBMS .
Test 2
+ASM down
+ASM2 up
=> ASM2_DG1 visible
! create tablespace in ASM2_DG1
Test 3
+ASM up
+ASM2 up
- restart RDBMS
=> only ASM2_DG1 visible in RDBMS
Test 4
+ASM up
+ASM2 up
=> DBCA only shows ASM1_DG1:

Test 5

shutdown +ASM
switch ASM1_DG to +ASM2 (some fiddling with ASM_DISKSTRING and symlinks)
mount ASM1_DG in +ASM2
create TS in ASM1_DG
shutdown RDBMS and +ASM2
switch ASM1_DG back to +ASM
startup +ASM, +ASM2 and rdbms

this constelation (ASM1_DG mounted on +ASM, ASM2_DG mounted on +ASM2) led to this error-message:

ORA-01157: cannot identify/lock data file 8 - see DBWR trace file
ORA-01110: data file 8: '+ASM1_DG/berx2/datafile/asm1_dg.256.676065339'
ORA-17503: ksfdopn:2 Failed to open file +ASM1_DG/berx2/datafile/asm1_dg.256.67
6065339
ORA-15001: diskgroup "ASM1_DG" does not exist or is not mounted
ORA-15001: diskgroup "ASM1_DG" does not exist or is not mounted

Test5 led me to one guess: At startup a RDBMS can potentially access all ASMs, but the first attempt to access a DG pins the RDBMS to the ASM. Unfortunately I do not know how to prove this.
Within the SR Bug:6991236 was mentioned.
This Bug is not visible right now, but interpreting the header I can guess it's an enhancement request to allow or support multiple ASM-instances with differen OS-users in a cluster-environment. It's shown to be fixed (implemented) in 11gR2 and has status 98 (Suggestion Implemented). We will see when 11gR2 is out.

berxblog