2023-01-14

OEM: ASH Package Version Status - manual update

Oracle Enterprise Manager is a perpetual well of joy! 

In the Performance Hub page, it can show a yellow warning sign (⚠️). I didn't see any degradation because if this, but my inner Monk is disturbed, So I have to fix it. 

This warning sign gibes a little information about an ASH package version and how it should be replaced. 
For some reasons it's not always that easy to run these jobs (or they run, but the triangle doesn't disappear). However, there is a quite straight forward method to deploy the proper packages, make the triangle disappear (and my Monk happy): 
the Scripts which need to be applied to the target DB are located on the OMS host in 
<OMS_HOME>middleware/plugins/oracle.sysman.db.oms.plugin_*/sql/db/latest/instance/ 
(or whatever the plugin version is). 
The files there are 

.
└── middleware
    └── plugins
        └── oracle.sysman.db.oms.plugin_13.5.1.0.0
            └── sql
                └── db
                    └── latest
                        └── instance
                            ├── ashviewer_pkgbodys.sql
                            ├── ashviewer_pkgdefs.sql
                            ├── dbms_compare_period.sql
                            ├── eaddm_pkgbody.sql
                            ├── eaddm_pkgdef.sql
                            ├── omc_ashv_pkg_body.sql
                            ├── omc_ashv_pkg.sql
                            ├── priv_grant_omc_ash.sql
                            ├── prvs_awr_data_cp.sql
                            ├── prvs_awr_data.sql
                            ├── prvt_awr_data_cp.sql
                            ├── prvt_awr_data.sql
                            ├── prvt_compare_period.sql
                            ├── README.txt
                            └── test.sql
                            

Not all these files are required. in fact, only these 3: 

omc_ashv_pkg.sql omc_ashv_pkg_body.sql priv_grant_omc_ash.sql

To apply them into DBSNMP, it's required to grant execute on DBMS_SQL and DBMS_LOB to DBSNMP. The 3 scripts must be executed as DBSNMP. 

With these steps, OEM is happy and the warning is gone. - everything is fine. (and especially for automated mass deployments, I see it harder to schedule some jobs in OEM and hope for their success, than simply run some scripts).


Just for the curious, the (more important from my perspective) parts of the package are

PACKAGE omc_ash_viewer

...


  REPORT_INTERNAL_VERSION CONSTANT VARCHAR2(64) := '53';

  -- date format to be used for communications with package.
  ASH_TIME_FORMAT CONSTANT VARCHAR2(30) := 'MM/DD/YYYY HH24:MI:SS';

  -- error ratio to be acceptable for not mixing in-memory with on disk.
  -- We set it to 1 to always mix memory and disk in cases where neither 
  -- covers all the time period.
  ASH_ALLOWED_ERR_RATIO CONSTANT NUMBER := 1;

  -- -----------------------------------------------------
  -- controlling the number of buckets.
  -- constants are for settings of "LOW", "MEDIUM", "HIGH"
  -- 
  -- The setting of "MAX" (or "ALL") is: 
  --  same number of buckets as high, but no down sampling
  -- -----------------------------------------------------
  ASH_LOW_RESOLUTION CONSTANT VARCHAR2(10) := 'LOW';
  ASH_LOW_BUCKETS CONSTANT NUMBER := 120;
  ASH_LOW_ROWS_PER_BUCKET CONSTANT NUMBER := 20;
  ASH_MED_RESOLUTION CONSTANT VARCHAR2(10) := 'MEDIUM';
  ASH_MED_BUCKETS CONSTANT NUMBER := 180;
  ASH_MED_ROWS_PER_BUCKET CONSTANT NUMBER := 35;
  ASH_HIGH_RESOLUTION CONSTANT VARCHAR2(10) := 'HIGH';
  ASH_HIGH_BUCKETS CONSTANT NUMBER := 360;
  ASH_HIGH_ROWS_PER_BUCKET CONSTANT NUMBER := 50;
  ASH_MAX_RESOLUTION CONSTANT VARCHAR2(10) := 'MAX';
  
  
  -- default REAL TIME min bucket size in seconds
  ASH_DEF_MEM_BUCKET_SIZE CONSTANT NUMBER := 10;

  -- default Historical min bucket size in seconds
  ASH_DEF_DISK_BUCKET_SIZE CONSTANT NUMBER := 20;

  -- length of SQL text to fetch
  ASH_DEF_SQLTEXT_LEN CONSTANT NUMBER := 200;

  -- database version constants
  VER_12_2 CONSTANT VARCHAR2(12) := '1202000000';
  VER_12_1_2 CONSTANT VARCHAR2(12) := '1201000200';
  VER_12_1 CONSTANT VARCHAR2(12) := '1201000000';
  VER_12   CONSTANT VARCHAR2(12) := '1200000000';
  VER_11_MIN CONSTANT VARCHAR2(12) := '1102000200';
  VER_19   CONSTANT VARCHAR2(12) := '1900000000';
  VER_20   CONSTANT VARCHAR2(12) := '2000000000';

  TOP_ADD_INFO_COUNT      CONSTANT BINARY_INTEGER := 20;
  MAX_INFO_TIME_LIMIT     CONSTANT BINARY_INTEGER := 2;

  -- menu categories --
  -- when you add a new category here, make sure to visit the function
  -- generate_menu_xml and add the category there.
  RSRC_CONS_CAT           CONSTANT VARCHAR2(128) := 'resource_consumption_cat';
  SESS_ID_CAT             CONSTANT VARCHAR2(128) := 'session_identifiers_cat';
  SESS_ATTR_CAT           CONSTANT VARCHAR2(128) := 'session_attributes_cat';
  SQL_CAT                 CONSTANT VARCHAR2(128) := 'sql_cat';
  PLSQL_CAT               CONSTANT VARCHAR2(128) := 'pl_sql_cat';
  TARGET_CAT              CONSTANT VARCHAR2(128) := 'target_category';


  -- -------------------------------------------------------------------------
  --                      error number constants
  -- -------------------------------------------------------------------------
  ERR_DIMNAME_TOO_LONG CONSTANT NUMBER := -13720;
  ERR_DIMNAME_INVALID  CONSTANT NUMBER := -13721;

  -- str_to_ascii converts a string in the DB language and character set to 
  -- ASCII8 that is safe to use in XML and XMLCDATA elements. Special 
  -- characters are masked based on UTF16 standard of \xxxx using asciistr 
  -- SQL function.
  FUNCTION str_to_ascii(s IN VARCHAR) RETURN VARCHAR;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getSqlText
  --   Returns the text and command of a SQL (if found) as an XML document
  --   Data sources are gv$sql and AWR.
  --   In case of a PDB: if AWR snapshots are taken regularly in the PDB,
  --   The AWR data source is AWR_PDB_SQLTEXT, otherwise it is AWR_ROOT_SQLTEXT
  --
  --   Arguments:
  --     p_dbid: the dbid to use.
  --             NULL: fetch from the local RDBMS
  --             If it matches the local dbid or con_dbid, fetch local.
  --             Otherwise, assume imported snapshots.
  --     p_sql_ids: the sql_id values we are interested in.
  --             Option 1: commad separated list with no spaces.
  --             Option 2: XML document in the format 
  --     <sqlid><m v="1q1spprb9m55h"></m><m v="a2k1zqcbp5nxf"</m></sqlid>
  -- 
  --   Returns: XML document containing the data, only of SQL that were found.
  --   The format is:
  --    <sqlid>
  --          <m v="1q1spprb9m55h" op="SELECT">
  --             <![CDATA[WITH MONITOR_DATA AS (SELECT INST_ID, KE ... ]]></m>
  --          <m v="a2k1zqcbp5nxf" op="INSERT">
  --             <![CDATA[insert into foo values(2)]]></m>
  --    </sqlid>
  --
  --   SQL text is truncated to size ASH_DEF_SQLTEXT_LEN, same as data APIs
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getSqlText(p_dbid IN NUMBER, p_sql_ids IN VARCHAR)
  RETURN   XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- fetch_sqltext
  --   Returns the text and command of a SQL (if found) as an XML element
  -- Subject to time limit: may return NULL if no time is left.
  -- p_sqlid: the SQL ID of the SQL we are looking for.
  -- p_is_local: 'y' if the data is from the RDBMS we query
  --             any other value indicates an imported AWR snapshot
  -- p_dbid and p_con_dbid: In case of a local (p_is_local = 'y') p_dbid
  --             is the root's dbid and p_con_dbid is the PDB's dbid.
  --             p_con_dbid can be null in case of old RDBMS, or in case
  --             we are using a standard RDBMS.
  -- p_is_pdb: 'y' if it is a local query from inside a pdb.
  -- p_is_old: 'y' if the local RDBMS version is 12.1 or below. 
  -- p_time_limit: 'y' if we are bound by time limit. 
  --
  -- We may search for SQL text in various places depending on the situation.
  -- Order of search for the various cases:
  --  A. Local snapshots, Non-CDB RDBMS or Root of CDB:
  --     1. V$SQL using sql_id
  --     2. DBA_HIST_SQLTEXT using sql_id, root_dbid
  --  
  --  B. Local snapshots, inside PDB, version 12.1 (is_old)
  --     1. V$SQL using sql_id
  --     2. DBA_HIST_SQLTEXT using sql_id, root_dbid
  --  
  --  C. Local snapshots, inside PDB, versions 12.2 and above
  --     1. V$SQL using sql_id
  --     2. AWR_PDB_SQLTEXT using sql_id, con_dbid
  --     2. AWR_ROOT_SQLTEXT using sql_id, root_dbid
  --  
  --  D. Imported snapshots, standard RDBMS or Root of CDB
  --     1. DBA_HIST_SQLTEXT using sql_id, dbid
  --  
  --  E. Imported snapshots, PDB versions 12.2 and onwards
  --     1. WR_PDB_SQLTEXT using sql_id, dbid
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION fetch_sqltext(p_sqlid IN VARCHAR, 
                         p_dbid IN NUMBER, p_con_dbid IN NUMBER,
                         p_is_local IN VARCHAR, p_is_pdb IN VARCHAR,
                         p_is_old IN VARCHAR, p_time_limit IN VARCHAR)
  RETURN   XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- fetch_obj_name
  --   Returns the fully qualified name of an object and its type as 
  --   an XML element.
  -- Subject to time limit: may return NULL if no time is left.
  -- p_obj_id: the ID of the segment.
  -- p_is_local: 'y' if the data is from the RDBMS we query
  --             any other value indicates an imported AWR snapshot
  -- p_dbid and p_con_dbid: In case of a local (p_is_local = 'y') p_dbid
  --             is the root's dbid and p_con_dbid is the PDB's dbid.
  --             p_con_dbid can be null in case of old RDBMS, or in case
  --             we are using a standard RDBMS.
  -- p_local_type: determines what RDBMS we are connected to (local)
  --               'ROOT' is the root of a CDB
  --               'PDB' is a PDB
  --               All other values mean a standard RDBMS
  -- p_is_old: 'y' if the local RDBMS version is 12.1 or below. 
  -- p_time_limit: 'y' if we are bound by time limit. 
  --
  -- We may search for object in various places depending on the situation.
  --   a. Dictionary (dba_objects or cdb_objects)
  --      dba_objects is moderately expensive to try
  --      cdb_objects is very expensive: should be tried last
  --   b. Local AWR using awr_pdb_seg_stat_obj or dba_hist_seg_stat_obj
  --      This is the cheapest data source. Should be tried first.
  --   c. Object link AWR (when inside a PDB and looking at the root).
  --      View is awr_root_seg_stat_obj (versions 12.2 and above) or 
  --      dba_hist_seg_stat_obj in version 12.1
  --      This is very expensive.
  -- 
  -- Order of search for the various cases:
  --
  -- A. Local snapshots, CDB Root:
  --    1. AWR(obj_id, root_dbid, con_dbid, 'DBA_HIST')
  --    2. Dictionary(obj_id, con_dbid, CDB_OBJECTS)
  -- 
  -- B. Local Snapshots, Non-CDB DB:
  --    1. AWR(obj_id, dbid, 'DBA_HIST')
  --    2. Dictionary(obj_id, DBA_OBJECTS)
  -- 
  -- C. Local Snapshots, PDB versions 12.1 and below
  --    1. Dictionary(obj_id, DBA_OBJECTS)
  --    2. AWR(obj_id, root_dbid, con_dbid, 'DBA_HIST')
  -- 
  -- D. Local snapshots, PDB versions 12.2 and above
  --    1. AWR(obj_id, con_dbid, 'AWR_PDB')
  --    2. Dictionary(obj_id, DBA_OBJECTS)
  --    3. AWR(obj_id, root_dbid, con_dbid, 'AWR_ROOT')
  -- 
  -- E. Imported snapshots, CDB root, standard RDBMS
  --    1. AWR(obj_id, dbid, con_dbid, 'DBA_HIST')
  -- 
  -- F. Imported Snapshots, inside PDB
  --    1. AWR(obj_id, dbid, con_dbid, 'AWR_PDB')
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION fetch_obj_name(p_obj_id IN NUMBER, p_dbid IN NUMBER,
                          p_con_dbid IN NUMBER, p_is_local IN VARCHAR,
                          p_local_type IN VARCHAR, p_is_old IN VARCHAR,
                          p_time_limit IN VARCHAR)
  RETURN   XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- fetch_procedure_name 
  --   Returns the full qualified PL/SQL procedure name (if found) 
  -- Subject to time limit: may return NULL if no time is left.
  -- This can only run on local DB - looks at dictionary
  -- ROOT: look at cdb_procedures
  -- PDB/standard DB: look at dba_procedures
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION fetch_procedure_name(p_obj_id IN NUMBER, p_subobj_id IN NUMBER,
                                p_con_dbid IN NUMBER, p_time_limit IN VARCHAR)
  RETURN VARCHAR;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- fetch_user_name 
  --   Returns the user name (if found) 
  -- Subject to time limit: may return NULL if no time is left.
  -- This can only run on local DB - looks at dictionary
  -- ROOT: look at cdb_users
  -- PDB/standard DB: look at dba_users
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION fetch_user_name(p_user_id IN NUMBER, p_con_dbid IN NUMBER,
                           p_time_limit IN VARCHAR)
  RETURN VARCHAR;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getVersion
  --   Returns the version of the package
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getVersion RETURN VARCHAR;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getData
  --   Single API for all other public functions in this package.
  --  "data_type"
  --     specifies which type of API is to be used. accepted values are:
  --     "data" for ASH data
  --     "timepicker" for the Time Picker graph
  --     "histogram" for a filtered time picker.
  --     "version" for getting just the package version
  --     "awr" for getting AWR info
  --     "cpu" for getting CPU info
  --     If an invalid value is given, ORA-20001 is raised as an error.
  --  "time_type"
  --     specifies how the time period is to be interpreted.
  --     (useful only for "data", "timepicker", "histogram") APIs
  --     "realtime" for all Real Time interfaces (from some time in the past to NOW)
  --     "incremental" for an increment over real time (bucket size must be defined)
  --     "historical" for a longer time period or a time period in the past (two time stamps)
  --     If an invalid value is given, ORA-20002 is raised as an error.
  --  "filter_list"
  --     is the filter used in the same way as the original package
  --  "args"
  --     contains the rest of the arguments in XML format.
  --     The xml format is as follows (example containing all valid arguments)
  --     If a mandatory argument is missing, ORA-20003 is raised as an error.
  --  
  --    <args>
  --       <dbid>87658765</dbid>
  --       <instance_number>1</instance_number>
  --       <time_since_sec>3600</time_since_sec>
  --       <begin_time_utc>07/23/2018 10:20:00</begin_time_utc>
  --       <end_time_utc>07/24/2018 08:30:00</end_time_utc>
  --       <bucket_size>30</bucket_size>
  --       <show_sql>n</show_sql>
  --       <verbose_xml>n</verbose_xml>
  --       <include_bg>n</include_bg>
  --       <minimize_cost>n</minimize_cost>
  --       <awr_info>n</awr_info>
  --       <resolution>medium</resolution>
  --    </args>
  --  
  --     Arguments that are not needed or that you wish to use the default values for,
  --     do not need to be specified in the XML doc.
  --     
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getData(data_type   VARCHAR2,
                   time_type   VARCHAR2,
                   filter_list VARCHAR2,
                   args        VARCHAR2
  ) RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getAWRInfo
  --   Returns information about AWR snapshots and ADDM Tasks
  --   p_dbid - specifies which dbid to look for
  --   p_inst_num - specifies the instance (if we want just one instance),
  --                use the default of NULL to get info on all instances
  --   p_begin_time_utc, p_end_time_utc - the time interval for information
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getAWRInfo(p_dbid IN NUMBER, p_begin_time_utc IN VARCHAR2,
                      p_end_time_utc IN VARCHAR2, p_inst_num IN NUMBER := NULL)
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getCPUInfo
  --   Returns information about availbale CPUs in XML format at a single 
  --   point in time.
  --    - dbid : specifies which db to look for, default (NULL) is DB we are
  --             conncted to.
  --    - observationTime : approximate time in which to look for data.
  --             default (NULL) is the latest possible data available 
  --             (NOW if possible).
  --    - ignore_cpu_history : when 'y', the API will not search for CPU 
  --             history from AWR
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getCPUInfo(dbid IN NUMBER := NULL,
                      observationTime IN VARCHAR := NULL,
                      instance_number IN NUMBER := NULL,
                      ignore_cpu_history IN VARCHAR := 'n')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getTimePickerRealTime
  --   Returns the time picker data for Real Time usage in XML format
  --   Time period is from NOW-time_since_sec to NOW. 
  --   The default time period is the last hour. 
  -- data is for entire database (all instances) we ara currently connected to
  -- ,foreground only, and in case we connect to a PDB - limited to that PDB.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  --
  --  The format of the output is (example):
  --  <report> <report_parameters> ... </report_parameters>
  --  <awr_snaps>
  --    <snap snap_id="15" snap_time="11/27/2018 11:00:54" cnt_inst="1" 
  --          task_id="34" owner="SYS" task_name="ADDM:1997586511_1_15" 
  --          fdg_count="0"/>
  --    <snap snap_id="16" snap_time="11/27/2018 12:00:08" cnt_inst="1" 
  --          task_id="35" owner="SYS" task_name="ADDM:1997586511_1_16" 
  --          fdg_count="0"/>
  --  </awr_snaps>
  --  </report>
  --  An xml element of the list of snapshots is included as an option
  --  for all time picker reports over local data.
  --  The element's meaning:
  --  1. snap_id : the id of the AWR snapshot
  --  2. snap_time: the timestamp of the end time of the snapshot (i.e.
  --                the time the snapshot was taken). In case of RAC, it
  --                is the average time across all instances for the same
  --                snapshot
  --  3. cnt_inst: number of instances participating in the snapshot.
  --               In case the API is at instance level, this will be 
  --               always '1' even in RAC
  --  4. task_id: The id of the automatically generated ADDM task 
  --              associated with the snapshot (we choose the minimal 
  --              task_id in case there is more than one)
  --  5. owner, task_name: Another way to identify a task in advisor 
  --              framework. 
  --  6. fdg_count: the number of findings in the ADDM task
  --  7. resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getTimePickerRealTime(
      time_since_sec IN NUMBER := 3600
    , show_sql       IN VARCHAR2 := 'n'
    , verbose_xml    IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , awr_info IN VARCHAR2 := 'n'
    , ignore_cpu_history IN VARCHAR := 'n'
    , resolution      IN VARCHAR2 := 'medium'
    , include_bg      IN VARCHAR2 := 'n')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- incrementTimePicker
  --   Returns the time picker data for Real Time usage in XML format
  --   This function is used to get incremental data after the initial load.
  --   Incremental use case is only for Real Time.
  --   Time period is from begin_time_utc to NOW.
  --   There is no default time period. 
  --   The time is bucketized using bucket_size (in seconds). 
  --   The bucket boundaries are: 
  --     begin_time_utc, begin_time_utc+bucket_size, 
  --     begin_time_utc+2*bucket_size etc.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION incrementTimePicker(
      begin_time_utc IN VARCHAR2 
    , bucket_size    IN NUMBER 
    , show_sql       IN VARCHAR2 := 'n'
    , verbose_xml    IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , awr_info IN VARCHAR2 := 'n'
    , resolution      IN VARCHAR2 := 'medium'
    , include_bg      IN VARCHAR2 := 'n')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getTimePickerHistorical
  --   Returns the time picker data for historical in an XML format
  --   dbid determines which RDBMS to look for, default is the one we are
  --     connected to.
  --   Time period is one of the following
  --     a) From begin_time_utc to end_time_utc if both are specified.
  --     b) From NOW-time_since_sec to NOW (default is 24 hours)
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getTimePickerHistorical(
      dbid           IN NUMBER := NULL
    , begin_time_utc IN VARCHAR2 := NULL
    , end_time_utc   IN VARCHAR2 := NULL
    , time_since_sec IN NUMBER := 86400
    , show_sql       IN VARCHAR2 := 'n'
    , verbose_xml    IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , awr_info       IN VARCHAR2 := 'n'
    , ignore_cpu_history IN VARCHAR := 'n'
    , resolution      IN VARCHAR2 := 'medium'
    , include_bg      IN VARCHAR2 := 'n')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getHistogramRealTime
  --   Returns the ASH histogram for Real Time Usage in an XML format
  --   Time period is from NOW-time_since_sec to NOW (default is one hour)
  --   The data can be filtered using the filter list.
  -- data is for entire database (all instances) we ara currently connected to
  -- ,foreground only, and in case we connect to a PDB - limited to that PDB.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getHistogramRealTime(
      filter_list     IN VARCHAR2 := NULL
    , time_since_sec  IN NUMBER := 3600
    , show_sql        IN VARCHAR2 := 'n'
    , verbose_xml     IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , ignore_cpu_history IN VARCHAR := 'n'
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- incrementHistogram
  --   Returns the ASH histogram for Real Time usage in XML format
  --   This function is used to get incremental data after the initial load.
  --   Incremental use case is only for Real Time.
  --   Time period is from begin_time_utc to NOW.
  --   There is no default time period. 
  --   The time is bucketized using bucket_size (in seconds). 
  --   The bucket boundaries are: 
  --     begin_time_utc, begin_time_utc+bucket_size, 
  --     begin_time_utc+2*bucket_size etc.
  --   The data can be filtered using the filter list.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION incrementHistogram(
      filter_list    IN VARCHAR2 := NULL
    , begin_time_utc IN VARCHAR2
    , bucket_size    IN NUMBER
    , show_sql       IN VARCHAR2 := 'n'
    , verbose_xml    IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getHistogramHistorical
  --   Returns the ASH histogram for historical usage in an XML format
  --   dbid determines which RDBMS to look for, default is the one we are
  --     connected to.
  --   Time period is one of the following
  --     a) From begin_time_utc to end_time_utc if both are specified.
  --     b) From NOW-time_since_sec to NOW (default is 24 hours)
  --   The data can be filtered using the filter list.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getHistogramHistorical(
      dbid            IN NUMBER := NULL
    , filter_list     IN VARCHAR2 := NULL
    , begin_time_utc  IN VARCHAR2 := NULL
    , end_time_utc    IN VARCHAR2 := NULL
    , time_since_sec  IN NUMBER := 86400
    , show_sql        IN VARCHAR2 := 'n'
    , verbose_xml     IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , ignore_cpu_history IN VARCHAR := 'n'
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getDataRealTime
  --   Returns the ASH data for Real Time Usage in an XML format
  --   Time period is from NOW-time_since_sec to NOW (default is one hour)
  --   The data can be filtered using the filter list.
  -- data is for entire database (all instances) we ara currently connected to
  -- ,foreground only, and in case we connect to a PDB - limited to that PDB.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- Minimize_cost: If set to 'y', 
  --                a. the time budget for additional information is 0.
  --                b. on disk data is disabled
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getDataRealTime(
      filter_list     IN VARCHAR2 := NULL
    , time_since_sec  IN NUMBER := 3600
    , show_sql        IN VARCHAR2 := 'n'
    , verbose_xml     IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , minimize_cost   IN VARCHAR2 := 'n'
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- incrementData
  --   Returns the ASH data for Real Time usage in XML format
  --   This function is used to get incremental data after the initial load.
  --   Incremental use case is only for Real Time.
  --   Time period is from begin_time_utc to NOW.
  --   There is no default time period. 
  --   The time is bucketized using bucket_size (in seconds). 
  --   The bucket boundaries are: 
  --     begin_time_utc, begin_time_utc+bucket_size, 
  --     begin_time_utc+2*bucket_size etc.
  --   The data can be filtered using the filter list.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- Minimize_cost: If set to 'y', 
  --                the time budget for additional information is 0.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION incrementData(
      filter_list    IN VARCHAR2 := NULL
    , begin_time_utc IN VARCHAR2
    , bucket_size    IN NUMBER
    , show_sql       IN VARCHAR2 := 'n'
    , verbose_xml    IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , minimize_cost   IN VARCHAR2 := 'n'
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;

  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  -- getDataHistorical
  --   Returns the ASH data for historical usage in an XML format
  --   dbid determines which RDBMS to look for, default is the one we are
  --     connected to.
  --   Time period is one of the following
  --     a) From begin_time_utc to end_time_utc if both are specified.
  --     b) From NOW-time_since_sec to NOW (default is 24 hours)
  --   The data can be filtered using the filter list.
  -- If instance_number is NULL it means all instances. Otherwise, fetch 
  --  only data for the specified instance number.
  -- Minimize_cost: If set to 'y', 
  --                the time budget for additional information is 0.
  -- resolution: valid values are 'low', 'medium', 'high', 'max' (or 'all')
  --        "reolution" control the number of buckets and rows per bucket.
  -- ------------------------------------------------------------------------
  -- ------------------------------------------------------------------------
  FUNCTION getDataHistorical(
      dbid            IN NUMBER := NULL
    , filter_list     IN VARCHAR2 := NULL
    , begin_time_utc  IN VARCHAR2 := NULL
    , end_time_utc    IN VARCHAR2 := NULL
    , time_since_sec  IN NUMBER := 86400
    , show_sql        IN VARCHAR2 := 'n'
    , verbose_xml     IN VARCHAR2 := 'n'
    , include_bg      IN VARCHAR2 := 'n'
    , instance_number IN NUMBER := NULL
    , minimize_cost   IN VARCHAR2 := 'n'
    , resolution      IN VARCHAR2 := 'medium')
  RETURN XMLTYPE;


As the package body isn't wrapped at all, it's easy to search for implementation details there also.

2023-01-07

cman - the misleading error message

 In my last post I showed how a human error and a misleading log entry leaded to a waste of resources and time identifying (and fixing) the problem. 

But that's not all: Oracle connection manager (cman) is providing additional sources of joy and time wasting error analysis by leading in the wrong direction.

In this case, the cman (21.3) is working fine. 

The database instance has a REMOTE_LISTENER configuration pointing to the cman. 

The service is registered in the cman (show services) and there is a proper rule for the service in place (show rules):

Still at connecting with sqlplus, an unexpected error occurs: 


sqlplus -L x/x@<MY_ALIAS>

SQL*Plus: Release 19.0.0.0.0 - Production on Do Jan 6 21:40:04 2023
Version 19.14.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

ERROR:
TNS-12529: TNS:connect request rejected based on current filtering rules


But even after checking the services and rules again, there is no visible error. 
To cut things short (and avoid moaning about lost hours of investigation), the problem is located in some unexpected configuration: 

The database instance parameter LOCAL_LISTENER was set to 
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=REGISTER_listener)))

This made totally sense as long as no cman was in use: the instance registered to it's dedicated listener and then the services were available for clients to connect to. 

http://wikiality.wikia.com/wiki/File:Bridge-to-nowhere.jpg


But in my case, rdbms instance told the cman it can handle the service MY_SERVICE (which is used in MY_ALIAS) by redirecting all connections to ipc:REGISTER_listener. Of course this listener is not available at the cman host, and so the gateway process fails miserably. 

By understanding the problem, the solution is simple: changing the database instance parameter to 

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=REGISTER_listener))),instance_host:1521

did the trick - and sqlplus was happy (at least as happy as ORA-1017 can make)
Quite simple, but it didn't map the error message 
TNS-12529: TNS:connect request rejected based on current filtering rules
at least to my standards. 
I'd prefer a statement like 
gateway couldn't establish connection for service MY_SERVICE to (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=REGISTER_listener))) 
Maybe a future release can do that trick?

2023-01-06

cman - the misleading log entry

 In a recent Oracle connection Manager (cman) installation (Version 21.3) I tried to configure a minimal ruleset and test the cman instance, before I let database instances register their services. 

So a simple list of rules at that stage should allow administration of the cmon service (that's the cman itself) from the local machine, but reject all other attempts regardless their source or destination. 
Later on in between I plan to add all the dedicated services, their allowed source machines and so on. 


  (rule_list=
    (rule=(src=<CMAN_HOST>)(dst=127.0.0.1)(srv=cmon)(act=accept))
    (rule=(src=*)(dst=*)(srv=*)(act=deny))
  )  


But when I tried to startup the cman, It threw me a nasty TNS-04012: Unable to start Oracle Connection Manager instance:

CMCTL> administer cman
Current instance cman is not yet started
Connections refer to (DESCRIPTION=(address=(protocol=tcp)(host=<CMAN_HOST>)(port=1521))).
The command completed successfully.
CMCTL:cman> startup
TNS-04012: Unable to start Oracle Connection Manager instance.
CMCTL:cman>
A short glimpse into the logfile showed one matching line 

(LOG_RECORD=(TIMESTAMP=06-JAN-2023 18:40:01)(EVENT=CMAN.ORA contains no rule for local CMCTL connection)(Add (rule=(src=<CMAN_HOST>)(dst=127.0.0.1)(srv=cmon)(act=accept)) in rule_list)

 
This is suspicious, as exactly that line already exists in the rule list. Even when I add this rule several times, or with any other patterns like ::1 for localhost, it didn't help. 
Only changing the last line to act=accept let me start the cman - which is fine to make some progress, but not acceptable from a later required security perspective. 

I was so desperate fixing the rule for accepting the cmon service, I failed to solve this problem myself - so I asked for help: 

the solution is as simple as I was blind: deny is not a valid keyword for any rules action, the correct keywords are accept, reject or drop

So the simple solution is to change the configuration to

  (rule_list=
    (rule=(src=<CMAN_HOST>)(dst=127.0.0.1)(srv=cmon)(act=accept))
    (rule=(src=*)(dst=*)(srv=*)(act=drop))
  )  


And I could easily startup the listener.
The Problem obviously existed between my keyboard and chair. 
I just have wished the error-message would have been a little bit more helpful. 
Maybe in a future release the logfile could show 
rule (rule=(src=*)(dst=*)(srv=*)(act=deny)) can not be parsed at keyword  deny 
Let's hope for the best. 
 

2022-12-29

AHF Insights

Oracle recently released a new version of its Autonomous Health Framework (AHF). At the time of writing the version is 22.3.1. 
As the development is quite active there, one of the new features is AHF Insights with a nice description: 

AHF Insights provides deeper diagnostic insights into Oracle diagnostic collections collected by AHF diagnostic utilities, Oracle EXAchk, Oracle Trace File Analyzer, Exawatcher, and Cluster Health Monitor.

This sounds promising, so I was tempted to give it a try! 
My first attempt failed: I gave it a chance at my sandbox in my laptop, but right now AHF Insights only work on Engineered Systems. I didn't found this documented somewhere - but ok. The good news: I was told AHF Insights will be available in future versions also for non-Engineered environments. As this is the first release, I can follow the argument here. 

So I had to grab an Engineered System machine, install AHF, let it run for some time and then just run 

ahf analysis create --type insights
I also could be executed for a specific time in the past (with --last n{m|h}) or some range (--from DATETIME --to DATETIME) - But as I was not searching for a specific issue, I took the default 2 Hours
The execution took some minutes, but it was faster than an EXAchk
The zipfile created has a fileformat like <hostname>_insights_YYYY_MM_DD_HH24_MI_SS - it contains one directory web containing an index.html and a lot of subfolders:
.
└── web
    ├── css
    │   ├── alta
    │   │   └── 11.1.1
    │   │       └── web
    │   │           ├── fonts
    │   │           └── images
    │   │               └── sprites
    │   ├── fonts
    │   ├── images
    │   └── redwood
    │       └── 11.1.1
    │           ├── common
    │           └── web
    │               ├── fonts
    │               └── images
    ├── dynamicHtml
    ├── icons
    ├── js
    │   ├── jsons
    │   │   ├── insights
    │   │   ├── meta
    │   │   └── topology
    │   ├── libs
    │   │   ├── oj
    │   │   │   └── v11.1.1
    │   │   │       └── resources
    │   │   │           └── root
    │   │   └── require
    │   ├── viewModels
    │   └── views
    │       ├── compliance
    │       │   └── js
    │       │       └── input
    │       ├── dbParameter
    │       │   └── js
    │       │       └── input
    │       ├── ganttPage
    │       │   └── js
    │       │       └── input
    │       ├── os
    │       │   └── js
    │       │       └── input
    │       └── rpm
    │           └── js
    │               └── input
    └── log


the welcome.html is all you need to open. In my case the first page looks like this: 
It contains some basic information about 
  • the AHF version 
  • the time range it covers
  • System Topology
  • Insights it found



The Cluster information contains basic details about the cluster itself, it's resources and the ASM disks. 
The summary information has anCopy as text button which generates a nice output: 

========================================================================================================================
Area                           | Dimension                      | Value
========================================================================================================================
System                         | Type                           | Exadata
System                         | Total Node Count               | 2
Cluster                        | Node Count                     | 1
Cluster                        | GI Version                     | 19.17.0.0.0
Cluster                        | Timezone                       | Europe/
Cluster                        | Cluster Name                   | 
Cluster                        | CRS Home                       | /u01/app/19.0.0.0/grid
Databases                      | Count                          | 1
Databases                      | Database Home                  | /u02/
Database Servers               | Count                          | 2
Database Servers               | Hardware Model                 | Standard PC (Q35 + ICH9, 2009)
Database Servers               | Image Version                  | 21.2.13.0.0.220602
Database Servers               | Operating System               | Linux x86_64
Database Servers               | Operating System Version       | 4.14.35-2047.511.5.5.1.el7uek.x86_64
========================================================================================================================
Unfortunately this feature isn't present on all the pages. 
The Cluster Resources and ASM disks just contain information about all OCR resources and the ASM diskgroups. 

The Databases Topology information provides some basic information about the CDBs and their PDBs, together with some very basic instance information like cpu_count, sga_target, pga_targetpga_limit and processes

On the Insights section, those with some findings worth to check are highlighted in red. On my system, there are 4 such sections. I will show some of them as examples: 


The Timeline gives a graph of all relevant events, in my case there are only 2 types of events on one hostname.  Below there are filters which can be used if many events appear on the timeline. Also a list of Timestamp, Type, Event, Hostname and Description of each (filtered) event is shown. 



If you move the mouse to the right top of the chart, a list of icons becomes visible for an interactive interaction with the graph:
This can be used in all charts in the Insights.



In the Operating System Issues Page, a lot of detailed information about CPU, Memory, IO, Network, and Processes is shown, both as Charts in Metrics and Grouped Details in the Reports section.  
A nice detail in Process Aggregation is the aggregation on instance basis - so I can check the aggregated memory, processes, etc. for each instance. 


In the Best Practices a list of all the checks, their findings and recommendations is given. 
I always be careful with "Best Practices" - in this particular case the underscore parameters were given by Oracle support in a specific service request. 
But of course, you can see them as recommendations, either follow them or at least document the specific reason you did NOT follow them for a given purpose. 



In the last section, Recommended Software, you can see the early stage of the software. 
For me, 19.17.0.0.0 is close enough to 19.17.0.0.221018.
Maybe in a future release, Oracle will agree ;) 

At all, I can highly recommend to use AHF Insights whenever you need to investigate an issue which might have a wider scope than one single SQL or instance. Even as a first release, it can provide a lot of useful information in a nice and well structured report. 

2022-12-22

OEM 13.5 - add cman target with emcli

 Yesterday I had some fun adding a cman to Oracle Enterprise Manager. 

Using the web frontend is fine for a single target, but it doesn't scale well. But OEM also has a very fine command line utility: emcli

emcli has a high amount of parameters, in my case I'm interested in emcli add_target. This verb has basically 4 parameters: -name, -type -host and -propertiesname and host are kind of obvious, type I already identified as oracle_cman, but the properties are of some mystery. I did not find the list of properties somewhere in the documentation, and also google didn't help. 

So I mixed find and grep together in som random methods until I found the file 

 ./middleware/plugins/oracle.sysman.db.oms.plugin_13.5.1.0.0/metadata/targetType/oracle_cman.xml

Beside some other information (e.g. the name of the perlscripts which test the target) there is also a list of all properties:


  <InstanceProperties>
     <InstanceProperty NAME="Machine" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_Machine_iprop">Machine Name</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="Port" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_Port_iprop">Port Number</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="Protocol" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_Protocol_iprop">Connection Protocol</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="CManOraDir" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_CmanOraDir_iprop">cman.ora Directory</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="CManName" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_CmanName_iprop">Connection Manager Name</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="OracleHome" CREDENTIAL="FALSE" OPTIONAL="FALSE">
       <Display>
         <Label NLSID="oracle_cman_OracleHome_iprop">Oracle Home</Label>
       </Display>
     </InstanceProperty>
     <InstanceProperty NAME="Password" CREDENTIAL="TRUE" OPTIONAL="TRUE">
       <Display>
         <Label NLSID="oracle_cman_OracleHome_iprop">Connection Manager Password</Label>
       </Display>
     </InstanceProperty>


But all this effort would not have been necessary 😁  

At my tests, I hit some issues, and so identified

emcli add_target -name=cman_xxx -type=oracle_cman -host="xxx.yyy.zzz"

threw a proper error:

Error: Required properties are missing: Machine, Port, Protocol, CManOraDir, CManName, OracleHome


This syntax worked for me:


emcli add_target -name=cman_xxx -type=oracle_cman \
  -host="xxx.yyy.zzz" \
  -properties="Machine:xxx.yyy.zzz;Port:1521;Protocol:TCP;CManOraDir:<tns_admin>;CManName:cman_xxx;OracleHome:<oracle_home>;Password:đŸ»đŸ»đŸ»"

My next goal is to test if these values are stored correct.
For this I stumbled about the Note 

Emcli Command To Get Target Properties Information Like ORACLE HOME (Doc ID 2329892.1)


emcli list -resource="TargetProperties" -search="TARGET_NAME ='cman_xxx'" -column="PROPERTY_NAME,PROPERTY_VALUE" -script 


PROPERTY_NAME  PROPERTY_VALUE
CManNamecman_xxx
CManOraDir<TNS_ADMIN>
Machinexxx.yyy.zzz
OracleHome<ORACLE_HOME>
Port1521
ProtocolTCP
orcl_gtp_osLinux
orcl_gtp_platformx86_64

unfortunately, the Password isn't shown here and can not be checked by this method.

2022-12-21

OEM 13.5 - add cman target

Right now I have to add some connection manager instances to Oracle Enterprise manager (13.5.0.10). I did not find any automatic discovery, so I had to enter the required values manually. But not all of these values are quite clear (to me) and all the documentation I found was ... of little use. 

Luckily I stumbled across $ORACLE_HOME/plugins/oracle.sysman.db.agent.plugin_*/scripts/cmanresp.pl
This script seem to check if cman is available at all. To understand the meaning of these properties, the code is quite useful. Here some relevant parts: 

my $oracleHome = $ENV{CMAN_ORACLE_HOME};
my $executable = $ENV{CMAN_ORACLE_HOME} . "/bin/cmctl";
my $name = $ENV{CMAN_NAME};
my $cmanOraDir = $ENV{CMAN_ORA_DIR};
my $machine = $ENV{CMAN_MACHINE};
my $protocol = $ENV{CMAN_PROTOCOL};
my $port = $ENV{CMAN_PORT};
my $password = $ENV{CMAN_PASSWORD};

#set default protocol
if(!defined($protocol) or  $protocol eq "")
{
  $protocol = "TCP";
}

#Set environment variable ORACLE_HOME
$ENV{ORACLE_HOME} = $oracleHome;

my $address = "(ADDRESS=(PROTOCOL=$protocol)(HOST=$machine)(PORT=$port))";

my $responseTime;

#----------------------------------------------
#Execute tnsping and see if cman is reachable.
#-----------------------------------------------
#check tnsping availability
if ( !-x "$oracleHome/bin/tnsping" )
 {
  print "em_result=|0|tnsping not found in $oracleHome/bin\n";
  exit;
 }     
Connection Protocol, Machine Name and Port Number should match exactly the values as in cmans connection string - they are combined to (ADDRESS=(PROTOCOL=$protocol)(HOST=$machine)(PORT=$port)).
The default Protocol is TCP - as it's a required field, this should be entered there. 

Another question I raised (and could not answer before) was if or how the Password was used. In this script it's quite easy visible:

#Command to get the cman status
if($password eq "")
{
  $command = "$executable show status -c $name";
}
else
{
  $command = "$executable show status -c $name -p $password";
}

If no password is given, cmctl is used without the -p parameter. 

The properties Oracle Home (ORACLE_HOME) and cman.ora directory (TNS_ADMIN) are obvious, even for me. 


That's a short post, at least my documentation-to-self.

2022-12-03

from 200 active sessions down to 2 - no silver bullets

 Recently I had some fun with an Oracle Enterprise Manager Cloud control installation (13.5.0.10). The Repository is 19.15 in a MAA Gold configuration: RAC and DataGuard, also the OEM has 2 WLS nodes with a load balancer in front.  

The system was not available for some hours (nothing to blame the architecture, and nothing I can discuss here). 

After everything was working again, the Average Active Sessions on the repository were quite high: around 200. It was clear there was a lot of work to be processed, all the events coming in from the agents to be collected, aggregated and so on. Still that load was something to investigate and improve - if possible. 
The first information I focused on were the high cluster related waits (all the gc ... events) together with buffer busy waits. I picket the top SQL which contributes to these waits: SQL_ID:79rrazqdmaq8h
INSERT INTO EM_METRIC_KEYS (METRIC_KEY_ID, KEY_PART_1, KEY_PART_2, KEY_PART_3, KEY_PART_4, KEY_PART_5, KEY_PART_6, KEY_PART_7, METRIC_KEY_GUID, CREATION_DATE)
  VALUES (EM_METRIC_KEY_ID_SEQ.NEXTVAL, :B1 , :B2 , :B3 , :B4 , :B5 , :B6 , :B7 , :B8 , SYSDATE)
  RETURNING METRIC_KEY_ID INTO :O0
At a first glance, the sequence might be a source for such problems. But as the version is greater than 19.10 already, dynamic sequence cache should mitigate this issue already (and _dynamic_sequence_cache was not set). After a little bit of investigation which objects could be affected at all (e.g. in v$active_session_history.CURRENT_OBJ# for the specific SQL_ID and wait event or in v$session using p1 (file#) and p2 (block#) and then identify the object in DBA_EXTENTS). In this case it's still sequence related, but not SEQ$. Instead the majority of these waits were on the index on EM_METRIC_KEYS(METRIC_KEY_ID).  The solution is still the same: Increase the CACHE for the sequence to a value where each involved instance can use it's only index block based on proper index block split (In my case a CACHE size of 5000 die the trick).
Some other INSERT statements also suffered from the same problems and could be improved by the same way. 
This helped to reduce the AAS down to ~80.

As these type of INSERTs disappeared from the top statements, the next to focus on is an UPDATE: 
SQL_ID: 7hyx2gndnw242
UPDATE EM_EVENT_BUS_QUEUES 
  SET LAST_DEQUEUE_TIME = MGMT_GLOBAL.SYSDATE_UTC 
  WHERE QNAME = :B1
Again a lot of Cluster waits with some Concurrency. The specific wait events looked quite familiar to the previous case. Here several sessions are fighting for the same block to do an update there. As the consistency of the block needs to be communicated clusterwide, and the block itself needs to travel back & forth even cache-fusion and modern Exadata hardware can only help so much. 
A quick check on the table EM_EVENT_BUS_QUEUES_E (the one behind the view EM_EVENT_BUS_QUEUES) showed there are only 89 rows in it (at that time, for this specific system). This leads to the idea to provide each session with direct access to the QUEUE name it is searching for (in the hope the same QUEUE is not processed by different sessions at the same time). 
Doing so is simple: 
ALTER TABLE SYSMAN.EM_EVENT_BUS_QUEUES_E PCTFREE 99; 
ALTER TABLE SYSMAN.EM_EVENT_BUS_QUEUES_E MOVE ONLINE;
This wastes a little bit of space, but contention on the blocks is eliminated. 
Again there were several statements and objects which could be threatened by this method. 
AAS went down to ~35. 

Next top statement is slightly different: 
SQL_ID: 747d8puab2h4u
BEGIN 
  DBMS_APPLICATION_INFO.SET_ACTION(:1 ); 
  EM_LOADERJOB.dequeu_job(:2 ,:3 ,:4 ,:5 ,:6 ,:7 ); 
END;
The wait events are still cluster related. But PL/SQL itself does rarely cause such events, and I was lazy digging into the dequeu_job procedure. But lazyness pays off if it leads me to a solution: in my case a simple google search for 747d8puab2h4u leads to the MOS document 
the solution provided there did not help, but some more search - this time within MOS - led to 
and further to 
and at last 
EM13c : Setting Up RAC Services In Repository Database (Doc ID 2693517.1)
There several tasks are explained, but based on my specific issue, I only focussed on one: Loaderjob service
To implement this specific suggestion, I created a dedicated service on the database with only 1 active instance (don't forget to create this service afterwards on the dataguard, or your next switchover test will become a disaster). 
Then this service was configured to the OMS:
emctl set property -name "oracle.sysman.core.pbs.gcloader.connectDescriptor" -value "\(DESCRIPTION=\(ADDRESS_LIST=\(ADDRESS=\(PROTOCOL=TCP\)\(HOST=<primary racinstance>\)\(PORT=1521\)\)\)\(CONNECT_DATA=\(SERVICE_NAME=...loaderjob\)\)\)"
 Of course, the connection string needs to contain both SCANs, in case of switchover tests ... 
This requires the OMS to restart.
After even one OMS restarted, the AAS ... was still at ~25, but the cluster waits mostly disappeared.

The last statement to improve was a SELECT: 
SQL_ID: dc3p6u6d0uvzn
select 1 
from "SYSMAN"."AQ$_EM_LOADERJOB_Q_TBL_L" 
where msgid = :1 
  and subscriber# = :2 
  and name = :3 
  and address# = :4
This SELECT is only ON CPU - at least no clusterwaits but still something worth to improve, if possible. 
The important observation here was the number of rows in this table. At the time of investigation it was 1. At the same time the segment used significant more blocks than required for 1 row. This ON CPU did just one thing: checking some blocks in buffercache IF there are any rows. And these checks were done at a high pace which created the still significant load. 
The solution for this last task was quite simple: 
alter table "SYSMAN"."AQ$_EM_LOADERJOB_Q_TBL_L" MOVE online;   
(shrink space might also have done the trick)
After that last step, the AAS was down to ~2.



These improvements in performance (or maybe more important, reduction in resource consumption) did take some time - and I did not show all the dead ends in between. 
There is also no silver bullet at all. Just a simple focus on top resource consumers. Of course basic knowledge about the system helps, as well as minimal skills in using a search engine (both online available an don software vendors support page).  
In this specific case, tuning & diagnostic pack was licensed. But all the information used here was available in simple data dictionary and dynamic performance views, if required.