berxblog: cpu

Posts mit dem Label cpu_count werden angezeigt. Alle Posts anzeigen

2022-11-20

dynamic paramater change when CPUs (dis)appear

In my last 2 blogposts I covered how Oracle instance parameters change based on cpu_count at startup, and also what's the minimum requirement for SGA size.

But there is more to discover! Modern systems can change the number of CPUs while running. This feature is quite popular in cloud environments but in fact was already available more than 20 years ago.

To see if this has any impact on my (21c) sandbox instance, I used a small "trick" in Linux:
As root I can set a CPU offline (or back online) by

echo 0 > /sys/devices/system/cpu/cpu7/online

This CPU becomes invisible to the processes - it also "disappears" in /proc/cpuinfo.

At every change, the alert.log shows an entry like (The sampling frequency was 70 sec on my system)

2022-11-20T14:09:12.762038+00:00
Detected change in CPU count to 7
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [0 - 0]

In this case my test was quite small (only 1..8 PCUs) but at least in that range only a small set of parameters changed:

PARAMETER
cpu_count
cpu_min_count
job_queue_processes
parallel_max_servers
parallel_servers_target

cpu_count and cpu_min_count makes sense, the remaining parameters are only related to

parallel and job processes. All of them are following the simple formula cpu_count * 20.

This follows the "normal" derived values for these parameters as shown before.

Again this is nothing tremendous spectacular, but worth to know for performance engineering or analysis in volatile systems: Some parameters change and might lead to unexpected behavior, if the change isn't taken into consideration.

2022-11-19

minimal SGA requirements - dependent on CPU_COUNT

In my previous post I described how different spfile parameters change dependent on the number of cpus available in the system (or the value of CPU_COUNT in my artificial testcase).
During these tests I saw this error (and so had to re-arrgange my scripts and try again):

ORA-00821: Specified value of sga_target 16M is too small, needs to be at least 2032M

Even the instance started fine with a smaller cpu_count, it seems the SGA needs to grow when more CPUs are used. So it's worth to re-run the test and see, what's the minimum SGA per CPU. (Again, these tests are made on my sandbox with Version 21.8 - your values might differ).

My measurement tells me SGA requirements between 260MB for 1 CPU and 11568MB for 255 CPUs (the raw data can be seen here).

It looks there is a quite linear growth, and this is also confirmed:

Call:
lm(formula = sga ~ cpu, data = data)

Residuals:
Min 1Q Median 3Q Max
-91.887 -55.661 -3.829 48.098 116.941

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.25259 7.38744 15.2 <2e-16 ***
cpu 44.50121 0.05003 889.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 58.81 on 253 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 7.912e+05 on 1 and 253 DF, p-value: < 2.2e-16

The core message is: SGA requirements starts with 112.3MB and adds about 44.5MB for each CPU.

Even the graph looks quite straight, there is a little deviation from the numbers. To make them visible, I generated the delta and get this plot.

It seems the residuals smile at me. We can also see the calculation up to 32 cpus is slightly different than the remaining, it drops to a minimum (negative residual) at 112 and then increases again. Beside this major pattern there are smaller groups also visible, this indicates different (internal) functions have their own increments and together, they provide this picture.

Please be aware the units: in the first picture we have units of GB, de residuals are shown in MB. Over all they are still small.

(In case you want to play with the data, my R commands are available)

You might think this is all irrelevant as with huge number of CPUS there also comes huge amount of memory which can be used for SGA. But at least for XE on nodes with many CPUs, there is a real impact (with a simple solution).

2022-11-15

spfile parameters derived from cpu_count 11.1 up to 21

In comparison to previous versions sga_target is bigger again, now it's 12g (otherwise instances with higher ocunts wouldn't even start)

#!/usr/bin/ksh  -x

integer i=1
while ((i <= 255));
do
  print " $i ";

  echo "*.audit_file_dest='/_some_dir_/BX1/adump'
*.audit_trail='db'
*.compatible='21.0.0.0.0'
*.control_files='/u02/oradata/CDB1/control01.ctl','/u02/oradata/CDB1/control02.ctl'
*.db_block_size=8192
*.db_create_file_dest='/u02/oradata'
*.db_name='cdb1'
*.db_unique_name='cdb1'
*.diagnostic_dest='/u01/appl/oracle'
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=12g
*.enable_pluggable_database=true
*.undo_tablespace='UNDOTBS1'
*._disable_cpu_check='TRUE'
*.cpu_count=$i" > /_some_dir_/cpus/initBX1_$i.ora

  echo "shutdown abort;
startup pfile='/_some_dir_/cpus/initBX1_$i.ora' ;

insert into init_parameters
select '21.08'     ,
       $i         ,
       a.ksppinm  ,
       b.ksppstvl ,
       c.ksppstvl
from x\$ksppi a, x\$ksppcv b, x\$ksppsv c
where a.indx = b.indx and a.indx = c.indx;

commit;

select distinct(\"CPUs\") from init_parameters;

shutdown;
exit;"> cpus/21doit$i.sql

sqlplus "/ as sysdba" @cpus/21doit$i.sql >cpus/21log$i.log

sleep 5

  (( i = i + 1));
done

To get a good picture of the changes, feel free to select parameters, versions or whatever you want below.
Raw data is available at GistHub.
The js app is available here.
(It's based on an idea PerfSheet.js by Luca Canali.

As some of the values are not numbers, I applied this mapping:

TRUE	1
FALSE	0
OFF	0
ADAPTIVE	-1
INSTANCE	-2

2018-03-31

spfile parameters derived from cpu_count

In Oracle databases some spfile parameters are derived based on others, if they are not set explicit.
One of the main drivers I discovered is CPU_COUNT.
So I did some tests based in the past for version 11.1.0.7 and 12.1.0.2.
To simulate a machine with sufficient CPUs, I had to use a trick described in a previous post. Using _disable_cpu_check threw an ORA-7445:[ksucre] in 12.2 with cpu_count >126.

the script I used this time is basically (of course with 180 instead of 122 for 18.0).

#!/usr/bin/ksh  -x

integer i=1
while ((i <= 255));
do
  print " $i ";

  echo "*.audit_file_dest='/_some_dir_/BX1/adump'
*.audit_trail='db'
*.compatible='12.2.0.1.0'
*.control_files='/_some_dir_/BX1/controlfile/o1_mf_djd017z2_.ctl', '/_some_dir_/fast_recovery_area/BX1/BX1/controlfile/o1_mf_djd01834_.ctl'
*.db_block_size=8192
*.db_create_file_dest='/_some_dir_/oradata'
*.db_name='BX1'
*.db_recovery_file_dest='/_some_dir_/fast_recovery_area/BX1'
*.db_recovery_file_dest_size=7G
*.db_unique_name='BX1'
*.diagnostic_dest='/appl/oracle'
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=7g
*.undo_tablespace='UNDOTBS1'
*.max_string_size='EXTENDED'
#*._disable_cpu_check='TRUE'
*.cpu_count=$i" > /_some_dir_/cpus/initBX1_$i.ora

  echo "shutdown abort;
startup pfile='/_some_dir_/cpus/initBX1_$i.ora' ;

insert into init_parameters
select '12.2'     ,
       $i         ,
       a.ksppinm  ,
       b.ksppstvl ,
       c.ksppstvl
from x\$ksppi a, x\$ksppcv b, x\$ksppsv c
where a.indx = b.indx and a.indx = c.indx;

commit;

select distinct(\"CPUs\") from init_parameters;

shutdown;
exit;"> cpus/122doit$i.sql

sqlplus "/ as sysdba" @cpus/122doit$i.sql >cpus/122log$i.log

sleep 5

  (( i = i + 1));
done

	11.1.0.7	12.1.0.2	12.2.0.1	18.0.0.0
_cursor_db_buffers_pinned	x	x	x	x
_db_block_buffers	x	x	x	x
_db_block_lru_latches	x	x	x	x
__db_cache_size	x	x	x	x
_db_file_exec_read_count			x	x
db_file_multiblock_read_count			x	x
_db_handles		x	x	x
db_writer_processes	x	x	x	x
dml_locks		x	x	x
_enqueue_hash		x	x	x
_enqueue_hash_chain_latches	x	x	x	x
_enqueue_locks	x	x	x	x
_enqueue_resources		x	x	x
_flashback_generation_buffer_size	x	x	x	x
_gc_fast_index_split_wait			x
_gcs_min_cr_slaves			x	x
_kghdsidx_count	x	x
_hang_resolution_scope			x	x
_ksmsq_hintmaxproc				x
__large_pool_size		x	x	x
_lms_rollbacks			x	x
log_buffer	x	x	x	x
_log_parallelism_max	x	x	x	x
_log_simultaneous_copies	x	x	x	x
_max_log_write_parallelism			x	x
_max_outstanding_log_writes		x	x	x
_max_pending_scn_bcasts		x	x	x
_messages		x	x	x
_num_longop_child_latches	x	x	x	x
_num_rlslaves			x	x
parallel_max_servers	x	x	x	x
_parallel_min_message_pool	x	x	x	x
parallel_min_servers		x	x	x
parallel_servers_target		x	x	x
pga_aggregate_limit		x	x	x
_pmon_max_slaves			x	x
_pmon_slaves_arr_size			x	x
processes		x	x	x
_px_proactive_slave_alloc_threshold		x	x	x
_session_allocation_latches		x	x	x
sessions		x	x	x
sga_max_size			x	x
shared_pool_reserved_size		x	x	x
__shared_pool_size	x	x	x	x
_shared_server_num_queues	x	x	x	x
_small_table_threshold	x	x	x	x
_spin_count	x	x	x	x
_sqlmon_max_plan		x	x	x
_super_shared_conversion_threshold			x	x
transactions		x	x	x
_use_single_log_writer		x	x	x

TRUE	1
FALSE	0
OFF	0
ADAPTIVE	-1
INSTANCE	-2

2018-03-20

real virtual CPUs

Some software changes it's behavior based on capabilities of the system it's running on.
but sometimes it's interesting to check how a software would heave on a different system, which is not at hand right now.

On Linux, a lot of information about the current system can be found in /proc and /sys.
These filesystems are virtual, so they can not changed easily with an editor.

In my case I want to simulate a lot more CPUs.
These are visible in several locations.
The most know is probably /proc/cpuinfo. There you find a block of information for each CPU the kernel knows about. Based on the current configuration, I create a modified fake file somewhere in a different space:

#!/bin/bash
# cpus.sh

count=0
max_socket=8
max_core=32

END=5
for ((soc=0;soc<max_socket;soc++)); do
    for (( cor=0;cor<max_core;cor++)); do
echo "processor       : $count
vendor_id       : GenuineIntel
cpu family      : 6
model           : 37
model name      : Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
stepping        : 1
microcode       : 0x3b
cpu MHz         : 2596.103
cache size      : 25600 KB
physical id     : $soc
siblings        : $max_core
core id         : $cor
cpu cores       : $max_core
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb pln pts dtherm pti tsc_adjust
bugs            : cpu_meltdown spectre_v2
bogomips        : 5193.98
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:

"
    let count=count+1
    done
done

and create a file with ./cpus.sh>cpuinfo.256

There is another location as well: /sys/devices/system/cpu.
In this directory are several directories and files with interesting information.

I copy the fill directory to another place (ignoring all the errors).
First the number of cpu[id] directories might need adjustment.
In my case a simple set of symlinks is sufficient:

for i in {2..255} ; do
  echo $i
  ln -s cpu1 cpu${i}
done

In every cpu[id] durectory there is a symlinik to which node it belongs: node0 -> ../../node/node0
So it might be required to spoof proper entries in /sys/devices/system/node. In my case it's not required.

The last fix required in my case is in the file cpu/online.
It contains 0-255 now (instead of 0-2).

As I mentioned above the original files can not be manipulated as they are not real files.
The mount option --bind does the trick:

mount --bind <my_working_dir>/cpuinfo.256 /proc/cpuinfo
mount --bind <my_working_dir>/cpu /sys/devices/system/cpu

After these nice manipulations, my sandbox Oracle instance shows now plenty of CPUs:

SQL> show parameter cpu_count                                                                                                                                                      

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     256

Update (2018-03-21):
For Oracle Databases I got 2 hints how to make it calculate with more CPUs than really available.

Another solution as sugested by Feng in the comment section here https://t.co/jIErDaiaxF is to fake skgpnumcpu return value
— Hatem Mahmoud (@Hatem__Mahmoud) March 20, 2018

with this small stap script:

#!/usr/bin/stap
function modify_rax() %{ long ret; ret = 6; memcpy( ((char *)CONTEXT->uregs) + 80 , &ret, sizeof(ret)); %}
probe process(“oracle”).function(“skgpnumcpu”).return { modify_rax(); }

and

Try setting _disable_cpu_check = true
May be version dependent - works in 12.1.0.2, can't remember which other versions I've tried.
— Jonathan Lewis (@JLOracle) March 20, 2018

2016-01-15

Instance parameters derived from cpu_count - 12.1.0.2

About 6 years ago I wanted to know which instance parameters are derived from cpu_count. So it tested a 11.1.0.7 DB - in that version there 21 parameters changed based on the value of cpu_count.
Some DB versions passed by so I decided it's time for another check. This time it's 12.1.0.2 without any PSUs/patches. The machine is the same class as previous, so it took some time.

the script is nearly the same today:

!/usr/bin/ksh  -x

integer i=1
while ((i <= 128));
do
  print " $i ";

  echo "*.audit_file_dest='/appl/oracle/admin/BERX3_GLOBAL/adump'
*.audit_trail='db'
*.compatible='12.1.0.2.0'
*.control_files='/appl/oracle/oradata/BERX3_GLOBAL/controlfile/o1_mf_c9h3bxlw_.ctl','/appl/oracle/fast_recovery_area/BERX3_GLOBAL/controlfile/o1_mf_c9h3bxpg_.ctl'
*.db_block_size=8192
*.db_create_file_dest='/appl/oracle/oradata'
*.db_name='BERX3_GL'
*.db_recovery_file_dest='/appl/oracle/fast_recovery_area'
*.db_recovery_file_dest_size=4815m
*.db_unique_name='BERX3_GLOBAL'
*.diagnostic_dest='/appl/oracle'
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=13g
*.undo_tablespace='UNDOTBS1'
*.cpu_count=$i" > /appl/home/oracle/instance_parameters/cpus/initBERX2_$i.ora

  echo "startup pfile='/appl/home/oracle/instance_parameters/cpus/initBERX2_$i.ora' ;

insert into init_parameters
select $i         ,
       a.ksppinm  ,
       b.ksppstvl ,
       c.ksppstvl
from x\$ksppi a, x\$ksppcv b, x\$ksppsv c
where a.indx = b.indx and a.indx = c.indx;

commit;

select distinct(\"CPUs\") from init_parameters;

shutdown;

exit;"> cpus/doit$i.sql

sqlplus "/ as sysdba" @cpus/doit$i.sql >cpus/log$i.log

sleep 5

  (( i = i + 1));
done

And all the values interesting is available again.
There are 40 parameters now changing value based on cpu_count - so a lot more of things which will change, when you "only" add/remove CPUs or alter instance caging.

Here are the graphs for 12.1.0.2 - and 11.1.0.7 if they exist already. Some look quite different.

__db_cache_size