Posts mit dem Label mount werden angezeigt. Alle Posts anzeigen
Posts mit dem Label mount werden angezeigt. Alle Posts anzeigen

2018-03-20

real virtual CPUs

Some software changes it's behavior based on capabilities of the system it's running on.
but sometimes it's interesting to check how a software would heave on a different system, which is not at hand right now.

On Linux, a lot of information about the current system can be found in /proc and /sys.
These filesystems are virtual, so they can not changed easily with an editor.

In my case I want to simulate  a lot more CPUs.
These are visible in several locations.
The most know is probably /proc/cpuinfo.  There you find a block of information for each CPU the kernel knows about. Based on the current configuration, I create a modified fake file somewhere in a different space:
#!/bin/bash
# cpus.sh

count=0
max_socket=8
max_core=32

END=5
for ((soc=0;soc<max_socket;soc++)); do
    for (( cor=0;cor<max_core;cor++)); do
echo "processor       : $count
vendor_id       : GenuineIntel
cpu family      : 6
model           : 37
model name      : Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
stepping        : 1
microcode       : 0x3b
cpu MHz         : 2596.103
cache size      : 25600 KB
physical id     : $soc
siblings        : $max_core
core id         : $cor
cpu cores       : $max_core
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb pln pts dtherm pti tsc_adjust
bugs            : cpu_meltdown spectre_v2
bogomips        : 5193.98
clflush size    : 64
cache_alignment : 64
address sizes   : 42 bits physical, 48 bits virtual
power management:

"
    let count=count+1
    done
done
and create a file with ./cpus.sh>cpuinfo.256

There is another location as well: /sys/devices/system/cpu.
In this directory are several directories and files with interesting information.

I copy the fill directory to another place (ignoring all the errors).
First the number of cpu[id] directories might need adjustment.
In my case a simple set of symlinks is sufficient:
for i in {2..255} ; do
  echo $i
  ln -s cpu1 cpu${i}
done  
In every cpu[id] durectory there is a symlinik to which node it belongs: node0 -> ../../node/node0
So it might be required to spoof proper entries in /sys/devices/system/node. In my case it's not required.

The last fix required in my case is in the file cpu/online.
It contains 0-255 now (instead of 0-2).

As I mentioned above the original files can not be manipulated as they are not real files.
The mount option --bind does the trick:
mount --bind <my_working_dir>/cpuinfo.256 /proc/cpuinfo
mount --bind <my_working_dir>/cpu /sys/devices/system/cpu

After these nice manipulations, my sandbox Oracle instance shows now plenty of CPUs:
SQL> show parameter cpu_count                                                                                                                                                      

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cpu_count                            integer     256

Update (2018-03-21):
For Oracle Databases I got 2 hints how to make it calculate with more CPUs than really available.

with this small stap script:
#!/usr/bin/stap
function modify_rax() %{ long ret; ret = 6; memcpy( ((char *)CONTEXT->uregs) + 80 , &ret, sizeof(ret)); %}
probe process(“oracle”).function(“skgpnumcpu”).return { modify_rax(); }

and




2017-09-12

wrong permission on shm kills JAVA_JIT

We found a lot of trace files from several DBs on one of our DB Servers.
They look like:


*********_ora_26444.trc or *********_m000_5598.trc
*** 2017-09-08 15:11:29.181
*** SESSION ID:(632.5995) 2017-09-08 15:11:29.181
*** CLIENT ID:(SYSADMIN) 2017-09-08 15:11:29.181
*** SERVICE NAME:(****_****) 2017-09-08 15:11:29.181
*** MODULE NAME:(*::***:******.****.***.****.******) 2017-09-08 15:11:29.181
*** ACTION NAME:(/) 2017-09-08 15:11:29.181

peshmmap_Create_Memory_Map:
Map_Length = 4096
Map_Protection = 7
Flags = 1
File_Offset = 0
mmap failed with error 1
error message:Operation not permitted

*** 2017-09-08 15:11:29.181
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x33BEF41, ioc_pin_shared_executable_object()+1509] [flags: 0x0, count: 1]
DDE: Problem Key 'ORA 7445 [ioc_pin_shared_executable_object()+1509]' was flood controlled (0x6) (incident: 61088)
ORA-07445: exception encountered: core dump [ioc_pin_shared_executable_object()+1509] [SIGSEGV] [ADDR:0x0] [PC:0x33BEF41] [Address not mapped to object] []
ssexhd: crashing the process...
Shadow_Core_Dump = PARTIAL
ksdbgcra: writing core file to directory '/***/diag/rdbms/***/***/cdump'

A quick search on MOS (and a opened SR) showedd as top result Ora‑7445 [Ioc_pin_shared_executable_object()] (Doc ID 1316906.1)

But the suggestions there did not solve the issue. (and we could not set java_jit_enabled = false due to application requirements).

But the Note was good enough to make me search more regarding /dev/shm, mmap and Operation not permitted.  This led to Shared executable memory on StackExchange. Again not a perfect fit, but it makes enough sense to guess:
with java_jit_enabled, the m000 process is doing the compilation (in time). To let the server process execute the compiled code, it's put into shared memory in a /tmp/ file and mapped in the private memory [ 2019-11-15 - fixed based on a comment by Nenad Noveljic ].
This shared memory must be executable, otherwise the server process can not use it. So the memory is mapped with PROT_EXEC.
I checked on the affected host, if there is a reason against this:
> mount|grep shm    
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,seclabel)

noexec prevents shared memory to be executed, so the memory mapping fails.

It's stated in Oracle Database Preinstallation Tasks
rw and execute permissions must be set, but noexec and nosuid must not be set.

this was changed after the DBs were installed. Probably for good intentions but with bad effects.

With the proper changes of the mount options, the test statement
SELECT dbms_java.longname('TEST') long_name FROM  dual;
completes without any error.