Freitag, 9. Dezember 2011

Who created that process?

Figure 2-7
Connection to
a Dedicated
Server Process
For some reason I was really curios who created that process. It's not about a particular process in detail, mir a well known kind of processes. At least well known for DBAs.
Which process? 
It's one of these:

oracle   13096     1  0 20:05 ?        00:00:00 oracleTTT071 (LOCAL=NO)

Yes, it's a simple server process, nothing spectacular. Nevertheless, the Concepts guide is not very specific, who created that process. So I tried to find out in more detail.
On my linux sandbox the first column of ps -ef shows the UID, the second is the PID, the third is the PPID. Unfortunately it's 1 here, and I'm quite sure, this process was not created by init. So this proces is somewhat orphaned, as the direct parent disappeared. Very sad!
I decided to follow Figure 2-7 from the concepts guide. I used strace -f -p <PID_of_listener> to see what's going on. -f follows all forks, so also their actions are traced.
The first 3 lines are
Process 2979 attached with 3 threads - interrupt to quit
[pid  2981] futex(0xae8dee4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2980] restart_syscall(<... resuming interrupted call ...> <unfinished ...>

So we have 3 listener processes - it's good to know and probably worth to investigating this segregation of duties - but not in this post. There are so many interesting lines, but I'm searching for a process, so let's continue with

[pid  2979] clone(Process 27028 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27028
[pid  2979] wait4(27028, Process 2979 suspended
 <unfinished ...>
[pid 27028] clone(Process 27029 attached (waiting for parent)
Process 27029 resumed (parent 27028 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27029
[pid 27028] exit_group(0)               = ?
Process 2979 resumed
Process 27028 detached
[pid  2979] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 27028
[pid 27029] close(15 <unfinished ...>
[pid  2979] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 27029] <... close resumed> )       = 0
[pid  2979] close(14 <unfinished ...>
[pid 27029] close(16 <unfinished ...>
[pid  2979] <... close resumed> )       = 0
[pid 27029] <... close resumed> )       = 0
[pid  2979] close(17)                   = 0

Here the listener ([pid  2979]) creates a new process by the first clone call. This new Process has the PID 27028. This new process has only one purpose: again clone a new Process: PID 27029 and use exit_group(0) to terminate directly afterwards. By this trick the listener is not shown as parent process for PID 27029. Directly after it's creation PID 27029 closes some file handles. As by the sequence of clone calls the new process inherited a table of all open file (and network) handles it seems it tries to get rid of any it does not need as early as possible. The next part
[pid  2979] fcntl(16, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid( <unfinished ...>
[pid  2979] fcntl(15, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 27029] <... setsid resumed> )      = 27029
[pid  2979] <... fcntl resumed> )       = 0
[pid 27029] geteuid()                   = 5831
[pid  2979] fcntl(13, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid()                    = -1 EPERM (Operation not permitted)
[pid  2979] poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}, {fd=15, events=0}], 5, -1 <unfinished ...>

makes sure the file descriptos 16, 15 and 13 will remain after an execve(2) call.
And here it goes:
[pid 27029] execve("/appl/oracle/product/rdbms_112022_a/bin/oracle", ["oracleTTT051", "(LOCAL=NO)"], [/* 109 vars */]) = 0
from the man page if execve:
execve() executes the program pointed to by filename.
execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of  the  program  loaded.   The  program invoked inherits the calling process’s PID, and any open file descriptors that are not set to close-on-exec.  Signals pending on the calling process are cleared.  Any signals set to be caught by the calling process are reset  to  their default behaviour.  The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.
       If the current program is being ptraced, a SIGTRAP is sent to it after a successful execve().
       If  the  set-user-ID  bit  is set on the program file pointed to by filename, and the calling process is not being ptraced, then the effective user ID of the calling process is changed to that of the owner of the program file.  i Similarly,  when  the  set-group-ID bit of the program file is set the effective group ID of the calling process is set to the group of the program file.
From that point on there you can see how the server process comes to life. It's very interesting in some details, but not scope of this post. After some conversation between listener and server process using file descriptors 15 and 16 (I assume these are just sockets) both close these file descriptors. The listener also closes file descriptor 13 which seems to be the TCP connection to the client. From that point the 2 processes seems to be independent.

Well, now I know (at least on my test-system) the simplest way, the listener creates the process - and it uses execve to do so. There still are many questions open, like what's going on at this redirection as shown in Figure 2-8.


Jared hat gesagt…

You can determine the type of open file from /proc/PID/fd.

Here for instance are socket file descriptors for a current oracle process:

> ls -l /proc/8786/fd|grep socket
lrwx------ 1 jkstill dba 64 Dec 10 09:52 7 -> socket:[31743562]
lrwx------ 1 jkstill dba 64 Dec 10 09:52 8 -> socket:[31743595]

Martin Berger hat gesagt…

thank you for the hint!
I checked the strace log again. The pipes where created directly before the clone takes place:

[pid 2979] pipe([14, 15]) = 0
[pid 2979] pipe([16, 17]) = 0

As these are closed shortly after, I could not check in /proc for those file descriptors.