Contains state information about processes and threads in the system.
#include <sys/procfs.h>
The /proc file system provides access to the state of each active process and thread in the system. The name of each entry in the /proc file system is a decimal number corresponding to the process ID. These entries are subdirectories and the owner of each is determined by the user ID of the process. Access to the process state is provided by additional files contained within each subdirectory. Except where otherwise specified, the term /proc file is meant to refer to a non-directory file within the hierarchy rooted at /proc. The owner of each file is determined by the user ID of the process.
The various /proc directory, file, and field names contain the term lwp (light weight process). This term refers to a kernel thread. The /proc files do not refer to user space pthreads. While the operating system does not use the term lwp to describe its threads, it is used in the /proc file system for compatibility with other UNIX operating systems.
The following standard subroutine interfaces are used to access the /proc files:
Most files describe process state and are intended to be read-only. The ctl (control) and lwpctl (thread control) files permit manipulation of process state and can only be opened for writing. The as (address space) file contains the image of the running process and can be opened for both reading and writing. A write open allows process control while a read-only open allows inspection but not process control. Thus, a process is described as open for reading or writing if any of its associated /proc files is opened for reading or writing, respectively.
In general, more than one process can open the same /proc file at the same time. Exclusive open is intended to allow process control without another process attempting to open the file at the same time. A process can obtain exclusive control of a target process if it successfully opens any /proc file in the target process for writing (the as or ctl files, or the lwpctl file of any kernel thread) while specifying the O_EXCL flag in the open subroutine. Such a call of the open subroutine fails if the target process is already open for writing (that is, if a ctl, as, or lwpctl file is open for writing). Multiple concurrent read-only instances of the open subroutine can exist; the O_EXCL flag is ignored on the open subroutine for reading. The first open for writing by a controlling process should use the O_EXCL flag. Multiple processes trying to control the same target process usually results in errors.
Data may be transferred from or to any locations in the address space of the traced process by calling the lseek subroutine to position the as file at the virtual address of interest, followed by a call to the read or write subroutine. An I/O request extending into an unmapped area is truncated at the boundary. A read or write request beginning at an unmapped virtual address fails with errno set to EFAULT.
Information and control operations are provided through additional files. The <sys/procfs.h> file contains definitions of data structures and message formats used with these files. Some of these definitions use sets of flags. The set types pr_sigset_t, fltset_t, and sysset_t correspond to signal, fault, and system call enumerations, respectively. These enumerations are defined in the <sys/procfs.h> files. The pr_sigset_t and fltset_t types are large enough to hold flags for its own enumeration. Although they are of different sizes, they have a common structure and can be manipulated by the following macros:
prfillset(&set); /* turn on all flags in set */ premptyset(&set); /* turn off all flags in set */ praddset(&set, flag); /* turn on the specified flag */ prdelset(&set, flag); /* turn off the specified flag */ r = prismember(&set, flag); /* != 0 if flag is turned on */
Either the prfillset or premptyset macro must be used to initialize the pr_sigset_t or fltset_t type before it is used in any other operation. The flag parameter must be a member of the enumeration that corresponds to the appropriate set.
The sysset_t set type has a different format, set of macros, and a variable length structure to accommodate the varying number of available system calls. You can determine the total number of system calls, their names, and number of each individual call by reading the sysent file. You can then allocate memory for the appropriately sized sysset_t structure, initialize its pr_size field, and then use the following macros to manipulate the system call set:
prfillsysset(&set) /* set all syscalls in the sysset */ premptysysset(&set) /* clear all syscalls in the sysset */ praddsysset(&set, num) /* set specified syscall in the sysset */ prdelsysset(&set, num) /* clear specified syscall in the sysset */ prissyssetmember(&set, num) /* !=0 if specified syscall is set */
See the description of the sysent file for more information about system calls.
Every active process contains at least one kernel thread. Every kernel thread represents a flow of execution that is independently scheduled by the operating system. All kernel threads in a process share address space as well as many other attributes. Using the ctl and lwpctl files, you can manipulate individual kernel threads in a process or manipulate all of them at once, depending on the operation.
When a process has more than one kernel thread, a representative thread is chosen by the system for certain process status file and control operations. The representative thread is stopped only if all the process's threads are stopped. The representative thread may be stopped on an event of interest only if all threads are stopped, or it may be stopped by a PR_REQUESTED stop only if no other events of interest exist.
The representative thread remains fixed as long as all the threads are stopped on events of interest or are in PR_SUSPENDED stop and the PCRUN operand is not applied to any of them.
When applied to the process control file (ctl), every /proc control operation that affects a kernel thread uses the same algorithm to choose which kernel thread to act on. With synchronous stopping (see PCSET), this behavior enables an application to control a multiple thread process using only the process level status and control files. For more control, use the thread-specific lwpctl files.
The /proc file system can be used by both 32-bit and 64-bit control processes to get information about both 32-bit and 64-bit target processes. The /proc files provide 64-bit enabled mode invariant files to all observers. The mode of the controlling process does not affect the format of the /proc data. Data is returned in the same format to both 32-bit and 64-bit control processes. Addresses and applicable length and offset fields in the /proc files are 8 bytes long.
At the top level, the /proc directory contains entries, each of which names an existing process in the system. The names of entries in this directory are process ID (pid) numbers. These entries are directories. Except where otherwise noted, the files described below are read-only. In addition, if a process becomes a zombie (one that has been terminated by its parent with an exit call but has not been suspended by a wait call), most of its associated /proc files disappear from the directory structure. Normally, later attempts to open or to read or write to files that are opened before the process is terminated elicit the ENOENT message. Exceptions are noted.
The /proc files contain data that presents the state of processes and threads in the system. This state is constantly changing while the system is operating. To lessen the load on system performance caused by reading /proc files, the /proc file system does not stop system activity while gathering the data for those files. A single read of a /proc file generally returns a coherent and fairly accurate representation of process or thread state. However, because the state changes as the process or thread runs, multiple reads of /proc files may return representations that show different data and therefore appear to be inconsistent with each other.
An atomic representation is a representation of the process or thread at a single and discrete point in time. If you want an atomic snapshot of process or thread state, stop the process and thread before reading the state. There is no guarantee that the data is an atomic snapshot for successive reads of /proc files for a running process. In addition, a representation is not guaranteed to be atomic for any I/O applied to the as (address space) file. The contents of any process address space might be simultaneously modified by a thread of that process or any other process in the system.
Note: Multiple structure definitions are used to describe the /proc files. A /proc file may contain additional information other than the definitions presented here. In future releases of the operating system, these structures may grow by the addition of fields at the end of the structures.
The /proc/pid directory contains (but is not limited to) the following entries:
uint32_t pr_flag; /* process flags from proc struct p_flag */ uint32_t pr_flag2; /* process flags from proc struct p_flag2 */ uint32_t pr_flags; /* /proc flags */ uint32_t pr_nlwp; /* number of threads in the process */ char pr_stat; /* process state from proc p_stat */ char pr_dmodel; /* data model for the process */ char pr__pad1[6]; /* reserved for future use */ pr_sigset_t pr_sigpend; /* set of process pending signals */ prptr64_t pr_brkbase; /* address of the process heap */ uint64_t pr_brksize; /* size of the process heap, in bytes */ prptr64_t pr_stkbase; /* address of the process stack */ uint64_t pr_stksize; /* size of the process stack, in bytes */ pid64_t pr_pid; /* process id */ pid64_t pr_ppid; /* parent process id */ pid64_t pr_pgid; /* process group id */ pid64_t pr_sid; /* session id */ struct pr_timestruc64_t pr_utime; /* process user cpu time */ struct pr_timestruc64_t pr_stime; /* process system cpu time */ struct pr_timestruc64_t pr_cutime; /* sum of children's user times */ struct pr_timestruc64_t pr_cstime; /* sum of children's system times */ pr_sigset_t pr_sigtrace; /* mask of traced signals */ fltset_t pr_flttrace; /* mask of traced hardware faults */ uint32_t pr_sysentry_offset; /* offset into pstatus file of sysset_t * identifying system calls traced on * entry. If 0, then no entry syscalls * are being traced. */ uint32_t pr_sysexit_offset; /* offset into pstatus file of sysset_t * identifying system calls traced on * exit. If 0, then no exit syscalls * are being traced. */ uint64_t pr__pad[8]; /* reserved for future use */ lwpstatus_t pr_lwp; /* "representative" thread status */
The members of the status file are described below:
Note: The address formed by the sum of the pr_brkbase and pr_brksize is the process break (see the brk subroutine).
Note: Each thread runs on a separate stack. The operating system grows the process stack as necessary.
uint32_t pr_flag; /* process flags from proc struct p_flag */ uint32_t pr_flag2; /* process flags from proc struct p_flag2 */ uint32_t pr_nlwp; /* number of threads in process */ uid_t pr_uid; /* real user id */ uid_t pr_euid; /* effective user id */ gid_t pr_gid; /* real group id */ gid_t pr_egid; /* effective group id */ uint32_t pr_argc; /* initial argument count */ pid64_t pr_pid; /* unique process id */ pid64_t pr_ppid; /* process id of parent */ pid64_t pr_pgid; /* pid of process group leader */ pid64_t pr_sid; /* session id */ dev64_t pr_ttydev; /* controlling tty device */ prptr64_t pr_addr; /* internal address of proc struct */ uint64_t pr_size; /* size of process image in KB (1024) units */ uint64_t pr_rssize; /* resident set size in KB (1024) units */ struct pr_timestruc64_t pr_start; /* process start time, time since epoch */ struct pr_timestruc64_t pr_time; /* usr+sys cpu time for this process */ prptr64_t pr_argv; /* address of initial argument vector in user process */ prptr64_t pr_envp; /* address of initial environment vector in user process */ char pr_fname[PRFNSZ]; /* last component of exec()ed pathname*/ char pr_psargs[PRARGSZ]; /* initial characters of arg list */ uint64_t pr__pad[8]; /* reserved for future use */ struct lwpsinfo pr_lwp; /* "representative" thread info */
Note: Some of the entries in the psinfo file, such as pr_flag, pr_flag2, and pr_addr, refer to internal kernel data structures and might not retain their meanings across different versions of the operating system. They mean nothing to a program and are only useful for manual interpretation by a user aware of the implementation details.
The psinfo file is accessible after a process becomes a zombie.
The pr_lwp flag describes the representative thread chosen. If the process is a zombie, the pr_nlwp and pr_lwp.pr_lwpid flags are zero and the other fields of pr_lwp are undefined.
Note: In AIX 5.1, there may be a limited number of array entries in this file. The map file may only contain entries for virtual address regions in the process that contain objects loaded into the process. This file may not contain array entries for other regions in the process such as the stack, heap, or shared memory segments.
The prmap structure contains the following members:
uint64_t pr_size; /* size of mapping in bytes */ prptr64_t pr_vaddr; /* virtual address base */ char pr_mapname[PRMAPSZ]; /* name in /proc/pid/object */ uint64_t pr_off; /* offset into mapped object, if any */ uint32_t pr_mflags; /* protection and attribute flags */ uint32_t pr_pathoff; /* if map entry is for a loaded object, * offset into the map file to a * null-terminated path name followed * by a null-terminated member name. * If file is not an archive file, the * member name is null. * The offset is 0 if map entry is * not for a loaded object. */
The members are described below:
A contiguous area of the address space having the same underlying mapped object may appear as multiple mappings because of varying read, write, execute, and shared attributes. The underlying mapped object does not change over the range of a single mapping. An I/O operation to a mapping marked MA_SHARED fails if applied at a virtual address not corresponding to a valid page in the underlying mapped object. Read and write operations to private mappings always succeed. Read and write operations to unmapped addresses always fail.
uid_t pr_euid; /* effective user id */ uid_t pr_ruid; /* real user id */ uid_t pr_suid; /* saved user id (from exec) */ gid_t pr_egid; /* effective group id */ gid_t pr_rgid; /* real group id */ gid_t pr_sgid; /* saved group id (from exec) */ uint32_t pr_ngroups; /* number of supplementary groups */ gid_t pr_groups[1]; /* array of supplementary groups */
The file consists of a header section followed by an array of entries, each of which corresponds to a system call provided to the process. Each array entry contains the system call number and an offset into the sysent file to that system call's null-terminated name.
The sysent file is characterized by the following attributes:
The object directory makes it possible for a controlling process to get access to the object file and any shared libraries (and consequently the symbol tables), without the process first obtaining the specific path names of those files.
The directory /proc/pid/lwp/tid contains the following entries:
uint64_t pr_lwpid; /* specific thread id */ uint32_t pr_flags; /* thread status flags */ char pr_state; /* thread state - from thread.h t_state */ uint16_t pr_cursig; /* current signal */ uint16_t pr_why; /* reason for stop (if stopped) */ uint16_t pr_what; /* more detailed reason */ uint32_t pr_policy; /* scheduling policy */ char pr_clname[PRCLSZ]; /* printable character representing pr_policy */ pr_sigset_t pr_lwppend; /* set of signals pending to the thread */ pr_sigset_t pr_lwphold; /* set of signals blocked by the thread */ pr_siginfo64_t pr_info; /* info associated with signal or fault */ pr_stack64_t pr_altstack; /* alternate signal stack info */ struct pr_sigaction64 pr_action; /* signal action for current signal */ uint16_t pr_syscall; /* system call number (if in syscall) */ uint16_t pr_nsysarg; /* number of arguments to this syscall */ uint64_t pr_sysarg[PRSYSARGS]; /* arguments to this syscall */ prgregset_t pr_reg; /* general and special registers */ prfpregset_t pr_fpreg; /* floating point registers */ pfamily_t pr_family; /* hardware platform specific information */
The members of the lwpstatus file are described below:
If the pr_syscall member is non-zero, the pr_nsysarg member is the number of arguments to the system call and the pr_sysarg array contains the arguments. In AIX 5.1, the pr_nsysarg member is always set to 8, the maximum number of system call parameters. The pr_sysarg member always contain eight arguments.
uint64_t pr_lwpid; /* thread id */ prptr64_t pr_addr; /* internal address of thread */ prptr64_t pr_wchan; /* wait addr for sleeping thread */ uint32_t pr_flag; /* thread flags */ uchar_t pr_wtype; /* type of thread wait */ uchar_t pr_state; /* numeric scheduling state */ char pr_sname; /* printable character representing pr_state */ uchar_t pr_nice; /* nice for cpu usage */ int pr_pri; /* priority, high value = high priority*/ uint32_t pr_policy; /* scheduling policy */ char pr_clname[PRCLSZ]; /* printable character representing pr_policy */ cpu_t pr_onpro; /* processor on which thread last ran */ cpu_t pr_bindpro; /* processor to which thread is bound */
Some of the entries in the lwpsinfo file, such as pr_flag, pr_addr, pr_state, pr_wtype, and pr_wchan, refer to internal kernel data structures and should not be expected to retain their meanings across different versions of the operating system. They have no meaning to a program and are only useful for manual interpretation by a user aware of the implementation details.
Process state changes are effected through messages written to the ctl file of the process or to the lwpctl file of an individual thread. All control messages consist of an int operation code identifying the specific operation followed by additional data containing operands (if any). Multiple control messages can be combined in a single write subroutine to a control file, but no partial writes are permitted. Each control message (operation code plus operands) must be presented in its entirety to the write subroutine, not in pieces through several system calls.
The following are allowed control messages:
Note: Writing a message to a control file for a process or thread that has already exited elicits the error ENOENT.
When applied to a thread control file,
When applied to a thread control file, PCSTOP and PCWSTOP complete when the thread stops on an event of interest and immediately if the thread is already stopped.
When applied to the process control file, PCSTOP and PCWSTOP complete when every thread has stopped on an event of interest.
An event of interest is either a PR_REQUESTED stop or a stop that has been specified in the process's tracing flags (set by PCSTRACE, PCSFAULT, PCSENTRY, and PCSEXIT). PR_JOBCONTROL and PR_SUSPENDED stops are not events of interest. (A thread may stop twice because of a stop signal; first showing PR_SIGNALLED if the signal is traced and again showing PR_JOBCONTROL if the thread is set running without clearing the signal.) If PCSTOP or PCDSTOP is applied to a thread that is stopped, but not because of an event of interest, the stop directive takes effect when the thread is restarted by the competing mechanism; at that time the thread enters a PR_REQUESTED stop before executing any user-level code.
A write operation of a control message that blocks is interruptible by a signal so that, for example, an alarm subroutine can be set to avoid waiting for a process or thread that may never stop on an event of interest. If PCSTOP is interrupted, the thread stop directives remain in effect even though the write subroutine returns an error.
A kernel process (indicated by the PR_ISSYS flag) is never executed at user level and cannot be stopped. It has no user-level address space visible through the /proc file system. Applying PCSTOP, PCDSTOP, or PCWSTOP to a system process or any of its threads elicits the error EBUSY.
The allowed flags for PCRUN are described below:
Note: The PRSTEP operation requires hardware and operating system support. It is not supported on the POWER-based platform because of no hardware support.
When applied to a thread control file, PCRUN makes the specific thread runnable. The operation fails and returns the error EBUSY if the specific thread is not stopped on an event of interest.
When applied to the process control file, a representative thread is chosen for the operation as described for the /proc/pid/status file. The operation fails and returns the error EBUSY if the representative thread is not stopped on an event of interest. If PRSTEP or PRSTOP were requested, PCRUN makes the representative thread runnable. Otherwise, the chosen thread is marked PR_REQUESTED. If as a result all threads are in the PR_REQUESTED stop state, they all become runnable.
Once PCRUN makes a thread runnable, it is no longer stopped on an event of interest, even if it remains stopped because of a competing mechanism.
If a signal that is included in a thread's held signal set is sent to that thread, the signal is not received and does not cause a stop until it is removed from the held signal set. Either the thread itself removes it or you remove it by setting the held signal set with PCSHOLD or the PRSHOLD option of PCRUN.
The syntax of this operation are different from those of the kill subroutine, pthread__kill subroutine, or PCKILL. With PCSSIG, the signal is delivered to the thread immediately after execution is resumed (even if the signal is being held) and an additional PR_SIGNALLED stop does not intervene even if the signal is being traced. Setting the current signal to SIGKILL ends the process immediately.
Note: Some of these may not occur on all processors; other processor-specific faults may exist in addition to those described here.
When not traced, a fault normally results in the posting of a signal to the thread that incurred the fault. If a thread stops on a fault, the signal is posted to the thread when execution is resumed unless the fault is cleared by PCCFAULT or by the PRCFAULT option of PCRUN. The pr_info field in /proc/pid/status or in /proc/pid/lwp/tid/lwpstatus identifies the signal to be sent and contains machine-specific information about the fault.
If a thread is stopped on entry to a system call (PR_SYSENTRY) or when sleeping in an interruptible system call (PR_ASLEEP is set), it may be instructed to go directly to system call exit by specifying the PRSABORT flag in a PCRUN control message. Unless exit from the system call is being traced, the thread returns to user level showing error EINTR.
The error EINVAL is returned if you specify flags other than those described above or to apply these operations to a system process. The current modes are reported in the pr_flags field of the /proc/pid/status file.
/proc | Directory (list of processes) |
/proc/pid | Directory for the process pid |
/proc/pid/status | Status of process pid |
/proc/pid/ctl | Control file for process pid |
/proc/pid/psinfo | ps info for process pid |
/proc/pid/as | Address space of process pid |
/proc/pid/map | as map info for process pid |
/proc/pid/object | Directory for objects for process pid |
/proc/pid/sigact | Signal actions for process pid |
/proc/pid/sysent | System call information for process pid |
/proc/pid/lwp/tid | Directory for thread tid |
proc/pid/lwp/tid/lwpstatus | Status of thread tid |
/proc/pid/lwp/tid/lwpctl | Control file for thread tid |
/proc/pid/lwp/tid/lwpsinfo | ps info for thread tid |
Other errors can occur in addition to the errors normally associated with
file system access:
ENOENT | The traced process or thread has exited after being opened. |
EFAULT | A read or write request was begun at an invalid virtual address. |
EIO | A write subroutine was attempted at an illegal address in the traced process. |
EBUSY | This error is returned because of one of the following reasons:
|
EPERM | Someone other than the privileged user attempted to better a process's priority by issuing PCNICE. |
ENOSYS | An attempt was made to do an unsupported operation (such as create, remove, link, or unlink) on an entry in the /proc file system. |
EINVAL | An invalid argument was supplied to a system call. The following
are some--but not all--possible conditions that can elicit this
error:
|
EINTR | A signal was received by the controlling process while waiting for the traced process or thread to stop via PCSTOP or PCWSTOP. |
EBADF | The traced process performed an exec subroutine on a setuid/setgid object file or on an object file that is not readable for the process. All further operations on the process or thread file descriptor (except the close subroutine) elicit this error. |
The effect of a control message is guaranteed to be atomic with respect to the traced process.
For security reasons, except for the privileged user, an open subroutine of a /proc file fails unless both the effective user ID and effective group ID of the caller match those of the traced process and the process's object file is readable by the caller. Files corresponding to setuid and setgid processes can be opened only by the privileged user. Even if held by the privileged user, an open process or thread file descriptor becomes invalid if the traced process performs an exec subroutine on a setuid/setgid object file or on an object file that it cannot read. Any operation performed on an invalid file descriptor, except for the close subroutine, fails with EBADF. In this case, if any tracing flags are set and the process or any thread file is open for writing, the process is directed to stop and its run-on-last-close flag is set (see PCSET).
This feature enables a controlling process (that has the necessary permission) to reopen the process file to obtain new valid file descriptors, close the invalid file descriptors, and proceed. Just closing the invalid file descriptors causes the traced process to resume execution with no tracing flags set. Any process that is not currently open for writing with tracing flags left over from a previous open subroutine and that performs an exec subroutine on a setuid/setgid or unreadable object file is not stopped. However, that process does not have all its tracing flags cleared.