SRUN(1) srun 0.7 (January 2006) SRUN(1) NAME srun - run parallel jobs SYNOPSIS srun [OPTIONS...] executable [args...] srun --batch [OPTIONS...] job_script srun --allocate [OPTIONS...] [job_script] srun --attach=jobid DESCRIPTION Allocate resources and optionally initiate parallel jobs on clusters managed by SLURM. parallel run options -n, --ntasks=ntasks Specify the number of processes to run. Request that srun allocate ntasks processes. The default is one process per node, but note that the -c parameter will change this default. -c, --cpus-per-task=ncpus Request that ncpus be allocated per process. This may be useful if the job is multithreaded and requires more than one cpu per task for optimal performance. The default is one cpu per process. If -c is specified without -n as many tasks will be allocated per node as possible while satisfying the -c restriction. -N, --nodes=minnodes[-maxnodes] Request that a minimum of minnodes nodes be allocated to this job. The scheduler may decide to launch the job on more than minnodes nodes. A limit on the maximum node count may be specified with maxnodes (e.g. "--nodes=2-4"). The minimum and maximum node count may be the same to specify a specific number of nodes (e.g. "--nodes=2-2" will ask for two and ONLY two nodes). The partition's node limits supersede those of the job. If a job's node limits are completely outside of the range permitted for its associated partition, the job will be left in a PENDING state. Note that the environment variable SLURM_NNODES will be set to the count of nodes actually allocated to the job. See the ENVIRONMENT VARIABLES section for more information. If -N is not specified, the default behaviour is to allocate enough nodes to satisfy the requirements of the -n and -c options. -r, --relative=n Run a job step relative to node n of the current allocation. This option may be used to spread several job steps out among the nodes of the current job. If -r Page 1 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) is used, the current job step will begin at node n of the allocated nodelist, where the first node is considered node 0. The -r option is not permitted along with -w or -x, and will be silently ignored when not running within a prior allocation (i.e. when SLURM_JOBID is not set). The default for n is 0. If the value of --nodes exceeds the number of nodes identified with the --relative option, a warning message will be printed and the --relative option will take precedence. -p, --partition=partition Request resources from partition "partition." Partitions are created by the slurm administrator, who also identify one of those partitions as the default. -P, --dependency=jobid Defer initiation of this job until the specified jobid has completed execution. Many jobs can share the same dependency and these jobs may belong to different users. The value may be changed after job submission using the scontrol command. --nice[=adjustment] Run the job with an adjusted scheduling priority. With no adjustment value the scheduling priority is decreased by 100. The adjustment range is from -10000 (highest priority) to 10000 (lowest priority). Only privileged users can specify a negative adjustment. NOTE: This option is presently ignored if SchedulerType=sched/maui. --begin=time Defer initiation of this job until the specified time. It accepts times of the form HH:MM:SS to run a job at a specific time of day (seconds are optional). (If that time is already past, the next day is assumed.) You may also specify midnight, noon, or teatime (4pm) and you can have a time-of-day suffixed with AM or PM for running in the morning or the evening. You can also say what day the job will be run, by giving a date in the form month-name day with an optional year, or giving a date of the form MMDDYY or MM/DD/YY or DD.MM.YY. You can also give times like now + count time-units, where the time-units can be minutes, hours, days, or weeks and you can tell SLURM to run the job today with the keyword today and to run the job tomorrow with the keyword tomorrow. The value may be changed after job submission using the scontrol command. -U, --account=account Change resource use by this job to specified account. Page 2 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) The account is an arbitrary string. The may be changed after job submission using the scontrol command. -t, --time=minutes Establish a time limit to terminate the job after the specified number of minutes. If the job's time limit exceeds the partition's time limit, the job will be left in a PENDING state. The default value is the partition's time limit. When the time limit is reached, the job's processes are sent SIGTERM followed by SIGKILL. The interval between signals is specified by the SLURM configuration parameter KillWait. A time limit of 0 minutes indicates that an infinite timelimit should be used. -D, --chdir=path have the remote processes do a chdir to path before beginning execution. The default is to chdir to the current working directory of the srun process. -I, --immediate exit if resources are not immediately available. By default, --immediate is off, and srun will block until resources become available. -k, --no-kill Do not automatically terminate a job of one of the nodes it has been allocated fails. This option is only recognized on a job allocation, not for the submission of individual job steps. The job will assume all responsibilities for fault-tolerance. The active job step (MPI job) will almost certainly suffer a fatal error, but subsequent job steps may be run if this option is specified. The default action is to terminate job upon node failure. Note that --batch jobs will be re-queued if a node failure occurs in the process of initiating it. -K, --kill-on-bad-exit Terminate a job if any task exits with a non-zero exit code. -s, --share The job can share nodes with other running jobs. This may result in faster job initiation and higher system utilization, but lower application performance. -O, --overcommit overcommit resources. Normally, srun will not allocate more than one process per cpu. By specifying --overcommit you are explicitly allowing more than one process per cpu. However no more than Page 3 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) MAX_TASKS_PER_NODE tasks are permitted to execute per node. -T, --threads=nthreads Request that srun use nthreads to initiate and control the parallel job. The default value is the smaller of 10 or the number of nodes allocated. -l, --label prepend task number to lines of stdout/err. Normally, stdout and stderr from remote tasks is line-buffered directly to the stdout and stderr of srun The --label option will prepend lines of output with the remote task id. -u, --unbuffered do not line buffer stdout from remote tasks. This option cannot be used with --label. -m, --distribution=(block|cyclic|hostfile) Specify an alternate distribution method for remote processes. block The block method of distribution will allocate processes in-order to the cpus on a node. If the number of processes exceeds the number of cpus on all of the nodes in the allocation then all nodes will be utilized. For example, consider an allocation of three nodes each with two cpus. A four-process block distribution request will distribute those processes to the nodes with processes one and two on the first node, process three on the second node, and process four on the third node. Block distribution is the default behavior if the number of tasks exceeds the number of nodes requested. cyclic The cyclic method distributes processes in a round-robin fashion across the allocated nodes. That is, process one will be allocated to the first node, process two to the second, and so on. This is the default behavior if the number of tasks is no larger than the number of nodes requested. hostfile The hostfile method of distribution will allocate processes in-order as listed in file designated by the environment variable MP_HOSTFILE. If this variable is listed it will over ride any other Page 4 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) method specified. If not set the method will default to block. -J, --job-name=jobname Specify a name for the job. The specified name will appear along with the job id number when querying running jobs on the system. The default is the supplied executable program's name. --mpi=mpi_type Identify the type of MPI to be used. May result in unique initiation procedures. list Lists avaliable mpi types to choose from. lam Initiates one 'lamd' process per node and establishes necessary environment variables for LAM/MPI. mpich-gm For use with Myrinet. mvapich For use with Infiniband. none No special MPI processing. This is the default and works with many other versions of MPI. --jobid=id Initiate a job step under an already allocated job with job id id. Using this option will cause srun to behave exactly as if the SLURM_JOBID environment variable was set. -o, --output=mode Specify the mode for stdout redirection. By default in interactive mode, srun collects stdout from all tasks and line buffers this output to the attached terminal. With --output stdout may be redirected to a file, to one file per task, or to /dev/null. See section IO Redirection below for the various forms of mode. If the specified file already exists, it will be overwritten. If --error is not also specified on the command line, both stdout and stderr will directed to the file specified by --output. -i, --input=mode Specify how stdin is to redirected. By default, srun redirects stdin from the terminal all tasks. See IO Redirection below for more options. Page 5 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) -e, --error=mode Specify how stderr is to be redirected. By default in interactive mode, srun redirects stderr to the same file as stdout, if one is specified. The --error option is provided to allow stdout and stderr to be redirected to different locations. See IO Redirection below for more options. If the specified file already exists, it will be overwritten. -b, --batch Submit in "batch mode." srun will make a copy of the executable file (a script) and submit the request for execution when resouces are available. srun will terminate after the request has been submitted. The executable file will run on the first node allocated to the job and must contain srun commands to initiate parallel tasks. stdin will be redirected from /dev/null, stdout and stderr will be redirected to a file (default is jobname.out or jobid.out in current working directory, see -o for other IO options). Note that if the slurm daemons are cold-started, jobid values will be reused. Plan accordingly to avoid over- writing output and error files. executable must be specified using either a fully qualified pathname or its pathname will be relative to the current working directory. The search path will not be used to locate the file. executable will be interpreted by the users default shell unless the file begins with "#!" followed by the fully qualified pathname of a valid shell. Note that batch jobs will be re-queued if a node fails while it is being initiated. Srun commandline options can also be inserted into the script by prefacing the option with #SLURM. Multiple options can be on one line or multiple lines. i.e. #SLURM -N 2 -n 2 #SLURM --mpi=lam This is run the script on 2 nodes, with 2 procs with mpi type lam. All commandline options are able to be set inside the script with the exception of the mode (which has already been set since to run a batch script you are in batch mode). Options on the command line take precedence over options in the batch script, which in turn take precedence over exiting environmement variables. -v, --verbose verbose operation. Multiple -v's will further increase the verbosity of srun. By default only errors will be displayed. Page 6 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) -d, --slurmd-debug=level Specify a debug level for slurmd(8). level may be an integer value between 0 [quiet, only errors are displayed] and 4 [verbose operation]. The slurmd debug information is copied onto the stderr of the job. By default only errors are displayed. -W, --wait=seconds Specify how long to wait after the first task terminates before terminating all remaining tasks. A value of 0 indicates an unlimited wait (a warning will be issued after 60 seconds). The default value is set by the WaitTime parameter in the slurm configuration file (see slurm.conf(5)). This option can be useful to insure that a job is terminated in a timely fashion in the event that one or more tasks terminate prematurely. -q, --quit-on-interrupt Quit immediately on single SIGINT (Ctrl-C). Use of this option disables the status feature normally available when srun receives a single Ctrl-C and causes srun to instead immediately terminate the running job. -X, --disable-status Disable the display of task status when srun receives a single SIGINT (Ctrl-C). Instead immediately forward the SIGINT to the running job. A second Ctrl-C in one second will forcibly terminate the job and srun will immediately exit. May also be set via the environment variable SLURM_DISABLE_STATUS. -Q, --quiet Quiet operation. Suppress informational messages. Errors will still be displayed. --mail-type=type Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, ALL (any state change). The user to be notified is indicated with --mail-user. --mail-user=user User to receive email notification of state changes as defined by --mail-type. The default value is the submitting user. --uid=user Attempt to submit and/or run a job as user instead of the invoking user id. The invoking user's credentials will be used to check access permissions for the target partition. User root may use this option to run jobs as a normal user in a RootOnly partition for example. If Page 7 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) run as root, srun will drop its permissions to the uid specified after node allocation is successful. user may be the user name or numerical user ID. --gid=group If srun is run as root, and the --gid option is used, submit the job with group's group access permissions. group may be the group name or the numerical group ID. --core=type Adjust corefile format for parallel job. If possible, srun will set up the environment for the job such that a corefile format other than full core dumps is enabled. If run with type = "list", srun will print a list of supported corefile format types to stdout and exit. --propagate[=rlimits] Allows users to specify which of the modifiable (soft) resource limits to propagate to the compute nodes and apply to their jobs. If rlimits is not specified, then all resource limits will be propagated. --prolog=executable srun will run executable just before launching the job step. The command line arguments for executable will be the command and arguments of the job step. If executable is "none", then no prolog will be run. This parameter overrides the SrunProlog parameter in slurm.conf. --epilog=executable srun will run executable just after the job step completes. The command line arguments for executable will be the command and arguments of the job step. If executable is "none", then no epilog will be run. This parameter overrides the SrunEpilog parameter in slurm.conf. --task-prolog=executable The slurmd daemon will run executable just before launching each task. This will be executed after any TaskProlog parameter in slurm.conf is executed. Besides the normal environment variables, this has SLURM_TASK_PID available to identify the process ID of the task being started. Standard output from this program of the form "export NAME=value" will be used to set environment variables for the task being spawned. --task-epilog=executable The slurmd daemon will run executable just after each task terminates. This will be before after any Page 8 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) TaskEpilog parameter in slurm.conf is executed. This is meant to be a very short-lived program. If it fails to terminate within a few seconds, it will be killed along with any descendant processes. Allocate options: -A, --allocate allocate resources and spawn a shell. When --allocate is specified to srun, no remote tasks are started. Instead a subshell is started that has access to the allocated resources. Multiple jobs can then be run on the same cpus from within this subshell. See Allocate Mode below. --no-shell immediately exit after allocating resources instead of spawning a shell when used with the -A, --allocate option. Attach to running job: -a, --attach=id This option will attach srun to a running job with job id = id. Provided that the calling user has access to that running job, stdout and stderr will be redirected to the current session (assuming that the tasks' stdout and stderr are not connected directly to files). stdin is not connected to the remote tasks, and signals are not forwarded unless the --join parameter is also specified. -j, --join Used in conjunction with --attach to specify that stdin should also be connected to the remote tasks (assuming that the remote tasks' stdin are not directly connected to files), and signals sent to srun will be forwarded to the remote tasks. Constraint Options. The following options all put constraints on the nodes that may be considered for the job: --mincpus=n Specify minimum number of cpus per node. --mem=MB Specify a minimum amount of real memory. --tmp=MB Specify a minimum amount of temporary disk space. -C, --constraint=list Page 9 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) Specify a list of constraints. The constraints are features that have been assigned to the nodes by the slurm administrator. The list of constraints may include multiple features separated by commas, in which case all nodes must have all listed features (i.e. the features are ANDed together). Alternately the features may be separated by a vertical bar, '|', in which case all nodes have must have at least one of the listed features (i.e. the features are ORed together). If no nodes have the requested features, then the job will be rejected by the slurm job manager. TP --contiguous Demand a contiguous range of nodes. The default is "yes". Specify --contiguous=no if a contiguous range of nodes is not a constraint. -w, --nodelist=host1,host2,... or filename Request a specific list of hosts. The job will contain at least these hosts. The list may be specified as a comma-separated list of hosts, a range of hosts (host[1-5,7,...] for example), or a filename. The host list will be assumed to be a filename if it contains a "/" character. -x, --exclude=host1,host2,... or filename Request that a specific list of hosts not be included in the resources allocated to this job. The host list will be assumed to be a filename if it contains a "/"character. Affinity/Multi-core Options: (when the task/affinity plugin is enabled) --cpu_bind=[{quiet,verbose},]type Bind tasks to CPUs q[uiet], quietly bind before task runs (default) v[erbose], verbosely report binding before task runs no[ne] don't bind tasks to CPUs (default) rank bind by task rank map_cpu: bind by mapping CPU IDs to tasks as specified where is ,,.... CPU IDs are interpreted as decimal values unless they are preceded with '0x' in which case they Page 10 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) interpreted as hexadecimal values. mask_cpu: bind by setting CPU masks on tasks as specified where is ,,.... CPU masks are always interpreted as hexadecimal values but can be preceded with an optional '0x'. To have SLURM always report on the selected binding for all srun commands executed in a shell, you can also enable verbose mode separately from the command line with: setenv SLURM_CPU_BIND verbose SLURM_CPU_BIND will not propagate into the tasks environment (binding by default only affects the first srun). To propagate --cpu_bind to successive srun commands, first do the following in each task: setenv SLURM_CPU_BIND \ ${SLURM_CPU_BIND_VERBOSE},${SLURM_CPU_BIND_TYPE}${SLURM_CPU_BIND_LIST} See the ENVIRONMENT VARIABLES section for a more detailed description of the individual SLURM_CPU_BIND* variables. The following options support AIX systems, but may be applicable to other systems as well. Since POE is used to launch tasks, these options are not normally used. --network=type Specify the communication protocol to be used. The following options support Blue Gene systems, but may be applicable to other systems as well. -g, fB--geometry=XxYxZ Specify the geometry requirements for the job. The three numbers represent the required geometry giving dimensions in the X, Y and Z directions. For example "--geometry=2x3x4", specifies a block of nodes having 2 x 3 x 4 = 24 nodes (actually base partions on Blue Gene). --conn-type=type Require the partition connection type to be of a certain type. On Blue Gene the acceptable of type are MESH, TORUS and NAV. If NAV, or if not set, then SLURM will try to fit a TORUS else MESH. You should not normally set this option. SLURM will normally allocate a TORUS if possible for a given geometry. Page 11 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) -R, --no-rotate Disables rotation of the job's requested geometry in order to fit an appropriate partition. By default the specified geometry can rotate in three dimensions. Help options --help Show this help message --usage Display brief usage message Other options -V, --version output version information and exit Unless the -a (--attach) or -A (--allocate) options are specified (see Allocate mode and Attaching to jobs below), srun will submit the job request to the slurm job controller, then initiate all processes on the remote nodes. If the request cannot be met immediately, srun will block until the resources are free to run the job. If the -I (--immediate) option is specified srun will terminate if resources are not immediately available. When initiating remote processes srun will propagate the current working directory, unless --chdir=path is specified, in which case path will become the working directory for the remote processes. The -n, -c, and -N options control how CPUs and nodes will be allocated to the job. When specifying only the number of processes to run with -n, a default of one CPU per process is allocated. By specifying the number of CPUs required per task (-c), more than one CPU may be allocated per process. If the number of nodes is specified with -N, srun will attempt to allocate at least the number of nodes specified. Combinations of the above three options may be used to change how processes are distributed across nodes and cpus. For instance, by specifying both the number of processes and number of nodes on which to run, the number of processes per node is implied. However, if the number of CPUs per process is more important then number of processes (-n) and the number of CPUs per process (-c) should be specified. srun will refuse to allocate more than one process per CPU unless --overcommit (-O) is also specified. Page 12 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) srun will attempt to meet the above specifications "at a minimum." That is, if 16 nodes are requested for 32 processes, and some nodes do not have 2 CPUs, the allocation of nodes will be increased in order to meet the demand for CPUs. In other words, a minimum of 16 nodes are being requested. However, if 16 nodes are requested for 15 processes, srun will consider this an error, as 15 processes cannot run across 16 nodes. IO Redirection By default stdout and stderr will be redirected from all tasks to the stdout and stderr of srun , and stdin will be redirected from the standard input of srun to all remote tasks. This behavior may be changed with the --output, --error, and --input (-o, -e, -i) options. Valid format specifications for these options are all stdout stderr is redirected from all tasks to srun. stdin is broadcast to all remote tasks. (This is the default behavior) none stdout and stderr is not received from any task. stdin is not sent to any task (stdin is closed). taskid stdout and/or stderr are redirected from only the task with relative id equal to taskid, where 0 <= taskid <= ntasks, where ntasks is the total number of tasks in the current job step. stdin is redirected from the stdin of srun to this same task. filename srun will redirect stdout and/or stderr to the named file from all tasks. stdin will be redirected from the named file and broadcast to all tasks in the job. If the job is submitted in batch mode using the -b or --batch option, filename refers to a path on each of the nodes on which the job runs. Otherwise filename refers to a path on the host that runs srun. Depending on the cluster's file system layout, this may result in the output appearing in different places depending on whether the job is run in batch mode. format string srun allows for a format string to be used to generate the named IO file described above. The following list of format specifiers may be used in the format string to generate a filename that will be unique to a given jobid, stepid, node, or task. In each case, the appropriate number of files are opened and associated with the corresponding Page 13 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) tasks. %J jobid.stepid of the running job. (e.g. "128.0") %j jobid of the running job. %s stepid of the running job. %N short hostname. This will create a separate IO file per node. %n Node identifier relative to current job (e.g. "0" is the first node of the running job) This will create a separate IO file per node. %t task identifier (rank) relative to current job. This will create a separate IO file per task. A number placed between the percent character and format specifier may be used to zero-pad the result in the IO filename. This number is ignored if the format specifier corresponds to non- numeric data (%N for example). Some examples of how the format string may be used for a 4 task job step with a Job ID of 128 and step id of 0 are included below: job%J.out job128.0.out job%4j.out job0128.out job%j-%2t.out job128-00.out, job128-01.out, ... Allocate Mode When the allocate option is specified (-A, --allocate) srun will not initiate any remote processes after acquiring resources. Instead, srun will spawn a subshell which has access to the acquired resources. Subsequent instances of srun from within this subshell will then run on these resources. If the name of a script is specified on the commandline with --allocate, the spawned shell will run the specified script. Resources allocated in this way will only be freed when the subshell terminates. Attaching to a running job Page 14 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) Use of the -a jobid (or --attach) option allows srun to reattach to a running job, receiving stdout and stderr from the job and forwarding signals to the job, just as if the current session of srun had started the job. (stdin, however, cannot be forwarded to the job). There are two ways to reattach to a running job. The default method is to attach to the current job read-only. In this case, stdout and stderr are duplicated to the attaching srun, but signals are not forwarded to the remote processes (A single Ctrl-C will detach this read-only srun from the job). If the -j (--join) option is is also specified, srun "joins" the running job, and is able to forward signals, connects stdin, and acts for the most part much like the srun process that initiated the job. Node and CPU selection options do not make sense when specifying --attach, and it is an error to use -n, -c, or -N in attach mode. ENVIRONMENT VARIABLES Some srun options may be set via environment variables. These environment variables, along with their corresponding options, are listed below. (Note: commandline options will always override these settings) SLURM_CONF The location of the SLURM configuration file. SLURM_ACCOUNT -U, --account=account SLURM_CPU_BIND -U, --cpu_bind=type SLURM_CPUS_PER_TASK -c, --ncpus-per-task=n SLURM_CONN_TYPE --conn-type=(mesh|nav|torus) SLURM_CORE_FORMAT --core=format SLURM_DEBUG -v, --verbose SLURMD_DEBUG -d, --slurmd-debug SLURM_DISTRIBUTION -m, --distribution=(block|cyclic|hostfile) SLURM_GEOMETRY -g, --geometry=X,Y,Z SLURM_LABELIO -l, --label SLURM_NNODES -N, --nodes=(n|min-max) Page 15 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) SLURM_NO_ROTATE --no-rotate SLURM_NPROCS -n, --ntasks=n SLURM_OVERCOMMIT -o, --overcommit SLURM_PARTITION -p, --partition=partition SLURM_REMOTE_CWD -D, --chdir==dir SLURM_STDERRMODE -e, --error=mode SLURM_STDINMODE -i, --input=mode SLURM_STDOUTMODE -o, --output=mode SLURM_TASK_EPILOG --task-epilog=executable SLURM_TASK_PROLOG --task-prolog=executable SLURM_TIMELIMIT -t, --time=minutes SLURM_WAIT -W, --wait=seconds SLURM_DISABLE_STATUS -X, --disable-status Additionally, srun will set some environment variables in the environment of the executing tasks on the remote compute nodes. These environment variables are: SLURM_CPU_BIND_VERBOSE --cpu_bind verbosity (quiet,verbose). SLURM_CPU_BIND_TYPE --cpu_bind type (none,rank,map_cpu:,mask_cpu:) SLURM_CPU_BIND_LIST --cpu_bind map or mask list () SLURM_CPUS_ON_NODE Count of processors available to the job on this node SLURM_JOBID Job id of the executing job SLURM_LAUNCH_NODE_IPADDR IP adddress of the node from which the task launch was initiated (where the srun command ran from) SLURM_LOCALID Node local task ID for the process Page 16 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) within a job SLURM_NNODES Total number of nodes in the job's resource allocation SLURM_NODEID The relative node ID of the current node SLURM_NODELIST List of nodes allocated to the job SLURM_NPROCS Total number of processes in the current job SLURM_PROCID The MPI rank (or relative process ID) of the current process SLURM_TASKS_PER_NODE Number of tasks to be initiated on each node. Values are comma separated and in the same order as SLURM_NODELIST. If two or more consecutive nodes are to have the same task count, that count is followed by "(x#)" where "#" is the repetition count. For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the first three nodes will each execute three tasks and the fourth node will execute one task. MPIRUN_PARTITION The block name on Blue Gene systems only. MPIRUN_NOALLOCATE Do not allcate a block on Blue Gene systems only. MPIRUN_NOFREE Do not free a block on Blue Gene systems only. SIGNALS AND ESCAPE SEQUENCES Signals sent to the srun command are automatically forwarded to the tasks it is controlling with a few exceptions. The escape sequence will report the state of all tasks associated with the srun command. If is entered twice within one second, then the associated SIGINT signal will be sent to all tasks. If a third is received, the job will be forcefully terminated without waiting for remote tasks to exit. The escape sequence is presently ignored. Our intent is for this put the srun command into a mode where various special actions may be invoked. Page 17 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) MPI SUPPORT LAM/MPI version 7.0.4 or higher is well intergated with SLURM. The lamboot command will acquire a SLURM resource allocation and uses the srun command to launch its lamd daemons on each allocated node. See http://www.lam-mpi.org/ for more information. On computers with a Quadrics interconnect, srun directly supports the Quadrics version of MPI without modification. Applications build using the Quadrics MPI library will communicate directly over the Quadrics interconnect without any special srun options. Users may also use MPICH on any computer where that is available. The mpirun command may need to be provided with information on its command line identifying the resources to be used. The installer of the MPICH software may configure it to perform these steps automatically. At worst, you must specify two parameters: -np SLURM_NPROCS number of processors to run on -machinefile list of computers on which to execute. This list can be constructed executing the command srun /bin/hostname and writing its standard output to the desired file. Execute mpirun --help for more options. EXAMPLES This simple example demonstrates the execution of the command hostname in eight tasks. At least eight processors will be allocated to the job (the same as the task count) on however many nodes are required to satisfy the request. The output of each task will be proceeded with its task number. (The machine "dev" in the example below has a total of two CPUs per node) > srun -n8 -l hostname 0: dev0 1: dev0 2: dev1 3: dev1 4: dev2 5: dev2 6: dev3 7: dev3 This example demonstrates how one might submit a script for Page 18 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) later execution (batch mode). The script will be initiated when resources are available and no higher priority job is pending for the same partition. The script will execute on 4 nodes with one task per node implicit. Note that the script executes on one node. For the script to utilize all allocated nodes, it must execute the srun command or an MPI program. > cat test.sh #!/bin/sh date srun -l hostname > srun -N4 -b test.sh srun: jobid 42 submitted The output of test.sh would be found in the default output file "slurm-42.out." The srun -r option is used within a job script to run two job steps on disjoint nodes in the following example. The script is run using allocate mode instead of as a batch job in this case. > cat test.sh #!/bin/sh echo $SLURM_NODELIST srun -lN2 -r2 hostname srun -lN2 hostname > srun -A -N4 test.sh dev[7-10] 0: dev9 1: dev10 0: dev7 1: dev8 The follwing script runs two job steps in parallel within an allocated set of nodes. > cat test.sh #!/bin/bash srun -lN2 -n4 -r 2 sleep 60 & srun -lN2 -r 0 sleep 60 & sleep 1 squeue squeue -s Page 19 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) wait > srun -A -N4 test.sh JOBID PARTITION NAME USER ST TIME NODES NODELIST 65641 batch test.sh grondo R 0:01 4 dev[7-10] STEPID PARTITION USER TIME NODELIST 65641.0 batch grondo 0:01 dev[7-8] 65641.1 batch grondo 0:01 dev[9-10] This example demonstrates how one executes a simple MPICH job. We use srun to build a list of machines (nodes) to be used by mpirun in its required format. A sample command line and the script to be executed follow. > cat test.sh #!/bin/sh MACHINEFILE="nodes.$SLURM_JOBID" # Generate Machinefile for mpich such that hosts are in the same # order as if run via srun # srun -l /bin/hostname | sort -n | awk '{print $2}' > $MACHINEFILE # Run using generated Machine file: mpirun -np $SLURM_NPROCS -machinefile $MACHINEFILE mpi-app rm $MACHINEFILE > srun -AN2 -n4 test.sh This simple example demonstrates the execution of different jobs on different nodes in the same srun. You can do this for any number of nodes or any number of jobs. The executables are placed on the nodes sited by the SLURM_NODEID env var. Starting at 0 and going to the number specified on the srun commandline. > cat test.sh case $SLURM_NODEID in 0) echo "I am running on " hostname ;; 1) hostname echo "is where I am running" ;; esac > srun -N2 test.sh dev0 Page 20 (printed 2/13/06) SRUN(1) srun 0.7 (January 2006) SRUN(1) is where I am running I am running on dev1 SEE ALSO scancel(1), scontrol(1), squeue(1), slurm.conf(5) Page 21 (printed 2/13/06)