Submit job script

TMG Executive Menu submits contiguous jobs to the scheduler by calling the submit job script, which defines how the jobs are distributed in the cluster and the order in which they are executed.

The format of the submit job script depends on the scheduler you use. Each job corresponds to a group of solver modules that run consecutively. Some jobs require only one process, while others run in parallel.

The submit job script requires the following four arguments:

  1. The name of the job.
  2. The job ID the scheduler must execute before the next job is sent to the queue. The value is -1 if there are no more jobs in the queue. It defines the ordering of the jobs by ensuring that the scheduler completes the first job, even when resources are available to launch the next one.

  3. The number of nodes required for the job. The submit job script retrieves this value from the solution monitor.
  4. The command to run the job.

SGE scheduler submit script example

This file called, for example script_parallel.sh, is a submit script that submits a job to the SGE scheduler from the TMG Executive Menu. Its content is:

#!/bin/bash
#
#  * $1 is the job name, visible in the queue
#  * $2 is the ID of a job that needs to run before this one. -1 if none.
#  * $3 is the number of CPUs for the job
#  * $4 is the command to run (with its arguments quote protected)
#
qsub_output=`qsub -hold_jid $2 -pe monitor $3 -N $1 /shared_nfs/scripts/run_parallel_slave.sh "$4"`
job_number=`echo $qsub_output | awk '{for(i=1;i<=NF;i++){ if(strtonum($i)>max){ max = strtonum($i)} }; print max}'`
echo $job_number

where:

  • qsub_output is the output log file name of the submitted job in the scheduler queue.
  • qsub-hold_jid defines the job dependency list of the submitted job. The submitted job is only eligible for execution when the job ID that is identified in the $2 parameter is completed.
  • -pe monitor $3 sets monitor as the name of the parallel environment and the $3 parameter identifies the number of nodes that are required for the job.
  • -N - $1 defines the job name set using the value set with the $1 parameter.
  • /shared_nfs/scripts/run_parallel_slave.sh is the location of the script that prepares the cluster for the DMP processing using the SGE scheduler. You can store this file in any directory providing the file is accessible from all nodes in the cluster.
  • "$4"` job_number=`echo $qsub_output | awk '{for(i=1;i<=NF;i++){ if(strtonum($i)>max){ max = strtonum($i)} }; print max} searches for the submitted job number in the qsub_output log file and prints the next job ID value as the job number.
  • job_number is the job number that the TMG Executive Menu retrieves from the output log file.

The run-time script, run_parallel_slave.sh, sets up the required configurations on the cluster nodes before the job ($4) is started. This script is executed by the master node of the cluster when the job is launched. It prepares the master node for the parallel processing and executes the job identified in the $1 parameter. Depending on the type of the scheduler you use, this job script may not be required. The alternative to this script is to submit the job directly to the queue.

#!/bin/bash
#
#  * $1 is the job that is to be executed
#
# Specify the shell used for the job script (scheduler preamble commands)
#$ -S /bin/bash
#$ -V
# Place output files in the current working directory
#$ -cwd
# Merge standard error output with standard output
#$ -j y
#$ -o $JOB_NAME.sgelog
cd $SGE_O_WORKDIR
export TMP=/tmp
export TEMP=$TMP
# Actually run the job now...

$1

LSF scheduler submit script example

This script called, for example script_parallel.sh submits a job to the LSF scheduler from the TMG Executive Menu.

#!/bin/bash
#  * $1 is the job name, visible in the queue
#  * $2 is the ID of a job that needs to run before this one. -1 if none.
#  * $3 is the number of CPUs for the job
#  * $4 is the command to run (with its arguments quote protected)
export LOG_FILE=/home/mydirectory/tmgrun
echo "Script_Parallel" >> $LOG_FILE
if [[ "$2" -eq  "-1" ]] ;
then        
        echo "Inside if" >> $LOG_FILE
        echo $2 >> $LOG_FILE
        echo $4 >> $LOG_FILE
        export qsub_output=$(bsub -q loc_dev_par -n 8 -G ug_dev_fdbundle -M 100 -app fdbundle -J $1 -P OTHER_IDN -R "rusage[mem=800]" $4 2>&1 )
                else
        echo "Inside-else" >> $LOG_FILE
        echo $2 >> $LOG_FILE
        echo $4 >> $LOG_FILE
        if [[ "$3" -eq  "1" ]] ;
        then
        export qsub_output=$(bsub -w "done($2)" -q loc_dev_par -n 12 -G ug_dev_fdbundle -M 100 -app fdbundle -J $1 -P OTHER_IDN -R "rusage[mem=800]" $4 2>&1 )
        else

        export qsub_output=$(bsub -w "done($2)" -q loc_dev_par -n $3 -G ug_dev_fdbundle -M 100 -app fdbundle -J $1 -P OTHER_IDN -R "rusage[mem=800]" $4 2>&1 )
fi
fi

echo "qsub_output=$qsub_output" >> $LOG_FILE

export job_number=`echo $qsub_output | awk -F '<' '{print $2}' | awk -F '>' '{print $1}'`
echo "printing job=$job_number" >> $LOG_FILE
echo $job_number

where:

  • LOG_FILE is the output file name of the script_parallel.sh.
  • echo displays the variables output into the LOG_FILE file.
  • export defines the path and exports shell variables.
  • if [[ "$2" -eq "-1" ]] checks for other jobs in the queue before running the current job.
  • qsub_output is the output log file name of the submitted job in the scheduler queue.
  • bsub submits the job to the LSF scheduler by running the specified commands.
  • -q loc_dev_par specifies the target queue. In this example, the target queue is loc_dev_par.
  • -n 8 (or $3) specifies the number of required nodes for the job.
  • -G ug_dev_fdbundle specifies your user group. In this example, the user group is ug_dev_fdbundle.
  • (Optional) -M 100 specifies a memory limit for all the processes. In this example, the memory limit is 100 megabytes.
  • -app fdbundle specifies an existing application profile. In this example, the application profile is fdbundle.
  • -J $1 defines the job name using the value set with the $1 parameter.
  • (Optional) -P OTHER_IDN assigns the job to a specified project name. In this example, the project name is OTHER_IDN.
  • (Optional) -R "rusage[mem=800]" specifies the required resource on a host and runs the job when it is available. In this example, 800 MB memory is requested to run the job.
  • -w "done($2)" checks that the previous job is complete before running the current job.
  • export job_number=`echo $qsub_output | awk -F '<' '{print $2}' | awk -F '>' '{print $1}' searches for the submitted job number in the qsub_output log file and prints the next job ID and name as the job number.
  • job_number is the job number that the TMG Executive Menu retrieves from the LOG_FILE file.