How to run the thermal solver on clusters?
This article explains two different methods on how to run the thermal solver on clusters for thermal-flow and multiphysics analyses.
Introduction
- Using a script that executes the thermal solver in parallel mode.
- Using the dedicated ND argument in the TMG Executive Menu, specifically designed for cluster execution.
Running through the custom script
The most common method to run the thermal solver is through a custom script tailored to the specific scheduler. Typically, you submit a script requesting specific resources. Once the job begins execution, it launches the script and identifies the nodes and CPUs allocated for the job by the job manager. This information is then used to generate the parallel configuration file with the correct cluster nodes. Finally, you launch the thermal solver from the command line in the script using the input file and the updated parallel configuration file.
To run thermal or thermal-structural parallel simulations on a cluster with the Slurm job scheduler on a single node, you can use the following steps and the slurm-script_single-node.sh sample script to run parallel simulations on a cluster.
- Make sure that the following prerequisites are met:
- The following files are required and must be stored together in a
run directory accessible to the nodes in the cluster:
- <simulation/model name>-<solution/analysis name>.xml for thermal simulations
- <simulation name>-<solution name>.mpdat, <simulation name>-<solution name>.dat, and <simulation/model name>-<solution/analysis name>.xml for multiphysics solutions
- Downloaded and adapted for your use slurm-script_single-node.sh Slurm script
- Defined UGII_TMG_DIR and UGII_BASE_DIR environment variables on the master node.
- The following files are required and must be stored together in a
run directory accessible to the nodes in the cluster:
- Prepare the Simulation files.
- For thermal simulations, ensure the
<simulation/model
name>-<solution/analysis
name>.xml file is configured to enable
parallel execution. You can modify it
manually:
Alternatively, you can configure this in Simcenter 3D by selecting the Run Solution in Parallel check box, on the Solution Details page while editing the solution.<SolverParameters> <Thermal-Flow> <Property name="Scratch Directory"> <Value>0</Value> </Property> <Property name="Scratch Directory Location"> <Value></Value> </Property> <Property name="Run Solution in Parallel"> <Value>1</Value> </Property> <Property name="Parallel Configuration File"> <Value>ParallelConfigurationFile.xml</Value> </Property> </Thermal-Flow> </SolverParameters>
- For multiphysics simulations, the script adds the parallel parameter, if it is not already defined, to the <simulation name>-<solution name>.mpdat file, or modifies the number of cores according to the input argument.
- For thermal simulations, ensure the
<simulation/model
name>-<solution/analysis
name>.xml file is configured to enable
parallel execution. You can modify it
manually:
- Use the provided slurm-script_single-node.sh sample
script or write your own Slurm script to handle the job submission. This
script accepts the following input arguments:
n
specifies the total number of cores for the thermal solver.N
specifies the number of nodes.V
specifies the total number of cores for view factors.m
specifies the total number of cores for Nastran in a multiphysics solution.s
specifies the input file: <simulation/model name>-<solution/analysis name>.xml for thermal solutions and <simulation name>-<solution name>.mpdat for multiphysics solutions.j
specifies the name of the generated job script name, such as submit_job.sh.
The script parses these parameters and sets up the job submission file accordingly.
The script automatically:- Generates the Slurm job file submit_job.sh with the number of cores used for the thermal solver.
- Writes the code to generate ParallelConfigurationFile.xml in the Slurm job file with the node name, and the number of cores specified for view factors and the thermal solver. If the file already exists, it is overwritten.
- Modifies the .mpdat file to include the parallel option with the specified number of cores.
- Launches thermal, .xml, or multiphysics, .mpdat, simulation.
- Execute the Slurm script with the required parameters. For example:
with the following arguments:./slurm-script_single-node.sh -n 2 -s sim1-sol1.xml
s
specifies the input sim1-sol1.xml file for the thermal solution.n
specifies two cores for the thermal solver.
This script includes all the necessary SBATCH options for running the simulation in parallel, such as:
You can modify the script to add more options.#SBATCH --job-name=$SIMULATION_NAME #SBATCH --output=%x_%j.out #SBATCH --error=%x_%j.err #SBATCH --ntasks=$TOTAL_CORES #SBATCH --nodes=$NODES
Monitor the job status using Slurm commands like
squeue
to check the running jobs. - (Optional) Resubmit the job, if you want to submit the same job without regenerating all files.
Running through the dedicated ND argument in theTMG Executive Menu
The second method allows you to run jobs through the TMG Executive Menu. In this mode, the monitor submits a series of cluster jobs that execute sequentially for each solver module. Additionally, parallel modules such as VUFAC and ANALYZ can be executed as separate parallel jobs.
The primary difference between the two methods lies in how resources are allocated and the parallel configuration file is formed. In the first method, the custom script handles resource allocation and forms the parallel configuration file, all within a single job. In contrast, when using the ND argument, the thermal solver creates the configuration file with the node names and launches separate cluster jobs for each TMG module. Running a simulation through the custom script offers more flexibility, however requires more scripting skills.
With the Simcenter 3D Thermal/Flow DMP license, you manage DMP processing using a process scheduler, such as Sun Grid Engine Software (SGE), or Load Sharing Facility (LSF), which organize the queues and resources to perform the analysis in parallel.