How to run the thermal solver on clusters?
This article explains two different methods on how to run the thermal solver on clusters for thermal-flow and multiphysics analyses.
Introduction
Running jobs on a cluster provides enhanced processing power, efficient resource management, parallel processing capabilities, scalability, remote access, and monitoring. The thermal solver uses Distributed Memory Parallel (DMP) implementation to run jobs in parallel on multiple machines. It lets you manage DMP processing using a process scheduler, which organizes the queues and resources to perform the analysis in parallel. To run on clusters, you require the Simcenter 3D Thermal/Flow DMP license.
There are two methods to run the thermal solver on cluster:
- Using a custom script that executes the thermal solver in parallel mode.
- Using the dedicated ND argument in the TMG Executive Menu, specifically designed for cluster execution.
Running through a custom script
The most common method to run the thermal solver is through a custom script tailored to the specific scheduler. Typically, when you submit a custom script requesting specific computing resources, it generates a job script with the requested resources and submits it to the job manager. Once the job manager executes the job, it launches the job script. This job script identifies the nodes and CPUs allocated for the job by the job manager and uses this information to generate the parallel configuration file with the correct number of cluster nodes. Finally, it launches the thermal solver using the input file and the generated parallel configuration file.
The TMG thermal-flow solvers installation contains simple scripts for both Slurm (Simple Linux Utility for Resource Management) and PBS (Portable Batch System). You can find these scripts in the tmg/if/scripts directory and use them to launch simulations on Linux clusters. These scripts start the TMG Executive Menu with the tmgnx.com executable.
To run thermal or thermal-structural parallel simulations on a cluster with the Slurm job scheduler on a single node, you can use the following steps to run parallel simulations on a cluster.
- Make sure that the following prerequisites are met:- The following files are required and must be stored together in a run directory accessible to all cluster nodes:- <simulation/model name>-<solution/analysis name>.xml for thermal simulations
- <simulation name>-<solution name>.mpdat, <simulation name>-<solution name>.dat, and <simulation/model name>-<solution/analysis name>.xml for multiphysics solutions
- Slurm script slurm-script_single.sh adapted for your use
 
- Defined UGII_TMG_DIR, UGII_BASE_DIR, and SPLM_LICENSE_SERVER environment variables on the master node.
 
- The following files are required and must be stored together in a run directory accessible to all cluster nodes:
- Prepare the Simulation files.- For thermal simulations, ensure the <simulation/model name>-<solution/analysis name>.xml file is configured to enable parallel execution. You configure this in Simcenter 3D by selecting the Run Solution in Parallel check box, on the Solution Details page while editing the solution.
- For multiphysics simulations, the script adds the parallel parameter, if it is not already defined, to the <simulation name>-<solution name>.mpdat file, or modifies the number of cores according to the input argument.
 
- Execute the Slurm script with the required arguments. For example: 
 with the following arguments:./slurm-script.sh -n 2 -s sim1-sol1.xml- -s sim1-sol1.xmlspecifies the input sim1-sol1.xml file for the thermal solution.
- -n 2specifies two cores for the thermal solver.
 To get the list of all input arguments, their definitions and default value, run the script without any argument. The script also includes all the necessary SBATCH options for running the simulation in parallel. You can modify the script to add more options. 
- (Optional) Monitor the job status using Slurm commands like squeueto check the running jobs.
- (Optional) Resubmit the job using the generated job script, if you want to submit the same job without regenerating all files.
Running through the dedicated ND argument in the TMG Executive Menu
The second method allows you to run jobs through the TMG Executive Menu. In this mode, the monitor submits a series of cluster jobs that execute sequentially for each solver module. Additionally, parallel modules such as VUFAC and ANALYZ can be executed as separate parallel jobs.
The primary difference between the two methods lies in how resources are allocated, and the parallel configuration file is formed. In the first method, the custom script handles resource allocation and forms the parallel configuration file, all within a single job. In contrast, when using the ND argument, the thermal solver creates the configuration file with the node names and launches separate cluster jobs for each TMG module. Running a simulation through the custom script offers more flexibility, however, requires more scripting skills.
