Cube Cluster is an add-on module to Cube Voyager, which enables distributing model runs across multiple processing cores and multiple machines. Distributed processing in Cube Cluster, is implemented by running multiple instances of special voyager nodes (called ‘Cluster slave nodes’), which are then managed by a master node (e.g., task monitor). The number of Cluster slave nodes to be used for a model run, depends on the requirements of the model to be run and the availability of physical cores on the machine(s) used for running the model. This section describes how to start cluster nodes.
Before starting one or more cluster nodes, the user should be aware of the following information
Process ID – name of the cluster process. All the temporary
communication files created by the cluster process will use this
process ID as prefix
Process List – sequence of the cluster slave node numbers to start
Work directory – this is a directory location where all the
cluster temp files are created, which helps the master node and
the slave nodes to communicate with each other. By default this
directory will be the work directory of the model run (i.e. the
directory in which the model application/script is run from). The
user can also specify a different location other than the default
location of the model application/script. If the user is specifying a
different location, then this directory should be accessible to all
the slave nodes (such as a shared drive in case the nodes are located
on different machines) and the
DISTRIBUTEMULTISTEP/DISTRIBUTEINTRASTEP statements in the model
script should use the COMMPATH keyword, to point to this directory.
Cluster slave nodes can be started in two ways.
1. Using Cluster Node Management tool
The cluster node management tool is available with the standard Cube Base installation. This tool can be opened from ‘File>>Tools>>Cluster Node Management’
The ‘Work Directory’ and ‘ProcessID’ field should contain the location of the cluster work directory followed by the process ID. The process list field should contain the list of cluster nodes to start on this physical machine. Click ‘Start Nodes’ to start the Cluster slave nodes. In the below example, we are starting 8 Cluster nodes (numbered from 1 to 8) with process ID, ‘Cubetown’ and work directory ‘C:\Cubetown\Model\’
For setting up cluster nodes across multiple machines, the same process will be followed on each machine with different sequence of process list numbers on different machines. For example, for two machines with 8 cores each, the user can start Cluster slave nodes 1 to 8 on one machine and 9-16 on another machine. The same node numbers (for the same process ID) should not be used across machines. Also, the work directory should be a shared location where all the machines have access to read and write files.
2. From Voyager PILOT program
In this method, the Cluster node management tool (cluster.exe) is called as a system command inside a Voyager PILOT program, and the work directory, process ID and process name are passed as parameters to the system command. The syntax for calling Cluster node management is shown below
Cluster [ProcID] [ProcList] [Start/Starthide/Close/List] [Exit]
In the below example, the user is calling Cube Cluster with ProcessID=Cubetown and ProcessList=1-8. This command will start 8 Cluster slave nodes numbered from 1 to 8 in the default work directory (i.e. the directory from which the model application/script is run from)
*Cluster Cubetown 1-8 Start Exit
If the user wants to start cluster nodes on a directory location other than the default working directory of the model run, then the cluster command should be coded as shown below explicitly specifying the location of the Cluster working directory (C:\Cluster) before the process ID.
*Cluster C:\Cluster\Cubetown 1-8 Start Exit
This method can be used only for starting nodes on one machine. If the user wants to use multiple machines then using Cluster Node Management tool is the only option.
When using a Cluster working directory which is different from the default model run working directory, then distribute statements in the model scripts should specify the COMMPATH keyword pointing to the the cluster working directory. Below is an example
DISTRIBUTEINTRASTEP ProcessID=Cubetown ProcessList=1-8 COMMPATH=”C:\Cluster”