We just completed the expansion and regular maintenance of the Lewis cluster by 33 nodes (1320 next generation cores) and now have a total of 6064 cores in the Lewis cluster. With these nodes comes an increased capacity to run large parallel MPI with the new 100Gbps EDR Infiniband fabric connecting these nodes. The new nodes are available as the “hpc5” partition, and have not been added to ether Lewis or General to give everyone time to test their jobs before we add them in during the next scheduled maintenance window. You can schedule jobs on two or more partitions by adding a comma (“-p Lewis,hpc5”) and it will pick the first one available to run your job.
This was a longer window since we differed a number of updates due to some known incompatibilities at the time. We updated the operating system to CentOS 7.5, upgraded our storage system clients, and the SLURM scheduler, all of which depend on each other. We also made a number of changes to the scheduler policies. The MRI Research Cluster grant from the NSF by Engineering has come to an official close, and the Lewis cluster has broader access to these resources (thank you to Engineering and to those who wrote and participated in the grant!).
For users with large MPI jobs, it is now, more than ever, important to understand where to run your jobs. MPI Jobs run on the “Lewis” partition may fail in the future. (why in the next paragraph). MPI users should run their jobs on the following partitions (not Lewis): z10ph-hpc3, r630-hpc3, hpc4, hpc4rc, hpc5 in order from oldest to newest. For a current list, see our MPI documentation at http://docs.rnet.missouri.edu/Software/mpi. These represent non-blocking and uniform processor collections of nodes, in addition jobs will not run on the old system (hpc2, hpc3, hpc4, hpc4rc) and the new nodes (hpc5) at the same time as they are independent. This will be a factor when the new nodes are added into the Lewis partition at a later date, so please test them now. If you do not care which one, you can specify all the nodes with (“#SBATCH -p z10ph-hpc3, r630-hpc3, hpc4, hpc4rc, hpc5”); however, please note the difference in processor architectures and core count per machine (24, 28 and 40). In addition, jobs that use the new nodes will need to use the “openmpi/openmpi-3.1.2-mellanox” module that has compiled support for the new hardware. Older nodes should use the older openmpi modules (3.1.0 or before) or the newer “openmpi/openmpi-3.1.2-qlogic” module.
Specifically, we do not run a large “fat tree” system as we “oversubscribe” our system. Large jobs should be run on specific partitions that represent “leaf” nodes (non-oversubscribed, non-blocking, fat tree connectivity). Leaf nodes are switches in individual racks and allow MPI jobs to run across an entire rack without blocking. This saves considerable costs, and for the overall size of cluster, would represent little gain. Also, to reduce complexity we have not connected the older 40Gbps QDR InfiniBand nodes with the new 100Gbps EDR switch/nodes. This means that jobs that run across these two InfiniBand networks may fail outright or fall back to the slow TCP method (there are ways to disable TCP).
We have added policy to limit the number of simultaneous Matlab licenses (4) that can be requested. It was possible for a single user to use all the licenses for long periods of times. We are also looking into expanding the number of Matlab licenses.
We also added the dedicated Data Transfer Node (DTN) into the cluster again for large scheduled transfer jobs and added access to a large tape library (over 300TB). We are working on the ability to write large (15TB-300TB) data sets to tape for Digital Preservation. Please contact us if you wish more details.
There will be a General Lewis Users Meeting on Wednesday September 26th 2018 from 10:00-noon in the library. Please join us for updates, questions, and discussions about the upgrade as well as the future needs and directions of our computing environment. The location may change so please check http://docs.rnet.missouri.edu/training for updates.
— Sept. 17, 2018