Lewis Maintenance Window – September 4, 2018

The nodes are here and ready to be added to the cluster! The electricians are finishing the wiring to handle the additional power that the new nodes require (similar to 12 dryer plugs!). In addition to adding 32 new nodes, we will also be upgrading our cluster storage network cables and moving some equipment around to make room for the new nodes and our next rack, which we hope to add in 6-12 months. We expect the maintenance window may run a bit longer than usual due to the amount of work that needs to be done. We will announce when we will open the login node and queue back up. There will be no training on 9/5.

Please note that whenever we are in the racks there is an increased risk of data loss. Please remember we do not backup ANY user data; it is your responsibility to make backups.

The queue will continue to run jobs up to 6:00 a.m. on 9/4 and will hold jobs until after the maintenance window. We typically open up the queue for a few hours before we open up the login node and allow new jobs to ensure that the cluster is running as expected. We will be upgrading SLURM this time so there is an increased chance that the queue will be flushed of jobs but will do our best to hold the jobs in the queue. We will take a snapshot of jobs held in the queue if you need that information.

Expanded Training! In response to your feedback, we will be expanding the scope of our regular training sessions. The entire team will be present at the weekly Wednesday training meetings to ensure that your questions, problems, and needs are addressed in a timely manner. There will be no training on 9/5.

We are excited to be adding 1,320 cores and 12TB of RAM to the cluster connected by a faster 100Gbps FDR Infiniband fabric to better serve your needs. You can get 10 cores of Fairshare by investing in our HPC Compute Service for $2,600 for 10 cores for 5 years, paid in full. The fairshare is credited right away. We will still do HPC Investor nodes for those of you that require equipment expenses on grants; however, we will be purchasing them all at once during the next upgrade (6-12 months), so please let us know as soon as possible so we can plan and judge demand. We will purchase a rack as soon as we have enough commitments!

The teaching cluster `clark.rnet.missouri.edu` should not be impacted but there is a small chance that there may be some short interruptions as the network is reconfigured.

Aug. 21, 2018