Lewis Expansion and the HPC Investment Service

If you plan on investing in the cluster with equipment grant funds in the next 12 months please read this as the HPC Investor service is changing. HPC Investors need to respond ASAP.

We are in the final phase of purchasing our next HPC expansion rack (HPC6) in anticipation of removing older nodes early in 2020. We are now at the “rack scale” and invite everyone to directly invest in the cluster at this time. Direct investment means that equipment-only grant funds can be utilized in addition to unrestricted funds. All equipment-only funded investment must be done as a part of a system expansion. This allows us to plan better, reduce complexity, and leverage our volume discounts. We expect to continue to upgrade or expand the cluster every 6-18 months.

As always, the HPC Service is available at any time at the new rate of $3,500/12 cores for 5 years (FY2020 rate). Equipment-only grant funds cannot be used as this is a service.

The HPC6 expansion will initially consist of 32 nodes with 48 cores each and 384GB of RAM with 1TB of local SSD scratch and HDR100 100Gbps Infiniband for a total of 1536 cores. Given the density of the compute we will be populating the rack in two phases due to density, demand, and power and cooling performance (it will run HOT!) with the first going live soon. The second phase is expected to add 32 more nodes in a fully connected “fat tree” configuration for a total of 3072 cores and will allow large parallel jobs to run across all 3072 cores! The second expansion depends on investment demand and the results of a number of large pending grant proposals.

Given our experience with the last investment we have waited until we have solid cost amount to invest to prevent any unexpected last minute changes. Unfortunately we need an immediate commitment to take advantage of substantial discounts. The cost will be approximately $12,798 per node depending on final configuration and rounding. Please immediately send the number of nodes, MoCode, and group name to rcss-support@missouri.edu. The equipment will remain in the cluster for at least 5 years. Extra “bonus” time in the cluster after 5 years will depend on operational, efficiency, power, cooling, and space considerations at that time.

This will most likely be the last “air breathing” generation as the power density is pushing the limits of air cooling (approximately 10% of the power budget is now in fans!). The next generation will most likely need to be direct water cooled. This will require extensive planning, power and cooling upgrades, and funding to host water cooled systems. It may also mean that there may be a longer than usual time between upgrades when we go to the next CPU generation (HPC7, early to mid-2020 at the earliest). We still have some room to expand the next HPC6 generation if there is sufficient investment demand by the community.

Please note that a large amount of HPC compute capacity is nearing end of warrantee and end of life. Much of this equipment was supported by the NSF MRI Engineering grant and the BioCompute Mizzou Advantage grant, which were one-time funds and will not be replaced. If there is not increased investment in the Lewis cluster the overall number of cores will start falling dramatically starting in 2020. Users that have not invested will find it increasingly difficult to run jobs that require a lot of resources. Please contact us for help including your computational and storage needs in your next grant application. We will continue to provide grant-friendly mechanisms to invest (overhead/indirect free).

If you have any questions please do not hesitate to contact me or rcss-support@missouri.edu directly.

July 19, 2019