After many months of work and planning, the HPC storage upgrade is complete. There has been a massive increase in performance for individual jobs and for overall cluster. Initial testing indicates that individual file read/write speeds are going from 10’s of MB/s to 1000’s of MB/s (a 10x-100x speedup for most workloads). We have also built a new all-SSD storage system for “/home” to improve interactive performance (check out `module avail`!). With different storage technologies in different locations, it is now more important than ever to understand how to make the best use of the different storage locations. There are also a number of changes that have been necessitated by the design of the new file system so please read carefully below. This and additional information can be found on our updated storage policy.
The most important change is that quotas are no longer defined per path (/data, /scratch, /group) but by the group ID that owns the file. The other important change is that /home, /data, /group, and /scratch all use compression. Quotas are calculated on the compressed values, not the larger actual size of the file (so everyone should be able to store more data in their quota).
Note that some legacy users never had a quota set on one or more of their directories and will now be out of compliance with the new quotas. You will not be able to write to files or create new folders and files if used space is greater than the quota – please delete or move the files to a different location (i.e. /home à /data or /scratch à /group)
Thank you everyone for being patient during the period of time where we were seeing performance issues with the Isilon. A special thanks goes to those users who were taxing the system with their research and moved their workloads to our Proof Of Concept environment. This allowed us to move quicker to production and to take load of the existing system.
To give a sense of scale, the change involved over 2000 drives in 4 racks and the movement of 200,000GB of data connected to over 200 machines. The new file system contains 960 10TB drives, 35 TB of NVMe storage (faster than SSD), and 2TB of RAM all connected by 100Gbps Ethernet (compared to 10Gbps connections on the old one).
If you are seeing increased performance, and would like to share how much faster your workflow runs, please send feedback to rcss-support@missouri.edu. If you are seeing performance impacts, we would also like to know so we can help you improve your workloads (there are some important knobs that users can tune).
— Jan. 16, 2018