RCSS Summer Update 2016
Research Computing Support Services continues to grow in terms of people, equipment, and infrastructure. First of all, I would like to welcome our latest team member, Jacob Gotberg. Jacob will be joining us in July as a Cyberinfrastructure Engineer and will be equally supporting research computing on campus and the College of Engineering. Jacob joins us from Sandia National Laboratories where he worked in high- performance computing (HPC).
I would also like to announce the upgrade and an increase in capacity to the Lewis cluster. We will be adding 16 new nodes to the community compute pool to meet the increased computing demands of users and our HPC investors are adding at least eight additional investor nodes to the cluster. The decommissioning of the legacy Lewis2 cluster and associated storage will allow us to upgrade the provisioning system and integrate some of the older capacity (34 nodes/616 cores) into the cluster as well.
Now would be an excellent time to become an HPC investor as our large purchase has allowed us get nodes at a very competitive price. With the HPC investor program researchers purchase an HPC node and we provide, at no-charge, the space, power, and cooling. We also manage the hardware, operating system, scientific software, and security required to run the nodes. Investors get HPC group storage of 3TB at no charge as well. Please contact me directly if you are interested in becoming an HPC investor.
We have also been busy building a teaching cluster (10 nodes) for students in courses to use for their scientific computational needs. This environment will be similar to the Lewis cluster, but tailored for instructional needs.
Over the past six months we have also been working on a Bio Compute cluster specifically targeted at Bioinformatics and Genomics workloads and workflows. Please contact us if you are interested in becoming an investor as well.
Since the last update we have had a number of disruptive storage events and we finally have a handle on the root cause. The issue was around how the different storage pools interact on the cluster (think of it as different disk drives in your workstation) and have now mitigated the problem. We would like to thank the community for both being very understanding and for helping us debug the issues. Without your flexibility and help it would have been a much more difficult series of events.
This is a good time to remind everyone that even though most of our systems are very resilient to component failure we do not backup users’ data. We employ the philosophy of “Research Grade” where we try to maximize compute, storage, and researcher productivity, and this is one of the tradeoffs we make.
The decommissioning of the original Lewis infrastructure will occur on July 1, and allow us to free up five racks of compute and storage to make room for additional cooling in the datacenter. As a part of this the firstname.lastname@example.org email address will no longer be active and users should use email@example.com.
During the transition, we will be normalizing group storage and group management so we can automate many of our processes to serve researchers better and faster. We will be contacting individual groups to migrate their storage and groups to the new system. Over the next few weeks we will also be testing the new environment (hardware, software, and infrastructure) and inviting researchers to help us shake out the bugs. If all goes well, we will have an extended outage the first week of August to make the final changes to the environment.
The Research Network (RNet) will see a much needed upgrade in FY17. Even though we are still in the early planning phases, there are two important changes that will have an impact on users of RNet. The first is that almost all Gigabit Ethernet RNet ports will utilize Tigernet to reach RNet, and starting in FY18, all connections will be charged the standard port fee. As the Research Network is upgraded during FY17, all RNet port fees will be waived as connections are migrated. The second is that we will be rolling out a new RNet architecture to better secure the research network where it needs to be while keeping it open where it needs to be. As we upgrade RNet, we will be working with individual researchers to find the service that best fits their needs. For researchers with specialized networking needs (networking research, SDN, 10Gigabit), we will continue to partner with them to find the best dedicated hardware solution available. Continue to watch this space for more information about the RNet upgrade.
We are working hard on many other projects, so keep watching this space for future announcements about secure compute, bio compute, science gateways, and storage upgrades. And, as always, don’t hesitate to reach out to me or one of the others on the team.
— June 15, 2016