Lewis3 Scheduler Policy
The new Lewis3 cluster is nearing production and the final stage is to implement a scheduler policy (the scheduler is currently running without any policy). In order to provide a fair and flexible environment and to ensure that resources are efficiently used, we are implementing the Fair Share allocation policy on the cluster scheduler (SLURM). This scheduler policy encourages users to specify resource requirements (cores, RAM, and compute time) to ensure that the jobs requirements can be met and can be scheduled efficiently. The default allocation will be for 1 core, 1GB of RAM, and for 2 hours. This type of job has historically accounted for between 50% and 80% of the jobs run on the cluster. The maximum job length will be limited to 48 hours for non-investors (historically 99% of jobs run less than 50 hours) and at least 168 hours for cluster investors with the fair-share will have a half-life of 28 days. Jobs that exceed their requested resource allocation will be terminated to protect the system (memory exhaustion, for example) and the scheduler. Investors in the cluster will receive allocation shares based on their investment, and the community resources provided by research computing will be divided up evenly among all the users with a portion allocated for external collaboration. The policy will be revised periodically based on cluster usage patterns and user feedback.
The resource allocation policy will be implemented in stages and will be completed by July 1. As of July 1, older hardware (Dell 1850s that are around 10 years old) in the cluster will be decommissioned as jobs complete and will be shut down after August 1. After this time, the remaining hardware will be either decommissioned or moved to the new Lewis3 cluster. Additional compute capacity will temporarily be made available to facilitate this transition. The legacy system (Lewis2) will remain in operation with a minimal amount of compute for at least 12 months and we will work with users to help them transition to the new system. We will provide training to individuals on a weekly basis and to groups on a one-on-one basis to help with the transition to the new cluster. Lewis3 users are encouraged to give feedback on the cluster scheduling as it will be implemented before July 1. Please check back here for updates.
— Jun. 18, 2015