After this weekend's unscheduled downtime, much of the cluster is back in working order.
Two Important Updates:
1) We took advantage of the unscheduled downtime to make an update to slurm to resolve some issues we were seeing affecting many users. A consequence of this is that users must now specify the slurm account to which they would like to charge their slurm job. For example, a user belonging to UNIX group "pi_professor" would need to add the line:
to their slurm submission scripts. Failure to make this specification will result in a sbatch submission error "sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified".
Please submit an RT ticket is you experience issues with this. (https://doit.umbc.edu/request-tracker-rt/doit-research-computing/)
The HPCF Webpage will be updated to reflect this change.
2) We have to physically investigate the 2013 GPUs before they are brought back online. These should be booting now and available in the next thirty minutes.