JASMIN and CEDA services affected by power outage - now resolvedPosted on June 22, 2020 (Last modified on October 19, 2023) • 4 min read • 643 words
JASMIN and CEDA Archive services have been offline today due a power outage that affected the entire RAL site (including the JASMIN machine room) at approximately 13.30 on Monday 22 June 2020. The power is now back, but not all remaining actions can be carried out remotely.
We are awaiting confirmation for when a member of the JASMIN team is allowed to enter the machine room. With the covid-19 restrictions in place, this will take longer than normal.
At present, we do not know how long it will take for us to be able to gain access. Therefore, unfortunately, we cannot give you an estimate of when the services will be back up and running - it is looking unlikely that it will be today.
Some limited services are back online but please consider all services AT RISK until further notice.
Some services are back online but many are still affected by yesterday’s power outage. Work is ongoing to restore these as quickly as possible. Please consider all services (CEDA Archive and JASMIN) at risk until further notice.
Access to the JASMIN machine room is still necessary for some issues, as mentioned yesterday this will take longer than in normal circumstances (due to covid-19 restrictions). Unfortunately, we are unable to give an accurate timescale of when this can happen.
/work/scratch (old scratch volume, soon to be retired as per recent notices) /datacentre/backupcache /datacentre/processing3 /group_workspaces/jasmin2/clipc /group_workspaces/jasmin2/cp4cds1/vol2 /group_workspaces/jasmin2/esacci_lst /group_workspaces/jasmin2/globolakes /group_workspaces/jasmin2/incompass /group_workspaces/jasmin2/jules_OLD /group_workspaces/jasmin2/leicester /group_workspaces/jasmin2/nceo_aerosolfire /group_workspaces/jasmin2/nceo_uor /group_workspaces/jasmin2/precis
Much of JASMIN is now up and returning to normal service, with the exception of the components listed below:
The “JASMIN GridFTP Server” Globus Endpoint (data-xfer1.ceda.ac.uk) is still down, but all other services in the Data Transfer Zone are working normally.
Most storage volumes are accessible, however the following are still inaccessible pending completion of file system checks:
/work/scratch (old scratch volume, soon to be retired as per recent notices)
Please report any further problems you encounter to [email protected] but please continue to bear with us as resolving problems may still take longer than normal.
We believe all JASMIN and CEDA Archive services are now working after Monday’s power outage. If you believe anything isn’t working correctly, please let us know by contacting the appropriate helpdesk.
A few of you have asked why JASMIN doesn’t have some form of back up, so we thought it might be useful to share the reason with you all. JASMIN has a power demand of around 200kW from ~50 full racks of equipment, network and tape robot infrastructure. It shares its machine room environment and some of these systems with facilities of similar scale for other communities and hence a power backup of suitable capacity and complexity would be prohibitively expensive for the science (rather than enterprise) budgets we’re working within. Many subsystems within JASMIN already run on UPS systems to provide short-term supply enabling clean shutdown, but continued operation without mains power is not something which is currently feasible.
We will keep monitoring the situation and will send another update soon.