The data centre function of CEDA celebrated 25 years of existence in October 2019. As we approach the end of this year, we take a look back to see what has changed along the way.
Our longest serving employee, Sam Pepler, has been working at CEDA as Head of Curation since November 1996! “When I joined the team, they’d only just got a website! It was brand new technology at that point. We were just phasing out a large optical jukebox (a robotic data storage device that could automatically load and unload optical discs) - which I think had 60 Gigabytes of data on it. The first hard-disk based system was then introduced with about 120GB - these data volumes are tiny when compared to our current data archive of over 10 Petabytes” explained Sam. CEDA’s data archive holdings have approximately doubled every 1.5 years since Sam has been working here.
The CEDA Archive (previously known as BADC and NEODC) now has over 50,000 users, ~13 Petabytes of data held in more than 250 million files. This requires CEDA to be at the forefront of data management best practice and provision of efficient data analysis facilities. Sam explains how this has changed; “We used to receive data in random user-invented formats, but now everyone uses standard formats like NetCDF. That’s made it much easier for us (in CEDA) and principally the user community to access and use our data. We really pushed for this change and I’m proud of how far we’ve come.” Data centres, like CEDA, have positively encouraged the scientific communities to make these changes and continue to do so by facilitating best practice in data management (e.g. issuing DOI’s for datasets).
With the ever increasing volume and variety of data in the CEDA Archive, a new solution was necessary for managing access to use and analyse the data. This is where JASMIN stepped in ~7.5 years ago. The flexible storage and shared group workspaces allow projects to bring their own data to use and analyse with the data held in the CEDA Archive - meaning data analysis can occur alongside the long term data archive. JASMIN’s community cloud enables projects to build their own bespoke computing environments to match their needs and showcase their work to their audiences. JASMIN is a globally unique analysis facility that allows researchers to do data intensive research that was not previously possible.
Whilst many things within CEDA have changed over the last 25 years, there are still some things that have stayed the same. “We still use tape technology at the back end of our services and the largest data volumes are still from the climate and earth observation communities - these things haven’t changed,'' explained Sam.
Our primary role continues to help the scientific community to do research - and 2019 has been one of our busiest years yet... with over 7.4 Petabytes of data archived, 517 new JASMIN users, and over 1630 users helped via our helpdesk - to name a few stats! See infographic below for more of our key achievements and stats from this year.
Over the past 25 years, CEDA has faced many challenges around ever increasing data volumes and user community which has meant that our technology has had to drastically evolve in order to adapt to these needs. We strive to improve our services in order to meet the needs of the community - and hope to continue doing so for the next 25 years too!
If you want to know more about what we’ve been up to, take a look at our previous annual reports or our archive of news items.