Transnational Access (TNA) service

New data-near processing capabilities at the European centers DKRZ, CNRS-IPSL, STFC (at CEDA), and CMCC for model data hosted in the Earth System Grid Federation will be made accessible to a broader user community via the new Transnational Access (TNA) service. These processing capabilities support multi-model server-side data analysis through direct access to large data pools including replicated data from the European as well as non-European ESGF data nodes. CMCC joins the ENES TNA initiative through the CMCC Analytics-Hub facility, providing a data science environment, based on ECAS, with:

  •  computational and storage resources,
  •  tools, libraries and services and
  •  a set of collections of climate model data in the context of climate model intercomparison experiments.

This environment is hosted as part of the CMCC data infrastructure and aims to support user groups with respect to climate data collection access, processing and analysis. The hosted datasets concentrate on model data generated as part of the CMIP climate model intercomparison project.  Applying to the TNA call allows you to have direct access to CMCC compute facilities. An evaluation committee will supervise the selection of applications for access to these virtual workspaces. More information about the application procedure here.

Below more info about the site-specific deployment and facility.

Data pool

CMCC will provide access to a set of specific CMIP variable-centric collections. Data will be downloaded and kept in sync with the ESGF federated data archive using the Synda replication tool. About 50 TB disk space have been allocated to this purpose.

The data pool is efficiently accessible from cluster resources as well as JupyterLab. The JupyterLab environment is already equipped with a set of Python libraries to support end-users data analysis. Users can request the installation of additional libraries by contacting the user support here.
Compute intensive parallel data analysis is supported by the submission of batch compute jobs via Ophidia to the CMCC Analytics-Hub cluster.

Besides a pre-defined set of variable-centric collections made available on the CMCC data pool, other ones will be set up to specifically address requests coming from the TNA applications.

Types of access

The CMCC Analytics-Hub offers two different types of access:

1) Direct access via user registration: upon registration, the end user can implement his/her own data analysis use cases by exploiting the different services available (Jupyter Hub, Ophidia, CDO). The user can either upload some input data or analyse data from the collections available in the Analytics-Hub data pool. Access can be requested also for training, evaluation and testing purposes. For any request, please contact the support here.

2) Access via Trans-National (TNA) calls: after a selection process, the winning candidates will be granted access to the Analytics-Hub resources via a dedicated "workspace" where they will perform the data analysis. Besides resource allocation, the CMCC team will offer dedicated support to:

  • set up specific data collections from the CMIP5/CMIP6/CORDEX experiments;
  • perform pre-processing steps;
  • optimize, fine-tune and monitor the data analysis pipeline;
  • apply "FAIR principles" to the performed research (analysis, output, workflows/notebook, etc.);
  • learn more about how to access and use the tools, services, data, infrastructure via dedicated telcos with the support team at CMCC.

Also in this case, for any request, please contact the support here.

Registration and user support

To access the CMCC Analytics-Hub, users needs to register at CMCC. You can register here.

Once registered, users will have access to a set of services (e.g. Jupyter Hub, Ophidia), data collections as well as a comprehensive set of Python scientific modules for data analysis.

Registered users can contact analytics-hub-support[at]cmcc[dot]it for any information request.

Technical information

Compute resources, storage and installed software:

  • Analytics-Hub cluster with 5 compute nodes (20 cores/node, 256 GB RAM/node, 1 TB local disk each), SLURM resource manager, Network 10 Gbit/sec (10 GbE);
  • 50 TB shared storage for data analysis (GFS);
  • one Virtual Machine (4 cores, 8 GB memory, 12 GB disk) for client-side components;
  • one Virtual Machine (8 cores, 16 GB memory, 12 GB disk) for front-end components and services;
  • software: the cluster runs the full Ophidia framework stack. Compute nodes are also equipped with additional libraries and tools (e.g. CDO, NCO);
  • data: the shared disk space hosts data collections from CMIP5/CMIP6 which are being populated/updated over time from ESGF using the Synda replication tool (recommended as it supports consistent version updates to keep the copy in sync with the pool content).