Make your original datasets stand out

The why and how of research data archiving and sharing

Simone Sacchi, Ph.D.
Research data Librarian
simone.sacchi@eui.eu

Before we start

This information session has been organised in the context of the 2026 International Love Data Week, similar events are held in the other villas, see full list of events on the EUI Library events page.
At the end of the presentation I will ask to complete a brief survey on your data-related needs.

Agenda for today

Research data

We take an inclusive notion of research data at the core of our work:

Research data is all information intentionally collected, observed, generated or reused to validate research findings and substantiate scholarly claims

Data can take a broad variety of forms, for example: tables, texts, images, audio/video recordings, archival material and other sources or physical evidence.

Read more on the EUI Research Data Guide: What is Research Data

Research data lifecycle

We organise our work around the idea that research data, much like research itself, goes through stages as identified in a research data lifecycle.

Today we are focusing on the Archive and Share stages, i.e. what shall happen towards the completion of a research project to elevate our work to its fullest!

Read more on the EUI Research Data Guide: Research Data Lifecycle

Why it matters

Why it matters [1]

Increased impact: Makes data easier to be found, which can lead to new findings, increased citations, and broader recognition of your work.
Improved reliability and reproducibility: Allows other researchers to verify your results and helps ensure conclusions are built on a solid foundation.
Builds public trust: Increases transparency and allows the public to check the work behind research conclusions.

Why it matters [2]

Enhanced collaboration: Increases opportunities to work with other researchers, institutions, and industries, fostering new partnerships.
Efficiency: Reduces redundant efforts and time-consuming future collection.
Long-term preservation: Makes sure that your data are not lost and stays available in the long term.

What to consider

Is my dataset original?

According to the Database Directive¹, A dataset is considered an original work if:

It is the result of an intentional structured collection of original data

by reason of the selection or arrangement of their contents, constitute the author’s own intellectual creation.

If any of the above is the case, you can claim intellectual paternity over a dataset, and therefore have copyright over it.

Tip

Reusing and integrating data from existing dataset does consitute an act of creation of an orginal work, if the structure is original (individual data points are considered “facts” and therefore are do not fall under copyright).

Am I reusing third-party data?

Check the quality standards of the data source (e.g. gathering purpose, data collection method(s), data documentation, etc.)
Check the license and terms of use associated with the data source (i.e. am I legally allowed to reuse that data/information?)
Record thoroughly your data sources, which dataset you reused and how (important for provenance and transparency)
Cite your data sources (you will want to be cited as well!)

Tip

Apply the same priciples you would apply when reusing (e.g. quoting) from other publications

Have I collected (or reused) personal and sensitive data?

Special terms and conditions apply to access and use of personal data, including micro-socioeconomic and qualitative data.

Data Protection at the EUI is governed by President’s Decision No.10/2019, which was introduced following the adoption of the General Data Protection Regulation (GDPR).

The EUI adopts several policies and best practices, here are some pages where to find all relevant resources:

Ethics and Integrity in Academic Research (EUI Academic Service)
EUI Ethics Committee
Procedure to request an Ethics Review

Document your data

Data documentation could be defined as the clear description of everything that a new “data user” or “your future-self” would need to know in order to find, understand, reproduce and reuse your data, independently.

Clear and accurate documentation should include:

Purpose, context and methodology of the research project
Description of the dataset (structure, folders, files, variables and versioning)
Definitions, variable names, problematic values, missing observations etc.
Methodology and how and when the data was collected or generated
Elaboration techniques (sub-setting, combining &c.)

Tip

Good documentation helps make datasets findable, accessible, interoperable and re-usable (FAIR principles).
Codebooks, questionnaires and data dictionaries should be archived with the data.

Organise your data: file naming

Consider the following elements¹

Version number;
Date of creation (date format should be YYYY-MM-DD);
Name of creator;
Description of content;
Name of research team/department associated with the data;
Publication date;
Project number.

Best practice:

Create meaningful but brief names;
Avoid using spaces, dots and special characters (& or ? or !);
Use hyphens (-) or underscores (_) to separate elements in a file name;
Avoid very long file names;
Reserve the 3-letter file extension for application-specific codes of file format (e.g. .doc, .xls, .mov, .tif);
Include versioning of file names where appropriate.

Organise your data: folder structure

The folder structure gives an overview of which information can be found where, enabling present as well as future stakeholders to understand what files have been produced in the project.

Folders should:

Follow a structure with folders and subfolders that correspond to the project design and workflow
Have a self-explanatory name that is only as long as is necessary
Have a unique name – avoid assigning the same name to a folder and a subfolder

Tip

The top folder should have a README.txt file describing the folder structure and what files are contained within the folders.

Remove personal information

When data are anonymised, individual research participants or third persons cannot be identified based on indirect identifiers or by combining the data with information available elsewhere.

Remove personal data identifiers
Aggregate, or reduce, the precision of variables

When data are pseudonymised, unique records are replaced by consistent values either derived from the original values or independent of them so that specific data subjects are no longer identifiable.

Justify why data are not anonymised
Replace personal identifiers
Store/encrypt pseudonyms separately

Amnesia
(by OpenAIRE)

TextWash
(python code available on GithHub)

Caution

Pseudonymised data can be anonymised by destroying the encryption key. Data should be deleted at end of retention period if not anonymised.

Adopt open formats

When it comes to file formats, you must distinguish between Working and Archiving file formats

The submission process and associated choices

Where to archive

You can opt for different options when deciding to archive your data, depending on your needs and research community:

Cadmus, the EUI Research Repository, a natural choice for all members of the EUI community. Cadmus
A disciplinary repository, when your research community gathers around a specific repository. Here you can find an extensive list of Humanities and Social Sciences repositories
A generalist repository, for all other purposes, like:
- Zenodo - Harvard Dataverse - Dryad - Figshare

Tip

You can search for the repository that works best for you using re3data.org a comprehensive registry of research data repositories that is global and covers all research disciplines.

Cadmus, EUI Research Repository

Cadmus, the EUI Research Repository, collects, preserves, and provides access to the EUI research outputs

Cadmus, includes a Research Data Collection, where all members of the EUI community can archive and share (or simply register) their original datasets.

Discover here how to initiate a submission process.

Tip

Three options: 1) Archive your data; 2) archive and share your data; 3) Register (if archived elsewhere)

Archive your data in Cadmus

Archiving your dataset in a trusted research repository ensures that your data can be found by both humans and machines by assigning a unique and persistent identifier (such as a DOI or HANDLE) and standardised, machine-readable metadata.

Criteria for archiving your dataset:

You are an active member of the EUI community
Your dataset is the result of an original data collection OR its structure carries significant intellectual effort to make it an original work
If you used data under copyright, please abide to the original terms of use and obtain clearance

Tip

If your dataset is already archived in a third-party trusted repository (e.g. Harvard Dataverse, Zenodo, etc.), you can register it in Cadmus and a link to the archived data will be added.

Wrap up

Care about your data!

Research data are becoming more and more first class citizens in the scholarly communication landscape, let them stand out!

Care, care, care about all your scholarly outputs, including your data!
Think about your future self as your first and foremost collaborator
Everything you produce has potential future value for you, for the academic community and for society, so consider opening your research data to the world.

The EUI Library is here to help

We are here to help you succeed in working with data and making sure you leverage them at best.

Always refer back to the EUI Library Research Data Guide
Look for specific training within the EUI Library Research Skills Programme
Reach out to Simone Sacchi, EUI Research Data Librarian

Your feedback matters!

Link: https://forms.office.com/e/XMUhLD34Th

Any question?

Simone Sacchi

Research Data Librarian, EUI Library

simone.sacchi@eui.eu

resdata@eui.eu

(Also Teams and BF-278)

End of the presentation

BACKUP SLIDES

The FAIR principles explained

Findable: ensuring that your data can be found by both humans and machines, by using a globally unique and persistent identifier (such as a DOI, kind of like an ORCID for data) and standardised, machine-readable metadata.
Accessible: once someone has found your data, they need to know how they can get access to them. This could include going through an authorisation and/or authentication process – i.e. it does not have to be open access to be FAIR (ethics always trump openness).
Interoperable: the use of open formats ensures that your data can be integrated with other data and that they can be utilised by many applications or workflows for analysis, storage, and processing into the future, regardless of changes in software.
Reusable: ensuring that your data (and their related metadata) are openly licensed and well-described, indicating unambiguously how they may be reused without a need to contact the author(s) first.

Make your original datasets stand out

Before we start

Agenda for today

Data archiving and data sharing

Research data

Research data lifecycle

Data archiving and data sharing

Archiving and sharing principles

As open as possible, and as closed as necessary

Why it matters

Why it matters [1]

Why it matters [2]

What to consider

Is my dataset original?

Am I reusing third-party data?

Have I collected (or reused) personal and sensitive data?

How to prepare your data for archiving and sharing

Document your data

Organise your data: file naming

Organise your data: folder structure

Remove personal information

Adopt open formats

The submission process and associated choices

Where to archive

Cadmus, EUI Research Repository

Archive your data in Cadmus

Share your data

Wrap up

Care about your data!

The EUI Library is here to help

Your feedback matters!

Any question?

End of the presentation

BACKUP SLIDES

The FAIR principles explained