Make your original datasets stand out

The why and how of research data archiving and sharing

Simone Sacchi, Ph.D.
Research data Librarian
simone.sacchi@eui.eu

Before we start


  • This information session has been organised in the context of the 2026 International Love Data Week, similar events are held in the other villas, see full list of events on the EUI Library events page.
  • At the end of the presentation I will ask to complete a brief survey on your data-related needs.

Agenda for today


Data archiving and data sharing

Research data

We take an inclusive notion of research data at the core of our work:

Research data is all information intentionally collected, observed, generated or reused to validate research findings and substantiate scholarly claims

Data can take a broad variety of forms, for example: tables, texts, images, audio/video recordings, archival material and other sources or physical evidence.

Read more on the EUI Research Data Guide: What is Research Data

Research data lifecycle

We organise our work around the idea that research data, much like research itself, goes through stages as identified in a research data lifecycle.


Today we are focusing on the Archive and Share stages, i.e. what shall happen towards the completion of a research project to elevate our work to its fullest!

Share Archive Analyse Process Organise Collect Plan Research Data Management



Read more on the EUI Research Data Guide: Research Data Lifecycle

Data archiving and data sharing

Research data archiving: organization, storage and the long-term preservation of research data after a project ends, ensuring its integrity, security, and accessibility for future use.

Research data sharing: the practice of making data, observations, and analytical code, accessible to other researchers and the public after a study is completed.

Read more on the EUI Research Data Guide: Register, archive and share data

Archiving and sharing principles


As open as possible, and as closed as necessary


FAIR guiding principles for scientific data management and stewardship Wilkinson et al., 2016, developed to improve the Findability, Accessibility, Interoperability and Reusability of research data.


Why it matters

Why it matters [1]

  • Increased impact: Makes data easier to be found, which can lead to new findings, increased citations, and broader recognition of your work.
  • Improved reliability and reproducibility: Allows other researchers to verify your results and helps ensure conclusions are built on a solid foundation.
  • Builds public trust: Increases transparency and allows the public to check the work behind research conclusions. 

Why it matters [2]

  • Enhanced collaboration: Increases opportunities to work with other researchers, institutions, and industries, fostering new partnerships.
  • Efficiency: Reduces redundant efforts and time-consuming future collection.
  • Long-term preservation: Makes sure that your data are not lost and stays available in the long term.

What to consider

Is my dataset original?

According to the Database Directive1, A dataset is considered an original work if:

  • It is the result of an intentional structured collection of original data

OR

  • by reason of the selection or arrangement of their contents, constitute the author’s own intellectual creation.

If any of the above is the case, you can claim intellectual paternity over a dataset, and therefore have copyright over it.

Tip

Reusing and integrating data from existing dataset does consitute an act of creation of an orginal work, if the structure is original (individual data points are considered “facts” and therefore are do not fall under copyright).

Am I reusing third-party data?

  • Check the quality standards of the data source (e.g. gathering purpose, data collection method(s), data documentation, etc.)
  • Check the license and terms of use associated with the data source (i.e. am I legally allowed to reuse that data/information?)
  • Record thoroughly your data sources, which dataset you reused and how (important for provenance and transparency)
  • Cite your data sources (you will want to be cited as well!)

Tip

Apply the same priciples you would apply when reusing (e.g. quoting) from other publications

Have I collected (or reused) personal and sensitive data?

Special terms and conditions apply to access and use of personal data, including micro-socioeconomic and qualitative data.

Data Protection at the EUI is governed by President’s Decision No.10/2019, which was introduced following the adoption of the General Data Protection Regulation (GDPR).

The EUI adopts several policies and best practices, here are some pages where to find all relevant resources:

How to prepare your data for archiving and sharing

Document your data

Data documentation could be defined as the clear description of everything that a new “data user” or “your future-self” would need to know in order to find, understand, reproduce and reuse your data, independently.

Clear and accurate documentation should include:

  • Purpose, context and methodology of the research project
  • Description of the dataset (structure, folders, files, variables and versioning)
  • Definitions, variable names, problematic values, missing observations etc.
  • Methodology and how and when the data was collected or generated
  • Elaboration techniques (sub-setting, combining &c.)

Tip

  • Good documentation helps make datasets findable, accessible, interoperable and re-usable (FAIR principles).
  • Codebooks, questionnaires and data dictionaries should be archived with the data.

Organise your data: file naming

Consider the following elements1

  • Version number;
  • Date of creation (date format should be YYYY-MM-DD);
  • Name of creator;
  • Description of content;
  • Name of research team/department associated with the data;
  • Publication date;
  • Project number.

Best practice:

  • Create meaningful but brief names;
  • Avoid using spaces, dots and special characters (& or ? or !);
  • Use hyphens (-) or underscores (_) to separate elements in a file name;
  • Avoid very long file names;
  • Reserve the 3-letter file extension for application-specific codes of file format (e.g. .doc, .xls, .mov, .tif);
  • Include versioning of file names where appropriate.

Organise your data: folder structure

The folder structure gives an overview of which information can be found where, enabling present as well as future stakeholders to understand what files have been produced in the project.

Folders should:

  • Follow a structure with folders and subfolders that correspond to the project design and workflow
  • Have a self-explanatory name that is only as long as is necessary
  • Have a unique name – avoid assigning the same name to a folder and a subfolder

Tip

The top folder should have a README.txt file describing the folder structure and what files are contained within the folders.

Remove personal information

When data are anonymised, individual research participants or third persons cannot be identified based on indirect identifiers or by combining the data with information available elsewhere.

  • Remove personal data identifiers
  • Aggregate, or reduce, the precision of variables

When data are pseudonymised, unique records are replaced by consistent values either derived from the original values or independent of them so that specific data subjects are no longer identifiable.

  • Justify why data are not anonymised
  • Replace personal identifiers
  • Store/encrypt pseudonyms separately

Amnesia 
(by OpenAIRE)

TextWash 
(python code available on GithHub)

Caution

Pseudonymised data can be anonymised by destroying the encryption key. Data should be deleted at end of retention period if not anonymised.

Adopt open formats

When it comes to file formats, you must distinguish between Working and Archiving file formats

The submission process and associated choices

Where to archive

You can opt for different options when deciding to archive your data, depending on your needs and research community:

Tip

You can search for the repository that works best for you using re3data.org a comprehensive registry of research data repositories that is global and covers all research disciplines.

Cadmus, EUI Research Repository

Cadmus, the EUI Research Repository, collects, preserves, and provides access to the EUI research outputs

Cadmus, includes a Research Data Collection, where all members of the EUI community can archive and share (or simply register) their original datasets.

Discover here how to initiate a submission process.

Tip

Three options: 1) Archive your data; 2) archive and share your data; 3) Register (if archived elsewhere)

Archive your data in Cadmus

Archiving your dataset in a trusted research repository ensures that your data can be found by both humans and machines by assigning a unique and persistent identifier (such as a DOI or HANDLE) and standardised, machine-readable metadata.

Criteria for archiving your dataset:

  • You are an active member of the EUI community
  • Your dataset is the result of an original data collection OR its structure carries significant intellectual effort to make it an original work
  • If you used data under copyright, please abide to the original terms of use and obtain clearance

Tip

If your dataset is already archived in a third-party trusted repository (e.g. Harvard Dataverse, Zenodo, etc.), you can register it in Cadmus and a link to the archived data will be added.

Share your data

Once your dataset are archived in Cadmus, and your dataset qualifies (e.g. does not contain personal and sensitive data according to GDPR), you can make it open and enable its full reuse for the broadest academic community and society.

In Cadmus you can choose:

  • Open —assigning a Creative Commons Attribution 4.0 International (CC-BY) or Public Domain 1.0 Universal (CC0) license
  • Embargoed —you pick a date in the future to open your dataset, licenses above still apply
  • Closed — under this condition, your dataset will stay closed by default, but you can specify under which conditions it can be shared (e.g. )

Tip

If your dataset contains sensitive and/or personal data, consider create an anonymised version

Wrap up

Care about your data!

Research data are becoming more and more first class citizens in the scholarly communication landscape, let them stand out!

  • Care, care, care about all your scholarly outputs, including your data!

  • Think about your future self as your first and foremost collaborator

  • Everything you produce has potential future value for you, for the academic community and for society, so consider opening your research data to the world.

The EUI Library is here to help

We are here to help you succeed in working with data and making sure you leverage them at best.

Your feedback matters!

Link: https://forms.office.com/e/XMUhLD34Th

Any question?

Simone Sacchi

Research Data Librarian, EUI Library

simone.sacchi@eui.eu

resdata@eui.eu

(Also Teams and BF-278)

End of the presentation

BACKUP SLIDES

The FAIR principles explained

  • Findable: ensuring that your data can be found by both humans and machines, by using a globally unique and persistent identifier (such as a DOI, kind of like an ORCID for data) and standardised, machine-readable metadata.
  • Accessible: once someone has found your data, they need to know how they can get access to them. This could include going through an authorisation and/or authentication process – i.e. it does not have to be open access to be FAIR (ethics always trump openness).
  • Interoperable: the use of open formats ensures that your data can be integrated with other data and that they can be utilised by many applications or workflows for analysis, storage, and processing into the future, regardless of changes in software.
  • Reusable: ensuring that your data (and their related metadata) are openly licensed and well-described, indicating unambiguously how they may be reused without a need to contact the author(s) first.