

Quality documentation and metadata enhances the discoverability of your data and enables others to accurately interpret, validate, reuse, and cite it.

Documentation is a comprehensive term that refers to any supporting material or information that provides context, explanation, or guidance regarding your research data. It includes a variety of components, such as codebooks, README files and, importantly, metadata.

Metadata or ‘data about data’ uses structured, standardized information to document data.

Supporting documentation e.g., codebooks, README files, code and scripts:

Data-level documentation is critical for interpreting, validating and re-using your data.

Please follow these guidelines if you are archiving our publishing  your data via Research Data 番茄社区.

  • At a minimum, ensure you include variable codes, labels, descriptions and units with your data (embedded) and/or in their own data file
  • Create and maintain codebooks, data dictionaries and README.txt files as required during your project and ensure you archive them with your data at completion.
  • Code and scripts used to derive or analyse your data should also be retained and published with your data as appropriate.

You can read more about study-level vs. data-level documentation and how documentation is stored below.

Metadata standards, schemas, classifications, vocabularies and ontologies:

Metadata in the form of standards, schemas, classification codes, vocabularies or ontologies may be relevant for your research project, particularly if you are depositing your data in a  discipline specific repository.  See our Repository Lists webpage for more information.

While these terms can be confusing, they all provide a structured framework for organizing and describing data. This ensures consistency, interoperability, and enhanced discoverability across domains and systems.

You can read more about machine vs. human-readable metadata and how metadata is stored below.

Here are some examples:

The metadata standard is an extension of Dublin Core (used for general resource description) specifically designed for biodiversity data. It is used by the Global Biodiversity Information Facility (GBIF) to aggregate and disseminate biodiversity data from various sources, promoting collaboration and advancing global insights into biodiversity. Darwin Core and Dublin Core are standards that include metadata schemas.

The ANZLIC metadata guidelines are widely used to document and describe spatial data in Australia and New Zealand. This profile of AS/NZS ISO 19115:2011 Geographic Information – Metadata has been retired (since 2015) in favour of the officially endorsed metadata standard AS/NZS ISO 19115.1:2015 Metadata (including the 2018 Amendment No.1). See the for more information.

The is widely used to ensure consistency and comparability when dealing with occupational data. This is an example of a classification system that uses coding.

Controlled vocabularies make it easier for researchers to find or analyse data or to aggregate it with other data. There are literally thousands of vocabularies available in the research domain. can help you locate, access and reuse vocabularies for your research project.

is used in bioinformatics and genomics. This ontology provides a structured vocabulary and standardized annotations to enable systematic and comprehensive analysis of gene functions across different species and biological contexts.

Study-level and data-level documentation:

Project or study-level metadata is often included in Research Data Management Plans (RDMPs) and provides a high-level overview and context for the data e.g. the research project’s aims, subject descriptions (keywords, FoR codes etc.), personnel, data collection and analysis methods, information about (IP) rights, access and plans for handling sensitive data.

In Research Data 番茄社区  some of this study-level metadata will auto-fill the Data Records and Data Publications that you create from your RDMPs. While this information is important for context, these metadata records must describe the data and not just the project or a publication.

You also need to provide supporting documentation at the data-level. This ensures your data is not misinterpreted and is critical for validating, reproducing and reusing  your data.  Some examples (from the UK Data Archive) include:

  • variable codes, labels, descriptions and units
  • reasons for missing values
  • weighting and grossing variables created
  • code and scripts used to derive data after collection (simple derivations such as grouping by age levels can be explained in variable and value labels)

Storing metadata and documentation:

Research Data 番茄社区 includes many of the metadata fields you will need to comprehensively document your data.

Documentation can be stored with the data (embedded) and/or included in their own data file e.g. codebooks, README files, scripts as supporting documentation.

Embedded documentation can be as simple as a key in a MS Excel spreadsheet or more complex e.g. for software packages such as R and Python libraries that include facilities for data annotation. If possible, export these as plain text and include them with your supporting documentation.

Machine and human-readable metadata:

Including some machine-readable metadata elements improves automation, and makes it easier for tools and systems to index, search, and analyse datasets efficiently. This is particularly important in large-scale data applications and systems.

In Research Data 番茄社区, we use machine-readable Digital Object Identifiers (DOIs) to identify your Data Publications and to link them with your other outputs, and machine-readable licences. DOIs and licences allow other researchers to discover your work, attribute it properly and understand the terms under which it can be used – via Research Data 番茄社区, and other services that harvest the metadata.

Human-readable metadata and descriptions provides context and insights into the background, methodology and nuances of a dataset, and improves its interpretability.

Combining machine and human-readable metadata enhances metadata quality, the integrity and utility of your datasets, and supports FAIR data.