RDA FairTracks schema interoperability

30 November 2024

Oslo | RDA-EOSC, in collaboration with the FAIRification of Genomic Annotations Working Group

Omnipy and whyqd (/wɪkɪd/) are independently-developed Python libraries offering general functionality for auditable and executable metadata mappings. In this project, we will integrate Omnipy and Whyqd to develop executable mappings that transform existing metadata from biodiversity projects, such as ERGA, to conform to the FGA-WG metadata model, kickstarting the process of FAIRifying genome annotation GFF3 files.

RDA FairTracks schema interoperability

The FAIRification of Genomic Annotations Working Group (WG) will focus on the challenges of harmonising metadata and software solutions to improve the discovery and reuse of publicly available “genomic annotation” data. Gavin Chait of Whythawk and Sveinung Gundersen of the WG wanted to begin a process of integrating their two data wrangling software projects, whyqd and Omnipy respectively.

We developed a research proposal for submission to the annual BioHackathon Europe and were selected as participants for the November 2024 event.

Our solution

BioHackathon Europe is an annual event that brings together life scientists from around the world. It is organised by ELIXIR Europe, and offers an intense week of hacking, with over 160 participants working on diverse and exciting projects. The goal is to create code that addresses challenges in bioinformatics research.

Our team consisted of genomics researchers from around the world, including teams onsite in Barcelona and remotely in the UK and Australia. During the week-long event, we worked collaboratively to develop methods that would form part of a tutorial, and additional policy and technical outputs for the WG.

These included:

  • Assess research workflows and systems to decide on appropriate strategies for mapping from complex source data to a defined hierarchical destination schema,
  • Develop techniques for defining minimal metadata to support genome annotations as FAIR objects,
  • Derive a convenience schema from the hierarchical FAIRtracks model and use this as a model for adapting to other formally-defined schemas,
  • Develop interoperable executable mappings from a bioinformatics case-study to the convenience schema.

Outcomes

We succesfully developed methods for creating convenience schemas, including creating recommendations for refinements to the FAIRtracks model. A tutorial and general guidelines were delivered and are now formally part of whyqd’s documentation, and the WG’s deliverables.

Related projects

RDA MOMSI enhancements to multi-omics metadata standards dashboard
24 December 2025

Multi-Omics Metadata Standards Integration (MOMSI) Research Data Alliance Working Group wanted to enhance the dashboard interactive visualisations for their query-based, interactive dashboard. The dashboard will render information from their existing Landscape Review.

La Marine Nationale Française CKAN Upgrade & Deployment
17 October 2025

Marine Nationale has an existing CKAN data management portal deployed on their internally accessible network. This is a secure environment, and upgrades and extensions to the software are performed by Marine Nationale directly.

UCL Data-As-A-Science Graduate Bootcamp
11 July 2025

Data has become the most important language of our era, informing everything from intelligence in automated machines, to predictive analytics in medical diagnostics. The plunging cost and easy accessibility of the raw requirements for such systems – data, software, distributed computing, and sensors – are driving the adoption and growth of data-driven decision-making.

essential