FAIR track metadata - FAIRification of Genomic Tracks started at 2018 as an ELIXIR Implementation Study focused on “FAIRifying” the metadata related to genomic annotation track files contained in track hubs. To achieve this, we have developed a common data model and technical solutions compatible with the existing TrackHub exchange format for genome browser tracks and implemented demonstrators to show the feasibility of this proposal across systems and programming languages.
FAIRtracks is a JSON Schema defining a draft standard for minimal genomic track metadata. FAIRtracks is supported by the TrackHub registry, the TrackFind metadata search and curation engine and the downstream track analysis tools GSuite HyperBrowser and EPICO.
GitHub repository of the FAIRtracks draft standard:https://github.com/fairtracks/fairtracks_standard
The FAIRtracks validator is a JSON Schema validator, that is able to check additional constraints specific to the FAIRtracks draft standard. Such extra constraints have been declared using reference extensions on the JSON Schema vocabulary.
The FAIRtracks validator is hosted online as a REST service:http://fairtracks.bsc.es/api/
The FAIRtracks validator can also be installed locally from this GitHub repository:https://github.com/fairtracks/fairtracks_validator
Screencasts showing how to use the FAIRtracks validator:
TrackFind is a track search engine and metadata FAIRification service. TrackFind supports crawling of the Track Hub Registry and other data portals to fetch track metadata. Crawled metadata can be accessed through hierarchical browsing or by search queries, both through a web-based user interface, and as a REST API. TrackFind supports advanced SQL-based search queries that can be easily built in the user interface, and the search results can be browsed and exported in JSON or GSuite format. The RESTful API allows downstream tools and scripts to easily integrate TrackFind search, as demonstrated by the GSuite HyperBrowser and EPICO.
TrackFind is available with a web-based user interface from here:https://trackfind.elixir.no/
TrackFind is also available as a REST API, as documented here:https://app.swaggerhub.com/apis-docs/FAIRtracks/TrackFind/1.0.0
GitHub repository for TrackFind:https://github.com/elixir-no-nels/trackfind
The Track Hub Registry services, maintained by EMBL-EBI, allows independent researchers to distribute their track hubs. Each track hub is a set of text files with links to data files, display configuration for each file, but also some metadata, which is used by the browsers to dynamically create selection menus. We extended the Track Hub Registry with support for distributing FAIRtracks-formatted metadata alongside the existing Track Hub metadata content, with the future goal of better integrating these. Also, the REST endpoints were improved to better support metadata queries by outside services, e.g., TrackFind.
URL to The Track Hub Registry:https://www.trackhubregistry.org/
GitHub repository for the Track Hub Registry:https://github.com/Ensembl/trackhub-registry
The GSuite HyperBrowser is a general purpose web-based platform for rigorous statistical analysis of track data, built upon the Galaxy framework. The HyperBrowser already has support for a track search mechanism (limited prototype), making use of the GSuite format to move collections of track data (typically resulting from a track search operation) through both basic and advanced data manipulation and analysis steps. A TrackFind client has been implemented to replace the existing prototype, and proved to work with BLUEPRINT data.
A test version of GSuite HyperBrowser containing the TrackFind client tool is available from the following URL:https://hyperbrowser.uio.no/trackfind_test/
GitHub repository for GSuite HyperBrowser:https://github.com/hyperbrowser/genomic-hyperbrowser
EPICO is an open-access reference set of tools, libraries and APIs to develop comparative epigenomic data portals, as well as a data and metadata validator and database loader. EPICO components work with a customizable, rich data model where ontology term checks can be introduced for specific fields, as a generalization to enumerated values. EPICO has been used to implement:The BLUEPRINT Data Analysis Portal
The BLUEPRINT Data Analysis Portal provides a virtual desktop for the comparative analysis of epigenetic features, recorded features (genes, transcripts, etc.) and pathways in the context of differentiation of hematopoietic lineages. The EPICO system is modified to upload the FAIR metadata from TrackFind.
GitHub repositories with changes on the EPICO system: