| access_method |
Access method used to access the File object (orig: DrsObject). |
| access_methods |
The list of access methods that can be used to fetch the data file. |
| access_url |
AccessURL object providing URL and associated HTTP headers to access the File object (orig: DrsObject). |
| accessions |
Database accession numbers for the genome assembly, if available. Should precisely identify the genome assembly and be omitted if changes have been made to the assembly after retrieval, such as removing the alternate sequences. |
| aliases |
Human-readable aliases of the genome assembly. Can be imprecise, as preciseness is enforced in the other fields. |
| analyses |
Information about computational processing and analyses that have been carried out to generate the files. |
| analysis_description |
Human-readable description of the analysis. |
| analysis_external_id |
External, globally unique identifier for the experiment. |
| analysis_id |
Internal identifier for the experiment (unique within the metadata deposit). |
| analysis_input_sources |
External or internal references to sources for the input data analyzed. Internal references should lead to FileCollection, File, Experiment, or Analysis objects. |
| analysis_label |
A human-readable description of the analysis, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| analysis_main_tool |
Main software tool used for the analysis. |
| analysis_main_tool_version |
Version of the main software tool used for the analysis. |
| analysis_protocol |
Document describing the analysis protocol that was followed. |
| analysis_study_ref |
Internal reference to the study within which the analysis has been carried out. |
| analysis_type |
The type of analysis carried out. |
| analysis_workflow |
External reference to the analysis workflow, with availability in at least one machine-operable form (e.g. CWL, Nextflow, ...). |
| antibody_target |
The target of the antibody used in the experiment. |
| assay_type |
Sequencing technique intended for this library. |
| assessment_details_url |
URL to a report containing the detailed output from the quality assessment. |
| assessment_method |
Quality assessment method that has been carried out (e.g. BUSCO, OMArk, peak calling statistics, etc.) |
| assessment_values |
Main values produced by the quality assessment. |
| biological_processes |
Biological processes illuminated by the experiment. |
| biological_replicate_labels |
Labels denoting the biological replicates within which the relation is defined, if any. |
| biospecimen_classification |
Main type of structural unit to be used for classification of the biospecimen/sample. |
| bundle_deposit |
Information about the public deposit of the bundle. |
| bundle_description |
Human-readable description of the bundle. |
| bundle_input_sources |
References to other input sources from which this entire bundle was derived, or possibly including DOIs of other bundles used as source. |
| bundle_label |
A human-readable description of the bundle, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| bundle_metadata |
Top-level metadata about the bundle of genomic annotation files. |
| bundle_ontology_versions |
Map from the version-agnostic URL to a versioned URL (e.g. "versionIRI" in owl) of each ontology used in the current metadata deposit (corresponding to deposit_versioned_id"). |
| cell_line |
Cultured cell line used in the biospecimen/sample. |
| cell_type |
Cell type of isolated normal cells in the biospecimen/sample. |
| checksum |
The hex-string encoded checksum for the data. |
| checksum_type |
The digest method used to create the checksum. The value (e.g. sha-256) SHOULD be listed as Hash Name String in the https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg [IANA Named Information Hash Algorithm Registry]. Other values MAY be used, as long as implementors are aware of the issues discussed in https://tools.ietf.org/html/rfc6920#section-9.4 [RFC6920]. GA4GH may provide more explicit guidance for use of non-IANA-registered algorithms in the future. Until then, if implementors do choose such an algorithm (e.g. because it's implemented by their storage provider), they SHOULD use an existing standard type value such as md5, etag, crc32c, trunc512, or sha1. |
| checksums |
A list of checksums of the data file. At least one checksum must be provided. For blobs, the checksum is computed over the bytes in the blob. |
| contact_id |
Globally unique identifier for a person (e.g. ORCID ID) or organisation (e.g. BioProject accession). |
| created_time |
Timestamp of content creation in RFC3339. (This is the creation time of the underlying content, not of the JSON object.). |
| data_content |
Classification describing the file's purpose or contents. |
| database_accessions |
Accession numbers for database records used as input source. Used in connection with "inputsource_external_ref". |
| date_of_retrieval |
Date of retrieval from the input source, typically used to timestamp downloading data from a database or URL. |
| deposit_first_created |
The date and time of the creation of the first deposited version of the metadata document. |
| deposit_id |
A globally unique and persistent identifier for the public deposit of the metadata document. A DOI or other persistent identifier is recommended. |
| deposit_last_changed |
The date and time of the last deposited change of the current metadata document (corresponding to "deposit_versioned_id"). |
| deposit_versioned_id |
A globally unique, persistent and versioned identifier for the public deposit of the metadata document. A versioned DOI to a deposited document is recommended. |
| deposit_versioned_ref |
Reference to versioned id of deposit containing this file collection. |
| design_description |
The high-level experiment design including layout, protocol. |
| donor_age |
Age of the donor/organism at the time of sampling |
| donor_clinical_information |
Clinical information of the donor/organism at the time of sampling. |
| donor_development_stage |
Development stage of the donor at the time of sampling. |
| donor_external_id |
External, globally unique identifier for the donor/organism. |
| donor_id |
Internal identifier for the donor/organism (unique within the metadata deposit). |
| donor_organism_ref |
Internal reference to the donor/organism from which the biospecimen/sample was taken. |
| donors |
Information about the donors or complete organisms from which the samples were taken. |
| drs_uri |
A drs:// hostname-based URI, as defined in the DRS documentation, that tells clients how to access this object. The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. For example, if you arrive at this DRS JSON by resolving a compact identifier-based DRS URI, the self_uri presents you with a hostname and properly encoded DRS ID for use in subsequent access endpoint calls. |
| edge_weight_type |
The type of values associated with the edges. |
| edges_are_directed |
Whether the edges linking sequence features are directed (at least one edge between sequence features is defined with a direction). |
| edges_denote_parents |
Whether the edges linking sequence features denote a parent-child relationship (all edges between sequence features denote parent-child relationships such as genes to exons, i.e. where the child is fully covered by the parent). |
| edges_have_weights |
Whether the edges linking sequence features are weighted (at least one edge between sequence features has an associated weight). |
| elements_circular |
Whether the sequence features have circular coordinates (at least one feature that cross a sequence border). |
| elements_overlapping |
Whether the sequence features are overlapping (at least one base pair is simultaneously covered by two sequence features). |
| email |
E-mail address of the person or organisation. |
| experiment_external_id |
External, globally unique identifier for the experiment. |
| experiment_id |
Internal identifier for the experiment (unique within the metadata deposit). |
| experiment_label |
A human-readable description of the experiment, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| experiment_samples |
External or internal references to samples used in the experiment. Internal references should refer to Sample objects. |
| experiment_study_ref |
Internal reference to the study within which the experiment has been carried out. |
| experiments |
Information about sequencing experiments that have been carried out to generate the files. |
| file_collections |
Information about collections of files contained in this dataset, each collection defined according to some selection criteria. |
| file_description |
A human readable description of the data file. |
| file_external_id |
External, globally unique identifier for the data file. |
| file_id |
Internal identifier for the data file (unique within the metadata deposit). |
| file_input_sources |
External or internal references to data sources for the file, typically a data collection or a process that has generated the file. Internal references should lead to FileCollection, File, Experiment, or Analysis objects. |
| file_label |
A human-readable description of the data file, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| file_name |
A string that can be used to name a data file. This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282 [portable filenames]. |
| file_size |
The file size in bytes. |
| file_type |
The file format of the data file. |
| file_version |
A string representing a version. (Some systems may use checksum, a RFC3339 timestamp, or an incrementing version number.). |
| filecollection_contact |
Contact point to the creator and/or maintainer of the file collection. |
| filecollection_description |
Human-readable description of the file collection. |
| filecollection_external_id |
External, globally unique identifier for the file collection (in most cases, this will not exist). |
| filecollection_id |
Internal identifier for the file collection (unique within the metadata deposit). |
| filecollection_input_sources |
References to other input sources from which this file collection was derived. |
| filecollection_label |
A human-readable description of the file collection, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| filecollection_refs |
Internal references to the FileCollection objects (within the deposit) that contains the data file, if any. |
| files |
Information about particular genome annotation (and other relevant) files. |
| genome_assembly |
Information about the genome assembly used to generate the genomic annotation file, consequently defining the genomic coordinate system for the annotation. |
| genomic_annotation_digest |
Content-derived digest for distributed identification of genomic annotation files. (This field is currently a placeholder, as an algorithm for generating such a digest is yet to be specified.). |
| has_edges |
Whether the sequence features are linked across positions (at least one edge between features exists). |
| has_gaps |
Whether there are gaps between the sequence features (there exists at least one gap between two features on the same sequence). |
| has_lengths |
Whether the sequence features have lengths (at least one feature spans more than 1 base pair). |
| has_names |
Whether the sequence features are named (at least one feature has a name). |
| has_strands |
Whether the sequence features are stranded (at least one feature has strand information). |
| has_values |
Whether the sequence features have associated values (at least one feature has an associated value). |
| headers |
An optional list of headers to include in the HTTP request to url. These headers can be used to provide auth tokens required to fetch the object bytes. |
| id |
External, globally unique identifier for the ontology term (in CURIE form). |
| inputsource_external_ref |
Reference to an external entity as the input source, using a globally unique identifier or an URL. External references will in most cases refer to a database, data record, data file, website or other data source. One of "inputsource_external_ref" or "inputsource_ref" must be specified. |
| inputsource_ref |
Reference to an internal object as the input source using a local identifier. Entities to be used as an internal input source includes FileCollection, Sample, Experiment, Analysis or File as restricted by the description of the field where the input source is used. One of "inputsource_external_ref" or "inputsource_ref" must be specified. |
| instrument |
Technology platform used to perform nucleic acid sequencing, including name and/or number associated with a specific sequencing instrument model. It is recommended to be as specific as possible for this property (e.g. if the model/revision are available, providing that instead of just the instrument maker). |
| key |
Key/name of the assessment value. |
| label |
Human-readable label associated to the term id in the current version of the ontology (as listed in the "ontology_versions" field of the Deposit object). |
| lengths_constant |
Whether the sequence lengths are constant (all sequence features have the same length, excluding features at the very end of a sequence). |
| library_layout |
Whether the library was built as paired-end, or single-end. |
| mime_type |
A string providing the mime-type of the data file. |
| molecule_type |
Specifies the type of source material that is being sequenced. |
| name |
Name of the person or organisation. |
| namespace |
The CURIE namespace (prefix) an ontology (e.g. "GO" for Gene Ontology). |
| ontology_url |
The version-agnostic URL of the ontology (e.g. the IRI of the ontology in OWL). |
| organism_tissue |
Part of organism (typically tissue or organ) from which the biospecimen/sample was taken, or cell line was derived from. |
| other_biospecimen |
Other biospecimen-related terms that can be used to further classify the biospecimen/sample. |
| phenotype |
Main phenotype (e.g. disease) connected to the biospecimen/sample. |
| project_external_ref |
Reference to a project within which the study was carried out (preferably a BioProject CURIE). |
| project_name |
Name of the project within which the study was carried out. |
| publications |
List of (relevant) publications containing the results of the study (in the form of DOI CURIEs). |
| qualified_relation |
A description of the relationship with the input source. |
| quality_assessments |
An array of QualityAssessment objects containing the main quality scores from assessment techniques applied to the data file. |
| region |
Name of the region in the cloud service provider that the object belongs to. |
| run_provenance |
Document detailing the provenance of the experiment or analysis run which produced the file as one of its outputs. The provenance info should include software versions, parameter settings, etc. |
| sample_collection_date |
Date of sample collection. |
| sample_collection_location |
Geographical location where the sample was collected. |
| sample_description |
Human-readable description of the biospecimen/sample and the sampling process. |
| sample_external_id |
External, globally unique identifier for the biospecimen/sample. |
| sample_id |
Internal identifier for the biospecimen/sample (unique within the metadata deposit). |
| sample_label |
A human-readable description of the sample, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. |
| samples |
Information about the biospecimens/samples used as raw material for lab experiments. |
| sampling_protocol |
Protocol detailing the collection and treatment of the biospecimen/sample. |
| seqcol_digest |
Top-level sequence collection digest according to the GA4GH refget, Sequence Collections standard (v1.0). This a globally unique identifier for the genome assembly, algorithmically derivable from the genome assembly content. Usage is to uniquely identify the exact genome assembly used and allow detailed comparisons across genome assembly variants (say, variants of the GRCh38 assembly). |
| seqcol_ordered_coord_system |
Content-derived digest that uniquely identifies the ordered coordinate system of the genome assembly. (Coordinate systems with the same sequence names and lengths, but where the sequences are ordered differently, will have different ordered digests.). Usage is the ordered coordinate system digest can be used to uniquely generate a chromSizes file, useful in a number of analysis tools. Definition is the ordered coordinate system digest is defined as the level 1 digest of the name_length_pairs attribute of the sequence collection generated from the genome assembly. |
| seqcol_unordered_coord_system |
Content-derived digest that uniquely identifies the order-invariant coordinate system of the genome assembly. This digest will be shared across all coordinate systems with the same sequence names and lenghts, regardless of the order of the sequences. Usage is the order-invariant coordinate system digest can be used to uniquely describe the coordinate system of a particular genome browser instance and the annotation files that are compatible with it. Definition is the order-invariant coordinate system digest is defined as the level 1 digest of the sorted_name_length_pairs attribute of the sequence collection generated from the genome assembly. |
| sequence_features |
List of sequence features described by the genomic annotation file. |
| sequencing_protocol |
Set of rules which guides how the sequencing protocol was followed. Change-tracking services such as Protocol.io or GitHub are encouraged instead of dumping free text in this field. |
| sex |
Biological sex of the donor/organism. |
| species_taxon |
Taxonomical classification of the species of the donor/organism. |
| studies |
The scientific studies, i.e. units of research, within which experiments and/or analyses have been carried out. |
| study_abstract |
Abstract of the study. |
| study_contact |
Contact point for the study. |
| study_external_id |
External, globally unique identifier for the study (preferably a BioStudies CURIE). |
| study_id |
Internal identifier for the study (unique within the metadata deposit). Namespace: "study". |
| study_title |
Title of the study. |
| technical_replicate_labels |
Labels denoting the technical replicates within which the relation is defined, if any. |
| track_geometry |
Geometric properties of the sequence features in the genomic annotation file if considered as an one-dimensional genome browser track (also relevant for non-visual analyses). |
| updated_time |
Timestamp of content update in RFC3339, identical to created_time in systems that do not support updates. (This is the update time of the underlying content, not of the JSON object.). |
| url |
A fully resolvable URL that can be used to fetch the actual object bytes. |
| value |
Value corresponding to the assessment key. |
| value_type |
The type of values associated with the sequence features, if any. |
| version |
Version information for the retrieval from the input source. |
| versioned_ontology_url |
The versioned URL of the ontology (e.g. the "versionIRI" in OWL). |