Class: GenomicAnnotationFile¶
Information about a genomic annotation / track file. GenomicAnnotationFile is a specification of the File entity and inherits all the fields defined in File, in addition to the fields that are specific to GenomicAnnotationFile, as detailed here.
URI: https://w3id.org/fga-wg/schema/bundle/GenomicAnnotationFile
classDiagram
class GenomicAnnotationFile
click GenomicAnnotationFile href "../GenomicAnnotationFile/"
File <|-- GenomicAnnotationFile
click File href "../File/"
GenomicAnnotationFile : access_methods
GenomicAnnotationFile --> "1..*" AccessMethod : access_methods
click AccessMethod href "../AccessMethod/"
GenomicAnnotationFile : checksums
GenomicAnnotationFile --> "1..*" Checksum : checksums
click Checksum href "../Checksum/"
GenomicAnnotationFile : created_time
GenomicAnnotationFile : data_content
GenomicAnnotationFile --> "1" OutputType : data_content
click OutputType href "../OutputType/"
GenomicAnnotationFile : drs_uri
GenomicAnnotationFile : file_description
GenomicAnnotationFile : file_external_id
GenomicAnnotationFile : file_id
GenomicAnnotationFile : file_input_sources
GenomicAnnotationFile --> "1..*" InputSource : file_input_sources
click InputSource href "../InputSource/"
GenomicAnnotationFile : file_label
GenomicAnnotationFile : file_name
GenomicAnnotationFile : file_size
GenomicAnnotationFile : file_type
GenomicAnnotationFile --> "1" Term : file_type
click Term href "../Term/"
GenomicAnnotationFile : file_version
GenomicAnnotationFile : filecollection_refs
GenomicAnnotationFile : genome_assembly
GenomicAnnotationFile --> "1" GenomeAssembly : genome_assembly
click GenomeAssembly href "../GenomeAssembly/"
GenomicAnnotationFile : genomic_annotation_digest
GenomicAnnotationFile : mime_type
GenomicAnnotationFile : quality_assessments
GenomicAnnotationFile --> "*" QualityAssessment : quality_assessments
click QualityAssessment href "../QualityAssessment/"
GenomicAnnotationFile : run_provenance
GenomicAnnotationFile : sequence_features
GenomicAnnotationFile --> "1..*" Term : sequence_features
click Term href "../Term/"
GenomicAnnotationFile : track_geometry
GenomicAnnotationFile --> "1" TrackGeometry : track_geometry
click TrackGeometry href "../TrackGeometry/"
GenomicAnnotationFile : updated_time
Example¶
Example JSON
{
"access_methods": [
{
"access_method": "https",
"access_url": {
"url": "https://epigenomesportal.ca/tracks/ENCODE/hg38/87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed"
}
},
{
"access_method": "https",
"access_url": {
"url": "https://www.encodeproject.org/files/ENCFF323LCS/@@download/ENCFF323LCS.bigBed"
}
},
{
"access_method": "s3",
"access_url": {
"url": "s3://encode-public/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed"
}
},
{
"access_method": "https",
"access_url": {
"url": "https://encode-public.s3.amazonaws.com/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed"
}
},
{
"access_method": "https",
"access_url": {
"url": "https://datasetencode.blob.core.windows.net/dataset/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D"
}
}
],
"checksums": [
{
"checksum": "535bc9628a1c5e5215226f9996e4eaca",
"checksum_type": "md5"
}
],
"created_time": "2016-11-13T17:42:04.385801+00:00",
"data_content": "replicated peaks",
"drs_uri": "drs://drs.example.org/ENCFF323LCS",
"file_description": "H3K9me3 ChIP-seq replicated peaks on human (hg38) AG04450 (Fibroblast derived cell line).",
"file_external_id": "encode:ENCFF323LCS",
"file_id": "file:ENCFF323LCS",
"file_input_sources": [
{
"biological_replicate_labels": [
"1",
"2"
],
"inputsource_ref": "analysis:ENCAN718KHT",
"qualified_relation": "prov:wasGeneratedBy",
"technical_replicate_labels": [
"1_1",
"2_1"
]
}
],
"file_label": "H3K9me3 ChIP-seq replicated peaks, GRCh38, AG04450",
"file_name": "87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed",
"file_size": 5359719,
"file_type": {
"id": "edam:format_3004",
"label": "bigBed"
},
"file_version": "efd4e74e-7875-4d13-9630-0085bc834f18",
"filecollection_refs": [
"collection:ihec_encode"
],
"genome_assembly": "ga4gh:SC.EiFob05aCWgVU_B_Ae0cypnQut3cxUP1",
"mime_type": "application/octet-stream",
"quality_assessments": [
{
"assessment_details_url": "https://www.encodeproject.org/histone-chipseq-quality-metrics/70ae08dc-3edc-437f-a0a5-378c72e6269b/",
"assessment_method": "histone-chipseq-quality-metrics",
"assessment_values": {
"frip": 0.2931669095906483,
"nreads": 21018235,
"nreads_in_peaks": 6161851
}
}
],
"run_provenance": "encode:ENCAN718KHT",
"sequence_features": [
{
"id": "SO:0001707",
"label": "H3K9Me3"
}
],
"track_geometry": {
"elements_circular": false,
"elements_overlapping": false,
"has_edges": false,
"has_gaps": true,
"has_lengths": true,
"has_names": true,
"has_strands": false,
"has_values": true,
"lengths_constant": false,
"value_type": "multiple"
},
"updated_time": "2016-11-13T17:42:04.385801+00:00"
}
Inheritance¶
- File
- GenomicAnnotationFile
Slots¶
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| genomic_annotation_digest | 0..1 Curie |
Content-derived digest for distributed identification of genomic annotation files. (This field is currently a placeholder, as an algorithm for generating such a digest is yet to be specified.). | direct |
| genome_assembly | 1 GenomeAssembly |
Information about the genome assembly used to generate the genomic annotation file, consequently defining the genomic coordinate system for the annotation. | direct |
| track_geometry | 1 TrackGeometry |
Geometric properties of the sequence features in the genomic annotation file if considered as an one-dimensional genome browser track (also relevant for non-visual analyses). | direct |
| sequence_features | 1..* Term |
List of sequence features described by the genomic annotation file. | direct |
| file_external_id | 0..1 Curie |
External, globally unique identifier for the data file. | File |
| file_id | 1 Curie |
Internal identifier for the data file (unique within the metadata deposit). | File |
| file_name | 0..1 String |
A string that can be used to name a data file. This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282 [portable filenames]. | File |
| file_label | 1 String |
A human-readable description of the data file, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. | File |
| file_description | 0..1 String |
A human readable description of the data file. | File |
| filecollection_refs | 1..* Curie |
Internal references to the FileCollection objects (within the deposit) that contains the data file, if any. | File |
| file_input_sources | 1..* InputSource |
External or internal references to data sources for the file, typically a data collection or a process that has generated the file. Internal references should lead to FileCollection, File, Experiment, or Analysis objects. | File |
| drs_uri | 0..1 Uri |
A drs:// hostname-based URI, as defined in the DRS documentation, that tells clients how to access this object. The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. For example, if you arrive at this DRS JSON by resolving a compact identifier-based DRS URI, the self_uri presents you with a hostname and properly encoded DRS ID for use in subsequent access endpoint calls. | File |
| access_methods | 1..* AccessMethod |
The list of access methods that can be used to fetch the data file. | File |
| run_provenance | 0..1 Uriorcurie |
Document detailing the provenance of the experiment or analysis run which produced the file as one of its outputs. The provenance info should include software versions, parameter settings, etc. | File |
| quality_assessments | * QualityAssessment |
An array of QualityAssessment objects containing the main quality scores from assessment techniques applied to the data file. | File |
| file_type | 1 Term |
The file format of the data file. | File |
| mime_type | 0..1 String |
A string providing the mime-type of the data file. | File |
| data_content | 1 OutputType |
Classification describing the file's purpose or contents. | File |
| file_size | 1 Integer |
The file size in bytes. | File |
| created_time | 1 Datetime |
Timestamp of content creation in RFC3339. (This is the creation time of the underlying content, not of the JSON object.). | File |
| updated_time | 0..1 Datetime |
Timestamp of content update in RFC3339, identical to created_time in systems that do not support updates. (This is the update time of the underlying content, not of the JSON object.). | File |
| file_version | 0..1 String |
A string representing a version. (Some systems may use checksum, a RFC3339 timestamp, or an incrementing version number.). | File |
| checksums | 1..* Checksum |
A list of checksums of the data file. At least one checksum must be provided. For blobs, the checksum is computed over the bytes in the blob. | File |
Identifier and Mapping Information¶
Schema Source¶
- from schema: https://w3id.org/fga-wg/schema/bundle
Mappings¶
| Mapping Type | Mapped Value |
|---|---|
| self | https://w3id.org/fga-wg/schema/bundle/GenomicAnnotationFile |
| native | https://w3id.org/fga-wg/schema/bundle/GenomicAnnotationFile |
LinkML Source¶
Direct¶
name: GenomicAnnotationFile
description: Information about a genomic annotation / track file. GenomicAnnotationFile
is a specification of the File entity and inherits all the fields defined in File,
in addition to the fields that are specific to GenomicAnnotationFile, as detailed
here.
from_schema: https://w3id.org/fga-wg/schema/bundle
is_a: File
slots:
- genomic_annotation_digest
- genome_assembly
- track_geometry
- sequence_features
Induced¶
name: GenomicAnnotationFile
description: Information about a genomic annotation / track file. GenomicAnnotationFile
is a specification of the File entity and inherits all the fields defined in File,
in addition to the fields that are specific to GenomicAnnotationFile, as detailed
here.
from_schema: https://w3id.org/fga-wg/schema/bundle
is_a: File
attributes:
genomic_annotation_digest:
name: genomic_annotation_digest
description: Content-derived digest for distributed identification of genomic
annotation files. (This field is currently a placeholder, as an algorithm for
generating such a digest is yet to be specified.).
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- GenomicAnnotationFile
range: curie
genome_assembly:
name: genome_assembly
description: Information about the genome assembly used to generate the genomic
annotation file, consequently defining the genomic coordinate system for the
annotation.
examples:
- value: ga4gh:SC.EiFob05aCWgVU_B_Ae0cypnQut3cxUP1
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- GenomicAnnotationFile
range: GenomeAssembly
required: true
track_geometry:
name: track_geometry
description: Geometric properties of the sequence features in the genomic annotation
file if considered as an one-dimensional genome browser track (also relevant
for non-visual analyses).
examples:
- object:
elements_circular: false
elements_overlapping: false
has_edges: false
has_gaps: true
has_lengths: true
has_names: true
has_strands: false
has_values: true
lengths_constant: false
value_type: multiple
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- GenomicAnnotationFile
range: TrackGeometry
required: true
sequence_features:
name: sequence_features
description: List of sequence features described by the genomic annotation file.
examples:
- object:
id: SO:0001707
label: H3K9Me3
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- GenomicAnnotationFile
range: Term
required: true
multivalued: true
file_external_id:
name: file_external_id
description: External, globally unique identifier for the data file.
examples:
- value: encode:ENCFF323LCS
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: curie
file_id:
name: file_id
description: 'Internal identifier for the data file (unique within the metadata
deposit). '
examples:
- value: file:ENCFF323LCS
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
identifier: true
owner: GenomicAnnotationFile
domain_of:
- File
range: curie
required: true
file_name:
name: file_name
description: A string that can be used to name a data file. This string is made
up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore
[A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282
[portable filenames].
examples:
- value: 87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: string
file_label:
name: file_label
description: A human-readable description of the data file, short enough to be
used for listings within software user interfaces, tables, illustration legends,
etc.
examples:
- value: H3K9me3 ChIP-seq replicated peaks, GRCh38, AG04450
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: string
required: true
pattern: ^.{1,60}$
file_description:
name: file_description
description: A human readable description of the data file.
examples:
- value: H3K9me3 ChIP-seq replicated peaks on human (hg38) AG04450 (Fibroblast
derived cell line).
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: string
filecollection_refs:
name: filecollection_refs
description: Internal references to the FileCollection objects (within the deposit)
that contains the data file, if any.
examples:
- value: collection:ihec_encode
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: curie
required: true
multivalued: true
file_input_sources:
name: file_input_sources
description: External or internal references to data sources for the file, typically
a data collection or a process that has generated the file. Internal references
should lead to FileCollection, File, Experiment, or Analysis objects.
examples:
- object:
inputsource_ref: analysis:ENCAN718KHT
qualified_relation: prov:wasGeneratedBy
biological_replicate_labels:
- '1'
- '2'
technical_replicate_labels:
- '1_1'
- '2_1'
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: InputSource
required: true
multivalued: true
drs_uri:
name: drs_uri
description: A drs:// hostname-based URI, as defined in the DRS documentation,
that tells clients how to access this object. The intent of this field is to
make DRS objects self-contained, and therefore easier for clients to store and
pass around. For example, if you arrive at this DRS JSON by resolving a compact
identifier-based DRS URI, the self_uri presents you with a hostname and properly
encoded DRS ID for use in subsequent access endpoint calls.
examples:
- value: drs://drs.example.org/ENCFF323LCS
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: uri
access_methods:
name: access_methods
description: 'The list of access methods that can be used to fetch the data file. '
examples:
- object:
access_method: https
access_url:
url: https://epigenomesportal.ca/tracks/ENCODE/hg38/87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed
- object:
access_method: https
access_url:
url: https://www.encodeproject.org/files/ENCFF323LCS/@@download/ENCFF323LCS.bigBed
- object:
access_method: s3
access_url:
url: s3://encode-public/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed
- object:
access_method: https
access_url:
url: https://encode-public.s3.amazonaws.com/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed
- object:
access_method: https
access_url:
url: https://datasetencode.blob.core.windows.net/dataset/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: AccessMethod
required: true
multivalued: true
run_provenance:
name: run_provenance
description: Document detailing the provenance of the experiment or analysis run
which produced the file as one of its outputs. The provenance info should include
software versions, parameter settings, etc.
examples:
- value: encode:ENCAN718KHT
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: uriorcurie
quality_assessments:
name: quality_assessments
description: An array of QualityAssessment objects containing the main quality
scores from assessment techniques applied to the data file.
examples:
- object:
assessment_method: histone-chipseq-quality-metrics
assessment_values:
nreads: 21018235
nreads_in_peaks: 6161851
frip: 0.2931669095906483
assessment_details_url: https://www.encodeproject.org/histone-chipseq-quality-metrics/70ae08dc-3edc-437f-a0a5-378c72e6269b/
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: QualityAssessment
multivalued: true
file_type:
name: file_type
description: The file format of the data file.
examples:
- object:
id: edam:format_3004
label: bigBed
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: Term
required: true
mime_type:
name: mime_type
description: A string providing the mime-type of the data file.
examples:
- value: application/octet-stream
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: string
data_content:
name: data_content
description: Classification describing the file's purpose or contents.
examples:
- value: replicated peaks
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: OutputType
required: true
file_size:
name: file_size
description: The file size in bytes.
examples:
- value: '5359719'
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: integer
required: true
created_time:
name: created_time
description: Timestamp of content creation in RFC3339. (This is the creation time
of the underlying content, not of the JSON object.).
examples:
- value: '2016-11-13T17:42:04.385801+00:00'
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: datetime
required: true
updated_time:
name: updated_time
description: Timestamp of content update in RFC3339, identical to created_time
in systems that do not support updates. (This is the update time of the underlying
content, not of the JSON object.).
examples:
- value: '2016-11-13T17:42:04.385801+00:00'
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: datetime
file_version:
name: file_version
description: A string representing a version. (Some systems may use checksum,
a RFC3339 timestamp, or an incrementing version number.).
examples:
- value: efd4e74e-7875-4d13-9630-0085bc834f18
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: string
checksums:
name: checksums
description: A list of checksums of the data file. At least one checksum must
be provided. For blobs, the checksum is computed over the bytes in the blob.
examples:
- object:
checksum: 535bc9628a1c5e5215226f9996e4eaca
checksum_type: md5
from_schema: https://w3id.org/fga-wg/schema/bundle
rank: 1000
owner: GenomicAnnotationFile
domain_of:
- File
range: Checksum
required: true
multivalued: true