Skip to content

Class: File¶

General information about a particular data file. Most fields (marked with an asterix*) are copied from the GA4GH DRS DrsObject model (https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/DrsObjectModel), which is the top-level object returned from a DRS server in response to a successful lookup call (i.e. https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/Objects).

URI: https://w3id.org/fga-wg/schema/bundle/File

 classDiagram
    class File
    click File href "../File/"
      File <|-- GenomicAnnotationFile
        click GenomicAnnotationFile href "../GenomicAnnotationFile/"

      File : access_methods





        File --> "1..*" AccessMethod : access_methods
        click AccessMethod href "../AccessMethod/"



      File : checksums





        File --> "1..*" Checksum : checksums
        click Checksum href "../Checksum/"



      File : created_time

      File : data_content





        File --> "1" OutputType : data_content
        click OutputType href "../OutputType/"



      File : drs_uri

      File : file_description

      File : file_external_id

      File : file_id

      File : file_input_sources





        File --> "1..*" InputSource : file_input_sources
        click InputSource href "../InputSource/"



      File : file_label

      File : file_name

      File : file_size

      File : file_type





        File --> "1" Term : file_type
        click Term href "../Term/"



      File : file_version

      File : filecollection_refs

      File : mime_type

      File : quality_assessments





        File --> "*" QualityAssessment : quality_assessments
        click QualityAssessment href "../QualityAssessment/"



      File : run_provenance

      File : updated_time

Example¶

Example JSON
{
  "access_methods": [
    {
      "access_method": "https",
      "access_url": {
        "url": "https://epigenomesportal.ca/tracks/ENCODE/hg38/87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed"
      }
    },
    {
      "access_method": "https",
      "access_url": {
        "url": "https://www.encodeproject.org/files/ENCFF323LCS/@@download/ENCFF323LCS.bigBed"
      }
    },
    {
      "access_method": "s3",
      "access_url": {
        "url": "s3://encode-public/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed"
      }
    },
    {
      "access_method": "https",
      "access_url": {
        "url": "https://encode-public.s3.amazonaws.com/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed"
      }
    },
    {
      "access_method": "https",
      "access_url": {
        "url": "https://datasetencode.blob.core.windows.net/dataset/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D"
      }
    }
  ],
  "checksums": [
    {
      "checksum": "535bc9628a1c5e5215226f9996e4eaca",
      "checksum_type": "md5"
    }
  ],
  "created_time": "2016-11-13T17:42:04.385801+00:00",
  "data_content": "replicated peaks",
  "drs_uri": "drs://drs.example.org/ENCFF323LCS",
  "file_description": "H3K9me3 ChIP-seq replicated peaks on human (hg38) AG04450 (Fibroblast derived cell line).",
  "file_external_id": "encode:ENCFF323LCS",
  "file_id": "file:ENCFF323LCS",
  "file_input_sources": [
    {
      "biological_replicate_labels": [
        "1",
        "2"
      ],
      "inputsource_ref": "analysis:ENCAN718KHT",
      "qualified_relation": "prov:wasGeneratedBy",
      "technical_replicate_labels": [
        "1_1",
        "2_1"
      ]
    }
  ],
  "file_label": "H3K9me3 ChIP-seq replicated peaks, GRCh38, AG04450",
  "file_name": "87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed",
  "file_size": 5359719,
  "file_type": {
    "id": "edam:format_3004",
    "label": "bigBed"
  },
  "file_version": "efd4e74e-7875-4d13-9630-0085bc834f18",
  "filecollection_refs": [
    "collection:ihec_encode"
  ],
  "mime_type": "application/octet-stream",
  "quality_assessments": [
    {
      "assessment_details_url": "https://www.encodeproject.org/histone-chipseq-quality-metrics/70ae08dc-3edc-437f-a0a5-378c72e6269b/",
      "assessment_method": "histone-chipseq-quality-metrics",
      "assessment_values": {
        "frip": 0.2931669095906483,
        "nreads": 21018235,
        "nreads_in_peaks": 6161851
      }
    }
  ],
  "run_provenance": "encode:ENCAN718KHT",
  "updated_time": "2016-11-13T17:42:04.385801+00:00"
}

Inheritance¶

Slots¶

Name Cardinality and Range Description Inheritance
file_external_id 0..1
Curie
External, globally unique identifier for the data file. direct
file_id 1
Curie
Internal identifier for the data file (unique within the metadata deposit). direct
file_name 0..1
String
A string that can be used to name a data file. This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282 [portable filenames]. direct
file_label 1
String
A human-readable description of the data file, short enough to be used for listings within software user interfaces, tables, illustration legends, etc. direct
file_description 0..1
String
A human readable description of the data file. direct
filecollection_refs 1..*
Curie
Internal references to the FileCollection objects (within the deposit) that contains the data file, if any. direct
file_input_sources 1..*
InputSource
External or internal references to data sources for the file, typically a data collection or a process that has generated the file. Internal references should lead to FileCollection, File, Experiment, or Analysis objects. direct
drs_uri 0..1
Uri
A drs:// hostname-based URI, as defined in the DRS documentation, that tells clients how to access this object. The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. For example, if you arrive at this DRS JSON by resolving a compact identifier-based DRS URI, the self_uri presents you with a hostname and properly encoded DRS ID for use in subsequent access endpoint calls. direct
access_methods 1..*
AccessMethod
The list of access methods that can be used to fetch the data file. direct
run_provenance 0..1
Uriorcurie
Document detailing the provenance of the experiment or analysis run which produced the file as one of its outputs. The provenance info should include software versions, parameter settings, etc. direct
quality_assessments *
QualityAssessment
An array of QualityAssessment objects containing the main quality scores from assessment techniques applied to the data file. direct
file_type 1
Term
The file format of the data file. direct
mime_type 0..1
String
A string providing the mime-type of the data file. direct
data_content 1
OutputType
Classification describing the file's purpose or contents. direct
file_size 1
Integer
The file size in bytes. direct
created_time 1
Datetime
Timestamp of content creation in RFC3339. (This is the creation time of the underlying content, not of the JSON object.). direct
updated_time 0..1
Datetime
Timestamp of content update in RFC3339, identical to created_time in systems that do not support updates. (This is the update time of the underlying content, not of the JSON object.). direct
file_version 0..1
String
A string representing a version. (Some systems may use checksum, a RFC3339 timestamp, or an incrementing version number.). direct
checksums 1..*
Checksum
A list of checksums of the data file. At least one checksum must be provided. For blobs, the checksum is computed over the bytes in the blob. direct

Usages¶

used by used in type used
Bundle files range File

Identifier and Mapping Information¶

Schema Source¶

  • from schema: https://w3id.org/fga-wg/schema/bundle

Mappings¶

Mapping Type Mapped Value
self https://w3id.org/fga-wg/schema/bundle/File
native https://w3id.org/fga-wg/schema/bundle/File

LinkML Source¶

Direct¶

name: File
description: General information about a particular data file. Most fields (marked
  with an asterix*) are copied from the GA4GH DRS DrsObject model (https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/DrsObjectModel),
  which is the top-level object returned from a DRS server in response to a successful
  lookup call (i.e. https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/Objects).
from_schema: https://w3id.org/fga-wg/schema/bundle
slots:
- file_external_id
- file_id
- file_name
- file_label
- file_description
- filecollection_refs
- file_input_sources
- drs_uri
- access_methods
- run_provenance
- quality_assessments
- file_type
- mime_type
- data_content
- file_size
- created_time
- updated_time
- file_version
- checksums

Induced¶

name: File
description: General information about a particular data file. Most fields (marked
  with an asterix*) are copied from the GA4GH DRS DrsObject model (https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/DrsObjectModel),
  which is the top-level object returned from a DRS server in response to a successful
  lookup call (i.e. https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.4.0/docs/#tag/Objects).
from_schema: https://w3id.org/fga-wg/schema/bundle
attributes:
  file_external_id:
    name: file_external_id
    description: External, globally unique identifier for the data file.
    examples:
    - value: encode:ENCFF323LCS
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: curie
  file_id:
    name: file_id
    description: 'Internal identifier for the data file (unique within the metadata
      deposit). '
    examples:
    - value: file:ENCFF323LCS
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    identifier: true
    owner: File
    domain_of:
    - File
    range: curie
    required: true
  file_name:
    name: file_name
    description: A string that can be used to name a data file. This string is made
      up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore
      [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282
      [portable filenames].
    examples:
    - value: 87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: string
  file_label:
    name: file_label
    description: A human-readable description of the data file, short enough to be
      used for listings within software user interfaces, tables, illustration legends,
      etc.
    examples:
    - value: H3K9me3 ChIP-seq replicated peaks, GRCh38, AG04450
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: string
    required: true
    pattern: ^.{1,60}$
  file_description:
    name: file_description
    description: A human readable description of the data file.
    examples:
    - value: H3K9me3 ChIP-seq replicated peaks on human (hg38) AG04450 (Fibroblast
        derived cell line).
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: string
  filecollection_refs:
    name: filecollection_refs
    description: Internal references to the FileCollection objects (within the deposit)
      that contains the data file, if any.
    examples:
    - value: collection:ihec_encode
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: curie
    required: true
    multivalued: true
  file_input_sources:
    name: file_input_sources
    description: External or internal references to data sources for the file, typically
      a data collection or a process that has generated the file. Internal references
      should lead to FileCollection, File, Experiment, or Analysis objects.
    examples:
    - object:
        inputsource_ref: analysis:ENCAN718KHT
        qualified_relation: prov:wasGeneratedBy
        biological_replicate_labels:
        - '1'
        - '2'
        technical_replicate_labels:
        - '1_1'
        - '2_1'
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: InputSource
    required: true
    multivalued: true
  drs_uri:
    name: drs_uri
    description: A drs:// hostname-based URI, as defined in the DRS documentation,
      that tells clients how to access this object. The intent of this field is to
      make DRS objects self-contained, and therefore easier for clients to store and
      pass around. For example, if you arrive at this DRS JSON by resolving a compact
      identifier-based DRS URI, the self_uri presents you with a hostname and properly
      encoded DRS ID for use in subsequent access endpoint calls.
    examples:
    - value: drs://drs.example.org/ENCFF323LCS
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: uri
  access_methods:
    name: access_methods
    description: 'The list of access methods that can be used to fetch the data file. '
    examples:
    - object:
        access_method: https
        access_url:
          url: https://epigenomesportal.ca/tracks/ENCODE/hg38/87234.ENCODE.ENCBS004ENC.H3K9me3.peak_calls.bigBed
    - object:
        access_method: https
        access_url:
          url: https://www.encodeproject.org/files/ENCFF323LCS/@@download/ENCFF323LCS.bigBed
    - object:
        access_method: s3
        access_url:
          url: s3://encode-public/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed
    - object:
        access_method: https
        access_url:
          url: https://encode-public.s3.amazonaws.com/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed
    - object:
        access_method: https
        access_url:
          url: https://datasetencode.blob.core.windows.net/dataset/2016/11/13/efd4e74e-7875-4d13-9630-0085bc834f18/ENCFF323LCS.bigBed?sv=2019-10-10&si=prod&sr=c&sig=9qSQZo4ggrCNpybBExU8SypuUZV33igI11xw0P7rB3c%3D
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: AccessMethod
    required: true
    multivalued: true
  run_provenance:
    name: run_provenance
    description: Document detailing the provenance of the experiment or analysis run
      which produced the file as one of its outputs. The provenance info should include
      software versions, parameter settings, etc.
    examples:
    - value: encode:ENCAN718KHT
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: uriorcurie
  quality_assessments:
    name: quality_assessments
    description: An array of QualityAssessment objects containing the main quality
      scores from assessment techniques applied to the data file.
    examples:
    - object:
        assessment_method: histone-chipseq-quality-metrics
        assessment_values:
          nreads: 21018235
          nreads_in_peaks: 6161851
          frip: 0.2931669095906483
        assessment_details_url: https://www.encodeproject.org/histone-chipseq-quality-metrics/70ae08dc-3edc-437f-a0a5-378c72e6269b/
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: QualityAssessment
    multivalued: true
  file_type:
    name: file_type
    description: The file format of the data file.
    examples:
    - object:
        id: edam:format_3004
        label: bigBed
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: Term
    required: true
  mime_type:
    name: mime_type
    description: A string providing the mime-type of the data file.
    examples:
    - value: application/octet-stream
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: string
  data_content:
    name: data_content
    description: Classification describing the file's purpose or contents.
    examples:
    - value: replicated peaks
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: OutputType
    required: true
  file_size:
    name: file_size
    description: The file size in bytes.
    examples:
    - value: '5359719'
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: integer
    required: true
  created_time:
    name: created_time
    description: Timestamp of content creation in RFC3339. (This is the creation time
      of the underlying content, not of the JSON object.).
    examples:
    - value: '2016-11-13T17:42:04.385801+00:00'
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: datetime
    required: true
  updated_time:
    name: updated_time
    description: Timestamp of content update in RFC3339, identical to created_time
      in systems that do not support updates. (This is the update time of the underlying
      content, not of the JSON object.).
    examples:
    - value: '2016-11-13T17:42:04.385801+00:00'
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: datetime
  file_version:
    name: file_version
    description: A string representing a version. (Some systems may use checksum,
      a RFC3339 timestamp, or an incrementing version number.).
    examples:
    - value: efd4e74e-7875-4d13-9630-0085bc834f18
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: string
  checksums:
    name: checksums
    description: A list of checksums of the data file. At least one checksum must
      be provided. For blobs, the checksum is computed over the bytes in the blob.
    examples:
    - object:
        checksum: 535bc9628a1c5e5215226f9996e4eaca
        checksum_type: md5
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: File
    domain_of:
    - File
    range: Checksum
    required: true
    multivalued: true