Skip to content

Class: GenomeAssembly

Information about of the exact genome assembly used to generate the annotation file, defining the genomic coordinate system for the sequence features.

URI: https://w3id.org/fga-wg/schema/bundle/GenomeAssembly

 classDiagram
    class GenomeAssembly
    click GenomeAssembly href "../GenomeAssembly/"
      GenomeAssembly : accessions

      GenomeAssembly : aliases

      GenomeAssembly : seqcol_digest

      GenomeAssembly : seqcol_ordered_coord_system

      GenomeAssembly : seqcol_unordered_coord_system

Example

Example JSON
{
  "accessions": [
    "encode:ENCSR425FOI"
  ],
  "aliases": [
    "GRCh38_no_alt_analysis_set_GCA_000001405.15",
    "GRCh38",
    "hg38"
  ],
  "seqcol_digest": "ga4gh:SC.EiFob05aCWgVU_B_Ae0cypnQut3cxUP1",
  "seqcol_ordered_coord_system": "ga4gh:SC.name_length_pairs.Yyz0Expaluj09xdDYg2Y6VOApvjg05Hf",
  "seqcol_unordered_coord_system": "ga4gh:SC.sorted_name_length_pairs._dMQ5dPUNVx4OGQnDAPmGMkVRWWcYV99"
}

Slots

Name Cardinality and Range Description Inheritance
seqcol_digest 1
Curie
Top-level sequence collection digest according to the GA4GH refget, Sequence Collections standard (v1.0). This a globally unique identifier for the genome assembly, algorithmically derivable from the genome assembly content. Usage is to uniquely identify the exact genome assembly used and allow detailed comparisons across genome assembly variants (say, variants of the GRCh38 assembly). direct
seqcol_ordered_coord_system 1
Curie
Content-derived digest that uniquely identifies the ordered coordinate system of the genome assembly. (Coordinate systems with the same sequence names and lengths, but where the sequences are ordered differently, will have different ordered digests.). Usage is the ordered coordinate system digest can be used to uniquely generate a chromSizes file, useful in a number of analysis tools. Definition is the ordered coordinate system digest is defined as the level 1 digest of the name_length_pairs attribute of the sequence collection generated from the genome assembly. direct
seqcol_unordered_coord_system 1
Curie
Content-derived digest that uniquely identifies the order-invariant coordinate system of the genome assembly. This digest will be shared across all coordinate systems with the same sequence names and lenghts, regardless of the order of the sequences. Usage is the order-invariant coordinate system digest can be used to uniquely describe the coordinate system of a particular genome browser instance and the annotation files that are compatible with it. Definition is the order-invariant coordinate system digest is defined as the level 1 digest of the sorted_name_length_pairs attribute of the sequence collection generated from the genome assembly. direct
accessions *
String
Database accession numbers for the genome assembly, if available. Should precisely identify the genome assembly and be omitted if changes have been made to the assembly after retrieval, such as removing the alternate sequences. direct
aliases 1..*
Curie
Human-readable aliases of the genome assembly. Can be imprecise, as preciseness is enforced in the other fields. direct

Usages

used by used in type used
GenomicAnnotationFile genome_assembly range GenomeAssembly

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/fga-wg/schema/bundle

Mappings

Mapping Type Mapped Value
self https://w3id.org/fga-wg/schema/bundle/GenomeAssembly
native https://w3id.org/fga-wg/schema/bundle/GenomeAssembly

LinkML Source

Direct

name: GenomeAssembly
description: Information about of the exact genome assembly used to generate the annotation
  file, defining the genomic coordinate system for the sequence features.
from_schema: https://w3id.org/fga-wg/schema/bundle
slots:
- seqcol_digest
- seqcol_ordered_coord_system
- seqcol_unordered_coord_system
- accessions
- aliases

Induced

name: GenomeAssembly
description: Information about of the exact genome assembly used to generate the annotation
  file, defining the genomic coordinate system for the sequence features.
from_schema: https://w3id.org/fga-wg/schema/bundle
attributes:
  seqcol_digest:
    name: seqcol_digest
    description: Top-level sequence collection digest according to the GA4GH refget,
      Sequence Collections standard (v1.0). This a globally unique identifier for
      the genome assembly, algorithmically derivable from the genome assembly content.
      Usage is to uniquely identify the exact genome assembly used and allow detailed
      comparisons across genome assembly variants (say, variants of the GRCh38 assembly).
    examples:
    - value: ga4gh:SC.EiFob05aCWgVU_B_Ae0cypnQut3cxUP1
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    identifier: true
    owner: GenomeAssembly
    domain_of:
    - GenomeAssembly
    range: curie
    required: true
  seqcol_ordered_coord_system:
    name: seqcol_ordered_coord_system
    description: Content-derived digest that uniquely identifies the ordered coordinate
      system of the genome assembly. (Coordinate systems with the same sequence names
      and lengths, but where the sequences are ordered differently, will have different
      ordered digests.). Usage is the ordered coordinate system digest can be used
      to uniquely generate a chromSizes file, useful in a number of analysis tools.
      Definition is the ordered coordinate system digest is defined as the level 1
      digest of the name_length_pairs attribute of the sequence collection generated
      from the genome assembly.
    examples:
    - value: ga4gh:SC.name_length_pairs.Yyz0Expaluj09xdDYg2Y6VOApvjg05Hf
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: GenomeAssembly
    domain_of:
    - GenomeAssembly
    range: curie
    required: true
  seqcol_unordered_coord_system:
    name: seqcol_unordered_coord_system
    description: Content-derived digest that uniquely identifies the order-invariant
      coordinate system of the genome assembly. This digest will be shared across
      all coordinate systems with the same sequence names and lenghts, regardless
      of the order of the sequences. Usage is the order-invariant coordinate system
      digest can be used to uniquely describe the coordinate system of a particular
      genome browser instance and the annotation files that are compatible with it.
      Definition is the order-invariant coordinate system digest is defined as the
      level 1 digest of the sorted_name_length_pairs attribute of the sequence collection
      generated from the genome assembly.
    examples:
    - value: ga4gh:SC.sorted_name_length_pairs._dMQ5dPUNVx4OGQnDAPmGMkVRWWcYV99
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: GenomeAssembly
    domain_of:
    - GenomeAssembly
    range: curie
    required: true
  accessions:
    name: accessions
    description: Database accession numbers for the genome assembly, if available.
      Should precisely identify the genome assembly and be omitted if changes have
      been made to the assembly after retrieval, such as removing the alternate sequences.
    examples:
    - value: encode:ENCSR425FOI
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: GenomeAssembly
    domain_of:
    - GenomeAssembly
    range: string
    multivalued: true
  aliases:
    name: aliases
    description: Human-readable aliases of the genome assembly. Can be imprecise,
      as preciseness is enforced in the other fields.
    examples:
    - value: GRCh38_no_alt_analysis_set_GCA_000001405.15
    - value: GRCh38
    - value: hg38
    from_schema: https://w3id.org/fga-wg/schema/bundle
    rank: 1000
    owner: GenomeAssembly
    domain_of:
    - GenomeAssembly
    range: curie
    required: true
    multivalued: true