Allele
The allele class is used for representing contiguous changes on a reference sequence. This class covers the most commonly described forms of variation, including all “small” variants such as SNVs and indels that are also representable in other contemporary genomic variant formats, such as SPDI, HGVS, and VCF.
Definition and Information Model
Computational Definition
The state of a molecule at a Location.
Information Model
Some Allele attributes are inherited from Variation.
Field |
Type |
Limits |
Description |
---|---|---|---|
id |
string |
0..1 |
The ‘logical’ identifier of the entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system. The identified entity may have a different ‘id’ in a different system, or may refer to an ‘id’ for the shared concept in another system (e.g. a CURIE). |
label |
string |
0..1 |
A primary label for the entity. |
description |
string |
0..1 |
A free-text description of the entity. |
extensions |
Extension |
0..m |
|
type |
string |
0..1 |
MUST be “Allele” |
digest |
string |
0..1 |
A sha512t24u digest created using the VRS Computed Identifier algorithm. |
expressions |
Expression |
0..m |
|
location |
IRI | Location |
1..1 |
The location of the Allele |
state |
SequenceExpression |
1..1 |
An expression of the sequence state |
Implementation Guidance
Sequence Location Coordinates
The location
property of the allele will almost always have start
and end
coordinates that are specified using
integers (not Range). There are some situations, such as the detection of deleted sequence by microarray, where it may
be appropriate to represent the variant as an Allele; however, other classes for representing such findings should also be
considered (e.g. CopyNumberCount).
Normalization
The Allele
also includes conventions for variant normalization (see Allele Normalization) that allows for compact and
uniform representation of variants.
New in v2
In VRS v1.x, normalization included methods for full justification of variants, as derived from the NCBI VOCA algorithm. In v2, this has been extended to include reference length encoding (see ReferenceLengthExpression), to accommodate compressed representation of variants that occur in large repetitive regions.
For alleles in small repeating regions, it may be convenient to also use the ReferenceLengthExpression.sequence
attribute
to represent the sequence state explicitly alongside the reference encoding.
Expressions
New in v2
The v2 variation classes now support expressions. This is a convenient mechanism for annotating Alleles using string syntaxes following the conventions other variant standards (e.g. HGVS, SPDI) and resources (e.g. ClinVar, gnomAD).