Allele

The allele class is used for representing contiguous changes on a reference sequence. This class covers the most commonly described forms of variation, including all “small” variants such as SNVs and indels that are also representable in other contemporary genomic variant formats, such as SPDI, HGVS, and VCF.

Definition and Information Model

Computational Definition

The state of a molecule at a Location.

Information Model

Some Allele attributes are inherited from Variation.

Field

Type

Limits

Description

id

string

0..1

The ‘logical’ identifier of the entity in the system of record, e.g. a UUID. This ‘id’ is unique within a given system. The identified entity may have a different ‘id’ in a different system, or may refer to an ‘id’ for the shared concept in another system (e.g. a CURIE).

label

string

0..1

A primary label for the entity.

description

string

0..1

A free-text description of the entity.

extensions

Extension

0..m

type

string

0..1

MUST be “Allele”

digest

string

0..1

A sha512t24u digest created using the VRS Computed Identifier algorithm.

expressions

Expression

0..m

location

IRI | Location

1..1

The location of the Allele

state

SequenceExpression

1..1

An expression of the sequence state

Implementation Guidance

Sequence Location Coordinates

The location property of the allele will almost always have start and end coordinates that are specified using integers (not Range). There are some situations, such as the detection of deleted sequence by microarray, where it may be appropriate to represent the variant as an Allele; however, other classes for representing such findings should also be considered (e.g. CopyNumberCount).

Normalization

The Allele also includes conventions for variant normalization (see Allele Normalization) that allows for compact and uniform representation of variants.

New in v2

In VRS v1.x, normalization included methods for full justification of variants, as derived from the NCBI VOCA algorithm. In v2, this has been extended to include reference length encoding (see ReferenceLengthExpression), to accommodate compressed representation of variants that occur in large repetitive regions.

For alleles in small repeating regions, it may be convenient to also use the ReferenceLengthExpression.sequence attribute to represent the sequence state explicitly alongside the reference encoding.

Expressions

New in v2

The v2 variation classes now support expressions. This is a convenient mechanism for annotating Alleles using string syntaxes following the conventions other variant standards (e.g. HGVS, SPDI) and resources (e.g. ClinVar, gnomAD).