This page uses modern technologies that don't seem to be supported by your browser. Please consider using a recent version of Mozilla Firefox, Google Chrome or Microsoft Edge. Apple Safari might also work (untested).
Experimental STR naming development tool
This tool converts sequences to allele names using simple guidelines, requiring only a reference sequence of each marker and some parameters. The aim is to establish an unbiased method to automatically generate short, informative and human-readable descriptions of STR alleles that works the same for any locus in use now or in the future.
In this experimental tool, some parameters can still be adjusted to improve the resulting STR allele names. The aim is to arrive at a single set of parameters that works well on all currently-used STR loci, with the hope that the same set of parameters will also result in favourable allele names for STR loci added in the future.
Four parameters control the discovery of repeat units, stretches and structures. Lengths are measured in nucleotides.
- The minimum number of consecutive repeats
- The maximum length of repeat units
- The minimum length of repeat stretches
- The minimum length of repeat structures
- The maximum length of repeat stretch interruptions (one larger gap is permitted)
Three options are available to enhance the agility of STRNaming (0=no, 1=only for units in the reference sequence, 2=always).
The following tuning parameters are available to decide the optimal allele name for the STR sequences.
- The score awarded for every base that is covered by a repeat
- × The score awarded for every distinct repeat unit used
- The score awarded for every repeat of a unit
- Additional score awarded for every repeat of a unit that was also repeated in the reference
- × The score awarded for every interruption between repeat stretches
- Additional score awarded for every interruption that is exactly one repeat unit long
- The score awarded for every base in an interruption between repeat stretches
- The score awarded for every base inserted or deleted in the prefix, suffix or large interruption of the STR
The first step is to extract the core features of an STR locus from its reference sequence. These features are:
- The sequences (but not the positions) of individual tandem repeat units found
- The sequence before the 5' start of the repeat structure (the 'prefix')
- The sequence after the 3' end of the repeat structure (the 'suffix')
- The sequence of the longest interruption, if it is longer than the maximum permitted
- The dominant repeat unit length, and a correction factor to calculate the CE allele number
Please enter a single reference sequence per marker below. The CE length of the reference allele may be omitted. Hover the mouse pointer over an analysed sequence to see precisely what data was extracted from it.
Generally, input sequences should be in uppercase letters. Upstream and downstream sequence (outside of the region that is analysed by your PCR kit) may be provided in lowercase letters. These parts of the sequence will be taken into account when analysing the reference sequence, but should not be visible in the sequences entered in the 'Allele naming' section below.
Legend: MarkerName CElength PrefixRepeat1GapRepeat2Repeat1Repeat3Repeat4Repeat5Suffix
Allele naming sequence input
Please enter the sequences to process below. Any lowercase portion of the reference sequence entered above should NOT be included here. Each line should contain a marker name and a sequence separated by a space, comma, or semicolon. If you need to reverse-complement some sequences, try this.
Allele name output
The resulting allele names are displayed below. Hover the mouse pointer over a name to see lower-scoring alternatives.
Additionally, the Sequence Identifier (SID) nomenclature is accessible. This nomenclature was independently developed by Young et al. and used with their permission. See: A nomenclature for sequence-based forensic DNA analysis. Forensic Sci Int Genet, 42, 14–20.