The results Window - SNP site centered analysis

This is an example analysis using the 26 000 ESTs published by Torres et al. 2008. The ESTs were mapped to the D. melanogaster genome and SNPs were identified without using quality sequences.

 

general information shows the overal results, number of total SNP-sites; Important: this text box also contains the SNP-benchmarks
strand

either the original or the reverse complement of an EST may be align with a reference gene. Choose whether you want to include SNPs being derived from EST which are aligned unchanged or as reverse complement with the reference sequence

allele type Which alleles should be displayed: 'A', 'T', 'C', 'G' or '-'
min. freq.counts How often should an allele at least occur at a SNP-site
min. alleles How many alleles should a SNP-site at least have. SNP-sites having fewer than this number are not shown.
min. tags How many ESTs should at least map to a SNP-site. This is actually the sum of valid and invalid ('N') SNPs at the SNP-site
max. alleles The maximum number of alleles at a valid SNP-site. SNP-sites having more than this number are not shown.
min. freq. percent The minimum frequeny of an allele in percent of the coverage (min. tags)
most frequent Display only the most frequent alleles. If for example set to 2 only the two most frequent alleles will be shown. This feature is interacting with max. alleles and min. alleles. If this value is for example set to 2 no SNP-site will ever be excluded when max. alleles is set to 3
min. dist. minimum distance from the next alignment end
max. low qual. maximum number of low alignment quality tokens in the neighborhood of a SNP ('-','N'). When the 454-adapted mode was used, this feature can be used to restrict the number of homopolymers in the immediate neighborhood of a SNP. For example when this value is set to 2 only one homopolymer of length 3 or two homopolymers of length 2 are allowed in the neighborhood (details here)
min. qual. site minimum sequence quality at the SNP
min. qual. neigh. minimum average sequence quality in the neighborhood of a SNP
detail level Choose the detail level of the analysis. When set to basic only the most important parameters will be shown. When set to detailed each parameter will be shown.
sorting How should the SNP-sites in the analysis be sorted. By (i) gene ID and (ii) position within the gene, by (i) gene ID and (ii) frequeny (tags mapping to the SNP-site), or primary by frequeny etc
show all Show all SNP-sites or only the first n. What accounts as first is specified by the active sorting.
show first n

Show only the first n SNP-sites for each gene ID. This feature can, for example, be used to export only the most frequently covered SNP-site for each reference sequence (gene). To achieve this, select sorting by 'Gene ID and frequency', uncheck the use all box and type 1 into the 'show first n' textbox. Finally press the export button, and you will only have the most infromative SNP-site for each gene.

clipboard copy the results of the analysis into the clipboard
export open the export subset dialog window
upadet analysis update the analysis using the user-specified parameters
results the results of the analysis

 

The results in detail

As mentioned before, PanGEA offers three differnt detail levels for the analysis of SNPs. Following only the features for the most detailed analysis are explained as the less detailed results represent merely a subset of this.

FBgn0000579 4431-0 16 100.00 16 2 16 2 0.22 100.00 2 0 G G 14 A 2
FBgn0003279 1911-0 19 100.00 19 2 18 2 0.20 100.00 2 0 G G 16 A 2
FBgn0003979 444-0 10 100.00 10 2 10 2 0.32 100.00 0 2 G G 8 C 2
FBgn0003979 445-0 10 100.00 10 2 10 2 0.32 100.00 0 2 C C 8 G 2
a b c d e f g h i j k l m n o p q

 

a reference sequence ID (gene ID)
b position of the SNP-site within the reference sequence; the first part is the position the second the indel shift (details here)
c coverage of the SNP-site (ESTs mapping to the SNP-site); this trait is calculated as the sum of all valid and invalid SNPs at the SNP site
d percent of valid SNPs at the SNP-site
e number of valid SNPs at the SNP-site
f number of alleles at the SNP-site
g subset number of valid SNPs at the SNP-site; the user-specified restrictions to delimit a active subset are used
h subset number of alleles at the SNP-site; e user-specified restrictions to delimit a active subset are used

i

PIC at the SNP-site; for the active subset
j percent of SNPs being derived from ESTs which are aligned to the reference sequence without reverse complementig the EST. In short: percent of sense SNPs; for active subset
k transitions, with respect to the reference sequence; for active subset
l transversions, with respect to the reference sequence; for active subset
m nucleotide character of the reference sequence at the SNP-site
n first allele at the SNP-site; PanGEA always attempts to show the allele being identical with the reference sequence character at first; for active subset
o frequency of the first allele in counts; for active subset;
p second allele at the SNP-site; for active subset
q frequency of the second allele
... etc for each allele at the SNP-site

The whole analysis may be copied into the clipboard with the button 'CB'

 

Torres TT, Metta M, Ottenwälder B, Schlötterer C.
Gene expression profiling by massively parallel sequencing.
Genome Res. 2008 Jan;18(1):172-7. Epub 2007 Nov 21.