PanGEA Online Manual

In PanGEA SNPs are identified directly from pairwise alignments which have been created with PanGEA-BlastN, this has the advantages that no previously assembled multiple alignments are required and a fast computation can be achieved. For example, 22 713 pairwise alignments (total length 2.5 Mbp) may be analysed in 3 seconds. This pairwise alignments may map to different genes or different positions within the genes.

Whether or not indels should be considered as valid SNP alleles may be specified by the user.
PanGEA furthermore allows to use sequence quality files for the ESTs. This files will be scanned to calculate (i) the sequence quality at the SNP site and (ii) the quality in the neighborhood of the SNP.

PanGEA also provides an ad hoc solution for assessing the alignment quality of a SNP, by counting low alignment quality tokens such as '-' or 'N' in the neighborhood of a SNP. The extent of this neighborhood has again to be specified by the user. Apart from indels, the 454 platform frequently generates sequencing errors near homopolymers. PanGEA therefore provides a variation of the assessment of SNP quality from pairwise alignments for the 454 technology, which scores homopolymers in the immediate neighborhood (3 bp in both directions) of a SNP also as low alignment quality tokens. The homopolymers are weighted by their total length (weight=length-1). To account for 'carry forward events' homopolymers consisting of the same nucleotide as the SNP allele have a threefold increased negative weight (weight=3*length-3; for details see here).We found that this feature is very useful and may help to distinguish false SNPs (sequencing erors) from true SNPs.

However, SNP identification from pairwise alignments proceeds in several distinct steps. First, all pairwise alignments are successively parsed and the sites of mismatching bases in the pairwise alignments are recorded (SNP-sites). Subsequently the pairwise alignments are parsed once again, this time recording each nucleotide character at the query sequence (SNP) of each SNP-site.
Finally, all SNPs meeting the minimum requirements are reported to the output file.
This initially generated set of SNPs may further be refined using PanGEAs 'Manage SNPs' option.
The number of pairwise alignments which can be analyzed at once is not limited and the alignments may map to any genes at any position.

The SNP identification module is also available as stand-alone console application PanGEA-SNP