PanGEA Online Manual

'de novo' - SNP identification

Choose the radio button 'Identify SNPs' and press the button 'Proceed'

Read the instroduction and press 'Proceed' again

Choose the radio button 'de novo' and press proceed.

Choose whether or not indels should be considered as valid SNPs. If you use 454-sequences (or generally seqeuncing-by-synthesis) we recommend that indels should be ignored.

Choose the input and output files. The input file must be a valid PanGEA pairwise alignment file. Multiple files may be specified at once. Although it is possible to use the ambiguous PanGEA-BlastN results for SNP-identification, we do not recommend it, as the results will not be meaningful. Use the unambiguous PanGEA-BlastN search results instead.

Output file is a '*.snp' file. Click here for specification.

Specify the settings for assessing the SNP quality. You can specify what should be considered as the neighborhood of the SNP for counting the low alignment quality tokens.

If you want to use quality files, click the checkbox 'Use quality sequences' and choose the files containing the quality sequences. Consider that the quality sequences have to be in agreement with the EST sequences, both in frame and length. If you want, for example, to trim EST sequences, or to remove adaptors, uses the 'Trim sequences' option of PanGEA. This option removes the specified sequences from the ESTs and the corresponding quality files. However, to calculate the average sequence quality in the SNP-neighborhood, you also have to specify what should be regarded as neighborhood of the SNP.

You must choose the method, which should be used to get an assessment the SNP-quality from the pairwise alignment. The normal mode only counts the number of low alignment quality tokens ('-', 'N', 'Y', etc) in the specified neighborhood of the SNP. The 454-mode additionally considers homopolymers as low alignment quality tokens. Homopolymer are weighted by their length, and homopolymers having the same nucleotide character as the SNP are weighted threefold. For details see here.

Choose the general settings of your SNP-identification. We recommend that each SNP allele should occur at least two times. Otherwise a single sequencing mistake will already account as a SNP-site. Generally, do not use restrictive settings at this step (e.g. minum allele frequency 30%), you can refine the set of initially identified SNPs further with the option 'Manage SNPs'. This is more conveninient and offers several additional options.

This is the SNP-search report which appears when the SNP-search has been successfully completed. Read the report carefully and copy it into a textfile when needed. This report is not automatically written into a textfile. If you encounter a out of memory exception during SNP-identification, decrease the amount of pairwise alignments which should be analysed at once (see here)

Finally press 'New task'