SNP identification from SNP-sites

Choose the radio button 'Identify SNPs' and press the button 'Proceed'

Read the instroduction and press 'Proceed' again

Choose the radio button 'SNP identification from SNP sites' and press proceed.

Choose whether or not indels should be considered as valid SNPs. If you use 454-sequences (or generally seqeuncing-by-synthesis) we recommend that indels should be ignored.

Choose the input and output files. The input file must be a valid PanGEA pairwise alignment file. Multiple files may be specified at once. Although it is possible to use the ambiguous PanGEA-BlastN results for SNP-identification we do not recommend it, as the results will not be meaningful. Use the unambiguous PanGEA-BlastN search results instead.

Output file is a '*.snp' file. Click here for specification.

Specify the settings for assessing the SNP quality. You can specify what should be considered as the neighborhood of the SNP for counting the low alignment quality tokens.

If you want to use quality files, click the checkbox 'Use quality sequences' and choose the files containing the quality sequences. Consider that the quality sequences have to be in agreement with the EST sequences, both in frame and length. If you want, for example, to trim EST sequences, or to remove adaptors, uses the 'Trim sequences' option of PanGEA. This option removes the specified sequences from the ESTs and the corresponding quality files. However, to calculate the average sequence quality in the SNP-neighborhood, you also have to specify what should be regarded as neighborhood of the SNP.

You must choose the method, which should be used to get an assessment the SNP-quality from the pairwise alignment. The normal mode only counts the number of low alignment quality tokens ('-', 'N', 'Y', etc) in the specified neighborhood of the SNP. The 454-mode additionally considers homopolymers as low alignment quality tokens. Homopolymer are weighted by their length, and homopolymers having the same nucleotide character as the SNP are weighted threefold. For details see here.

Choose the files containing a list of SNP-sites. (File specifications see here) This SNP sites will be used to extract the character state for each EST (query sequence) from the pairwise alignments. Of course only ESTs mapping to a SNP-sites are considered. This method is not restrictive and does not require any minium allele frequencies. It might even happen that only one allele occurs at a SNP site. You may refine the results subsequently using the option 'Manage SNPs'.

This is the SNP-search report which appears when the SNP-search from SNP-sites has been finished successfully. Read the report carefully and copy it into a textfile when needed. This report is not automatically written into a textfile.

Press 'New task'