PanGEA Main

PanGEA-BlastN can be used to map ESTs to genes or whole genomes. The user has at least to specify the file containing the ESTs, the file containing the database sequences (genes or chromosomes) and the dynamic programming algorithm. The output may be used for SNP identification using PanGEA-SNP

The number of database sequences (genes or chromosomes) which can at most be analysed concurrently is limited to 65 536. As the number of genes in any known organism is <40 000 this limitation will hopefully not be a problem.

The number of ESTs which can be mapped is not limited as PanGEA-BlastN operates in batch mode.

Minimum commmand in Windows:
PanGEA-BlastN -i ests.fasta -d reference_genes.fasta -normalsw

Minimum command in Linux (Mono):
mono PanGEA-BlastN.exe -i ests.fasta -d reference_genes.fasta -normalsw

Following, for sake of simplicity, only the windows commands are shown, the Linux commands have to be adjusted as shown above:

If no output file is specified the default output file "result.aln" is used.

Multiple input or database sequences may be specified:
PanGEA-BlastN -i input1.fasta -i input2.fasta -d genes1.fasta -d genes2.fasta -normalsw -o output.aln

For sequencing-by-synthesis ESTs (e.g.: 454-sequences) the homopolymere Smith-Waterman algorithm should be used:
PanGEA-BlastN -i ests.fasta -d reference_genes.fasta -homosw

If you require help for the different PanGEA-BlastN parameters type PanGEA-BlastN without arguments:
PanGEA-BlastN

Description of all parameters:

	Obligatory parameters:
-i	input file(s); obligatory parameter
-d	database file(s); obligatory parameter
-normalsw / -homosw	use the normal or the homopolymer Smith-Waterman algorithm; obligatory parameter;

	Optional parameters:
-o	output file; optional paramter; default: result.txt
-pi	gap introducing penalty; optional parameter; default: 11
-pe	gap extend penalty; optional parameter; default: 2
-pt	homopolymere transgression penalty; optional parameter; default: 3
-pmm	mismatch penalty; optional parameter; default: 5
-hit	score for a hit; optional parameter; default: 3
-wl	word length; optional parameter; default 11
-minD	minimum diagonal length; optional parameter; default: 3
-comp	low complexity cutoff; optional parameter; default: 10
-ua	unambiguity score difference; optional paramter; default: 10
-intron yes / no	use intron mode; optional parameter; default: yes
-usemil yes / no	use a maximum intron length; optional parameter; default: no
-mil	maximum intron length; optional parameter; default: 5000

All input files have to be fasta or multiple fasta files. All nucleotide sequences have to be in uppercase characters.

eg:

>example EST1
AAATTTCCCGCGATATATATATCG

>example EST2
AATCCTGTTATATTCCCGGTTTCG

Description of all parameters:

Obligatory parameters:

Optional parameters: