PanGEA-BlastN can be used to map ESTs to genes or whole genomes. The user has at least to specify the file containing the ESTs, the file containing the database sequences (genes or chromosomes) and the dynamic programming algorithm. The output may be used for SNP identification using PanGEA-SNP
The number of database sequences (genes or chromosomes) which can at most be analysed concurrently is limited to 65 536. As the number of genes in any known organism is <40 000 this limitation will hopefully not be a problem.
The number of ESTs which can be mapped is not limited as PanGEA-BlastN operates in batch mode.
Minimum commmand in Windows:
PanGEA-BlastN -i ests.fasta -d reference_genes.fasta -normalsw
Minimum command in Linux (Mono):
mono PanGEA-BlastN.exe -i ests.fasta -d reference_genes.fasta -normalsw
Following, for sake of simplicity, only the windows commands are shown, the Linux commands have to be adjusted as shown above:
If no output file is specified the default output file "result.aln" is used.
Multiple input or database sequences may be specified:
PanGEA-BlastN -i input1.fasta -i input2.fasta -d genes1.fasta -d genes2.fasta -normalsw -o output.aln
For sequencing-by-synthesis ESTs (e.g.: 454-sequences) the homopolymere Smith-Waterman algorithm should be used:
PanGEA-BlastN -i ests.fasta -d reference_genes.fasta -homosw
If you require help for the different PanGEA-BlastN parameters type PanGEA-BlastN without arguments:
PanGEA-BlastN
Description of all parameters:
|
Obligatory parameters: |
-i |
input file(s); obligatory parameter |
-d |
database file(s); obligatory parameter |
-normalsw / -homosw |
use the normal or the homopolymer Smith-Waterman algorithm; obligatory parameter; |
|
|
|
Optional parameters: |
-o |
output file; optional paramter; default: result.txt |
-pi |
gap introducing penalty; optional parameter; default: 11 |
-pe |
gap extend penalty; optional parameter; default: 2 |
-pt |
homopolymere transgression penalty; optional parameter; default: 3 |
-pmm |
mismatch penalty; optional parameter; default: 5 |
-hit |
score for a hit; optional parameter; default: 3 |
-wl |
word length; optional parameter; default 11 |
-minD |
minimum diagonal length; optional parameter; default: 3 |
-comp |
low complexity cutoff; optional parameter; default: 10 |
-ua |
unambiguity score difference; optional paramter; default: 10 |
-intron yes / no |
use intron mode; optional parameter; default: yes |
-usemil yes / no |
use a maximum intron length; optional parameter; default: no |
-mil |
maximum intron length; optional parameter; default: 5000 |
All input files have to be fasta or multiple fasta files. All nucleotide sequences have to be in uppercase characters.
eg:
>example EST1
AAATTTCCCGCGATATATATATCG >example EST2
AATCCTGTTATATTCCCGGTTTCG |