Home
Search Options
GO Viewer
BLAST Search
cSNP
Download
Contact Us

 

spacer

Help


BLAST Search

Overview

BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching.
A fairly complete on-line guide to BLAST searching can be found at the NCBI BLAST Help Manual.
BLAST searches offered by Wild Silkbase allow users to compare any query sequence to Antheraea assama, Samia cynthia ricini and Antheraea mylitta EST sequence datasets.

Program

Wild Silkbase offers these three BLAST programs to accommodate different types of searches:
  1. BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset.
  2. TBLASTX compares the six-frame translations of a DNA sequence to the six-frame translations of a nucleotide sequence dataset.
  3. TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands).

Query sequence

Sequence can be submitted for a BLAst search in two different ways. The sequence can be typed or pasted in the text box, or the sequence can be uploaded from a text file on your computer. All sequences must be in FASTA-format, i.e., each sequence begins with the ">" character in the first position, followed by descriptive text (the "definition line"). One or more lines containing the sequence then follow. These lines may be of varying length and should contain only sequence characters that are valid to BLAST.
>seqid  description
AAGGGAACAAAGCTGGAGCTCACGCGGTGGCGGCGCTCTAGAACTATGATCCCCGGGCTGCAGGAATTCG
GCACGAGGCGACAGTTCTTTTGTATGTACTCTATTGCTGATAGGAACAGGAATAATACTTACTAGTCTGA
TATTGATATGATTATTGATTTTGATTTCATCCTAATTTGGTGAAGTATGGAACCTAATCCTGTTATTATC
GCCTCAACGGCCAGACAA

Databases

Wild Silkbase offers a selection of sequence databases that can be searched, depending on the user's requirements.
  • "All ESTs"
  • "Antheraea assama (All)"
  • "Antheraea assama Embryo (96 Hrs)"
  • "Antheraea assama Brain"
  • "Antheraea assama Testis"
  • "Antheraea assama Ovary"
  • "Antheraea assama Midgut"
  • "Antheraea assama Fatbody"
  • "Antheraea assama Middle Silkgland"
  • "Antheraea assama Posterior Silkgland"
  • "Antheraea assama Epidermis"
  • "Antheraea assama Compound Eye"
  • "Samia cynthia ricini (All)"
  • "Samia cynthia ricini Embryo (96 Hrs)"
  • "Samia cynthia ricini Fatbody I+II"
  • "Samia cynthia ricini Fatbody I"
  • "Samia cynthia ricini Fatbody II"
  • "Antheraea mylitta (All)"
  • "Antheraea mylitta Fatbody"

Options

  • Changing the E-Value determines the stringency of a BLAST search. A lower E-value increases the stringency (to be used if short and / or very A/T-rich sequences are submitted), a higher E-Value decreases the stringency of a search. The default is 0.1, which means no alignment with a value higher than 10 is displayed.
  • The number of Alignments to show determines how many alignments are displayed.
  • The number of Descriptions to show how many one-line descriptions are displayed.
  • The default Word Size is 11 nucleotides for DNA and 3 amino acids for Proteins.
  • The Matrix is a general purpose matrix. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM 62, the default, is among the best of the available matrices for detecting weak similarities. Other supported options are BLOSUM 45, BLOSUM 80,PAM 30, and PAM 70. Adjustments to the matrix may be in order when a search for very distant relatives of the query is being performed.
  • Filtering is ON by default and filters the query sequence for low complexity regions. In a protein search low complexity regions appear as X's in the alignment while in a nucleotide search they appear as n's. The score and E-value of a match may be affected slightly by filtering since it effectively shortens the query length. The DUST and SEG algorithms are used.

Results

BLAST search results are returned directly to the user's web browser in HTML format. The sequence IDs on the BLAST result page is furthur linked to the information like organism name, Tissue Type, Sequennce Length, Unigene ID and Sequence.
A link to Clustal W alignment file of the sequences matched in the databases is also provided on the result page.



Search Options

Wild Silkbase is accomodated with three different search options:

Keyword Search

The keyword search option allows user to search keyword against the database. The user can choose from the three different options of search i.e., Go Terms, EST Clone ID and Unigene ID. The wildcard character "%" can be used in the search to broaden the search results.

Homolog Finder

The Homolog Finder provides user the facility of finding homolog of the query sequence against six whole insect genomes (Aedes aegypti, Anopheles gambiae, Apis mellifera, Bombyx mori, Drosophila melanogaster and Tribolium castaneum). The result page shows single sequence matched against the selected database.

SSR Finder

SSR Finder provides a tabulated data on SSR of selected Organism and repeat typeTandem Repeat Finder is used to extract SSRs with specific parameter settings (match = 2, mismatch = 3, indel = 5, match propability = 0.8, indel probability = 0.1, minimum score = 25, maximum period = 10).


GO Viewer

GO Viewer allows user to view the EST unigene sequences according to Gene Ontology (GO) terms. GO terms are given to sequences according to BLAST similarity search against the GO seqdblite database, which contains GO terms, gene products and the sequences associated with these gene products. This is the same as seqdb, except all IEA associations have been removed. The IEA associations provide relatively little value compared to the curated associations, and they slow querying down immensely.


cSNP

cSNPs are detected with SEAN software (D. Huntley, A. Baldo, S. Johri, and M. Sergot (February 15, 2006) SEAN: SNP prediction and display program utilizing EST sequence clusters. Bioinformatics, 22(4): 495 - 496). Right now only cSNPs of Antheraea mylitta are available, the data for Antheraea assama and Samia cynthia ricini will be updated soon.


Download

Download page is password protected to avoid unauthorised use of data. Users are requested to mail at wildsilkbase@cdfd.org.in for the login ID and password. All sequence and annotation data are available for download. The sequence files are in FASTA format and zipped, while the annotation data files are in csv format which can be directly uploaded to excel sheets.

Copyright © 2008 All Rights Reserved, CDFD, Hyderabad, India