search Team home Team
SilkSatDb: The First Comprehensive Database on Insect Microsatellites

Silkworm Biology



WGS= Whole Genome Shot-gun.
EST= Expressed Sequence Tag.
BAC= Bacterial Artificial Chromosome.
Total-Genomic refers to the 22.43 Mb Genomic and Zchr-BAC sequences used for analysis.
Genomic refers to 21.76 Mb WGS sequences from chromosomes other than Z.
Zchr-BAC refers to 0.67Mb Z chromosome derived BAC sequences.
EST refers to 6.3 MB sequences from ~9300 non redundant ESTs.

  1. Microsatellites are widely distributed in B. mori genome, with about 3 Kb of repeats per Mb of Genomic and EST sequences.

  2. Out of the 0.31% microsatellite repeats in silkworm genome, mono-, di-, tri-, tetra-, penta- and hexanucleotide repeats represent 0.110, 0.116, 0.053, 0.018, 0.006 and 0.003%of the genome respectively.

  3. Among  trinucleotide, TAA repeats were the most abundant repeats in the genomic sequences comprising of almost 50% of trinucleotide repeats followed by GTA and TGA. Except these three trinucleotide repeat types all the remaining repeats were over represented in ESTs.

  4. Silkworm genome of 530 Mb accounts for 1.63 Mb for microsatellite repeats equivalent to 0.31% of the genome.

  5. The total number of mono-, di-, tri-, tetra-, penta- and hexanucleotide repeat units in the genome are 5.9 (0.59 Mb), 4.4 (0.62 Mb), 1.5 (0.28 Mb) 0.02 (0.10 Mb), 0.006 (0.03 Mb) and 0.002 (0.02 Mb) millions, respectively.

  6. Amoung trinucleotide repeat tracks TAA repeats were significantly over represented in Zchr-BAC compared to Genomic and EST sequences. GCA, CGA and CGG were significantly higher than those in Genomic sequences were as GAA and GGA were completely absent. Tetra-, penta- and hexanucleotide repeat motifs were very scarce in the Zchr-BACs.

  7. A/T stretches are highly common than C/G stretches. Greater than 20 repeat unit tracks of A/T is more common in Genomic and ESTs.

  8. Among dinucleotides, CG repeats are least abundant.Among dinucleotides in ESTs CA and GA are as abundant as TA repeats, and CG repeats are relatively more compared to Genomic and Zchr-BAC sequences.

  9. Trinucleotide repeats show a drastic reduction in number when the length of repeats increases.

  10. Tetra-, penta- and hexanucleotide repeats have large groups of repeat type, thus we classified them based on AT percentage starting from 0 to 100. For e.g. ATTT is a 100% AT rich tetranucleotide repeat , CTTT is a 75% AT rich tetranucleotide repeat , ATCG is a 50 AT rich tetranucleotide repeat etc.   Maximum tetranucleotide repeats were observed under 75% AT rich (single C/G) category followed by 100% AT (no C/G) rich repeats. 80% (single C/G) and 100% AT rich (no C/G) repeat types constitute greater than 60% of the total number of pentanucleotide repeats. 83.3% (single C/G) and 100% AT rich (no C/G) repeat types were greater than 50% of the total number of hexanucleotide repeats.

  11. We observed a common phenomenon where the repeat length and repeat number were inversely proportional. This was more pronounced in trinucleotide repeats, especially because of their abundance in both the genome and ESTs.

  12. All repeat types have large number of repeat below 15 bp. Repeat tracks longer than 15 bp are very less in number.

  13. The average length of mononucleotides was more as compared to other repeat classes.


Flanking sequences

To test the base composition surrounding the microsatellite repeats, we calculated GC percentage of 100 bases upstream and 100 downstream sequences flanking different classes of repeat motifs.

  1. Significant differences in AT/GC composition between upstream and downstream flanking sequences was not observed except in the case of 100% AT rich penta repeats, where downstream flanking sequence showed 5% lower GC content compared to upstream sequences in both Genomic and EST sequences.

  2. We observed a positive correlation between the GC content of the repeat and the GC content of the flanking sequences in all repeat types. As and when the GC content of the repeat increased, GC content of the flanking sequence also increased, with a minimum of 30% to a maximum of  60%. 

Copyright © 2004 All Rights Reserved, CDFD, Hyderabad, India