WGS= Whole Genome Shot-gun.
EST= Expressed Sequence Tag.
BAC= Bacterial Artificial Chromosome.
Total-Genomic refers to the 22.43 Mb Genomic and Zchr-BAC sequences used for analysis.
Genomic refers to 21.76 Mb WGS sequences from chromosomes other than Z.
Zchr-BAC refers to 0.67Mb Z chromosome derived BAC sequences.
EST refers to 6.3 MB sequences from ~9300 non redundant ESTs.
widely distributed in B. mori genome, with about 3 Kb of repeats
per Mb of Genomic and EST sequences.
Out of the 0.31%
microsatellite repeats in silkworm genome, mono-,
di-, tri-, tetra-, penta- and hexanucleotide repeats represent 0.110, 0.116,
0.053, 0.018, 0.006 and 0.003%of the genome respectively.
Among trinucleotide, TAA repeats were the most abundant repeats in the genomic sequences comprising of almost 50%
of trinucleotide repeats followed by GTA and TGA. Except these three trinucleotide repeat types all the remaining repeats were over represented in ESTs.
Silkworm genome of 530 Mb
accounts for 1.63 Mb for microsatellite repeats equivalent to 0.31% of the genome.
The total number
of mono-, di-, tri-, tetra-, penta- and hexanucleotide repeat units in
the genome are 5.9 (0.59 Mb), 4.4 (0.62 Mb), 1.5 (0.28 Mb) 0.02 (0.10
Mb), 0.006 (0.03 Mb) and 0.002 (0.02 Mb) millions, respectively.
TAA repeats were significantly over represented in Zchr-BAC
compared to Genomic and EST sequences. GCA, CGA and CGG were significantly
higher than those in Genomic sequences were as GAA and GGA were completely
absent. Tetra-, penta- and hexanucleotide repeat motifs were very scarce in the Zchr-BACs.
A/T stretches are highly common than C/G stretches.
Greater than 20 repeat unit tracks of A/T is more common in Genomic and ESTs.
CG repeats are least abundant.Among dinucleotides in ESTs CA and GA are as abundant
as TA repeats, and CG repeats are relatively more compared to Genomic and Zchr-BAC
repeats show a drastic reduction in number when the length of repeats
Tetra-, penta- and hexanucleotide repeats have
large groups of repeat type, thus we classified them based on AT percentage
starting from 0 to 100. For e.g. ATTT is a
, CTTT is a
, ATCG is a 50
repeats were observed under 75% AT rich
category followed by 100% AT (no C/G)
and 100% AT rich
repeat types constitute greater than 60% of the total number of pentanucleotide
repeats. 83.3% (single C/G)
and 100% AT rich
repeat types were greater than 50% of the total number of hexanucleotide
a common phenomenon where the repeat length
and repeat number were inversely proportional. This was more pronounced in trinucleotide repeats, especially because of their abundance in both the genome and ESTs.
All repeat types
have large number of repeat below 15 bp. Repeat tracks longer than 15 bp
are very less in number.
The average length of mononucleotides was more as compared to other repeat classes.
To test the base composition
surrounding the microsatellite repeats, we calculated GC percentage of
100 bases upstream and 100
flanking different classes of repeat motifs.
in AT/GC composition between upstream and downstream flanking sequences
was not observed except in the case of 100% AT rich penta repeats, where
downstream flanking sequence showed 5% lower GC content compared to upstream
sequences in both Genomic and EST sequences.
a positive correlation between the GC content of the repeat and the GC content of the flanking sequences in all
repeat types. As and when the GC content of the repeat
increased, GC content of the flanking sequence also
with a minimum of 30% to a maximum of 60%.