HAMSALab of Computational BiologyCentre for DNA Fingerprinting & Diagnostics

The working of the tool can be described as two step process:
(a) Extraction of position-specific,structure-based and amino-acid features from the query protein (input processing) and
(b) Prediction of deleterious mutations:Disease or Benign.

Input processing: The input to the  tool is the amino acid sequence of the query protein for where mutations mapped onto protein has to be predicted. From the amino acid sequence 10 nsSNP neutral-disease discriminatory features (nsSNPND)  are calculated. They are:



Position-specific features

Position-specific probability score of wild-type amino acid residues (pab_WT) 

Position-specific probability score of mutant-type amino acid residues (pab_MT)

Difference of position–specific probabilities score of  Wild and Mutant  amino acid residues (pab_WT-pab_MT )

Gribskov’s Score of  wild-type amino acid residues (Gab_WT)

Gribskov’s Score of  mutant-type amino acid residues (Gab_MT)

Difference of Gribskov’s Score of  wild-type and mutant  amino acid residues (Gab_WT-Gab_MT )

Sequence-based Structure Features

Solvent accessibility status of amino acid residues Buried (1) or Exposed (0) $

Secondary structure prediction status Alpha-helix (1) or extended strand (2) or rest (0) #

Amino acid  residue based Features

Difference of transfer in free energy values of wild type and mutated type from inside to surface of the protein @

BLOSUM62 Substitution scores for Wild-type -Mutated Type amino acids


$ Solvent accessibility calculated from ACCpro4.0 (Cheng et al.,2005)
# Secondary structure prediction calculated from SSpro v4.5 (Cheng et al.,2005)
@ Transfer in free energy values  from inside to outside of a globular protein (Janin,1979)

Prediction of Deleterious mutation:
Mutation mapping onto the query protein will be characterized as “Disease” or “Neutral”  by our SVM-based method. This is  carried out by means of Support Vector Machine (SVM) which is a supervised machine-learning method first developed by Vapnik (1995). All SVM computations are carried out using LIBSVM (Chang and Lin, 2001) using RBF kernel with the values of the cost parameter C and the kernel parameter g optimized by us. Before the actual predictions are carried out, we have used 5-fold cross validation of HumVar dataset .This is referred to as training of SVM.

References for HANSA:
Vishal Acharya and H.A.Nagarajaram (2012) Human Mutation 33 332-337
Vishal Acharya and H.A.Nagarajaram (2013) Human Mutation 34 407

Copyright © 2011 CDFD, All Rights Reserved.
Laboratory of Computational Biology
Centre for DNA Fingerprinting and Diagnostics,
Nampally, Hyderabad, INDIA.