• Protein location prediction using atomic composition and ...


  •   
  • FileName: 102_Protein-Location-Prediction.pdf?attachauth=ANoY7cpb5Vfq4sEWi-jgq92qIR2RGvg6FnTGJQdSVtrD4Co2kQ_hU [preview-online]
    • Abstract: system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and ... Self-consistency test, jackknife test and independent data test. are ...

Download the ebook

Biochemical and Biophysical Research Communications 391 (2010) 1670–1674
Contents lists available at ScienceDirect
Biochemical and Biophysical Research Communications
journal homepage: www.elsevier.com/locate/ybbrc
Protein location prediction using atomic composition and global features
of the amino acid sequence
Betsy Sheena Cherian *, Achuthsankar S. Nair
Centre for Bioinformatics, University of Kerala, Kariyavattom Campus, Thiruvananthapuram, Kerala, India
a r t i c l e i n f o a b s t r a c t
Article history: Subcellular location of protein is constructive information in determining its function, screening for drug
Received 14 December 2009 candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further
Available online 28 December 2009 studies. Computational prediction of subcellular localization deals with predicting the location of a pro-
tein from its amino acid sequence. For a computational localization prediction method to be more accu-
Keywords: rate, it should exploit all possible relevant biological features that contribute to the subcellular
Subcellular localization localization. In this work, we extracted the biological features from the full length protein sequence to
Amino acid composition
incorporate more biological information. A new biological feature, distribution of atomic composition
Atomic composition
Physiochemical properties
is effectively used with, multiple physiochemical properties, amino acid composition, three part amino
Sequence similarity acid composition, and sequence similarity for predicting the subcellular location of the protein. Support
Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our
system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and
independent data test respectively. Our results provide evidence that the prediction based on the biolog-
ical features derived from the full length amino acid sequence gives better accuracy than those derived
from N-terminal alone. Considering the features as a distribution within the entire sequence will bring
out underlying property distribution to a greater detail to enhance the prediction accuracy.
Ó 2009 Elsevier Inc. All rights reserved.
Introduction as protein sorting signal, usually present in the N-terminal of ami-
no acid sequence. Sorting signals of proteins for various locations
Cell is the basic unit of life and proteins are the work horses in like mitochondria, chloroplast, nucleus, peroxisome etc. had been
the cell. For a protein to perform its function, it should be located in identified [9–22]. Many computational subcellular prediction
its targeted cellular location. Information about a protein’s location methods use the presence of sorting signals for prediction
in the cell gives insight into the function of the protein and is use- [2,6,23–25]. The amino acid composition for localization prediction
ful in, screening candidates for drug discovery and vaccine design, deals with the entire protein sequence and has its own advantages.
annotating of gene products and, selecting relevant proteins for Pseudo-amino acid composition proposed by Chou [26–31] incor-
further studies. Computational subcellular localization prediction porated parameters that reflect sequence order effect with the
methods deal with predicting the location of the protein from its amino acid composition [32–36]. Several new methods consider
amino acid sequences. The success of computational subcellular the amino acid sequence as three parts, N-terminal, middle region
localization prediction relies on two important components. First and C-terminal to enhance various biological feature extraction
is the extraction of biological features which are relevant in the [1,33]. The physiochemical properties of the amino acids are of
subcellular localization and the second is the computational tech- great relevance in subcellular localization prediction [1,4,37,38].
nique employed for making prediction [1]. The biological features The most widely considered physiochemical parameters are
used for prediction include detection of protein sorting signal, ami- hydrophobicity, accessibility, flexibility, distribution ratio etc. Se-
no acid composition, physiochemical properties, and homology quence similarity is another potential biological feature for infer-
search [2–8]. ring subcellular location information. Needleman–Wunsch and
Most of the proteins, which are synthesized in the ribosomes, Smith–Waterman algorithm for sequence alignment has been used
are translocated to its destination by an inherent signal, known for subcellular localization prediction [39,40].
In this work, priority is given to the global features of the amino
acid sequence rather than features of a single part like N-terminal.
* Corresponding author. The biological features for prediction include atomic composition
E-mail address: betsy.skb@gmail[email protected] (B.S. Cherian). of the full sequence, multiple physiochemical properties for the full
0006-291X/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved.
doi:10.1016/j.bbrc.2009.12.118
B.S. Cherian, A.S. Nair / Biochemical and Biophysical Research Communications 391 (2010) 1670–1674 1671
sequence, amino acid composition of the full sequence, 3 part ami- ization address signal. Each amino acid in the sequence is replaced
no acid composition of the full sequence, and sequence similarity with corresponding physiochemical value to get a global represen-
for the entire sequence. tation of the physiochemical values. A full list of these physio-
chemical parameters is given in Supplementary data. This SVM
Materials and methods module has feature vector of length 96 as we consider 96 physio-
chemical parameters. Nearly half of these parameters are second-
For this study, we used the dataset compiled by Chou [8,27,41]. ary structure related, like, average relative probability of beta-
The training set contains amino acid sequences of 145 chloroplast sheet, normalized frequency of alpha-helix. Other widely consid-
proteins, 571 cytoplasmic proteins, 34 cytoskeleton proteins, 49 ered parameters like hydrophobicity, charge etc. are also included
endoplasmic reticulum proteins, 224 extracellular proteins, 25 in the list. Let P be a protein sequence, P = x1, x2, x3, x4, . . ., xN, where
Golgi apparatus proteins, 37 lysosome proteins, 84 mitochondria xi 2 A, i = 1, 2, 3, . . ., N, N is the length of the protein sequence and A
proteins, 272 nucleus proteins, 27 peroxisome proteins, 699 is the set of 20 amino acids, A = {a1, a2, a3, . . ., a20}. Let H be the ami-
plasma membrane proteins, and 24 vacuole proteins summing no acid index of a physiochemical parameter C. H = {h1, h2, . . ., h20}
up to 2191 protein sequences altogether. The independent dataset where hj is the amino acid index value of the amino acid aj.
has a total of 2494 proteins with, 112 chloroplast proteins, 761 !
cytoplasmic proteins, 19 cytoskeleton proteins, 106 endoplasmic X
N
C i ¼ log hi ð1Þ
reticulum proteins, 95 extracellular proteins, 4 Golgi apparatus
i¼1
proteins, 31 lysosome proteins, 163 mitochondria proteins, 418
nucleus proteins, 23 peroxisome proteins, 762 plasma membrane where Ci 2 {c1, c2, c3, . . ., c96}. The kernel parameters used were c = 2,
proteins. None of the protein in the independent dataset occurs C = 64.
in the training dataset. The fraction of each amino acid in the protein is used for predic-
Support Vector Machine (SVM) was proposed by Vapnik [42] as tion in amino acid composition SVM module. Let P be a protein se-
a very effective method for general purpose supervised pattern rec- quence, P = x1, x2, x3, x4, . . ., xN, where xi 2 A, i = 1, 2, 3, . . ., N and A is
ognition. Many of the subcellular localization prediction tools the set of 20 amino acids, A = {a1, a2, a3, . . ., a20}. We calculated the
make use of SVM [1,3–5,8,43]. The SVM are of better-quality in amino acid composition AAC as the fraction of each amino acid a in
practical applications and is well founded theoretically. SVMs are the sequence P.
popular because of their high performance, adaptability and their
AAC ðai Þ ¼ ðtotal number of amino acid ai Þ=N; ð2Þ
ability to deal with data in high dimensional feature space. The
SVM can classify nonlinear data using kernel transformation. The where, N is the total number of amino acids in the sequence. The
data is translated into a high dimensional feature space, and then feature vector for this SVM module had a length of 20 for each
the optimal separating hyperplane is determined. Since this work protein and the kernel parameters used were c = 4, C = 512.
deals with proteins of 11 locations, this is a multi-class problem. The three part amino acid composition SVM module consider the
There are two approaches for SVM to handle a multi-class problem, sequence as three parts, N-terminal, middle part, and C-terminal.
‘‘one-against-one” and ‘‘one-against-all”. We employed ‘‘one- This will bring out the compositional difference in each part of the
against-all” approach for making prediction. We used LIBSVM 2.9 sequence and will reveal the distribution of the amino acid compo-
[44] for making SVM modules. The Radial Basis Function (RBF) is sition to a greater detail. The protein sequence P is divided into three
used for all modules. The kernel parameters c and regularization segments PN, PM, PC, where PN is the N-terminal segment, PM is the
parameter C were optimized with the training set. middle segment and, PC is the C-terminal segment. The length of
The input features used for prediction include atomic composi- each segment was equal and was calculated based on the length of
tion, multiple physiochemical properties, amino acid composition, the amino acid sequence. Let L be the length of P. Then each segment
3 part amino acid composition and sequence similarity. is of length L/3. Amino acid composition of each segment was calcu-
Atomic composition is the number of constituent atoms in an lated separately. P = x1, x2, x3, x4, . . ., xN, where xi 2 A, i = 1, 2, 3, . . ., N
amino acid sequence. As the side chain atoms of the amino acids and A is the set of 20 amino acids, A = {a1, a2, a3, . . ., a20}. We calcu-
decide the property of the amino acid and amino acid composition lated the amino acid composition AAC as the fraction of each amino
itself is a powerful parameter for localization prediction, we acid a in the sequence P as in Eq. (2). The input feature vector for this
hypothesize that the atomic composition will serve as a feature SVM module had 60 elements for each protein and the kernel param-
for the localization prediction. Amino acids are made up of carbon, eters were c = 4, C = 512.
hydrogen, nitrogen, oxygen and sulphur atoms. Atomic composi- In sequence similarity module, the whole query sequence was
tion gives the total number of each type of atoms in an amino acid aligned against the sequences in the training set using Smith–
sequence. To reveal the distribution of the atomic composition to a Waterman algorithm [47] to make the prediction. Smith–Water-
greater detail, we logically divided N-terminal, middle region and man algorithm is a dynamic programming method to find the opti-
C-terminal into 3 subregions. Thus the SVM module for atomic mal alignment between the sequences. This algorithm finds out the
composition has a feature vector of size 45 for each protein. The local alignment between the protein sequences, bringing out com-
length of each subregion is equal and is calculated based on the mon patterns and domains within the sequences. Since the address
length of the amino acid sequence. signals for each location share common characteristics, this algo-
Let L be the length of P. Then each segment is of length L/9. Let P rithm is competent to detect the signals present within. The scor-
be a protein sequence, P = x1, x2, x3, x4, . . ., xN, where xi 2 A, i = 1, 2, ing matrix BLOSUM50 was used for alignment. Location of
3, . . ., N and A is the set of 20 amino acids, A = {a1, a2, a3, . . ., a20}. Let sequence with highest similarity with the query protein was pre-
T = {C, H, N, O, S} be the set of atoms in the amino acids. The atomic dicted as the location of the query protein.
composition of sequence segment Pi is calculated as ATC (Pi) = {t1, Four SVM modules were designed for atomic composition, mul-
t2, t3, t4, t5} where t 2 T, is the count of each type of atom in the se- tiple physiochemical properties, amino acid composition and three
quence segment Pi. The kernel parameters used are c = 1, C = 2. part amino acid compositions. These SVM modules were named
In physiochemical SVM module, multiple physiochemical ATC-SVM, Phys-SVM, AAC-SVM, 3-AAC-SVM, respectively. A voting
parameters from AAIndex database [45,46] are used for feature system has been employed to make the final prediction from these
extraction. This is based on the observation that it is not a single individual SVM modules. If a conflict occurs, for instance, the three
physiochemical property, but a group of them describes the local- modules predict different locations, sequence similarity module is
1672 B.S. Cherian, A.S. Nair / Biochemical and Biophysical Research Communications 391 (2010) 1670–1674
Query sequence
ATC-SVM Phys-SVM AAC-SVM 3-AAC-SVM Sequence
Alignment
Voting system
Prediction
Fig. 1. Schematic diagram of the subcellular prediction system.
used for making prediction. The layout of the system is depicted in that the full length sequence information performs better than
Fig. 1. N-terminal sequence information. The result of 96 physiochemical
Self-consistency test, jackknife test and independent data test values applied to both N-terminal and the full length sequence is
are performed to evaluate the system. The self-consistency test listed in Table 1. The result of sequence similarity applied to both
measures the self-consistency of the developed method. The same N-terminal and entire sequence is listed in Table 2. These result
dataset, from which the rules of classification are derived, is used show that usage of full length sequence gives better accuracy than
for making prediction. This will give high accuracy, because same using N-terminal alone. This can be because of the dispersion of the
dataset is used for training and testing. If the self-consistency of address signals within the entire sequence rather than in N-termi-
a method is poor, it is not a good classification method. In jackknife nal alone.
test, each protein in the training test is singled out to make predic-
tion using the rules derived from the rest of the training test. Jack- Smith–Waterman sequence alignment performs better
knife is considered as more objective and rigorous than other tests.
In independent test, the training dataset is used for training the We used Smith–Waterman sequence alignment on the whole
SVM to derive the support vectors and testing dataset is used for sequence for subcellular localization prediction. Our experiments
measuring the performance. The prediction accuracy of the each proved that, Smith–Waterman sequence alignment for the whole
protein subcellular location is calculated as sequence performs better than Needleman–Wunsch alignment
[48] for N-terminal alone, Needleman–Wunsch alignment for
Accuracy ðLÞ ¼ ðC L =T L Þ Â 100 ð3Þ
whole sequence and Smith–Waterman for the N-terminal. The
where CL is the number of true predictions and TL is the total number comparisons are given in Table 2. The higher accuracy exhibited
of proteins for location L. Total prediction accuracy is calculated as, by Smith–Waterman algorithm can be because of its capability
for finding out the local alignment within the protein sequences,
Accuracy ðSysÞ ¼ 1=N Â ðtotal correct predictionÞ ð4Þ
bringing out common patterns and domains within the sequences.
where N is the total number of proteins: ð5Þ The address signals for each location share common features and
this algorithm is competent to detect the signals present within.
Result and discussion
Performance
Global features perform better than N-terminal features alone
We did self-consistency test, jackknife test and independent
data test on the data. The result of the each module and the entire
We had tested each module with both N-terminal sequence
information and full sequence information. Our results showed
Table 1 Table 2
Comparison of prediction accuracies of physiochemical module applied both at N- Comparison of Needleman–Wunsch, Smith–Waterman sequence alignment applied
terminal and to entire sequence. both at N-terminal and to entire sequence.
Location N-terminal accuracy Full sequence accuracy Location NW N-terminal SW N-terminal NW full SW full
Chloroplast 55.36 73.21 Chloroplast 86.61 91.96 97.32 97.32
Cytoplasm 83.05 90.67 Cytoplasm 84.23 85.81 85.15 83.44
Cytoskeleton 52.63 94.74 Cytoskeleton 78.95 100.00 100.00 100.00
ER 56.60 66.98 ER 90.57 99.06 99.06 100.00
Extracellular 69.47 77.89 Extracellular 94.74 95.79 94.74 97.89
Golgi apparatus 50.00 25.00 Golgi apparatus 75.00 75.00 75.00 75.00
Lysosome 32.26 64.52 Lysosome 90.32 100.00 100.00 100.00
Mitochondria 41.10 30.67 Mitochondria 71.17 91.41 96.32 100.00
Nucleus 70.81 79.43 Nucleus 65.55 77.03 68.42 86.60
Peroxisome 30.43 30.43 Peroxisome 73.91 95.65 91.30 91.30
Plasma membrane 86.22 95.14 Plasma membrane 94.88 97.11 99.34 99.74
Total 74.94 83.00 Total 84.20 89.74 89.25 92.30
B.S. Cherian, A.S. Nair / Biochemical and Biophysical Research Communications 391 (2010) 1670–1674 1673
Table 3 used along with other global features of the sequence to enhance
Accuracy of each module for self-consistency test, jackknife and independent test. the accuracy of subcellular localization prediction.
Method Self-consistency Jackknife Independent
Phys-SVM 95.44 72.34 82.72 Appendix A. Supplementary data
3-AAC-SVM 100.00 80.92 84.72
AAC-SVM 100.00 77.50 81.07
SW 100.00 78.27 92.26
Supplementary data associated with this article can be found, in
ATC-SVM 97.63 71.79 77.39 the online version, at doi:10.1016/j.bbrc.2009.12.118.
Hybrid 100.00 81.29 84.60
Prediction system 100.00 82.47 88.81 References
[1] E. Tantoso, K.B. Li, AAIndexLoc: predicting subcellular localization of proteins
based on a new representation of sequences using amino acid indices, Amino
Acids 13 (2008) 345–353.
Table 4
[2] H. Bannai, Y. Tamada, O. Maruyama, K. Nakai, S. Miyano, Extensive feature
Comparison with other methods.
detection of n-terminal protein sorting signals, Bioinformatics 18 (2002) 298–
Method Self-consistency Jackknife Independent 305.
[3] M. Bhasin, A. Garg, G.P.S. Raghava, Pslpred: prediction of subcellular
Pseudo-amino acid composition, 85.8 73.0 80.9 localization of bacterial proteins, Bioinformatics 21 (2005) 2522–2524.
covariant-discriminant [4] M. Bhasin, G.P.S. Raghava, ESLpred: svm-based method for subcellular
method [27] localization of eukaryotic proteins using dipeptide composition and psi-blast,
Functional domain 87.3 66.7 81.7 Nucleic Acids Res. 32 (2004) W414–W419.
composition [49] [5] T. Blum, S. Briesemeister, O. Kohlbacher, MultiLoc2: integrating phylogeny and
Stochastic signal processing 81.5 67.7 73.9 gene ontology terms improves subcellular protein localization prediction, BMC
Bioinf. 10 (2009), doi:10.1186/1471-2105-10-274.
approach [35]
[6] J.L. Gardy, M.R. Laird, F. Chen, S. Rey, C.J. Walsh, M. Ester, F.S.L. Brinkman,
Cellular automata images [36] 86.4 72.6 74.8
Psortb v.2.0: expanded prediction of bacterial protein subcellular localization
Complexity measure factor [50] — 73.6 79.8
and insights gained from comparative proteome analysis, Bioinformatics 21
Hydrophobic patterns and average 86.0 72.8 79.9 (2005) 617–623.
power-spectral density [51] [7] A. Garg, M. Bhasin, G.P.S. Raghava, Support vector machine-based method for
Lyapunov index, bessel function, 82.3 69.9 — subcellular localization of human proteins using amino acid compositions,
and chebyshev filter [32] their order, and similarity search, J. Biol. Chem. 280 (2005) 14427–14432.
Multi-scale energy [52] — 80.3 87.0 [8] C.S. Yu, C.J. Lin, J.K. Hwang, Predicting subcellular localization of proteins for
Atomic composition (this paper) 97.6 71.8 77.4 gram-negative bacteria by support vector machines based on n-peptide
Hybrid: (this paper) 100.0 81.3 84.6 compositions, Protein Sci. 13 (2004) 1402–1406.
Prediction system (this paper) 100.0 82.5 88.8 [9] B.D. Bruce, The paradox of plastid transit peptides: conservation of function
despite divergence in primary structure, Biochim. Biophys. Acta 1541 (2001)
2–21.
[10] D. Christophe, C.C. Hobertus, B. Pichon, Nuclear targeting of proteins: how
many different signals?, Cell Signal 12 (2000) 337–341.
system is given in Table 3. The newly introduced feature, atomic
[11] M. Cokol, R. Nair, B. Rost, Finding nuclear localization signals, EMBO Rep. 1
composition, alone has significant prediction accuracy for indepen- (2000) 411–415.
dent test. Also the self-consistency test of atomic composition [12] R. Dono, D. James, R. Zeller, A GR-motif functions in nuclear accumulation of
module is higher than that of physiochemical module. Considering the large fgf-2 isoforms and interferes with mitogenic signalling, Oncogene 16
(1998) 2151–2158.
the full sequence as three parts and calculating the amino acid [13] O. Emanuelsson, Predicting protein subcellular localisation from amino acid
composition gives better accuracy than considering the whole se- sequence information, Brief. Bioinform. 3 (2002) 361–376.
quence together. This may be because the former expose the distri- [14] S.J. Gould, G.A. Keller, N. Hosken, J. Wilkinson, S. Subramani, A conserved
tripeptide sorts proteins to peroxisomes, J. Cell Biol. 108 (1989) 1657–1664.
bution of amino acid composition to a finer detail. A hybrid module [15] D. Kalderon, B.L. Roberts, W.D. Richardson, A.E. Smith, A short amino acid
based on Phys-SVM, 3-AAC-SVM, AAC-SVM, and ATC-SVM is devel- sequence able to specify nuclear location, Cell 39 (1984) 499–509.
oped to demonstrate the strength of the method when the se- [16] W. Neupert, Protein import into mitochondria, Annu. Rev. Biochem. 66 (1997)
863–917.
quence alignment is excluded. The prediction accuracies of this [17] N. Pfanner, A. Geissler, Versatility of the mitochondrial protein import
module also is reported. In the hybrid approach the 3-AAC-SVM machinery, Nat. Rev. Mol. Cell Biol. 2 (2001) 339–349.
is given weight for voting. We have conducted 5-fold cross valida- [18] V.W. Pollard, W.M. Michael, S. Nakielny, M.C. Siomi, F. Wang, G. Dreyfuss, A
novel receptor-mediated nuclear protein import pathway, Cell 86 (1996) 985–
tion for the individual SVM modules. The cross validation accura- 994.
cies are 71.56 for Phys-SVM, 80.00 for 3-AAC-SVM, 76.54 for [19] T.A. Rapoport, Transport of proteins across the endoplasmic reticulum
AAC-SVM, and 70.19 for ATC-SVM. Comparison of our method with membrane, Science 258 (1992) 931–936.
[20] J. Robbins, S.M. Dilwortht, R.A. Laskey, C. Dingwall, Two interdependent basic
other methods is given in Table 4.
domains in nucleoplasmin nuclear targeting sequence: identification of a class
of bipartite nuclear targeting sequence, Cell 64 (1991) 615–623.
Conclusions [21] G. von Heijne, Patterns of amino acids near signal-sequence cleavage sites, Eur.
J. Biochem. 133 (1983) 17–21.
[22] G. von Heijne, J. Steppuhn, R.G. Herrmann, Versatility of the mitochondrial
We have introduced a new parameter, atomic composition for protein import machinery, Eur. J. Biochem. 180 (2001) 535–545.
subcellular localization prediction and effectively integrated it [23] O. Emanuelsson, H. Nielsen, S. Brunak, G. von Heijne, Predicting subcellular
localization of proteins based on their n-terminal amino acid sequence, J. Mol.
with other parameters like amino acid composition, physiochemi- Biol. 300 (2000) 1005–1016.
cal parameters and sequence similarity. Our results demonstrated [24] K. Nakai, M. Kanehisa, Expert system for predicting protein localization sites in
that the global information of the sequence contributed more to gram-negative bacteria, Proteins 11 (1991) 95–110.
[25] K. Nakai, M. Kanehisa, A knowledge base for predicting protein localization
the prediction accuracy. This is found true in the case of physio-
sites in eukaryotic cells, Genomics 14 (1992) 897–911.
chemical properties and sequence alignment modules. Another [26] K.C. Chou, Prediction of protein subcellular locations by incorporating quasi-
observation is that considering the full sequence as a group of sequence-order effect, Biochem. Biophys. Res. Commun. 278 (2000) 477–483.
[27] K.C. Chou, Prediction of protein cellular attributes using pseudo-amino acid
three parts, N-terminal, middle region and C-terminal, will bring
composition, Proteins: Struct. Funct. Genet. 43 (2001) 246–255.
out underlying property distribution to a greater detail to enhance [28] K.C. Chou, Using amphiphilic pseudo amino acid composition to predict
the prediction accuracy. For sequence alignment module, the enzyme subfamily classes, Bioinformatics 21 (2005) 10–19.
Smith–Waterman algorithm for whole sequence performs better [29] K.C. Chou, Y.D. Cai, Prediction of membrane protein types by incorporating
amphipathic effects, J. Chem. Inf. 45 (2005) 407–413.
than Needleman–Wunsch algorithm for whole sequence. Our work [30] H.B. Shen, K.C. Chou, Ensemble classifier for protein fold pattern recognition,
strongly demonstrates that atomic composition can be effectively Bioinformatics 22 (2006) 1717–1722.
1674 B.S. Cherian, A.S. Nair / Biochemical and Biophysical Research Communications 391 (2010) 1670–1674
[31] H.B. Shen, K.C. Chou, PseAAC: a flexible web server for generating various [41] K.C. Chou, D.W. Elrod, Protein subcellular location prediction, Protein Eng. 12
kinds of protein pseudo amino acid composition, Anal. Biochem. 373 (2008) (1999) 107–118.
386–388. [42] V.N. Vapnik, The Nature of Statistical Learning Theory, Wiley-Interscience,
[32] Y. Gao, S. Shao, X. Xiao, Y. Ding, Y. Huang, Z. Huang, K.C. Chou, Using pseudo New York, 1998.
amino acid composition to predict protein subcellular location: approached [43] R. Nair, B. Rost, Mimicking cellular sorting improves prediction of subcellular
with lyapunov index, bessel function, and chebyshev filter, Amino Acids 28 localization, J. Mol. Biol. 348 (2005) 85–100.
(2005) 373–376. [44] C.C. Chang, C. Lin, LIBSVM: a library for support vector machines, 2001.
[33] S. Matsuda, J.P. Vert, H. Saigo, N. Ueda, H. Toh, T. Akutsu, A novel www.csie.ntu.edu.tw/~cjlin/libsvm.
representation of protein sequences for prediction of subcellular location [45] S. Kawashima, H. Ogata, M. Kanehisa, AAindex: amino acid index database,
using support vector machines, Protein Sci. 14 (2005) 2804–2813. Nucleic Acids Res. 27 (1


Use: 0.1466