N. Terrapon, O. Gascuel, E. Maréchal, and L. Bréhélin, Detection of new protein domains using co-occurrence: application to Plasmodium falciparum, Bioinformatics, vol.25, issue.23, pp.253077-3083, 2009.
DOI : 10.1093/bioinformatics/btp560

URL : https://hal.archives-ouvertes.fr/lirmm-00431171

J. Richardson, The Anatomy and Taxonomy of Protein Structure, Adv Protein Chem, vol.34, pp.167-339, 1981.
DOI : 10.1016/S0065-3233(08)60520-3

A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, SCOP: A structural classification of proteins database for the investigation of sequences and structures, Journal of Molecular Biology, vol.247, issue.4, pp.536-540, 1995.
DOI : 10.1016/S0022-2836(05)80134-2

S. Hunter, InterPro: the integrative protein signature database, Nucleic Acids Research, vol.37, issue.Database, pp.211-215, 2009.
DOI : 10.1093/nar/gkn785

URL : https://hal.archives-ouvertes.fr/hal-01214141

R. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger et al., The Pfam protein families database, Nucleic Acids Research, pp.38-211, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01294685

R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: Probabilistic models of proteins and nucleic acids, 1998.
DOI : 10.1017/CBO9780511790492

E. Pizzi and C. Frontali, Low-Complexity Regions in Plasmodium falciparum Proteins, Genome Research, vol.11, issue.2, pp.218-229, 2001.
DOI : 10.1101/gr.GR-1522R

O. Bastien, S. Lespinats, S. Roy, K. Métayer, B. Fertil et al., Analysis of the compositional biases in Plasmodium falciparum genome and proteome using Arabidopsis thaliana as a reference, Gene, vol.336, issue.2, pp.163-173, 2004.
DOI : 10.1016/j.gene.2004.04.029

L. Coin, A. Bateman, and R. Durbin, Enhanced protein domain discovery using taxonomy, BMC Bioinformatics, vol.5, issue.1, p.56, 2004.
DOI : 10.1186/1471-2105-5-56

I. Alam, S. Hubbard, S. Oliver, and M. Rattray, A kingdom-specific protein domain HMM library for improved annotation of fungal genomes, BMC Genomics, vol.8, issue.1, p.97, 2007.
DOI : 10.1186/1471-2164-8-97

D. Jones, W. Taylor, and J. Thornton, The rapid generation of mutation data matrices from protein sequences, Bioinformatics, vol.8, issue.3, pp.275-282, 1992.
DOI : 10.1093/bioinformatics/8.3.275

S. Whelan and N. Goldman, A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach, Molecular Biology and Evolution, vol.18, issue.5, pp.691-699, 2001.
DOI : 10.1093/oxfordjournals.molbev.a003851

S. Le and O. Gascuel, An Improved General Amino Acid Replacement Matrix, Molecular Biology and Evolution, vol.25, issue.7, pp.1307-1320, 2008.
DOI : 10.1093/molbev/msn067

URL : https://hal.archives-ouvertes.fr/lirmm-00324106

S. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol.28, issue.2, 1957.
DOI : 10.1109/TIT.1982.1056489

G. Apic, J. Gough, and S. Teichmann, Domain combinations in archaeal, eubacterial and eukaryotic proteomes, Journal of Molecular Biology, vol.310, issue.2, pp.311-325, 2001.
DOI : 10.1006/jmbi.2001.4776

B. Efron and G. Gong, A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician, pp.36-48, 1983.

J. Wootton and S. Federhen, Statistics of local complexity in amino acid sequences and sequence databases, Computers & Chemistry, vol.17, issue.2, pp.149-163, 1993.
DOI : 10.1016/0097-8485(93)85006-X

A. Ghouila, N. Terrapon, O. Gascuel, F. Guerfali, D. Laouini et al., EuPathDomains: The divergent domain database for eukaryotic pathogens, Infection, Genetics and Evolution, vol.11, issue.4, 2010.
DOI : 10.1016/j.meegid.2010.09.008

URL : https://hal.archives-ouvertes.fr/lirmm-00540932

K. Forslund and E. Sonnhammer, Predicting protein function from domain content, Bioinformatics, vol.24, issue.15, pp.1681-1687, 2008.
DOI : 10.1093/bioinformatics/btn312

N. Ponts, E. Harris, J. Prudhomme, I. Wick, C. Eckhardt-ludka et al., Nucleosome landscape and control of transcription in the human malaria parasite, Genome Research, vol.20, issue.2, pp.228-238, 2010.
DOI : 10.1101/gr.101063.109

G. Mcconkey, J. Pinney, D. Westhead, K. Plueckhahn, T. Fitzpatrick et al., Annotating the Plasmodium genome and the enigma of the shikimate pathway, Trends in Parasitology, vol.20, issue.2, pp.60-65, 2004.
DOI : 10.1016/j.pt.2003.11.001

B. Cantarel, P. Coutinho, C. Rancurel, T. Bernard, V. Lombard et al., The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics, Nucleic Acids Research, vol.37, issue.Database, pp.37-233, 2009.
DOI : 10.1093/nar/gkn663

S. Sato, The apicomplexan plastid and its evolution. Cellular and Molecular Life Sciences, pp.681285-1296, 2011.

A. Kumar and L. Cowen, Augmented training of hidden Markov models to recognize remote homologs via simulated evolution, Bioinformatics, vol.25, issue.13, pp.1602-1608, 2009.
DOI : 10.1093/bioinformatics/btp265

H. Mamitsuka, A Learning Method of Hidden Markov Models for Sequence Discrimination, Journal of Computational Biology, vol.3, issue.3, pp.361-373, 1996.
DOI : 10.1089/cmb.1996.3.361

D. Brown, N. Krishnamurthy, J. Dale, W. Christopher, and K. Sjölander, SUBFAMILY HMMS IN FUNCTIONAL GENOMICS, Biocomputing 2005, pp.322-333, 2005.
DOI : 10.1142/9789812702456_0031

P. Srivastava, D. Desai, S. Nandi, and A. Lynn, HMM-ModE ??? Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences, BMC Bioinformatics, vol.8, issue.1, p.104, 2007.
DOI : 10.1186/1471-2105-8-104