Issues on Sampling Negative Examples for Predicting Prokaryotic Promoters

Eduardo Gusmão 1 Marcilio De Souto 2, *
* Auteur correspondant
1 Computational Biology Research Group
CBRG - Computational Biology Research Group
Abstract : Supervised learning methods have been successfully used to build classifiers for the identification of promoter regions. The classifier is often built from a dataset that has examples of promoter (positive) and non-promoter (negative) regions. Thus, a careful selection of the data used for constructing and testing a promoter finding algorithm is a very important issue. In this context, whereas experimentally known promoter regions can safely be assumed to be positive training instances, since definite knowledge whether a given region is not a promoter is not generally available, negative instances are not straightforward to be obtained. To make the problem more complex, for the case of promoter, there is not a unequivocal definition of what a negative instance is. As a consequence, depending on which definition of non-promoter region one assumed to build the data, such a choice could affect significantly the performance of the classifier and/or yield a biased estimate of the performance. We present an empirical study of the effect of this kind of problem for promoter prediction in E. coli. As far as we are concerned, up to now, there is no such a kind of study for the context of prokaryotic promoter prediction.
Type de document :
Communication dans un congrès
IEEE. IEEE IJCNN 2014, Jul 2014, Beijing, China. pp.494-501, 2014, 〈10.1109/IJCNN.2014.6889557〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00955531
Contributeur : Marcilio De Souto <>
Soumis le : mardi 4 mars 2014 - 16:11:02
Dernière modification le : jeudi 17 janvier 2019 - 15:10:02

Identifiants

Collections

Citation

Eduardo Gusmão, Marcilio De Souto. Issues on Sampling Negative Examples for Predicting Prokaryotic Promoters. IEEE. IEEE IJCNN 2014, Jul 2014, Beijing, China. pp.494-501, 2014, 〈10.1109/IJCNN.2014.6889557〉. 〈hal-00955531〉

Partager

Métriques

Consultations de la notice

67