A top-down linguistic approach to the analysis of genomic sequences: The metabotropic glutamate receptors 1 and 5 in human and in mouse as a case study
Résumé
This paper presents a top-down strategy to detect features in genomic sequences. The strategy's core is to exploit dictionary-based compression algorithms and analyze the content of the automatically generated dictionary. We classify the different over-represented segments and in the case study we correlate them to experimentally identified or theoretically forecasted biological features. A large spectrum analysis reveals that the only feature co-located with the extracted segments is the torsional flexibility of DNA, while non-B DNA configurations are anti-localized and other features are mostly independent of the extracted sequences. This analysis unravels complex relationships between the linguistic structures investigated under our approach and some known biological features.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...