Prediction of structural alphabet protein blocks using data mining - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Biochimie Année : 2022

Prediction of structural alphabet protein blocks using data mining

Résumé

3D protein structures determine proteins' biological functions. The 3D structure of the protein backbone can be approximated using the prototypes of local protein conformations. Sets of these prototypes are called structural alphabets (SAs). Amongst several approaches to the prediction of 3D structures from amino acid sequences, one approach is based on the prediction of SA prototypes for a given amino acid sequence. Protein Blocks (PBs) is the most known SA, and it is composed of 16 prototypes of five consecutive amino acids which were identified as optimal prototypes considering the ability to correctly approximate the local structure and the prediction accuracy of prototypes from an amino acid sequence. We developed models for PBs prediction from sequence information using different data mining approaches and machine learning algorithms. Besides the amino acid sequences, the results of the following tools were used to train the models: the Spider3 predictor of protein structure properties, several predictors of the protein's intrinsically disordered regions, and a tool for finding repeats in amino acid sequences. The highest accuracy of the constructed models is 80%, which is a significant improvement compared to the previous best available prediction, whose accuracy was 61%. Analyzing the models constructed by applying different algorithms, it was noticed that the significance of input attributes differs among the models constructed by algorithms. Using the information about amino acids belonging to intrinsically disordered regions and repeats improves the precision of prediction for some PBs using the CART classification algorithm, while this is not the case with the C5.0 classification algorithm. Improved prediction approaches can have interesting applications in protein structural model approaches or computational protein design.
Fichier non déposé

Dates et versions

hal-03652564 , version 1 (26-04-2022)

Identifiants

Citer

Mirjana Maljković, Nenad Mitić, Alexandre de Brevern. Prediction of structural alphabet protein blocks using data mining. Biochimie, 2022, 197, pp.74-85. ⟨10.1016/j.biochi.2022.01.019⟩. ⟨hal-03652564⟩
19 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More