Using reliable and surprising item sets for the characterization of protein-protein interfaces
Résumé
Numerous research effort have aimed to characterize and predict protein-protein interfaces. This paper introduces a method that rely only on known protein-protein interfaces (positive instances only). It combines frequent item set mining techniques with statistical tests to ensure the selection of interesting features. Starting from a database of known interfaces described with geometrical elements, the method produces the elements and combinations thereof that are emphcharacteristic of the interfaces. This approach allows one to easily interpret the results, as compared to techniques that operate as ``black-boxes’’ and ensures a satisfactory proportion of reliable item sets. The results obtained on a set of 459 protein-protein interfaces from the DOCKGROUND database confirm that the findings are consistent with current knowledge about protein-protein interfaces.