Hide and Mine in Strings: Hardness and Algorithms

Abstract : We initiate a study on the fundamental relationbetween data sanitization (i.e., the process of hiding confidentialinformation in a given dataset) and frequent pattern mining, inthe context of sequential (string) data. Current methods for stringsanitization hide confidential patterns introducing, however, anumber of spurious patterns that may harm the utility offrequent pattern mining. The main computational problem isto minimize this harm. Our contribution here is twofold. First,we present several hardness results, for different variants of thisproblem, essentially showing that these variants cannot be solvedor even be approximated in polynomial time. Second, we proposeinteger linear programming formulations for these variants andalgorithms to solve them, which work in polynomial time undercertain realistic assumptions on the problem parameters.
