. , 75 6.2 A Unified Framework For Data-Driven And User-Driven Geolocated Event Discovery, Taking Into Account User Feedback Into Biased Quality Measures: The Case of Geolocated Event Detection In Social Media Contents 6.1 Introduction

. , Integration Of User Feedback Into Quality Measure

G. .. Events, 81 6.4.1 Event Detection With Coverage Guarantee, Algorithms For Computing

. .. Experiments, 87 6.5.4 User-driven discovery of geo-located events

. .. Conclusion,

. Twitter, . Weibo, E. Instagram, and . Xiao, As such, they are an incredibly rich mean to know the pulse of the world, or of a specific neighborhood, in real time. Analyzing the abundant user-generated content can provide high valued information. Social media data have been analysed for several purposes, e.g. to understand the concerns of a population, Introduction Social microblogging, 2010.

. .. , Cohesive Subgraphs with Exceptional Attributes (CSEA), Mining Subjectively Interesting Attributed Subgraphs Contents 7.1 Introduction

.. .. Sias-miner-algorithm,

. .. Experiments,

. .. Conclusion, 109 graph embeddings [Cai et al., 2017]-which map the nodes of a graph into a low dimensional space while preserving the local and global graph structure as well as possible-, community detection [Fortunato, 2010]-the discovery of groups of vertices that somehow 'belong together'-, or subgraph mining-the identification of informative subgraphs. Besides the relational structure, the so-called attributed graphs may carry information in the form of attribute-value pairs on vertices and/or edges

, U)) is the description length of S (resp. U

, Encoding an attribute over |A| possibilities costs log(|A|) bits. We do this encoding (|S| + 1) times, one for each attribute in S plus one for the length of S. The second term is the length of the encoding of restriction (a, with M a = |{â|{?|{â(v) | v ? V }|, the number of distinct values of a on the graph

?. S. , As mentioned above, we describe the vertex set U in the pattern as (the intersection of) a set of neighborhoods N d (v), v ? V , with a set of exceptions: vertices are in the intersection but not part of U. The length of such a description is the sum of the description lengths of the neighborhoods and the exceptions. More formally, let us define the set of all neighborhoods N = {N d (v) | v ? V ? d ? 0, D} (with D the maximum range d considered), and let N (U) = {N d (v) subset X ? N (U), along with the set of exceptions exc(X,U) ? N d (v)?X N d (v) \U, ]) and the encoding of the other bound of the interval is in logarithm of the number of distinct values of a on the graph

, |N |)). The second term accounts for the description of the number of exceptions (log(| ? x?X x|)), and for describing the exceptions themselves, first term accounts for the description of the number of neighborhoods (log(|N |), and for describing which neighborhoods are involved (|X| log

. Clearly, there is generally no unique way to describe the set U. The best one is thus the one that minimizes f

S. Algorithm,

, ) that are closed simultaneously with respect to U, S, and the neighborhood description. Second, it ranks patterns according to their SI values. The calculation of IC(U, S) and DL A (S) is simple and direct. However, computing DL V (U) is not trivial, since there are several ways to describe U and we are looking for the one minimizing f (X,U), SIAS-Miner mines interesting patterns using an enumerate-and-rank approach. First, it enumerates all CSEA patterns

, The set of exceptions in X ?Y ? {e} is equal to exc(X ?Y ? {e},U) = exc(X ? {e},U) ? exc

X. ?-{e}, U. )-?-exc(x-?-{e, }. , and U. , then exc(X ?Y ? {e},U) ? exc(X ?Y ? {e },U)

, Notice that even if an element e has been removed due to the lower bound of e , the procedure is still correct since e is lower bound by e by the transitivity of inclusion, Algorithm, vol.7

. Cand-?-{e-i-?-cand-|-?e-j-?-cand-\-{e-i-},

A. , B. Albert, and A. Barabási, Topology of complex networks: Local events and universality, Phys. Rev, vol.85, pp.5234-5237, 2000.

[. Anand, The role of domain knowledge in data mining, Proceedings of the fourth international conference on Information and knowledge management, pp.37-43, 1995.

S. Ashbrook and T. Starner, Using gps to learn significant locations and predict movement across multiple users, Pers. Ub. Comput, vol.7, issue.5, pp.275-286, 2003.

]. S. Huberman, B. Asur, and . Huberman, Predicting the future with social media, WI-IAT, pp.492-499, 2010.

M. Atzmueller and F. Puppe, Sd-map-A fast algorithm for exhaustive subgroup discovery, PKDD 2006, pp.6-17, 2006.

. Atzmueller, Description-oriented community detection using exhaustive subgroup discovery. Information Science, IEEE/ACM ASONAM, vol.329, pp.757-764, 2016.

[. Bastide, Mining frequent patterns with counting inference, SIGKDD Explorations, vol.2, pp.66-75, 2000.
URL : https://hal.archives-ouvertes.fr/hal-00467750

P. Stephen, D. Bay, and . Pazzani, Detecting group differences: Mining contrast sets, Data mining and knowledge discovery, vol.5, issue.3, pp.213-246, 2001.

[. Bayardo, Track me! a web based location tracking and analysis system, FIMI '04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, vol.126, pp.117-122, 2004.

. Belfodil, Flash points: Discovering exceptional pairwise behaviors in vote or rating data, Machine Learning and Knowledge Discovery in Databases-European Conference, ECML PKDD 2017, pp.442-458, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01587041

. Bendimerad, Ahmed Anes Bendimerad, Rémy Cazabet, Marc Plantevit, and Céline Robardet. Contextual subgraph discovery with mobility models, The Sixth International Conference on Complex Networks and Their Applications, pp.477-489, 2016.

. Bendimerad, Anes Bendimerad, Marc Plantevit, and Céline Robardet. Mining exceptional closed patterns in attributed graphs, KAIS, pp.1-25, 2017.

. Bendimerad, Ahmed Anes Bendimerad, Marc Plantevit, and Céline Robardet. Mining exceptional closed patterns in attributed graphs, Knowl. Inf. Syst, vol.56, issue.1, pp.1-25, 2018.

[. Berlingerio, Björn Bringmann, and Aristides Gionis, European Conf. on Machine Learning and Princ. and Pract. of Knowl. Disc. in Databases (ECML/PKDD), pp.115-130, 2009.

A. Hasan, ;. M. Bhuiyan, and M. A. Hasan, Interactive knowledge discovery from hidden data through sampling of frequent patterns, Statistical Analysis and Data Mining, vol.9, issue.4, pp.205-229, 2016.

[. Bhuiyan, Interactive pattern mining on hidden data: a sampling-based solution, Proceedings of the 21st ACM international conference on Information and knowledge management, pp.95-104, 2012.

, Tijl De Bie. An information theoretic framework for data mining, KDD, pp.564-572, 2011.

, Tijl De Bie. Maximum entropy models and subjective interestingness, Data Mining and Knowledge Discovery, vol.23, issue.3, pp.407-446, 2011.

C. Biemann, Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems, Proceedings of the first workshop on graph based methods for natural language processing, pp.73-80, 2006.

. Blockeel, An inductive database system based on virtual mining views, Data Min. Knowl. Discov, vol.24, issue.1, pp.247-287, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00599315

, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment, issue.10, p.10008, 2008.

[. Boley, Listing closed sets of strongly accessible set systems with applications to data mining, Theor. Comput. Sci, vol.411, issue.3, pp.691-700, 2010.

[. Boley, Direct local pattern sampling by efficient two-step random procedures, ACM SIGKDD 2011, pp.582-590, 2011.

L. Bonchi, C. Lucchese, and ;. Bonchi, Extending the state-of-the-art of constraint-based pattern discovery, Fundamentals in information theory and coding, vol.60, pp.25-31, 2005.

[. Borgwardt, Pattern mining in frequent dynamic subgraphs, ICDM, pp.818-822, 2006.

[. Boulicaut, Free-sets: A condensed representation of boolean data for the approximation of frequency queries, Data Min. Knowl. Discov, vol.7, issue.1, pp.5-22, 2003.
URL : https://hal.archives-ouvertes.fr/hal-01503814

, Local pattern detection in attributed graphs, pp.168-183, 2016.

S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine. Comp. net. and ISDN systems, vol.30, pp.107-117, 1998.

N. Bringmann, What is frequent in a single graph? In PAKDD, pp.858-863, 2008.

[. Bringmann, Luc De Raedt, and Siegfried Nijssen. Don't be afraid of simpler patterns, Knowledge Discovery in Databases: PKDD, p.10, 2006.

, European Conference on Principles and Practice of Knowledge Discovery in Databases, pp.55-66, 2006.

. Budhathoki, K. Vreeken, J. Budhathoki, and . Vreeken, Causal inference on event sequences, Proceedings of the 2018 SIAM International Conference on Data Mining, SDM 2018, vol.56, pp.285-307, 2018.

[. Buzmakov, Fast generation of best interval patterns for nonmonotonic constraints, Machine Learning and Knowledge Discovery in Databases-European Conference, ECML PKDD 2015, pp.157-172, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01186718

[. Cai, Ali Cakmak and Gultekin Özsoyoglu. Taxonomy-superimposed graph mining, EDBT, pp.217-228, 2008.

B. Goethals-;-toon-calders and . Goethals, Mining all non-derivable frequent itemsets, Principles of Data Mining and Knowledge Discovery, 6th European Conference, pp.74-85, 2002.

[. Calders, A survey on condensed representations for frequent sets, Constraint-Based Mining and Inductive Databases, European Workshop on Inductive Databases and Constraint Based Mining, pp.64-80, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01613469

[. Calders, Toon Calders, Bart Goethals, and Szymon Jaroszewicz. Mining rank-correlated sets of numerical attributes, KDD, pp.96-105, 2006.

[. Carchiolo, Enhancing space-aware community detection using degree constrained spatial null model, Workshop CompleNet, pp.47-55, 2015.

[. Cellier, Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts, J. Biomedical Semantics, vol.6, p.27, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01192959

[. Cerf, Closed patterns meet n-ary relations, TKDD, vol.3, issue.1, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01499247

[. Cerf, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, KDD '14, vol.26, pp.1166-1175, 2013.

[. Chen, Compressing neural networks with the hashing trick, Proceedings of the 32nd International Conference on Machine Learning, pp.2285-2294, 2009.

T. Thomas, M. Cover, . Joy, and . Thomas, Entropy, relative entropy and mutual information. Elements of information theory, vol.2, pp.1-55, 1991.

. De-sá, Discovering a taste for the unusual: exceptional models for preference mining, Machine Learning, 2018.

. Demirbas, imap: Indirect measurement of air pollution with cellphones, PerCom Workshops, pp.1-6, 2009.

, Janez Demsar. Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, vol.7, pp.1-30, 2006.

, Cohesive co-evolution patterns in dynamic attributed graphs, Discovery Science, pp.110-124, 2012.

[. Desmier, Granularity of co-evolution patterns in dynamic attributed graphs, Advances in Intelligent Data Analysis XIII-13th International Symposium, pp.84-95, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01301086

L. Dong and J. Li, Efficient mining of emerging patterns, ACM SIGKDD, pp.43-52, 1999.

. Dong, Multiscale event detection in social media, vol.29, pp.1374-1405, 2015.

. Dong, Improving interpretability of deep neural networks with semantic information, 2017.

L. Downar and W. Duivesteijn, Exceptionally monotone models-the rank correlation model class for exceptional model mining, Knowl. Inf. Syst, vol.51, issue.2, pp.369-394, 2017.

[. Duivesteijn, Exceptional model mining-supervised descriptive local pattern mining with complex target concepts, ICDM 2010, vol.30, pp.47-98, 2010.

. Duivesteijn, Wouter Duivesteijn. A short survey of exceptional model mining: Exploring unusual interactions between multiple targets, 2014 International Workshop on Multi-Target Prediction, 2014.

[. Dzyuba, Interactive learning of pattern rankings, International Journal on Artificial Intelligence Tools, vol.23, issue.06, p.1460026, 2014.

D. Strash-;-david-eppstein and ;. Strash, Where is the soho of rome? measures and algorithms for finding similar neighborhoods in cities, Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis, vol.108, pp.35-41, 1977.

E. Galbrun and P. Miettinen, Redescription Mining. Springer Briefs in Computer Science, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01726072

P. , Fosca Giannotti and Dino Pedreschi. Mobility, data mining and privacy, 2008.

[. Gionis, Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, and Panayiotis Tsaparas, TKDD, vol.1, issue.3, p.14, 2007.

Z. Goethals, Grosskreutz and Rüping, 2009] Henrik Grosskreutz and Stefan Rüping. On subgroup discovery in numerical domains, Advances in frequent itemset mining implementations: report on fimi'03. SIGKDD Explorations, vol.6, pp.210-226, 2004.

[. Grosskreutz, A relevance criterion for sequential patterns, ECMLPKDD, pp.369-384, 2013.

[. Günnemann, Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, and Luc De Raedt. Miningzinc: A declarative framework for constraint-based mining, ICDM, vol.244, pp.6-29, 2010.

R. Hamon, Analysis of temporal networks using signal processing methods : Application to the bike-sharing system in Lyon. Theses, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01216173

F. Han, Y. Han, and . Fu, Mining multiple-level association rules in large databases, IEEE Trans. Knowl. Data Eng, vol.11, issue.5, pp.798-804, 1999.

C. Harrington, V. Harrington, and . Cahill, Route profiling: putting context to work, SAC, pp.1567-1573, 2004.

[. He, Analyzing feature trajectories for event detection, SIGIR, pp.207-214, 2007.

[. Holat, Sequence classification based on delta-free sequential patterns, 2014 IEEE International Conference on Data Mining, ICDM, pp.170-179, 2014.

. Ifrim, Event detection in twitter using aggressive filtering and hierarchical tweet clustering, snow@WWW, pp.33-40, 2014.

T. Imielinski and H. Mannila, Akihiro Inokuchi. Mining generalized substructures from a set of labeled graphs, ICDM, vol.39, pp.415-418, 1996.

P. Jiang and J. Pei, Mining frequent cross-graph quasi-cliques, vol.2, pp.1-42, 2009.

[. Kang, What effects topological changes in dynamic graphs?-elucidating relationships between vertex attributes and the graph structure, Machine Learning, vol.5, pp.1171-1211, 2011.

[. Khan, Towards proximity pattern mining in large graphs, SIGMOD, pp.867-878, 2010.

[. Khiari, Constraint programming for mining n-ary patterns, Principles and Practice of Constraint Programming-CP 2010-16th International Conference, pp.552-567, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01016652

J. Kleinberg, Bursty and hierarchical structure in streams, KDD, pp.91-101, 2002.

W. Klosgen, Explora: A multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining, 1996.

S. O. Kuznetsov, Learning of simple conceptual graphs from positive and negative examples, Principles of Data Mining and Knowledge Discovery, Third European Conference, PKDD '99, pp.174-185, 1999.

[. Lavrac, Subgroup discovery with CN2-SD. JMLR, vol.5, pp.153-188, 2004.

[. Leman, Exceptional model mining, ECMLPKDD 2008, pp.1-16, 2008.

[. Lemmerich, Andreas Hotho, and Markus Strohmaier. Mining subgroups with exceptional transition behavior, KDD, pp.965-974, 2016.

S. Leskovec and R. Sosi?, SNAP: A general purpose network analysis and graph mining library in C++, 2014.

[. Li, Tedas: A twitter-based event detection and analysis system, ICDE'12, pp.1273-1276, 2010.

[. Lijffijt, P-n-rminer: a generic framework for mining interesting structured relational patterns, I. J. Data Science and Analytics, vol.1, issue.1, pp.61-76, 2016.

, Zachary Chase Lipton. The mythos of model interpretability, 2016.

W. Liu, L. Liu, and . Wong, Effective pruning techniques for mining quasicliques, ECML/PKDD, pp.33-49, 2008.

[. Liu, Integrating classification and association rule mining, KDD, pp.80-86, 1998.

. Low-kam, Mining statistically significant sequential patterns, 2013 IEEE 13th International Conference on Data Mining, pp.488-497, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922255

[. Luo, Finding time period-based most frequent path in big trajectory data, SIGMOD, pp.260-272, 2004.

[. Masseglia, The PSP approach for mining sequential patterns, Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD '98, vol.88, p.22812, 1998.

. Michaelis, Solving Large Scale Learning Tasks. Challenges and Algorithms-Essays Dedicated to Katharina Morik on the Occasion of Her 60th Birthday, Discovery Science, vol.9580, pp.1-15, 2009.

, Mining cohesive patterns from graphs with feature vectors, SIAM SDM, pp.593-604, 2009.

, Finding collections of k-clique percolated components in attributed graphs, PAKDD, 2012.

. Mougel, Benjamin Négrevergne and Tias Guns. Constraint-based sequence mining using constraint programming, Integration of AI and OR Techniques in Constraint Programming12th International Conference, vol.39, pp.288-305, 2014.

. Négrevergne, Exploratory mining and pruning optimizations of constrained association rules, Proceedings ACM SIGMOD International Conference on Management of Data, vol.28, pp.13-24, 1998.

. Nguyen, Multidimensional association rules in boolean tensors, Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, pp.570-581, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01354377

. Nguyen, Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, Intell. Data Anal, vol.17, issue.1, pp.49-69, 2013.

K. Nijssen, J. N. Nijssen, . Kok-;-petra-kralj, N. Novak, G. I. Lavra? et al., Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining, Systems, Man and Cybernetics, vol.5, pp.377-403, 2004.

[. Page, The pagerank citation ranking: Bringing order to the web, WWW, pp.161-172, 1998.

. Papadopoulos, , vol.1150, 2014.

J. Pearl and . Causality, On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Journal of Science, vol.50, issue.302, pp.157-175, 1900.

[. Pei, Pushing convertible constraints in frequent itemset mining, Data Min. Knowl. Discov, vol.8, issue.3, pp.227-252, 2004.

. Pérez-melián, Zipf's and benford's laws in twitter hashtags, EACL, pp.84-93, 2017.

[. Petitjean, Condensed representation of sequential patterns according to frequency-based measures, Adv. in Intelligent Data Analysis, vol.30, pp.155-166, 2009.

[. Plantevit, Mining multidimensional and multilevel sequential patterns, vol.4, 2010.
URL : https://hal.archives-ouvertes.fr/hal-01381826

[. Prado, Mining graph topological patterns, IEEE Trans. Knowl. Data Eng, vol.25, issue.9, pp.2090-2104, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01351727

. Zimmermann, A. Luc-de-raedt, and . Zimmermann, Constraint-based pattern set mining, Proceedings of the Seventh SIAM International Conference on Data Mining, pp.237-248, 2007.

P. Raïssi and M. Plantevit, Céline Robardet. Constraint-Based Pattern Mining in Dynamic Graphs, Data Warehousing and Knowledge Discovery, 10th International Conference, pp.950-955, 2008.

S. Rueping, Ranking interesting subgroups, Proceedings of the 26th Annual International Conference on Machine Learning, pp.913-920, 2009.

[. Sarzynska, Null models for community detection in spatially embedded, temporal networks, J. Complex Networks, vol.4, issue.3, pp.363-406, 2016.

[. Silva, Mining attribute-structure correlated patterns in large attributed graphs, vol.5, pp.466-477, 2012.

[. Simini, A universal model for mobility and migration patterns, 2011.

C. Soulet and B. Crémilleux, Mining constraint-based patterns using automatic relaxation, Intell. Data Anal, vol.13, issue.1, pp.109-133, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01012079

. Soulet, Mining dominant patterns in the sky, 11th IEEE International Conference on Data Mining, ICDM 2011, pp.655-664, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00623566

, Mining sequential patterns: Generalizations and performance improvements, EDBT, pp.3-17, 1996.

. Terada, Statistical significance of combinatorial regulations, Proceedings of the National Academy of Sciences, vol.110, issue.32, pp.12996-13001, 2013.

[. Ugarte, Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems, Artif. Intell, vol.244, pp.402-414, 2007.
URL : https://hal.archives-ouvertes.fr/hal-02048224

, Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem, Algorithmica, vol.56, issue.1, pp.3-16, 2010.

. Leeuwen-;-matthijs-van-leeuwen, Matthijs van Leeuwen. Interactive data exploration using pattern mining, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, vol.21, pp.169-182, 2010.

[. Vreeken, Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes, Data Min. Knowl. Discov, vol.23, issue.1, pp.169-214, 2011.

[. Wang, Frequent closed sequence mining without candidate maintenance, IEEE Trans. Knowl. Data Eng, vol.19, issue.8, pp.1042-1056, 2007.

[. Wang, Measurement error in network data: A re-classification, KDD, vol.34, pp.396-409, 2011.

[. Wang, Redundancy-aware maximal cliques, ACM SIGKDD 2013, pp.122-130, 2013.

. I. Petitjean-;-geoffrey, F. Webb, . Petitjean-;-stefan, and . Wrobel, A multiple test correction for streams and cascades of statistical hypothesis tests, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.78-87, 1997.

[. Xiao, Discovering topically and temporally coherent events in interaction networks, ECMLPKDD, 2016.

H. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, Int. Conf. on Data Mining (ICDM), pp.721-724, 2002.

[. Yan, Clospan: Mining closed sequential patterns in large databases, SDM, pp.166-177, 2003.

J. Mohammed, C. Zaki, and . Hsiao, Charm: An efficient algorithm for closed itemset mining, SDM. SIAM, 2002.

M. Zaki, Scalable algorithms for association mining, IEEE Trans. Knowl. Data Eng, vol.12, issue.3, pp.372-390, 2000.

M. Zaki, SPADE: an efficient algorithm for mining frequent sequences, Machine Learning, vol.42, pp.31-60, 2001.

D. Matthew, R. Zeiler, and . Fergus, Visualizing and understanding convolutional networks, Computer Vision-ECCV 2014-13th European Conference, pp.818-833, 2014.

[. Zhang, Geoburst: Real-time local event detection in geo-tagged tweet streams, ACM SIGIR, pp.513-522, 2016.

[. Zheng, Mining interesting locations and travel sequences from gps trajectories, WWW, pp.791-800, 2009.

G. Kingsley-zipf, The p 1 p 2/d hypothesis: The case of railway express, The Journal of Psychology, vol.22, issue.1, pp.3-8, 1946.