75 6.2 A Unified Framework For Data-Driven And User-Driven Geolocated Event Discovery, Taking Into Account User Feedback Into Biased Quality Measures: The Case of Geolocated Event Detection In Social Media Contents 6.1 Introduction ,
Integration Of User Feedback Into Quality Measure ,
81 6.4.1 Event Detection With Coverage Guarantee, Algorithms For Computing ,
87 6.5.4 User-driven discovery of geo-located events ,
,
As such, they are an incredibly rich mean to know the pulse of the world, or of a specific neighborhood, in real time. Analyzing the abundant user-generated content can provide high valued information. Social media data have been analysed for several purposes, e.g. to understand the concerns of a population, Introduction Social microblogging, 2010. ,
Cohesive Subgraphs with Exceptional Attributes (CSEA), Mining Subjectively Interesting Attributed Subgraphs Contents 7.1 Introduction ,
,
,
109 graph embeddings [Cai et al., 2017]-which map the nodes of a graph into a low dimensional space while preserving the local and global graph structure as well as possible-, community detection [Fortunato, 2010]-the discovery of groups of vertices that somehow 'belong together'-, or subgraph mining-the identification of informative subgraphs. Besides the relational structure, the so-called attributed graphs may carry information in the form of attribute-value pairs on vertices and/or edges ,
, U)) is the description length of S (resp. U
, Encoding an attribute over |A| possibilities costs log(|A|) bits. We do this encoding (|S| + 1) times, one for each attribute in S plus one for the length of S. The second term is the length of the encoding of restriction (a, with M a = |{â|{?|{â(v) | v ? V }|, the number of distinct values of a on the graph
As mentioned above, we describe the vertex set U in the pattern as (the intersection of) a set of neighborhoods N d (v), v ? V , with a set of exceptions: vertices are in the intersection but not part of U. The length of such a description is the sum of the description lengths of the neighborhoods and the exceptions. More formally, let us define the set of all neighborhoods N = {N d (v) | v ? V ? d ? 0, D} (with D the maximum range d considered), and let N (U) = {N d (v) subset X ? N (U), along with the set of exceptions exc(X,U) ? N d (v)?X N d (v) \U, ]) and the encoding of the other bound of the interval is in logarithm of the number of distinct values of a on the graph ,
, |N |)). The second term accounts for the description of the number of exceptions (log(| ? x?X x|)), and for describing the exceptions themselves, first term accounts for the description of the number of neighborhoods (log(|N |), and for describing which neighborhoods are involved (|X| log
there is generally no unique way to describe the set U. The best one is thus the one that minimizes f ,
,
, ) that are closed simultaneously with respect to U, S, and the neighborhood description. Second, it ranks patterns according to their SI values. The calculation of IC(U, S) and DL A (S) is simple and direct. However, computing DL V (U) is not trivial, since there are several ways to describe U and we are looking for the one minimizing f (X,U), SIAS-Miner mines interesting patterns using an enumerate-and-rank approach. First, it enumerates all CSEA patterns
, The set of exceptions in X ?Y ? {e} is equal to exc(X ?Y ? {e},U) = exc(X ? {e},U) ? exc
then exc(X ?Y ? {e},U) ? exc(X ?Y ? {e },U) ,
, Notice that even if an element e has been removed due to the lower bound of e , the procedure is still correct since e is lower bound by e by the transitivity of inclusion, Algorithm, vol.7
,
Topology of complex networks: Local events and universality, Phys. Rev, vol.85, pp.5234-5237, 2000. ,
The role of domain knowledge in data mining, Proceedings of the fourth international conference on Information and knowledge management, pp.37-43, 1995. ,
Using gps to learn significant locations and predict movement across multiple users, Pers. Ub. Comput, vol.7, issue.5, pp.275-286, 2003. ,
Predicting the future with social media, WI-IAT, pp.492-499, 2010. ,
Sd-map-A fast algorithm for exhaustive subgroup discovery, PKDD 2006, pp.6-17, 2006. ,
Description-oriented community detection using exhaustive subgroup discovery. Information Science, IEEE/ACM ASONAM, vol.329, pp.757-764, 2016. ,
Mining frequent patterns with counting inference, SIGKDD Explorations, vol.2, pp.66-75, 2000. ,
URL : https://hal.archives-ouvertes.fr/hal-00467750
Detecting group differences: Mining contrast sets, Data mining and knowledge discovery, vol.5, issue.3, pp.213-246, 2001. ,
Track me! a web based location tracking and analysis system, FIMI '04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, vol.126, pp.117-122, 2004. ,
Flash points: Discovering exceptional pairwise behaviors in vote or rating data, Machine Learning and Knowledge Discovery in Databases-European Conference, ECML PKDD 2017, pp.442-458, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01587041
Ahmed Anes Bendimerad, Rémy Cazabet, Marc Plantevit, and Céline Robardet. Contextual subgraph discovery with mobility models, The Sixth International Conference on Complex Networks and Their Applications, pp.477-489, 2016. ,
Anes Bendimerad, Marc Plantevit, and Céline Robardet. Mining exceptional closed patterns in attributed graphs, KAIS, pp.1-25, 2017. ,
Ahmed Anes Bendimerad, Marc Plantevit, and Céline Robardet. Mining exceptional closed patterns in attributed graphs, Knowl. Inf. Syst, vol.56, issue.1, pp.1-25, 2018. ,
Björn Bringmann, and Aristides Gionis, European Conf. on Machine Learning and Princ. and Pract. of Knowl. Disc. in Databases (ECML/PKDD), pp.115-130, 2009. ,
Interactive knowledge discovery from hidden data through sampling of frequent patterns, Statistical Analysis and Data Mining, vol.9, issue.4, pp.205-229, 2016. ,
Interactive pattern mining on hidden data: a sampling-based solution, Proceedings of the 21st ACM international conference on Information and knowledge management, pp.95-104, 2012. ,
, Tijl De Bie. An information theoretic framework for data mining, KDD, pp.564-572, 2011.
, Tijl De Bie. Maximum entropy models and subjective interestingness, Data Mining and Knowledge Discovery, vol.23, issue.3, pp.407-446, 2011.
Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems, Proceedings of the first workshop on graph based methods for natural language processing, pp.73-80, 2006. ,
An inductive database system based on virtual mining views, Data Min. Knowl. Discov, vol.24, issue.1, pp.247-287, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00599315
, Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment, issue.10, p.10008, 2008.
Listing closed sets of strongly accessible set systems with applications to data mining, Theor. Comput. Sci, vol.411, issue.3, pp.691-700, 2010. ,
Direct local pattern sampling by efficient two-step random procedures, ACM SIGKDD 2011, pp.582-590, 2011. ,
Extending the state-of-the-art of constraint-based pattern discovery, Fundamentals in information theory and coding, vol.60, pp.25-31, 2005. ,
Pattern mining in frequent dynamic subgraphs, ICDM, pp.818-822, 2006. ,
Free-sets: A condensed representation of boolean data for the approximation of frequency queries, Data Min. Knowl. Discov, vol.7, issue.1, pp.5-22, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-01503814
, Local pattern detection in attributed graphs, pp.168-183, 2016.
The anatomy of a large-scale hypertextual web search engine. Comp. net. and ISDN systems, vol.30, pp.107-117, 1998. ,
What is frequent in a single graph? In PAKDD, pp.858-863, 2008. ,
Luc De Raedt, and Siegfried Nijssen. Don't be afraid of simpler patterns, Knowledge Discovery in Databases: PKDD, p.10, 2006. ,
, European Conference on Principles and Practice of Knowledge Discovery in Databases, pp.55-66, 2006.
Causal inference on event sequences, Proceedings of the 2018 SIAM International Conference on Data Mining, SDM 2018, vol.56, pp.285-307, 2018. ,
Fast generation of best interval patterns for nonmonotonic constraints, Machine Learning and Knowledge Discovery in Databases-European Conference, ECML PKDD 2015, pp.157-172, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01186718
Ali Cakmak and Gultekin Özsoyoglu. Taxonomy-superimposed graph mining, EDBT, pp.217-228, 2008. ,
Mining all non-derivable frequent itemsets, Principles of Data Mining and Knowledge Discovery, 6th European Conference, pp.74-85, 2002. ,
A survey on condensed representations for frequent sets, Constraint-Based Mining and Inductive Databases, European Workshop on Inductive Databases and Constraint Based Mining, pp.64-80, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-01613469
Toon Calders, Bart Goethals, and Szymon Jaroszewicz. Mining rank-correlated sets of numerical attributes, KDD, pp.96-105, 2006. ,
Enhancing space-aware community detection using degree constrained spatial null model, Workshop CompleNet, pp.47-55, 2015. ,
Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts, J. Biomedical Semantics, vol.6, p.27, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01192959
Closed patterns meet n-ary relations, TKDD, vol.3, issue.1, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-01499247
Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, KDD '14, vol.26, pp.1166-1175, 2013. ,
Compressing neural networks with the hashing trick, Proceedings of the 32nd International Conference on Machine Learning, pp.2285-2294, 2009. ,
Entropy, relative entropy and mutual information. Elements of information theory, vol.2, pp.1-55, 1991. ,
Discovering a taste for the unusual: exceptional models for preference mining, Machine Learning, 2018. ,
imap: Indirect measurement of air pollution with cellphones, PerCom Workshops, pp.1-6, 2009. ,
, Janez Demsar. Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, vol.7, pp.1-30, 2006.
, Cohesive co-evolution patterns in dynamic attributed graphs, Discovery Science, pp.110-124, 2012.
Granularity of co-evolution patterns in dynamic attributed graphs, Advances in Intelligent Data Analysis XIII-13th International Symposium, pp.84-95, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01301086
Efficient mining of emerging patterns, ACM SIGKDD, pp.43-52, 1999. ,
Multiscale event detection in social media, vol.29, pp.1374-1405, 2015. ,
Improving interpretability of deep neural networks with semantic information, 2017. ,
Exceptionally monotone models-the rank correlation model class for exceptional model mining, Knowl. Inf. Syst, vol.51, issue.2, pp.369-394, 2017. ,
Exceptional model mining-supervised descriptive local pattern mining with complex target concepts, ICDM 2010, vol.30, pp.47-98, 2010. ,
Wouter Duivesteijn. A short survey of exceptional model mining: Exploring unusual interactions between multiple targets, 2014 International Workshop on Multi-Target Prediction, 2014. ,
Interactive learning of pattern rankings, International Journal on Artificial Intelligence Tools, vol.23, issue.06, p.1460026, 2014. ,
Where is the soho of rome? measures and algorithms for finding similar neighborhoods in cities, Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis, vol.108, pp.35-41, 1977. ,
Redescription Mining. Springer Briefs in Computer Science, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01726072
Fosca Giannotti and Dino Pedreschi. Mobility, data mining and privacy, 2008. ,
Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, and Panayiotis Tsaparas, TKDD, vol.1, issue.3, p.14, 2007. ,
Grosskreutz and Rüping, 2009] Henrik Grosskreutz and Stefan Rüping. On subgroup discovery in numerical domains, Advances in frequent itemset mining implementations: report on fimi'03. SIGKDD Explorations, vol.6, pp.210-226, 2004. ,
A relevance criterion for sequential patterns, ECMLPKDD, pp.369-384, 2013. ,
Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, and Luc De Raedt. Miningzinc: A declarative framework for constraint-based mining, ICDM, vol.244, pp.6-29, 2010. ,
Analysis of temporal networks using signal processing methods : Application to the bike-sharing system in Lyon. Theses, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01216173
Mining multiple-level association rules in large databases, IEEE Trans. Knowl. Data Eng, vol.11, issue.5, pp.798-804, 1999. ,
Route profiling: putting context to work, SAC, pp.1567-1573, 2004. ,
Analyzing feature trajectories for event detection, SIGIR, pp.207-214, 2007. ,
Sequence classification based on delta-free sequential patterns, 2014 IEEE International Conference on Data Mining, ICDM, pp.170-179, 2014. ,
Event detection in twitter using aggressive filtering and hierarchical tweet clustering, snow@WWW, pp.33-40, 2014. ,
Akihiro Inokuchi. Mining generalized substructures from a set of labeled graphs, ICDM, vol.39, pp.415-418, 1996. ,
Mining frequent cross-graph quasi-cliques, vol.2, pp.1-42, 2009. ,
What effects topological changes in dynamic graphs?-elucidating relationships between vertex attributes and the graph structure, Machine Learning, vol.5, pp.1171-1211, 2011. ,
Towards proximity pattern mining in large graphs, SIGMOD, pp.867-878, 2010. ,
Constraint programming for mining n-ary patterns, Principles and Practice of Constraint Programming-CP 2010-16th International Conference, pp.552-567, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01016652
Bursty and hierarchical structure in streams, KDD, pp.91-101, 2002. ,
Explora: A multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining, 1996. ,
Learning of simple conceptual graphs from positive and negative examples, Principles of Data Mining and Knowledge Discovery, Third European Conference, PKDD '99, pp.174-185, 1999. ,
, Subgroup discovery with CN2-SD. JMLR, vol.5, pp.153-188, 2004.
Exceptional model mining, ECMLPKDD 2008, pp.1-16, 2008. ,
Andreas Hotho, and Markus Strohmaier. Mining subgroups with exceptional transition behavior, KDD, pp.965-974, 2016. ,
SNAP: A general purpose network analysis and graph mining library in C++, 2014. ,
Tedas: A twitter-based event detection and analysis system, ICDE'12, pp.1273-1276, 2010. ,
P-n-rminer: a generic framework for mining interesting structured relational patterns, I. J. Data Science and Analytics, vol.1, issue.1, pp.61-76, 2016. ,
, Zachary Chase Lipton. The mythos of model interpretability, 2016.
Effective pruning techniques for mining quasicliques, ECML/PKDD, pp.33-49, 2008. ,
Integrating classification and association rule mining, KDD, pp.80-86, 1998. ,
Mining statistically significant sequential patterns, 2013 IEEE 13th International Conference on Data Mining, pp.488-497, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00922255
Finding time period-based most frequent path in big trajectory data, SIGMOD, pp.260-272, 2004. ,
The PSP approach for mining sequential patterns, Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD '98, vol.88, p.22812, 1998. ,
Solving Large Scale Learning Tasks. Challenges and Algorithms-Essays Dedicated to Katharina Morik on the Occasion of Her 60th Birthday, Discovery Science, vol.9580, pp.1-15, 2009. ,
, Mining cohesive patterns from graphs with feature vectors, SIAM SDM, pp.593-604, 2009.
, Finding collections of k-clique percolated components in attributed graphs, PAKDD, 2012.
Benjamin Négrevergne and Tias Guns. Constraint-based sequence mining using constraint programming, Integration of AI and OR Techniques in Constraint Programming12th International Conference, vol.39, pp.288-305, 2014. ,
Exploratory mining and pruning optimizations of constrained association rules, Proceedings ACM SIGMOD International Conference on Management of Data, vol.28, pp.13-24, 1998. ,
Multidimensional association rules in boolean tensors, Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, pp.570-581, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01354377
Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, Intell. Data Anal, vol.17, issue.1, pp.49-69, 2013. ,
Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining, Systems, Man and Cybernetics, vol.5, pp.377-403, 2004. ,
The pagerank citation ranking: Bringing order to the web, WWW, pp.161-172, 1998. ,
, , vol.1150, 2014.
On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Journal of Science, vol.50, issue.302, pp.157-175, 1900. ,
Pushing convertible constraints in frequent itemset mining, Data Min. Knowl. Discov, vol.8, issue.3, pp.227-252, 2004. ,
Zipf's and benford's laws in twitter hashtags, EACL, pp.84-93, 2017. ,
Condensed representation of sequential patterns according to frequency-based measures, Adv. in Intelligent Data Analysis, vol.30, pp.155-166, 2009. ,
Mining multidimensional and multilevel sequential patterns, vol.4, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-01381826
Mining graph topological patterns, IEEE Trans. Knowl. Data Eng, vol.25, issue.9, pp.2090-2104, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01351727
Constraint-based pattern set mining, Proceedings of the Seventh SIAM International Conference on Data Mining, pp.237-248, 2007. ,
Céline Robardet. Constraint-Based Pattern Mining in Dynamic Graphs, Data Warehousing and Knowledge Discovery, 10th International Conference, pp.950-955, 2008. ,
Ranking interesting subgroups, Proceedings of the 26th Annual International Conference on Machine Learning, pp.913-920, 2009. ,
Null models for community detection in spatially embedded, temporal networks, J. Complex Networks, vol.4, issue.3, pp.363-406, 2016. ,
Mining attribute-structure correlated patterns in large attributed graphs, vol.5, pp.466-477, 2012. ,
A universal model for mobility and migration patterns, 2011. ,
Mining constraint-based patterns using automatic relaxation, Intell. Data Anal, vol.13, issue.1, pp.109-133, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-01012079
Mining dominant patterns in the sky, 11th IEEE International Conference on Data Mining, ICDM 2011, pp.655-664, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00623566
, Mining sequential patterns: Generalizations and performance improvements, EDBT, pp.3-17, 1996.
Statistical significance of combinatorial regulations, Proceedings of the National Academy of Sciences, vol.110, issue.32, pp.12996-13001, 2013. ,
Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems, Artif. Intell, vol.244, pp.402-414, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-02048224
, Takeaki Uno. An efficient algorithm for solving pseudo clique enumeration problem, Algorithmica, vol.56, issue.1, pp.3-16, 2010.
Matthijs van Leeuwen. Interactive data exploration using pattern mining, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, vol.21, pp.169-182, 2010. ,
Jilles Vreeken, Matthijs van Leeuwen, and Arno Siebes, Data Min. Knowl. Discov, vol.23, issue.1, pp.169-214, 2011. ,
Frequent closed sequence mining without candidate maintenance, IEEE Trans. Knowl. Data Eng, vol.19, issue.8, pp.1042-1056, 2007. ,
Measurement error in network data: A re-classification, KDD, vol.34, pp.396-409, 2011. ,
Redundancy-aware maximal cliques, ACM SIGKDD 2013, pp.122-130, 2013. ,
A multiple test correction for streams and cascades of statistical hypothesis tests, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.78-87, 1997. ,
Discovering topically and temporally coherent events in interaction networks, ECMLPKDD, 2016. ,
gSpan: Graph-Based Substructure Pattern Mining, Int. Conf. on Data Mining (ICDM), pp.721-724, 2002. ,
Clospan: Mining closed sequential patterns in large databases, SDM, pp.166-177, 2003. ,
Charm: An efficient algorithm for closed itemset mining, SDM. SIAM, 2002. ,
Scalable algorithms for association mining, IEEE Trans. Knowl. Data Eng, vol.12, issue.3, pp.372-390, 2000. ,
SPADE: an efficient algorithm for mining frequent sequences, Machine Learning, vol.42, pp.31-60, 2001. ,
Visualizing and understanding convolutional networks, Computer Vision-ECCV 2014-13th European Conference, pp.818-833, 2014. ,
Geoburst: Real-time local event detection in geo-tagged tweet streams, ACM SIGIR, pp.513-522, 2016. ,
Mining interesting locations and travel sequences from gps trajectories, WWW, pp.791-800, 2009. ,
The p 1 p 2/d hypothesis: The case of railway express, The Journal of Psychology, vol.22, issue.1, pp.3-8, 1946. ,