. Al-salemi, . Bassam, M. J. Aziz, N. Ab, and S. Azman, Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study, Journal of Information Science, vol.11, issue.2, pp.732-746, 2015.
DOI : 10.1007/978-3-540-24752-4_14

. Ad00, S. Acid, . De-campos, and M. Luis, Learning Right Sized Belief Networks by Means of a Hybrid Methodology, M. Lecture Notes in Computer Science, pp.309-315, 1910.

. Ad01, S. Acid, . De-campos, and M. Luis, A hybrid methodology for learning belief networks: BENEDICT, In: International Journal of Approximate Reasoning, vol.273, pp.235-262, 2001.

H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, vol.19, issue.6, pp.716-723, 1974.
DOI : 10.1109/TAC.1974.1100705

. Andersson, A. Steen, . Madigan, . David, . Perlman et al., Alternative Markov Properties for Chain Graphs, Scandinavian Journal of Statistics, vol.28, issue.1, pp.40-48, 1996.
DOI : 10.1111/1467-9469.00224
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.5147

A. Antonucci, . Corani, . Giorgio, D. Mauá, . Deratani et al., An Ensemble of Bayesian Networks for Multilabel Classification, IJCAI. Ed. by Rossi, Francesca. IJCAI/AAAI, p.2013

D. D. Ayers, A Bayesian Method Reexamined, pp.23-27, 1994.
DOI : 10.1016/B978-1-55860-332-5.50008-0
URL : http://arxiv.org/abs/1302.6781

K. P. Burnham, D. Anderson, and . Robert, Model selection and multimodel inference: a practical information-theoretic approach, 2002.
DOI : 10.1007/b97636

J. Besag and . English, Spatial Interaction and the Statistical Analysis of Lattice Systems, Journal of the Royal Statistical Society. Series BMethodological), vol.36, issue.2, pp.192-236, 1974.

L. Breiman and J. H. Friedman, Predicting Multivariate Responses in Multiple Linear Regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.59, issue.1, pp.3-54, 1997.
DOI : 10.1111/1467-9868.00054

J. K. Bradley and C. Guestrin, Learning Tree Conditional Random Fields, pp.127-134, 2010.

. Bishop and M. Christopher, Pattern recognition and machine learning, 2006.

F. R. Bach, . Jordan, and I. Michael, Thin Junction Trees In: NIPS, pp.569-576, 2001.

. Bielza, . Concha, G. Li, and P. Larrañaga, Multi-dimensional classification with Bayesian networks, International Journal of Approximate Reasoning, vol.52, issue.6, pp.705-727, 2011.
DOI : 10.1016/j.ijar.2011.01.007
URL : http://doi.org/10.1016/j.ijar.2011.01.007

. Borchani, . Hanen, . Varando, . Gherardo, C. Bielza et al., A survey on multi-output regression, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.33, issue.5, pp.216-233, 2015.
DOI : 10.18637/jss.v033.i01

L. Breiman, Bagging predictors, Machine Learning, vol.10, issue.2, pp.123-140, 1996.
DOI : 10.2307/1403680

. Blockeel, . Hendrik, L. Raedt, and R. De, Top-Down Induction of Clustering Trees, pp.55-63, 1998.

. Buntine and L. Wray, Theory Refinement on Bayesian Networks, pp.52-60, 1991.
DOI : 10.1016/B978-1-55860-203-8.50010-3
URL : http://arxiv.org/abs/1303.5709

J. Cussens and M. Bartlett, Advances in Bayesian Network Learning using Integer Programming, 2013.

L. Chen, . Huang, and Z. Jianhua, Sparse reduced-rank regression with covariance estimation, Statistics and Computing, vol.69, issue.1, pp.461-470, 2016.
DOI : 10.1111/j.1467-9868.2007.00591.x

G. F. Cooper and E. Herskovits, A Bayesian Method for Constructing Bayesian Belief Networks from Databases, pp.86-94, 1991.
DOI : 10.1016/B978-1-55860-203-8.50015-2
URL : http://arxiv.org/abs/1303.5714

J. Cheng, . Greiner, . Russell, . Kelly, . Jonathan et al., Learning Bayesian networks from data: An information-theory based approach, Artificial Intelligence, vol.137, issue.1-2, pp.43-90, 2002.
DOI : 10.1016/S0004-3702(02)00191-1
URL : http://doi.org/10.1016/s0004-3702(02)00191-1

. Chierichetti, . Flavio, . Kumar, . Ravi, . Pandey et al., Finding the Jaccard Median, In: SODA. Ed. by Charikar, Moses. SIAM, pp.293-311, 2010.
DOI : 10.1137/1.9781611973075.25

D. Chickering and . Maxwell, Optimal Structure Identification With Greedy Search, In: Journal of Machine Learning Research, vol.3, issue.82, pp.507-554, 2002.

D. Chickering and . Maxwell, Learning Bayesian Networks is NP-Complete, pp.121-130, 1995.
DOI : 10.1007/978-1-4612-2404-4_12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1322

D. Chickering, . Maxwell, . Heckerman, . David, and C. Meek, Large- Sample Learning of Bayesian Networks is NP-Hard, In: Journal of Machine Learning Research, vol.5, issue.84, pp.1287-1330, 2004.
DOI : 10.1007/978-1-4612-2404-4_12
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1322

C. K. Chow and C. N. Liu, Approximating discrete probability distributions with dependence trees, IEEE Transactions on Information Theory, vol.14, issue.3, pp.462-467, 1968.
DOI : 10.1109/TIT.1968.1054142
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.9772

D. Chickering, . Maxwell, and C. Meek, Finding Optimal Bayesian Networks In: UAI, pp.94-102, 2002.

D. Chickering, . Maxwell, . Meek, . Christopher, and D. Heckerman, Large- Sample Learning of Bayesian Networks is NP-Hard, pp.124-133, 2003.

E. Cherman, . Alvares, J. Metz, M. Monard, and . Carolina, Incorporating label dependency into the binary relevance framework for multi-label classification, In: Expert Systems With Applications, vol.392, issue.119, pp.1647-1655, 2012.

G. Corani, A. Antonucci, D. Mauá, . Deratani, and S. Gabaglio, Trading off Speed and Accuracy in Multilabel Classification, Linda C. van der and Feelders, pp.145-159, 2014.
DOI : 10.1007/978-3-319-11433-0_10

R. G. Cowell, A. P. Dawid, . Lauritzen, L. Steffen, and D. J. Spiegelhalter, Probabilistic networks and expert systems, 1999.

F. Cozman and . Gagliardi, Independence for full conditional probabilities: Structure, factorization, non-uniqueness, and Bayesian networks, International Journal of Approximate Reasoning, vol.54, issue.9, pp.1261-1278, 2013.
DOI : 10.1016/j.ijar.2013.08.001

. Cruz-ramírez, . Nicandro, . Acosta-mesa, . Héctor-gabriel, . Barrientos-martínez et al., How Good Are the Bayesian Information Criterion and the Minimum Description Length Principle for Model Selection? A Bayesian Network Analysis, Lecture Notes in Computer Science, vol.4293, pp.494-504, 2006.
DOI : 10.1007/11925231_46

J. Cussens, Bayesian network learning with cutting planes In: UAI, pp.153-160, 2011.

D. R. Cox and N. Wermuth, Linear Dependencies Represented by Chain Graphs, Statistical Science, vol.8, issue.3, pp.204-218, 1993.
DOI : 10.1214/ss/1177010887

D. R. Cox and N. Wermuth, Multivariate Dependencies: Models, Analysis and Interpretation, 1996.

A. Dawid and . Philip, Beware of the DAG! " In: NIPS Causality: Objectives and Assessment, JMLR Proceedings. JMLR.org, pp.59-86, 2010.

. Daw79, A. Dawid, and . Philip, Conditional Independence in Statistical Theory, Journal of the Royal Statistical Society, Series B, vol.41, pp.1-31, 1979.

. Daw80, A. Dawid, and . Philip, Conditional independence for statistical operations, The Annals of Statistics, vol.83, pp.598-617, 1980.

. Dembczynski, . Krzysztof, . Cheng, . Weiwei, and E. Hüllermeier, Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains, pp.279-286, 2010.

. Dash, . Denver, . Druzdzel, and J. Marek, A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data, pp.142-149, 1999.

+. De, L. M. De-campos, J. M. Fernández-luna, . Gámez, A. José et al., Ant colony optimization for learning Bayesian networks, In: International Journal of Approximate Reasoning, vol.313, pp.291-311, 2002.

. De-campos and M. Luis, A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests, In: Journal of Machine Learning Research, vol.7, pp.2149-2187, 2006.

. De-campos and M. Luis, Characterizations of Decomposable Dependency Models (Research Note), In: Journal of Artificial Intelligence Research, vol.5, pp.289-300, 1996.

. Dembczynski, . Krzysztof, . Waegeman, . Willem, . Cheng et al., Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss, In: ECML/PKDD Lecture Notes in Computer Science, vol.6321, issue.1, pp.280-295, 2010.
DOI : 10.1007/978-3-642-15880-3_24

. Dembczynski, . Krzysztof, . Waegeman, . Willem, . Cheng et al., An Exact Algorithm for F-Measure Maximization In: NIPS, pp.1404-1412, 2011.

. Dembczynski, . Krzysztof, . Waegeman, . Willem, . Cheng et al., On label dependence and loss minimization in multi-label classification, Machine Learning, vol.18, issue.12, pp.5-45, 2012.
DOI : 10.1145/1835804.1835930

. Dembczynski, . Krzysztof, . Jachnik, . Arkadiusz, . Kotlowski et al., Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization, JMLR Proceedings. JMLR.org, pp.1130-1138, 2013.

Y. Deng, . Li, . Dong, . Xie, . Xudong et al., Partially occluded face completion and recognition, pp.4145-4148, 2009.

L. M. De-campos, J. M. Fernández-luna, J. Puerta, and . Miguel, An iterated local search algorithm for learning Bayesian networks with restarts based on conditional independence tests, International Journal of Intelligent Systems, vol.2143, issue.9, pp.221-235, 2003.
DOI : 10.1109/34.537345

. Dic45, L. Dice, and . Raymond, Measures of the Amount of Ecologic Association Between Species, In: Ecology, vol.263, pp.297-302, 1945.

C. De-campos, . Polpo, and Q. Ji, Efficient Structure Learning of Bayesian Networks using Constraints, In: Journal of Machine Learning Research, vol.12, pp.663-689, 2011.

M. Drton, . Richardson, and S. Thomas, Binary models for marginal independence, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.20, issue.2, pp.287-309, 2008.
DOI : 10.2307/2529341
URL : http://arxiv.org/abs/0707.3794

M. Drt09-]-drton, Discrete chain graph models, Bernoulli, vol.15, issue.3, pp.736-753, 2009.
DOI : 10.3150/08-BEJ172

. Dembczynski, . Krzysztof, . Waegeman, . Willem, and . Hüllermeier, An Analysis of Chaining in Multi-Label Classification In: ECAI, Frontiers in Artificial Intelligence and Applications, vol.242, issue.124, pp.294-299, 2012.

. Fan, . Xiannian, . Malone, M. Brandon, and C. Yuan, Finding Optimal Bayesian Network Structures with Constraints Learned from Data, pp.200-209, 2014.

. Friedman, . Nir, . Nachman, . Iftach, and D. Pe-'er, Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm, pp.206-215, 1999.

. Fry90 and M. Frydenberg, The chain graph Markov property, In: Scandinavian Journal of Statistics, vol.174, pp.333-353, 1990.

. Friedman, . Nir, and Z. Yakhini, On the Sample Complexity of Learning Bayesian Networks, pp.274-282, 1996.

R. Gens and P. M. Domingos, Learning the Structure of Sum-Product Networks, JMLR Proceedings. JMLR.org, pp.873-880, 2013.

I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, In: Journal of Machine Learning Research, vol.3, pp.1157-1182, 2003.

. Gharroudi, . Ouadie, . Elghazel, . Haytham, and A. Aussem, A Comparison of Multi-Label Feature Selection Methods Using the Random Forest Paradigm, Lecture Notes in Computer Science, vol.8436, pp.95-106, 2014.
DOI : 10.1007/978-3-319-06483-3_9
URL : https://hal.archives-ouvertes.fr/hal-01301070

D. Geiger, The Non-axiomatizability of Dependencies in Directed Acyclic Graphs, In: Report, 1987.

Y. Guo and S. Gu, Multi-Label Classification Using Conditional Dependency Networks, In: IJCAI. Ed. by Walsh, Toby. IJCAI/AAAI, vol.124, pp.1300-1305, 2011.

J. Ghosh, . Kumar, and . Mukerjee, Characterization of priors under which Bayesian and frequentist Bartlett corrections are equivalent in the multiparameter case, Journal of Multivariate Analysis, vol.38, issue.2, pp.385-393, 1991.
DOI : 10.1016/0047-259X(91)90052-4

D. Geiger, . Meek, . Christopher, and B. Sturmfels, Factorization of Discrete Probability Distributions, pp.162-169, 2002.

I. J. Goodfellow, . Pouget-abadie, . Jean, . Mirza, and . Mehdi, Generative Adversarial Nets, In: NIPS. Ed. by Ghahramani, pp.2672-2680

D. Geiger and J. Pearl, On the logic of causal models In: UAI, pp.3-14, 1988.

D. Geiger and J. Pearl, Logical and algorithmic properties of independence and their application to Bayesian networks, Annals of Mathematics and Artificial Intelligence, vol.70, issue.no. 3, pp.165-178, 1990.
DOI : 10.1007/BF01531004

P. Grünwald, Minimum Description Length Principle, p.78, 2007.

S. Godbole and S. Sarawagi, Discriminative Methods for Multi-labeled Classification, Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.22-30, 2004.
DOI : 10.1007/978-3-540-24775-3_5

M. A. Gómez-villegas and L. Sanz, Reconciling Bayesian and frequentist evidence in the point null testing problem, Test, vol.12, issue.1, pp.207-216, 1998.
DOI : 10.1093/biomet/44.1-2.187

. Geiger, . Dan, . Verma, . Thomas, J. Pearl et al., d-Separation: From Theorems to Algorithms, In: UAI. Ed. by Henrion, pp.139-148, 1989.
DOI : 10.1016/B978-0-444-88738-2.50018-X
URL : http://arxiv.org/abs/1304.1505

. Geiger, . Dan, . Verma, . Thomas, and J. Pearl, Identifying independence in bayesian networks, Networks, vol.9, issue.5, pp.507-534, 1990.
DOI : 10.1090/psapm/034/846853

. Hal+09, . Hall, A. Mark, . Frank, . Eibe et al., The WEKA data mining software: an update, In: SIGKDD Explorations, vol.111, pp.10-18, 2009.

C. Han, . Chen, . Jian, . Wu, . Qingyao et al., Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction, Journal of Bioinformatics and Computational Biology, vol.3, issue.1, 2015.
DOI : 10.1016/j.jmgm.2011.08.001

J. M. Hammersley, . Clifford, and E. Peter, Markov random fields on finite graphs and lattices, In: Unpublished manuscript, 1971.

D. Heckerman, D. Chickering, . Maxwell, . Meek, . Christopher et al., Dependency Networks for Inference, Collaborative Filtering, and Data Visualization, In: Journal of Machine Learning Research, vol.1, pp.49-75, 2000.

. Heckerman, . David, D. Geiger, C. , and D. Maxwell, Learning Bayesian Networks: The Combination of Knowledge and Statistical Data, In: Machine Learning, vol.203, issue.74, pp.197-243, 1995.

R. A. Howard and J. E. Matheson, Influence Diagrams, Decision Analysis, vol.2, issue.3, 1981.
DOI : 10.1287/deca.1050.0020

. Ho95, T. Ho, and . Kam, Random decision forests, In: ICDAR. IEEE Computer Society, pp.278-282, 1995.

E. Ising and . German, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f??r Physik, vol.6, issue.4, pp.253-258, 1925.
DOI : 10.1007/BF02980577

T. S. Jaakkola, . Sontag, . David, . Globerson, . Amir et al., Learning Bayesian Network Structure using LP Relaxations, JMLR Proceedings. JMLR.org, pp.358-365, 2010.

M. Jansche, A Maximum Expected Utility Framework for Binary Sequence Labeling In: ACL, Antal van den, and Zaenen, 2007.

. Joachims, . Thorsten, . Finley, Y. Thomas, and C. John, Cutting-plane training of structural SVMs, Machine Learning, vol.6, issue.2, pp.27-59, 2009.
DOI : 10.1007/s10994-009-5108-8
URL : https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5108-8.pdf

. Kauermann and . Göran, On a Dualization of Graphical Gaussian Models, In: Scandinavian Journal of Statistics, vol.231, pp.105-116, 1996.

D. Koller and N. Friedman, Probabilistic Graphical Models -Principles and Techniques, pp.1-1231, 2009.

. Kocev, . Dragi, . Vens, . Celine, J. Struyf et al., Ensembles of Multi-Objective Decision Trees, In: ECML. Ed. by Kok, Joost N. et al. Lecture Notes in Computer Science, vol.4701, pp.624-631, 2007.
DOI : 10.1007/978-3-540-74958-5_61
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.73.2063

. Bibliography, . Kojima, . Kaname, . Perrier, . Eric et al., Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure, In: Journal of Machine Learning Research, vol.11, issue.100, pp.285-310, 2010.

A. Kolmogorov and . Nikolaïevitch, On tables of random numbers, Theoretical Computer Science, vol.207, issue.2, pp.369-376, 1961.
DOI : 10.1016/S0304-3975(98)00075-9

J. T. Koster, Marginalizing and conditioning in graphical models, pp.817-840, 2002.

. Koivisto, . Mikko, and K. Sood, Exact Bayesian Structure Discovery in Bayesian Networks, In: Journal of Machine Learning Research, vol.5, pp.549-573, 2004.

D. Koller and M. Sahami, Toward Optimal Feature Selection, pp.284-292, 1996.

C. Kang and J. Tian, Local Markov Property for Models Satisfying Composition Axiom, pp.284-291, 2005.

S. Kullback, Information theory and statistics, 1968.

A. Kumar, . Vembu, . Shankar, A. Menon, . Krishna et al., Learning and Inference in Probabilistic Classifier Chains with Beam Search, Nello. Lecture Notes in Computer Science, vol.7523, issue.1, pp.665-680, 2012.
DOI : 10.1007/978-3-642-33460-3_48
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.470.1379

P. Larrañaga, . Poza, . Mikel, . Yurramendi, . Yosu et al., Structure learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.18, issue.9, pp.912-926, 1996.
DOI : 10.1109/34.537345

. Lau+90-]-lauritzen, L. Steffen, A. Dawid, . Philip, B. N. Larsen et al., Independence properties of directed markov fields, Networks, vol.5, issue.5, pp.491-505, 1990.
DOI : 10.1016/B978-0-444-70058-2.50031-0

. Lau96-]-lauritzen and L. Steffen, Graphical Models, 1996.

. Lecun, . Yann, . Bengio, . Yoshua, and G. Hinton, Deep learning, Nature, vol.9, issue.7553, pp.436-444, 2015.
DOI : 10.1007/s10994-013-5335-x

S. Liu, C. Monica, and . Jiun-hung, A multi-label classification based approach for sentiment classification, Expert Systems with Applications, vol.42, issue.3, pp.1083-1093, 2015.
DOI : 10.1016/j.eswa.2014.08.036

K. Li, . Liu, . Yi, . Wang, and . Quanxin, A Spacecraft Electrical Characteristics Multi-Label Classification Method Based on Off-Line FCM Clustering and On-Line WPSVM, PLOS ONE, vol.15, issue.1, pp.1-16, 2015.
DOI : 10.1371/journal.pone.0140395.s001
URL : http://doi.org/10.1371/journal.pone.0140395

C. Li, . Wang, . Bingyu, . Pavlu, . Virgil et al., Conditional Bernoulli Mixtures for Multi-label Classification, Conference Proceedings. JMLR.org, pp.2482-2491, 2016.

S. Li, Causal models have no complete axiomatic characterization, 2008.

J. Lee, . Kim, and . Dae-won, Feature selection for multi-label classification using multivariate mutual information, Pattern Recognition Letters, vol.34, issue.3, pp.349-357, 2013.
DOI : 10.1016/j.patrec.2012.10.005

J. Lee, . Kim, and . Dae-won, Fast multi-label feature selection based on information-theoretic feature ranking, Pattern Recognition, vol.48, issue.9, pp.2761-2771, 2015.
DOI : 10.1016/j.patcog.2015.04.009

. Lotter, . William, . Kreiman, . Gabriel, . Cox et al., Unsupervised Learning of Visual Structure using Predictive Generative Networks, p.6380, 1511.

J. D. Lafferty, . Mccallum, . Andrew, and F. C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, pp.282-289, 2001.

. Liu, . Zhifa, . Malone, M. Brandon, and C. Yuan, Empirical evaluation of scoring functions for Bayesian network model selection, BMC Bioinformatics, vol.13, issue.Suppl 15, pp.15-29, 2012.
DOI : 10.1007/s10994-006-6889-7

. Levitz, . Michael, M. D. Perlman, and D. Madigan, Correction: Separation and completeness properties for AMP chain graph Markov models, The Annals of Statistics, vol.31, issue.1, pp.1751-1784, 2001.
DOI : 10.1214/aos/1046294468
URL : http://academiccommons.columbia.edu/download/fedora_content/download/ac:173888/CONTENT/euclid.aos.1046294468__1_.pdf

H. W. Lin and M. Tegmark, Why Does Deep and Cheap Learning Work So Well?, Journal of Statistical Physics, vol.13, pp.8225-2016, 1608.
DOI : 10.1007/BF02165411

O. Luaces, . Díez, . Jorge, . Barranquero, . José et al., Binary relevance efficacy for multilabel classification, Progress in Artificial Intelligence, vol.40, issue.7, pp.303-313, 2012.
DOI : 10.1016/j.patcog.2006.12.019

A. Liaw and M. Wiener, Classification and Regression by randomForest, pp.18-22, 2002.

S. L. Lauritzen and N. Wermuth, Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative, The Annals of Statistics, vol.17, issue.1, pp.31-57, 1989.
DOI : 10.1214/aos/1176347003

S. Morais, . Rodrigues-de, and A. Aussem, A novel Markov boundary based feature subset selection algorithm, Neurocomputing, vol.73, issue.4-6, pp.4-6, 2010.
DOI : 10.1016/j.neucom.2009.05.018
URL : https://hal.archives-ouvertes.fr/hal-00383776

S. Morais, . Rodrigues-de, and A. Aussem, An Efficient and Scalable Algorithm for Local Bayesian Network Structure Discovery, In: ECML/PKDD, issue.3
DOI : 10.1007/978-3-642-15939-8_11

C. Meek, Graphical models: selecting causal and statistical models, 1997.

B. Malone, . Järvisalo, . Matti, and P. Myllymäki, Impact of Learning Strategies on the Quality of Bayesian Networks: An Empirical Evaluation, UAI. Ed. by Meila, pp.562-571, 2015.

. English, Gibbs and Markov random systems with constraints, Journal of Statistical Physics, vol.101, pp.11-33, 1974.

. Maron, . Oded, A. Ratan, and . Lakshmi, Multiple-Instance Learning for Natural Scene Classification, pp.341-349, 1998.

D. Margaritis, S. Thrun, S. A. Solla, T. K. Leen, and K. Müller, Bayesian Network Induction via Local Neighborhoods In: NIPS, pp.505-511, 1999.

A. W. Moore, . Wong, and . Weng-keen, Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning, pp.552-559, 2003.

J. Nielsen, . Dalgaard, . Kocka, . Tomás, . Peña et al., On Local Optima in Learning Bayesian Networks, pp.435-442, 2003.

S. Ott, . Imoto, . Seiya, S. Miyano, R. B. Altman et al., FINDING OPTIMAL MODELS FOR SMALL GENE NETWORKS, Biocomputing 2004, pp.557-567, 2004.
DOI : 10.1142/9789812704856_0052

. Peña, M. José, J. Björkegren, and J. Tegnér, Scalable, Efficient and Correct Learning of Markov Boundaries Under the Faithfulness Assumption, In: ECSQARU. Ed. by Godo, Lluis. Lecture Notes in Computer Science, vol.3571, issue.87, pp.136-147, 2005.
DOI : 10.1007/11518655_13

J. Petterson, . Caetano, S. Tibério, J. D. Lafferty, C. K. Williams et al., Reverse Multi-Label Learning In: NIPS, pp.1912-1920, 2010.

. Poon, . Hoifung, and P. M. Domingos, Sum-product networks: A new deep architecture, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.337-346, 2011.
DOI : 10.1109/ICCVW.2011.6130310

. Pea09 and J. Pearl, Causality: Models, Reasoning and Inference, 2009.

. Pea12 and J. Pearl, The Do-Calculus Revisited In: UAI, pp.3-11, 2012.

. Pea85 and J. Pearl, Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning, Proceedings of the Cognitive Science Society (CSS-7), 1985.

J. Pearl, Probabilistic reasoning in intelligent systems -networks of plausible inference Morgan Kaufmann series in representation and reasoning, pp.1-552, 1989.

. Pea95 and J. Pearl, Causal diagrams for empirical research, In: Biometrika, vol.824, pp.669-710, 1995.

. Peh+15, . Peharz, . Robert, . Tschiatschek, . Sebastian et al., On Theoretical Properties of Sum-Product Networks, Conference Proceedings. JMLR.org, p.2015

. Peh15 and R. Peharz, Foundations of Sum-Product Networks for Probabilistic Modeling, p.2015

. Peñ+06, . Peña, M. José, R. Nilsson, J. Björkegren et al., Identifying the Relevant Nodes Without Learning the Model, 2006.

. Peñ+07, . Peña, M. José, R. Nilsson, J. Björkegren et al., Towards scalable and data efficient learning of Markov boundaries, In: International Journal of Approximate Reasoning, vol.452, issue.101 144, pp.211-232, 2007.

. Peñ08, . Peña, and M. Jose, Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control In: EvoBIO, Jason H. Lecture Notes in Computer Science, vol.4973, pp.165-176, 2008.

. Peñ14, . Peña, and M. Jose, Marginal AMP chain graphs, In: International Journal of Approximate Reasoning, vol.555, issue.16, pp.1185-1206, 2014.

. Peñ15, . Peña, and M. Jose, Factorization, Inference and Parameter Learning in Discrete AMP Chain Graphs, In: ECSQARU. Ed. by Destercke, Sébastien and Denoeux, Thierry. Lecture Notes in Computer Science, vol.9161, pp.335-345, 2015.

. Perrier, . Eric, . Imoto, . Seiya, . Miyano et al., Finding Optimal Bayesian Network Given a Super-Structure, In: Journal of Machine Learning Research, vol.9, issue.100, pp.2251-2286, 2008.

. Pearl, . Judea, and A. Paz, Graphoids: Graph-Based Logic for Reasoning about Relevance Relations or When would x tell you more about y if you already know z?, In: ECAI, vol.24, pp.357-363, 1986.

. Papagiannopoulou, . Christina, . Tsoumakas, . Grigorios, and I. Tsamardinos, Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pp.915-924, 2015.
DOI : 10.1145/1835804.1835930

. Pearl, . Judea, and T. Verma, The Logic of Representing Dependencies by Directed Graphs, pp.374-379, 1987.

[. Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, p.2016

J. Read, . Pfahringer, . Bernhard, G. Holmes, and E. Frank, Classifier Chains for Multi-label Classification, Bibliography Lecture Notes in Computer Science, vol.5782, issue.120, pp.254-269, 2009.
DOI : 10.1007/978-3-642-04174-7_17
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.148.1174

. Read, . Jesse, . Pfahringer, . Bernhard, G. Holmes et al., Classifier chains for multi-label classification, In: Machine Learning, vol.853, pp.333-359, 2011.

. Read, . Jesse, . Reutemann, . Peter, . Pfahringer et al., MEKA: A Multi-label/Multi-target Extension to WEKA, Journal of Machine Learning Research, vol.1721, pp.1-5, 2016.

. Rahman, . Tahrima, . Gogate, and . Vibhav, Merging Strategies for Sum-Product Networks: From Trees to Graphs, Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp.617-626, 2016.

T. Richardson, Markov Properties for Acyclic Directed Mixed Graphs, Scandinavian Journal of Statistics, vol.5, issue.1, pp.145-157, 2003.
DOI : 10.1111/1467-9469.00157

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

J. Rissanen, Stochastic Complexity in Statistical Inquiry, In: World Scientific , Series in Computer Science, vol.15, 1989.
DOI : 10.1142/0822

A. Radford, . Metz, . Luke, and S. Chintala, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 1511.

. Read, . Jesse, . Martino, . Luca, and D. Luengo, Efficient monte carlo methods for multi-dimensional learning with classifier chains, Pattern Recognition, vol.47, issue.3, pp.1535-1546, 2014.
DOI : 10.1016/j.patcog.2013.10.006
URL : http://arxiv.org/pdf/1211.2190

R. W. Robinson, Counting Labeled Acyclic Digraphs In: New Directions in Graph Theory, 1973.
DOI : 10.1007/bfb0069178

T. Richardson and P. Spirtes, Ancestral graph Markov models, The Annals of Statistics, vol.30, issue.4, pp.962-1030, 2002.
DOI : 10.1214/aos/1031689015
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.9139

M. Scutari, . Adriana, and . Brogini, Bayesian Network Structure Learning with Permutation Tests, Communications in Statistics - Theory and Methods, vol.35, issue.3, pp.16-17, 2012.
DOI : 10.1007/s10994-006-6889-7
URL : http://arxiv.org/pdf/1101.5184

. Sad11 and K. Sadeghi, Markov Equivalences for Subclasses of Loopless Mixed Graphs, 2011.

K. Sadeghi, Graphical representation of independence structures, p.2012

K. Sadeghi, Stable mixed graphs, Bernoulli, vol.19, issue.5B, pp.2330-2358, 2013.
DOI : 10.3150/12-BEJ454

K. Sadeghi, Marginalization and conditioning for LWF chain graphs, The Annals of Statistics, vol.44, issue.4, pp.1792-1816, 2016.
DOI : 10.1214/16-AOS1451SUPP

A. P. Streich and J. M. Buhmann, Asymptotic analysis of estimators on multi-label data, Machine Learning, vol.18, issue.10, pp.373-409, 2015.
DOI : 10.1145/1076034.1076082

. Studený, . Milan, . Bouckaert, and R. Remco, On chain graph models for description of conditional independence structures, The Annals of Statistics, pp.1434-1495, 1998.
DOI : 10.1214/aos/1024691250

. Scu10 and M. Scutari, Learning Bayesian Networks with the bnlearn R Package, Journal of Statistical Software, vol.353, issue.156, pp.1-22, 2010.

. Spirtes, . Peter, and C. Glymour, An Algorithm for Fast Recovery of Sparse Causal Graphs, Social Science Computer Review, vol.9, issue.1, pp.62-72, 1991.
DOI : 10.1177/089443939100900106

. Spirtes, . Peter, . Glymour, . Clark, and R. Schienes, From probability to causality, Philosophical Studies, vol.66, issue.No. 4, 1990.
DOI : 10.1007/BF00356088

. Spirtes, . Peter, . Glymour, . Clark, and R. Schienes, Causation, prediction, and search, pp.86-91, 1993.
DOI : 10.1007/978-1-4612-2748-9

. Sim51, . Simpson, and H. Edward, The Interpretation of Interaction in Contingency Tables, Journal of the Royal Statistical Society. Series BMethodological), vol.132, pp.238-241, 1951.

T. P. Speed and H. T. Kiiveri, Gaussian Markov Distributions over Finite Graphs, The Annals of Statistics, vol.14, issue.1, pp.138-150, 1986.
DOI : 10.1214/aos/1176349846

K. Sadeghi and S. Lauritzen, Markov properties for mixed graphs, Bernoulli, vol.20, issue.2, pp.676-696, 2014.
DOI : 10.3150/12-BEJ502
URL : http://doi.org/10.3150/12-bej502

K. Sadeghi, . Lauritzen, and L. Steffen, Unifying Markov properties in graphical models, Unpublished manuscript (2015) (cit. on pp. 41, pp.62-64

A. R. Statnikov, J. Lemeire, A. Constantin, and F. , Algorithms for discovery of multiple Markov boundaries, In: Journal of Machine Learning Research, vol.141, issue.147, pp.499-566, 2013.

. Singh, P. Ajit, and A. W. Moore, Finding optimal Bayesian networks by dynamic programming. Tech. rep. 2nd revision, 2005.

T. Silander and P. Myllymäki, A Simple Approach for Finding the Globally Optimal Bayesian Network Structure, 2006.

C. A. Sutton and A. Mccallum, An Introduction to Conditional Random Fields, Foundations and Trends?? in Machine Learning, vol.4, issue.4, pp.267-373, 2012.
DOI : 10.1561/2200000013
URL : http://arxiv.org/abs/1011.4088

D. Sonntag and J. M. Peña, Chain graph interpretations and their relations revisited, International Journal of Approximate Reasoning, vol.58, issue.63, pp.39-56, 2015.
DOI : 10.1016/j.ijar.2014.12.001
URL : http://liu.diva-portal.org/smash/get/diva2:800745/FULLTEXT01

N. Spolaôr, E. Cherman, . Alvares, M. Monard, L. Carolina et al., A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach, Electronic Notes in Theoretical Computer Science, vol.292, pp.135-151, 2013.
DOI : 10.1016/j.entcs.2013.02.010

. Smith, . Noah, and R. Tromble, Sampling Uniformly from the Unit Simplex, pp.1-6, 2004.

. Stu05 and M. Studeny, Probabilistic Conditional Independence Structures, pp.65-149, 2005.

. Stu89 and M. Studeny, Multiinformation and the problem of characterization of conditional independence relations, Problems of Control and Information Theory, pp.3-16, 1989.

. Stu92 and M. Studeny, Conditional independence relations have no finite complete characterization In: Transactions of the 11th Prague Conference Information Theory, Statistical Decision Functions and Random Processes, J.A, pp.377-396, 1992.

. Stu97 and M. Studený, A recovery algorithm for chain graphs, In: International Journal of Approximate Reasoning, vol.17, pp.2-3, 1997.

. Stu98 and M. Studený, Bayesian Networks from the Point of View of Chain Graphs In: UAI, pp.496-503, 1998.

. Suc+14, L. Sucar, . Enrique, . Bielza, . Concha et al., Multi-label classification with Bayesian network-based chain classifiers, In: Pattern Recognition Letters, vol.41, pp.14-22, 2014.

. Sul09 and S. Sullivant, Gaussian conditional independence relations have no finite complete characterization, Journal of Pure and Applied Algebra, vol.2138, 2009.

. Singh, . Moninder, and M. Valtorta, An Algorithm for the Construction of Bayesian Network Structures from Data, pp.259-265, 1993.
DOI : 10.1016/B978-1-4832-1451-1.50036-6

. Tas03a-]-tsamardinos, . Ioannis, C. F. Aliferis, and A. R. Statnikov, Algorithms for Large Scale Markov Blanket Discovery, pp.376-381, 2003.

. Tas03b-]-tsamardinos, . Ioannis, C. F. Aliferis, and A. R. Statnikov, Time and sample efficient discovery of Markov blankets and direct causal relations, In: KDD. Ed. by Getoor, vol.87, pp.673-678, 2003.

I. Tsamardinos and G. Borboudakis, Permutation Testing Improves Bayesian Network Learning, Lecture Notes in Computer Science, vol.6323, pp.322-337, 2010.
DOI : 10.1007/978-3-642-15939-8_21

. Tsamardinos, . Ioannis, L. E. Brown, A. Constantin, and F. , The maxmin hill-climbing Bayesian network structure learning algorithm, In: Machine Learning, vol.651, issue.100, pp.31-78, 2006.
DOI : 10.1007/s10994-006-6889-7
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.2806

. Tsoumakas, . Grigorios, and I. Katakis, Multi-Label Classification: An Overview, In: IJDWM, vol.33, pp.1-13, 2007.
DOI : 10.4018/978-1-59904-951-9.ch006

. Tsoumakas, . Grigorios, . Katakis, . Ioannis, and I. Vlahavas, Effective and Efficient Multilabel Classification in Domains with Large Number of Labels, Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD'08, 2008.

. Tsoumakas, . Grigorios, . Katakis, . Ioannis, . Vlahavas et al., Mining Multilabel Data In: Data Mining and Knowledge Discovery Handbook, pp.667-685, 2010.

. Tsoumakas, . Grigorios, . Katakis, . Ioannis, . Vlahavas et al., Random k-Labelsets for Multilabel Classification, IEEE Transactions on Knowledge and Data Engineering, vol.23, issue.7, pp.1079-1089, 2011.
DOI : 10.1109/TKDE.2010.164

. Tsochantaridis, . Ioannis, . Joachims, . Thorsten, . Hofmann et al., Large Margin Methods for Structured and Interdependent Output Variables, In: Journal of Machine Learning Research, vol.6, pp.1453-1484, 2005.

. Tsoumakas, . Grigorios, . Spyromitros-xioufis, . Eleftherios, . Vilcek et al., Mulan: A Java Library for Multi-Label Learning, Journal of Machine Learning Research, vol.12, pp.2411-2414, 2011.

. Tsoumakas, . Grigorios, . Vlahavas, P. Ioannis, . Kok et al., Random k-Labelsets: An Ensemble Method for Multilabel Classification, Lecture Notes in Computer Science, vol.4701, issue.123, pp.406-417, 2007.
DOI : 10.1007/978-3-540-74958-5_38
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.5044

E. Villanueva, C. Maciel, and . Dias, Optimized Algorithm for Learning Bayesian Network Super-structures In: ICPRAM (1), pp.217-222, 2012.

T. Verma and J. Pearl, Causal networks: semantics and expressiveness In: UAI, pp.69-78, 1988.

T. Verma and J. Pearl, Equivalence and synthesis of causal models In: UAI, pp.255-270, 1990.

W. N. Venables, . Ripley, and D. Brian, Modern Applied Statistics with S. Fourth, 2002.
DOI : 10.1007/978-0-387-21706-2

P. R. Waal and . De, Marginals of DAG-Isomorphic Independence Models, In: ECSQARU. Ed. by Sossai, Claudio and Chemello, Gaetano. Lecture Notes in Computer Science, vol.18, issue.1, pp.192-203, 2009.
DOI : 10.1016/B978-0-444-88650-7.50006-8

. Waegeman, . Willem, . Dembczynski, . Krzysztof, . Jachnik et al., On the bayes-optimality of F-measure maximizers, In: Journal of Machine Learning Research, vol.151, issue.111, pp.3333-3388, 2014.

. Wang, . Shangfei, . Wang, . Jun, . Wang et al., Enhancing multilabel classification by modeling dependencies among labels, In: Pattern Recognition, vol.4710, issue.122, pp.3405-3413, 2014.
DOI : 10.1016/j.patcog.2014.04.009

L. Wang, T. Zhou, . Hua, Y. Lee, . Koo et al., An efficient refinement algorithm for multi-label image annotation with correlation model, Telecommunication Systems, vol.1, issue.1, pp.285-301, 2015.
DOI : 10.1007/978-3-540-78849-2_59

Z. Wang and A. C. Bovik, Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures, Signal Processing Magazine, pp.98-117, 2009.

. Wermuth, . Nanny, D. R. Cox, and J. Pearl, Explanations for multivariate structures derived from univariate recursive regressions, In: Ber. Stoch. Verw. Geb, 1994.

J. Wu, . Huang, . Sheng-jun, . Zhou, and . Zhi-hua, Genome-Wide Protein Function Prediction through Multi-Instance Multi-Label Learning, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.11, issue.5, pp.891-902, 2014.
DOI : 10.1109/TCBB.2014.2323058

. Wermuth, . Nanny, G. M. Marchetti, . Cox, and R. David, Triangular systems for symmetric binary variables, Electronic Journal of Statistics, vol.3, issue.0, pp.932-955, 2009.
DOI : 10.1214/09-EJS439
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.508.9821

. Wright, The Relative Importance of Heredity and Environment in Determining the Piebald Pattern of Guinea-Pigs, Proceedings of the National Academy of Sciences, vol.6, issue.6, pp.320-332, 1920.
DOI : 10.1073/pnas.6.6.320

. Wright, The Method of Path Coefficients, The Annals of Mathematical Statistics, vol.5, issue.3, pp.161-215, 1934.
DOI : 10.1214/aoms/1177732676

S. K. Wong, . Michael, D. Wu, and T. Lin, A Structural Characterization of DAG-Isomorphic Dependency Models, Bruce. Lecture Notes in Computer Science, vol.2338, pp.195-209, 2002.
DOI : 10.1007/3-540-47922-8_17

N. Ye, K. M. Chai, . Adam, W. Lee, . Sun et al., Optimizing F-measure: A Tale of Two Approaches, In: ICML. icml.cc / Omnipress, vol.171, issue.182, p.2012

C. Yuan, . Malone, and M. Brandon, Learning Optimal Bayesian Networks: A Shortest Path Perspective, In: Journal of Artificial Intelligence Research, vol.48, pp.23-65, 2013.

C. Yuan, . Malone, M. Brandon, and X. Wu, Learning Optimal Bayesian Networks Using A* Search, IJCAI. Ed. by Walsh, Toby. IJCAI/AAAI, pp.2186-2191, 2011.

. Zar+11, J. H. Zaragoza, L. Sucar, . Enrique, E. F. Morales et al., Bayesian Chain Classifiers for Multidimensional Classification, IJCAI. Ed. by Walsh, Toby. IJCAI/AAAI, pp.2192-2197, 2011.

W. Zhang, . Liu, . Feng, . Luo, . Longqiang et al., Predicting drug side effects by multi-label learning and ensemble learning, BMC Bioinformatics, vol.10, issue.5, p.365, 2015.
DOI : 10.1371/journal.pone.0128194
URL : http://doi.org/10.1186/s12859-015-0774-y

K. Zhao, . Zhang, . Honggang, . Ma, . Zhanyu et al., Multi-label learning with prior knowledge for facial expression analysis, Neurocomputing, vol.157, pp.280-289, 2015.
DOI : 10.1016/j.neucom.2015.01.005

H. Zhao, . Adel, . Tameem, G. Gordon, and B. Amos, Collapsed Variational Inference for Sum-Product Networks, Conference Proceedings. JMLR.org, pp.1310-1318, 2016.

H. Zhao, . Melibari, . Mazen, P. Poupart, F. R. Bach et al., On the Relationship between Sum-Product Networks and Bayesian Networks, JMLR Proceedings. JMLR.org, pp.116-124, 2015.

D. Zufferey, . Hofer, . Thomas, . Hennebert, and . Jean, Performance comparison of multi-label learning algorithms on clinical data for chronic diseases, Computers in Biology and Medicine, vol.65, pp.34-43, 2015.
DOI : 10.1016/j.compbiomed.2015.07.017

M. Zhang, . Zhou, and . Zhi-hua, ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognition, vol.40, issue.7, pp.2038-2048, 2007.
DOI : 10.1016/j.patcog.2006.12.019
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.538.9597

M. Zhang and K. Zhang, Multi-label learning by exploiting label dependency, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '10, pp.999-1008, 2010.
DOI : 10.1145/1835804.1835930

M. Zhang, . Zhou, and . Zhi-hua, A Review on Multi-Label Learning Algorithms, IEEE Transactions on Knowledge and Data Engineering, vol.26, issue.8, pp.1819-1837, 2014.
DOI : 10.1109/TKDE.2013.39

. Gasse, . Maxime, A. Aussem, and H. Elghazel, A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Expert Systems with Applications, vol.41, issue.15, pp.6755-6772, 2014.
DOI : 10.1016/j.eswa.2014.04.032
URL : https://hal.archives-ouvertes.fr/hal-01153255

M. Gasse and A. Aussem, F-Measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets, Lecture Notes in Computer Science, vol.15, issue.1, pp.619-631, 2016.
DOI : 10.1007/978-0-387-21706-2
URL : https://hal.archives-ouvertes.fr/hal-01425528

M. Gasse, A. Aussem, A. Antonucci, . Corani, . Giorgio et al., Identifying the irreducible disjoint factors of a multivariate probability distribution In: PGM, Conference Proceedings. JMLR.org, pp.183-194, 2016.

. Gasse, . Maxime, A. Aussem, H. Elghazel, F. R. Bach et al., On the Optimality of Multi-Label Classification under Subset Zero-One Loss for Distributions Satisfying the Composition Property, JMLR Proceedings. JMLR.org, pp.2531-2539, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01234346

A. Aussem, . Caillet, . Pascal, . Klemm, and . Zara, Analysis of risk factors of hip fracture with causal Bayesian networks In: IWBBIO, Ortuño. Copicentro Editorial, pp.1074-1085, 2014.

+. Le, . Goff, . Ronan, . Garcia, . David et al., Optimal Sensor Locations for Polymer Injection Molding Process, In: ESAFORM. Key Engineering Materials, vol.611, pp.1724-1733, 2014.

. Gasse, . Maxime, A. Aussem, and H. Elghazel, An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning, Lecture Notes in Computer Science, vol.7523, pp.58-73, 2012.
DOI : 10.1007/978-3-642-33460-3_9
URL : https://hal.archives-ouvertes.fr/hal-01122771