| Type of document: |
 |
Congres communications |
 |
| Domain: |
 |
Computer Science/Learning
|
 |
| Title: |
 |
On Probability Distributions for Trees: Representations, Inference and Learning |
 |
| Author(s): |
 |
François Denis ( ) 1, Amaury Habrard ( ) 1, Rémi Gilleron ( ) 2, 3, Marc Tommasi ( ) 2, 3, 4, Édouard Gilbert ( ) 3 |
 |
| Research team(s): |
 |
|
 |
| Abstract: |
 |
We study probability distributions over free algebras of trees. Probability distributions can be seen as particular (formal power) tree series [Berstel et al 82, Esik et al 03], i.e. mappings from trees to a semiring K . A widely studied class of tree series is the class of rational (or recognizable) tree series which can be defined either in an algebraic way or by means of multiplicity tree automata. We argue that the algebraic representation is very convenient to model probability distributions over a free algebra of trees. First, as in the string case, the algebraic representation allows to design learning algorithms for the whole class of probability distributions defined by rational tree series. Note that learning algorithms for rational tree series correspond to learning algorithms for weighted tree automata where both the structure and the weights are learned. Second, the algebraic representation can be easily extended to deal with unranked trees (like XML trees where a symbol may have an unbounded number of children). Both properties are particularly relevant for applications: nondeterministic automata are required for the inference problem to be relevant (recall that Hidden Markov Models are equivalent to nondeterministic string automata); nowadays applications for Web Information Extraction, Web Services and document processing consider unranked trees. |
 |
| ACM Classification: |
 |
| F.: Theory of Computation/F.4: MATHEMATICAL LOGIC AND FORMAL LANGUAGES/F.4.3: Formal Languages/F.4.3.1: Classes defined by grammars or automata (e.g., context-free languages, regular sets, recursive sets) |
| I.: Computing Methodologies/I.5: PATTERN RECOGNITION/I.5.1: Models/I.5.1.5: Structural |
|
 |
| Full text language: |
 |
English |
 |
|
| Publication date: |
 |
2007 |
 |
| Audience: |
 |
international |
 |
| Conference title: |
 |
NIPS Workshop on Representations and Inference on Probability Distributions |
 |
| Conference city: |
 |
Whistler |
 |
| Country: |
 |
Canada |
 |
| Conference date: |
 |
2007-12-08 |
 |
|
| Keywords: |
 |
Tree automata – tree series – probability distributions – weighted tree automata – machine learning |
 |
| ANR Project: |
 |
| Project Id |
ANR-05-MMSA-0016 |
| Year |
2005 |
| Project acronyme |
marmota |
| Project title |
Apprentissage automatique, modèles probabilistes et langages d'arbres |
| Intitule |
Masse de données : Modélisation, Simulation, Applications |
| Acronyme |
MMSA |
|
 |