Web multiform data structuring for warehousing - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2003

Web multiform data structuring for warehousing

Résumé

In a data warehousing process, the data preparation phase is crucial. Mastering this phase allows multidimensional analysis or the use of data mining algorithms, as well as substantial gains in terms of time and performance when performing such analyses. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context, though the data broadcasted on this medium are very heterogeneous. In this chapter, we propose a modeling process for integrating all these diverse, heterogeneous data into a unified format. Furthermore, the very schema definition provides first-rate metadata in our data warehousing context. At the conceptual level, a complex object is represented in UML as a superclass of any useful data source (databases, plain or tagged texts, images, sounds, video clips, etc.). Our logical model is an XML schema that can be described with a DTD or the XML-Schema language. Eventually, we have designed a Java prototype that transforms our multiform input data into XML documents representing our physical model. Then, the XML documents we obtain are mapped into a relational database. We view this database as an ODS (Operational Data Storage), whose data will have to be re-modeled in a multidimensional way to allow their storage in a warehouse and, later, their analysis.
Fichier non déposé

Dates et versions

hal-00701433 , version 1 (25-05-2012)

Identifiants

  • HAL Id : hal-00701433 , version 1

Citer

Jérôme Darmont, Omar Boussaid, Fadila Bentayeb, Sabine Loudcher, Yamina Zellouf. Web multiform data structuring for warehousing. C. Djeraba. Multimedia Mining: A Highway to Intelligent Multimedia Documents, Kluwer, pp.179-194, 2003. ⟨hal-00701433⟩
55 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More