An Automatic Schema-Instance Approach for Merging Multidimensional Data Warehouses

Using data warehouses to analyse multidimensional data is a significant task in company decision-making.The data warehouse merging process is composed of two steps: matching multidimensional components and then merging them. Current approaches do not take all the particularities of multidimensional data warehouses into account, e.g., only merging schemata, but not instances; or not exploiting hierarchies nor fact tables. Thus, in this paper, we propose an automatic merging approach for star schema-modeled data warehouses that works at both the schema and instance levels. We also provide algorithms for merging hierarchies, dimensions and facts. Eventually, we implement our merging algorithms and validate them with the use of both synthetic and benchmark datasets.


INTRODUCTION
Data warehouses (DWs) are widely used in companies and organizations as an important Business Intelligence (BI) tool to help build Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. IDEAS 2021, July [14][15][16]2021 decision support systems [9]. Data in DWs are usually modeled in a multidimensional way, which allows users to consult and analyze the aggregated data through multiple analysis axes with On-Line Analysis Processing (OLAP) [14]. In a company, various independent DWs containing some common elements and data may be built for different geographical regions or functional departments. There may also exist common elements and data between the DWs of different companies. The ability to accurately merge diverse DWs into one integrated DW is therefore considered as a major issue [8]. DW merging constitutes a promising solution to provide more opportunities of analysing the consistent data coming from different sources.
A DW organizes data according to analysis subjects (facts) associated with analysis axes (dimensions). Each fact is composed of indicators (measures). Finally, each dimension may contain one or several analysis viewpoints (hierarchies). Hierarchies allow users to aggregate the attributes of a dimension at different levels to facilitate analysis. Hierarchies are identified by attributes called parameters.
Merging two DWs is a complex task that implies solving several problems. The first issue is identifying the common basic components (attributes, measures) and defining semantic relationships between these components. The second issue is merging schemata that bear common components. Merging two multidimensional DWs is difficult because two dimensions can (1) be completely identical in terms of schema, but not necessarily in terms of instances; (2) have common hierarchies or have sub-parts of hierarchies in common without necessarily sharing common instances. Likewise, two schemata can deal with the same fact or different facts, and even if they deal with the same facts, they may or may not have measures in common, without necessarily sharing common data.
Moreover, a merged DW should respect the constraints of the input multidimensional elements, especially the hierarchical relationships between attributes. When we merge two dimensions having matched attributes of two DWs, the final DW should preserve all the partial orders of the input hierarchies (i.e., the binary aggregation relationships between parameters) of the two dimensions. It is also necessary to integrate all the instances of the input DWs, which may cause the generation of empty values in the merged DW. Thus, the merging process should also include a proper analysis of empty values.
In sum, the DW merging process concerns matching and merging tasks. The matching task consists in generating correspondences between similar schema elements (dimension attributes and fact measures) [4] to link two DWs. The merging task is more complex and must be carried out at two levels: the schema level and the instance level. Schema merging is the process of integrating several schemata into a common, unified schema [12]. Thus, DW schema merging aims at generating a merged unified multidimensional schema. The instance level merging deals with the integration and management of the instances. In the remainder of this paper, the term "matching" designates schema matching without considering instances, while the term "merging" refers to the complete merging of schemata and corresponding instances.
To address these issues, we define an automatic approach to merge two DWs modeled as star schemata (i.e., schemata containing only one fact table), which (1) generates an integrated DW conforming to the multidimensional structures of the input DWs, (2) integrates the input DW instances into the integrated DW and copes with empty values generated during the merging process.
The remainder of this paper is organized as follows. In Section 2, we review the related work about matching and merging DWs. In Section 3, we specify an automatic approach to merge different DWs and provide DW merging algorithms at the schema and instance levels. In Section 4, we experimentally validate our approach. Finally, in Section 5, we conclude this paper and discuss future research.

RELATED WORK
DW merging actually concerns the matching and the merging of multidimensional elements. We classify the existing approaches into four levels: matching multidimensional components, matching multidimensional schemata, merging multidimensional schemata and merging DWs.
A multidimensional component matching approach for matching aggregation levels is based on the fact that the cardinality ratio of two aggregation levels from the same hierarchy is nearly always the same, no matter the dimension they belong to [3]. Thus, by creating and manipulating the cardinality matrix for different dimensions, it is possible to discover the matched attributes.
The matching of multidimensional schemata directs at discovering the matching of every multidimensional components between two multidimensional schemata. A process to automatically match two multidimensional schemata is achieved by evaluating the semantic similarity of multidimensional component names [2]. Attribute and measure data types are also compared in this way. The selection metric of bipartite graph helps determine the mapping choice and define rules aiming at preserving the partial orders of the hierarchies at mapping time. Another approach matches a set of star schemata generated from both business requirements and data sources [5]. Semantic similarity helps find the matched facts and dimension names. Yet, the DW designer must intervene to manually identify some elements.
A two-phase approach for automatic multidimensional schema merging is achieved by transforming the multidimensional schema into a UML class diagram [7]. Then, class names are compared and the number of common attributes relative to the minimal number of attributes of the two classes is computed to decide whether two classes can be merged.
DW merging must operate at both schema and instance levels. Two DW merging approaches are the intersection and union of the matched dimensions. Instance merging is realized by a d-chase procedure [15]. The second merging strategy exploits similar dimensions based on the equivalent levels in schema merging [11]. It also uses the d-chase algorithm for instance merging. However, the two approaches above do not consider the fact table. Another DW merging approach is based on the lexical similarity of schema string names and instances, and by considering schema data types and constraints [8]. Having the mapping correspondences, the merging algorithm takes the preservation requirements of the multidimensional elements into account, and is formulated to build the final consolidated DW. However, merging details are not precise enough and hierarchies are not considered.
To summarize, none of the existing merging methods can satisfy our DW merging requirements. Some multidimensional components are ignored in these approaches, and the merging details of each specific multidimensional components is not explicit enough, which motivates us to propose a complete DW merging approach.

PRELIMINARIES
We introduce in this section the basic concepts of multidimensional DW design [13]. The multidimensional DW can be modelled by a star or a constellation schema. In the star schema, there is a single fact connected with different dimensions, while the constellation schema consists of more than one fact which share one or several common dimensions. A dimension models an analysis axis and is composed of attributes (dimension properties).
} is a set of attributes, where represents the dimension identifier, which is also the parameter of the lowest level and called the root parameter.
= { 1 , ..., } is a set of hierarchies, = { 1 , ..., } is a set of dimension instances. The value of the instance for an attribute is annotated as . .
Dimension attributes (also called parameters) are organised according to one or more hierarchies. Hierarchies represent a particular vision (perspective) and each parameter represents one data granularity according to which measures could be analysed.
where is a hierarchy name, =< , 2 , ..., > is an ordered set of dimension attributes, called parameters, which represent useful graduations along the dimensions, ∀ ∈ [1... ], ∈ . The roll up relationship between two parameters can be denoted by 1 ⪯ for the case where 1 roll up to 2 in . For , we have ⪯ 1 , 1 ⪯ 2 , ..., −1 ⪯ . The matching of multidimensional schemata is based on the matching of parameters, the matching relationship between two parameters of two hierarchies 1 and 2 is denoted as 1 ≃ 2 .
A sub-hierarchy is a continuous sub-part of a hierarchy which we call the parent hierarchy of the sub-hierarchy. This concept will be used in our algorithms, but it is not really meaningful. So a subhierarchy has the same elements than a hierarchy, but its lowest level is not considered as " ". All parameters of a sub-hierarchy are contained in its parent hierarchy and have the same partial orders than those in the parent hierarchy. "Continuous" means that in the parameter set of the parent hierarchy of a sub-hierarchy, between the lowest and highest level parameters of the sub-hierarchy, there is no parameter which is in the parent hierarchy but not in the sub-hierarchy.
where is a sub-hierarchy name, = < 1 , ..., > is an ordered set of parameters, called parameters, ∀ ∈ [1... ], ∈ . According to the relationship between a sub-hierarchy and its parent hierarchy, we have: A fact reflects information that has to be analysed according to dimensions and is modelled through one or several indicators called measures. . associates fact instances to their linked dimension instances.
We complete these definitions by a function ( 1 , 2 ) allowing to extend the parameters of the first (sub)hierarchy 1 by the other one ( 2 ).

AN AUTOMATIC APPROACH FOR DW MERGING
Like illustrated in Figure 1, merging two DWs implies matching steps and steps dedicated to the merging of dimensions and facts. The matching of parameters and measures are based on syntactic and semantic similarities [10][6] for the attribute or measure names. Since the matching is intensively studied in the literature, we focus in this paper only on the merging steps of our process (green rectangle in Figure 1). In regard to the merging, we firstly define an algorithm for the merging of hierarchies by decomposing two hierarchies into sub-hierarchy pairs and merging them to get the final hierarchy set. Then, we define an algorithm of dimension merging concerning both instance and schema levels and which completes some empty values. Finally, we define an algorithm of the star merging based on the dimension merging algorithm which merges the dimensions and the facts at the schema and instance levels and corrects the hierarchies after the merging.

Hierarchy merging
In this section, we define the schema merging process of two hierarchies coming from two different dimensions. The first challenge is that we should preserve the partial orders of the parameters. The second one is how to decide the partial orders of the parameters coming from different original hierarchies. These challenges are solved in the algorithm proposed below which is achieved by 4 steps: record of the matched parameters, generation of the subhierarchy pairs, merging of the sub-hierarchy pairs and generation of the final hierarchy set.    for each ′ ∈ ′ do 46: end for 48: Record of the matched parameters. The first step of the algorithm consists in matching the parameters of the two hierarchies and record the matched parameter pairs( 1 -9 ). If there is no matched parameter between the two hierarchies, the merging process stops ( 11 -12 ).
4.1.2 Generation of the sub-hierarchy pairs. Then the algorithm generates pairs containing 2 sub-hierarchies ( 1 and 2 ) of the original hierarchies whose lowest and highest level parameters are adjacent in the list of matched parameter pairs that we created in the previous step ( 18 -22 ). To make sure that the last parameters of the two hierarchies are included in the sub-hierarchies, we also add the pair of the last parameters into the matched parameter pair ( 14 -17 ).
-. So for the first sub-hierarchy pair, the first parameter of 1 and 2 is and their last parameter is , so we have: In the second sub-hierarchy pair, we get the subhierarchy of 1 from to : , >, and the sub-hierarchy of 2 from to : If the last parameters of the two original hierarchies do not match, like of 1 and of 3 in (b), < , > is added into the matched parameter pair of the algorithm so that the last sub-hierarchies of 1 and 3 are 1 =< , , > and 3 =< , >.

4.1.3
Merging of the sub-hierarchies. We then merge each subhierarchy pair to get a set of merged sub-hierarchies ( ′ ) and The matched parameters will be merged into one parameter, so it's the unmatched parameters that we should deal with. We have 2 cases in terms of the unmatched parameters.
If one of the sub-hierarchies has no unmatched parameter, we obtain a sub-hierarchy set containing one sub-hierarchy whose parameter set is the same as the other sub-hierarchy ( 23 -26 ).  Figure 4. We see that 1 does not have any unmatched parameter, so the obtained sub-hierarchy set contains one sub-hierarchy whose parameter set is the same as The second case is that both two sub-hierarchies have unmatched parameters ( 27 -30 ). We then see if these unmatched parameters can be merged into one or several hierarchies and discover their partial orders. Our solution is based on the functional dependencies (FDs) of these parameters. To be able to detect the FDs of the parameters of the two sub-hierarchies, we should make sure that there are intersections between the instances of these two sub-hierarchies which means that they should have same values on the root parameter of the sub-hierarchies. We keep only the FDs which have a single parameter in both hands and which can not be inferred by transitivity. These FDs are represented in the form of ordered set ( 1 _ 2 ) are then treated by algorithm 2 to get the parameter sets of the merged sub-hierarchies. If it's not possible to discover the FDs, the two sub-hierarchies are impossible to be merged ( 31 -32 ).
Algorithm 2 constructs recursively the parameter sets from the FDs in the form of ordered sets. In each recursion loop, for each one of these sets, we search for the other ones whose non-last (or non-first) elements have the same values and order as its non-first (or non-last) elements and then merge them ( 6 -21 ). The recursion is finished until there are no more two sets being able to be merged ( 22 -31 ). is then inputted to the second recursion, we then get the next =<< , , , >, < , , , >> after the merging of the ordered sets, since < , , > and < , , > are not merged, they are also added into , and we get =<< , , , >, < , , , >, < , , >, < , , >>. In the final recursion, it's no more possible to merge any two ordered sets, so the parameter set of the final result of the hierarchy set is << , , , >, < , , , >, < , , >, < , , >>.   [1], we also add the two original hierarchies into the final hierarchy set. Then for a parameter which appears in different hierarchies, it can be divided into different parameters in different hierarchies of the hierarchy set so that each hierarchy is complete. Thus, for the multidimensional schema that we get, we provide an analysis form like shown in Figure 4. In the analysis form, one parameter can be marked with different numbers if it is in different hierarchies.
For the generation of the final hierarchy set, we discuss 2 cases where the 2 hierarchies have the matched root parameters which means their dimensions are the same analysis axis and the opposite case which will lead to 2 kinds of output results (one or two sets of merged hierarchies).
If the root parameters of the two original hierarchies match, we simply add the two original hierarchies into the merged hierarchy set obtained in the previous step to get one final merged hierarchy set. ( 37 -39 ).  If the root parameters of the two original hierarchies do not match, we will get two merged hierarchy sets instead of one. For each original hierarchy, the final merged hierarchy set will be the extension of the sub-hierarchy containing all the parameters which are not included in any one of the sub-hierarchies created before ( 1 ′ and 2 ′ ) with the merged hierarchy set that we get plus this original hierarchy itself ( 41 -49 ).
Example 4.5. In Figure 5, between 1 and 3 , we have 1 . - . We can then get one sub-hierarchy pair in which there are 2 subhierarchies containing parameter sets < , , -> and < , , >. By merging the sub-hierarchy pairs, we get the merged hierarchy whose parameter set is < , , , >. For 1 , the remaining part < > is associated to it to get the merged hierarchy 1 13 . We then get the merged hierarchy set of 1 containing 1 and 1 13 . We do the same thing for 3 and get the merged hierarchy set containing 3 and 2 13 .

Dimension merging
This section concerns the merging of two dimensions having matched attributes which is realized by algorithm 3 . We consider both the schema and instance levels for the merging of dimensions. The schema merging is based on the merging of hierarchies. Concerning the instances, we have 2 tasks: merging the instances and completing the empty values. The hierarchy set of the merged dimension is the union of the hierarchy sets generated by merging every 2 hierarchies of the original dimensions ( 3 -7 ). We also get a hierarchy set containing only the merged hierarchies but no original hierarchies ( ) which is to be used for the complement of the empty values ( 8 ). The attribute set of the merged dimension is the union of the attribute sets of the original dimensions ( 8 ).
Example 4.6. Given 2 original dimensions 1 and 2 in Figure  8 and their instances in Figure 6, we can get the merged dimension schema ′ in Figure 8. In ′ , 1 and 2 are the original hierarchies of 1 , 3 and 4 are those of 2 , 13 is a merged hierarchy of 1 and 3 , and 24 is a merged hierarchy of 2 and 4 . We can thus get When the root parameters of the two dimensions don't match, we will get a merged dimension for each original dimension, which is realized by 13 -25 . For each original dimension, the hierarchy set of its corresponding merged dimension is the union of all hierarchy sets generated by merging every 2 hierarchies of the original dimensions ( 13 -18 ), the attribute set is the union of the attributes of each hierarchy in the merged dimension ( 19 -24 ). Similar to the first case, we get a hierarchy set containing only the merged hierarchies for each original dimension ( 1 and 2 ) ( 26 -27 ).
Example 4.7. Given 2 original dimensions 1 and 2 in Figure  5 and their instances in Figure 7, after the execution of algorithm 3 , we can get the merged dimension schema 1 ′ and 2 ′ in Figure 5. In 1 ′ , 1 and 2 are the original hierarchies of 1 , 1 13 is the merged hierarchy of 1 and 3 . In 2 ′ , 3 is the original hierarchy of 2 , 2 13 is the merged hierarchy of 1 and 3 . So for 1 , we have

Instance merging and complement.
When the root parameters of the two dimensions match, the instance of the merged dimension is obtained by the union of the two original dimension instances which means that we insert the data of the two original dimension tables into the merged dimension table and merge the lines which have the same root parameter instance ( 9 ). The complement of the empty values is realized by Algorithm where the input 1 ′ is the merged dimension table having empty values to be completed, 2 ′ is the merged dimension table which provides the completed values and is the hierarchy set of 1 ′ containing only merged hierarchies but no original hierarchies. In this discussed case, ′ is inputted as both 1 ′ and 2 ′ in since we get one merged dimension including all data of two original dimensions ( 11 ).
. is not null) then 8: for each ∈ do 9: . ← For an empty value, we search for an instance which has the same value as the instance of this empty value on one of the parameters rolling up to the parameter of the empty value and whose value of the parameter of the empty value is not empty, we can then fill the empty by this non-empty value. The complement of the empty values is also possibly a change of hierarchies. Nevertheless, after completing the empty values of an instance, there may be some completed parameters which are not included in the hierarchies of the instance, so the complement of such values does not make sense in this case. The possible change of the hierarchy is from the hierarchies containing less parameters to those containing more parameters. We know that the merged hierarchies contain more parameters than their corresponding original hierarchies. Hence, before the complement of an instance, we will first look at the merged hierarchies to decide which parameter values can be completed.
In algorithm 4 which aims to complete the empty values, for each hierarchy in the merged hierarchy set we see, if (a) there exists instances in the merged dimension table which contains empty values on the parameters of this hierarchy ( 3 ) and (b) where the value of the second lowest parameter is not empty ( 3 ). The condition a is basic because we need empty values to be completed. Since we will complete the empty values by the other lines of the merged dimension table, we can only complete the empty values based on the non-id parameters since the id is unique, so if the second lowest parameter is empty, it can never be completed so that the hierarchy can never be completed. That's why we have the condition b. For each one of the instances satisfying these conditions ( ), we search for the parameters ( ) having empty values ( 5 ) and to make sure that each one of them can be completed, we search also for the parameters ( ) which roll up to the lowest of them and to which we refer to complete the empty values ( 6 ). We can then complete the empty values like discussed in the previous paragraph ( 7 -11 ).
Example 4.9. After the merging in 4.9, we get the empty values of ′ which are in red in Figure 6. The merged hierarchies are 13 and 24 as illustrated in Figure 5. For 13 , the instances of code 3 and 5 have empty values on the second root parameter , which do not satisfy the condition b. As we can see, for the instance of 3, although the value of can be retrieved through the value of which is the same as the instance of 1, the value of can not be completed and thus we should give up this complement. For the instance of 9, the value of is completed by 7 which has the same value of and whose value of is not empty. When it's the turn of 24 , values of of 8 and 9 are completed in the same way.
When the root parameters of the two dimensions don't match, the instance merging and complement are done by 26 -27 . The values of the attributes of one of the dimension tables coming from the other dimension table are empty, so there is only instance complement but no merging. We also call algorithm 4 to complete the instances for each one of the merged dimension tables.

Star merging
In this section, we discuss the merging of two stars. Having two stars, we can get a star schema or a constellation schema because the fact table of each schema may be merged into one schema or not. The star merging is related to the dimension merging and fact merging. Two stars are possible to be merged only if there are dimensions having matched root parameters between them. For the dimensions of the two stars, we have two cases: 1. The two stars have the same number of dimensions and for each dimension of one schema, there is a dimension having matched root parameters in the other schema. 2. There exists at least one dimension between the two stars which does not have a dimension having a matched root parameter in the other.
The dimension merging of two stars is common for the two cases which is done by algorithm 5 . We first merge every two dimensions of the two stars which have unmatched root parameters because the merging of such dimensions is able to complete the original dimensions with complementary attributes ( 1 -7 ). Then the dimensions having matched root parameters are merged to generate the merged dimensions of the merged multidimensional schema ( 8 -15 ). After the merging and complement of the instances of the dimension tables, there may be some merged hierarchies to which none of the instances belong. In this case, if there will be no more update of the data, such hierarchies should be deleted. There may also be original hierarchies in the merged dimensions such that there is no instance which belongs to them but does not belong to any merged hierarchy containing all the parameters of this original hierarchy. The instances belonging to this kind of hierarchies belong also to other hierarchies which contains more parameters,so they become useless and should also be deleted ( 18 -19 ).  Figure 6 at the instance level. In their merged dimension table ′ . We can find that all the instances belonging to 4 also belong to 24 which is a merged hierarchy containing all the parameters of 4 , so 4 should be deleted.
We then discuss the merging of the other elements in the two cases which is processed by algorithm 6 : Output:A merged multidimensional schema which may be a star schema ′ or a merged constellation schema ′ return ′ 6: else 7: For the first case, we merge the two fact tables into one fact table and get a star schema. The measure set of the merged star schema is the union of the 2 original measures ( 3 ). The fact instances are the union of the measure instances of the two input star schemata ( 4 ). The function associating fact instances to their linked dimension instances of the merged schema is also the union of the functions of the original schemata ( 4 ).  Figure 8, the dimension merging is discussed above so we mainly focus on the merging of fact table instances here. The dimensions , of 1 have respectively matched root parameters in the dimensions , of 2 . They also have the same number of dimensions. Therefore we get a merged star schema ′ , the original fact tables are merged by merging the measures of 1 and 2 to get the fact table of ′ . At the instance level, in Figure 9, we have the instances of the fact tables, for the instances of 1 and 2 , the framed parts are the instances having the common linked dimension instances, so they are merged into the merged fact table ′ , the other instances are also integrated in ′ but with empty values in the merged instances, but they will not have big impacts on the analysis, so they will not be treated particularly.
For the second case, since there are unmatched dimensions, the merged schema should be a constellation schema. The facts of the original schemata have no change at both the schema and instance levels and compose the final constellation.

EXPERIMENTAL ASSESSMENTS
To validate the effectiveness of our approach, we applied our algorithms on benchmark data. Unfortunately, we did not find a suitable benchmark for our problem. So, we adapted the datasets of the TPC-H benchmark to generate different DWs. Originally, the TPC-H benchmark serves for benchmarking decision support systems by examining the execution of queries on large volumes of data. Because of space limit, we put the test results in github 1 . 1 https://github.com/Implementation111/Multidimensional-DW-merging

Technical environment and Datasets
The algorithms were implemented by Python 3.7 and were executed on a processor of Intel(R) Core(TM) i5-8265U CPU@ 1.60GHz with a 16G RAM. The data are implemented in R-OLAP format through the Oracle 11g DBMS. The TPC-H benchmark provides a pre-defined relational schema 2 with 8 tables and a generator of massive data.
First, we generated 100M of data files, there are respectively 600572, 15000, 25, 150000, 20000, 80000, 5, 1000 tuples in the table of  ,  ,  ,  ,  , , and . Second, to have more deeper hierarchies, we included the data of and into and , and those of into . Third, we transformed these files to generate two use cases by creating 2 DWs for each case. To make sure that there are both common and different instances in different DWs, for each dimension, instead of selecting all the corresponding data, we selected randomly 3/4 of them. For the fact table, we selected the measures related to these dimension data. Since the methods in the related work do not have exactly the same treated components or objective as the ours, we do not have comparable baseline in our experiments. The objective of this experiment is to merge two star schemata having the same 4 dimensions with the matched lowest level of granularity for each dimension.

Star schema generation
After executing our algorithms, we obtain one star schema as shown in Figure 11 which is consistent with the expectations. The parameters of the hierarchies satisfy the relationships of functional dependency. The run time is 30.70s. The 3 dimensions , , of the original DWs are merged. Between the different dimensions 1 . and 2 . , there is a matched attribute , so they are also merged such that 1 . provides 2 .
with the attribute . Then the in the merged DW also has the attribute . We can also observe that normally, in the merged schema, there should be the original hierarchy → ℎ → of 2 . but which is deleted. By looking up in the table, we find that there is no tuple which belongs to this hierarchy but not to → ℎ → → , that's why it is removed. At the instance level, the result is shown in github. Table 1 shows the number of tuples of the original DWs ( 1 , 2 ), of the merged DW ( ′ ) and the number of the common tuples ( ∩ ) (tuples having   Table 2 shows the number of these attributes in the original DWs ( 1 , 2 ) and in the merged DW ( ′ ), we can then get the number of the completed values + for these attributes. They meet the relationship ′ = 1 + 2 + + .

Constellation schema generation
The objective of this experiment is to merge two star schemata having the same 2 dimensions ( , ) with the same lowest level of granularity for each dimension, as well as 2 different dimensions ( 1 . and 2. ).

Figure 12: Constellation schema generation
At the schema level, the second test generates a constellation schema like shown in Figure 12. The run time is 32.13s. As expected, the 2 dimensions , of the original DWs are merged, the other dimension and fact tables are not merged. The dimension gains a new attribute by the merging between 1 . and 2 .
. We can see that the hierarchy → of which should be in the merged schema is deleted because there is no tuple which belongs to this hierarchy but not to → → . The hierarchy → of is removed due to the same reason. At the instance level, the data of experiment can be found in github. They also meet ′ = 1 + 2 − ∩ . There are empty values of the attribute in the dimension and which are completed which meet ′ = 1 + 2 + + .
We got the results conforming to our expectations in the tests, we can thus conclude that our algorithms work well for the different cases discussed at both schema and instance levels.

CONCLUSION AND FUTURE WORK
In this paper, we define an automatic approach to merge two different star schema-modeled DWs, by merging multidimensional schema elements including hierarchies, dimensions and facts at the schema and instance levels. We define the corresponding algorithms, which consider different cases. Our algorithms are implemented and illustrated by various examples.
Since we only discuss the merging of DWs modeled as star schemata in this paper, which is only one (albeit common) possible DW design, we plan to extend our approach by adding the merging of DWs modelled as constellation schemata in the future. There may also be so-called weak attributes in DW components. Thus, we will consider them in future work. Our goal is to provide a complete approach that is integrated in our previous work concerning the automatic integration of tabular data in DWs.