Centralized architecture is the most common pattern of large data warehouses. Consists of all described on the previous lecture part (ODS layer is optional).
Federated architecture is used in cases where for various reasons can not arise Central, a large data warehouse, gathering all the data. Then the storage place are operational data stores, and wholesale, there is only virtually - as a common conceptual and logical schema (that is, as niezmaterializowana perspective). Inquiries directed to the warehouse to be automatically translated into the language of the ODS and counted in a distributed manner. The disadvantage of this solution is less performance, but it may be sufficient if the wholesalers Scene (materialized due to efficiency) required to support all analytical functions.
Layered architecture is a variant of one of the previous two architectures, in which we build multiple layers of thematic warehouse containing increasingly higher levels of aggregation. Data from successive layers are calculated from the previous ones. For performance reasons, all layers are materialized. The advantage of a layered architecture, good design is to optimize the size of data warehouses modes (the ability to quickly calculate the less detailed statements, while leaving the possibility of reaching for a more detailed, but also aggregated, the data in the previous layer), as well as shorten the time update (because of the greater dispersion calculations and avoiding redundancy).
The logical structure of individual elements of warehouse
The central data warehouse is designed as a simple database. The starting element is the data that we store - their logical schema (split into arrays, etc.) may reflect the underlying data, if it is uniform. If not, the data must be the integration of conceptual and logical (see next lecture), so that they can be found in the central data warehouse. In addition, we often need to enrich the data with additional logic elements associated with osługą data warehouse. Often this is an additional column that indicates the date of introduction of the record to the data warehouse. Additional structures regarding time and time data changes tend to be complex - see the issue of retrospection described in the next chapter.
Wholesale themes are designed for OLAP tasks. In such systems, ROLAP (relational OLAP) is dominated by logic diagrams called a model of a star or snowflake. In the case of purely implementation of multidimensional (MDDB), these models may not be publicly available, but still it is good to think of them as a source of data for multi-dimensional cubes.
Recall the basic terms of multidimensional data:
- Fact - a single underlying event analyzes (eg, sales). The facts are described by the dimensions and measures.
- Dimension - a feature that describes a fact that allows it to associate with other business model concepts (eg customer, date, place, product). The dimensions are described attributes.
- Attribute - a feature dimension, which stores additional information about the fact (for example, may have a date dimension attributes: month, quarter, year, size client can have attributes: name, region zmieszkania).
- The measure - a numerical value assigned to a given fact (such as sales, number of units).
Star model is as follows:
The central fact table contains records indicate the individual facts to be examined and stocktakings. Around this central array are (usually much smaller) arrays describing each dimension. Attributes are the dimensions of the column array dimensions, the array of facts essentially contains only the facts measure (ie, subject stocktakings size) and the keys to the array dimensions.
In principle, an array of facts is the largest (millions of records) and fast incremental, and tables of dimensions are subject to slight changes. These are usually growth data in the event of having to change the description of the item given dimension applies the principles of flashbacks - see the next section. Note that the dimension of time is expressed as an array. An array containing the following days in the calendar may seem of little use, but often facilitates the construction of warehouses and ensure the transparency of the scheme. It also allows you to broadcast your own attributes different days (eg day off from work in the institution).
Normalization of a star schema by modeling attributes using the following tables leads to a model of a snowflake. Snowflake model reproduces the hierarchy of dimensions, the situation in which one dimension has many different levels of detail. For example, the dimension 'product' can have an attribute "manufacturer", modeled in a separate table, and in parallel - the attribute "product group", also modeled in an array.
Warehouse Design - Model Spot
Multidimensional data modeling (star, snowflake) is performed in order to make OLAP analysis. In designing this part of the data warehouse, you might want to consider the business reasons for creating thematic warehouse. Sample (assumed) effects and the means to achieve them are:
- Increased sales, increased market share, reducing the proportion of departing customers.
- Measure: analysis of customer behavior, the location of weak points, tracking trends.
- Tool: Analysis of collected in the warehouse (integrated and cleaned) data warehouse design theme. This latter task is also on reducing the amount of data in the warehouse theme (due to the speed of calculations and technical and financial constraints).
Designing a warehouse theme, first make a conceptual modeling analyst needs to select the relevant part of the central data warehouse and design process of creating and updating thematic warehouse. Conceptual model reflects the relationships between concepts related to the business, such as:
- Who is a "customer" and what we know about him?
- As is described in a different branch offices?
- What the term "sell the fact" would like to understand the analyst?
- What model of multidimensional data can help us in creating business reports?
Example of a conceptual modeling technique to modeling wholesale thematic point. Information on concepts, which are accessible to the analyst, are presented in a diagram, where:
- The facts are represented in points
- Dimensions are represented by name
- Similarly, we represent successive levels of the hierarchy
The model can include multiple (constellation) facts, benefiting in part from the common hierarchy of dimensions
The model also save the next point about include:
- Attribute names
- Data types
- Referential integrity
- FLASHBACK (variation of attribute values over time)
- Refresh rate
- Origin of the (source, transformations)
- Business metadata (description in natural language or formal)
"True" - saves any changes in value along with the exact time they were done,
"False" - new record replaces the old,
"Sustainable" - we do not anticipate changes in value.
In theory, all data should be a real flashback. This, however, considerable technical difficulties: the simple question becomes complex, because they must include information from auxiliary tables recording the history of changes. On the other hand, the false flashback may distort the results of our analysis. For example, if we analyze the turnover of customers depending on their address, it will move the client to assign all of a sudden his transactions to the new place of residence (which is true).
No comments:
Post a Comment