Data Warehouse

In Data Warehouse we can store data for analysis and security purpose. Data warehouse contain historical data that is coming from running systems data is also coming from different sources. In data warehouse we have lot of data and consolidate these data that is coming from different data sources analysis this data then get final figure of actual analysis.

In data ware house data is coming from different data sources and different system that are running in the organization like sale’s system, pay roll’s system, inventory’s system etc.

Three functions layers of Data Warehouse:

  • Staging:

It is use for to store data for use by developer analysis and support.

  • Integration:

This layer is use for to integrate data and to have a level of abstraction from users

  • Access:

In this layer we getting the data from the user

Historical prospect of Data Warehouse:

In 1980 two scientist Barry Devlin and Paul Murphy developed an application name “Business Data Warehouse”. Main aim to make the system is that to get the data from any source in raw form then collect these data into one main system then use for decision support systems.

It was typical for multiple decision support environments to operate independently. The operational systems were frequently reexamined as new decision support requirements. Other developers also contribute in data warehouse project and find some core features in this system after 60s.

 Contribution in Data Warehouse:

  • In 1960: General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
  • In 1970: ACNielsen and provide dimensional data marts for retail sales. Bill Inmon begins to define and discuss the term: Data Warehouse
  • In 1975: Sperry Univac Introduce MAPPER (Maintain, Prepare, and Produce Executive Reports) is a database management and reporting system.
  • In 1983:  Tera data introduces a database management system specifically designed for decision support.
  • In 1988: Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems.
  • In 1990: Daniel Linstedt begins work on Developing the Data Vault model and methodology for data warehouses
  • In 1991: Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.
  • In 1995: The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
  • In 1996: Ralph Kimball publishes the book The Data Warehouse Toolkit.
  • In 2000: Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.

 Data Warehouse Models:

There are many data warehouse models some are listing below.

  • Online Transaction Processing:

Online Transaction Processing, which is a data warehouse model, is built for speed and ease of use.

  • Online Analytical Processing:

Online Analytical processing is more difficult to use and adds an extra step of analysis within the data. Usually it requires more steps which slows the process down and requires much more data in order to analyze certain queries.

This model have some characteristics.

  1.  Subject Oriented
  2. Time Variant
  3. Non Volatile
  4. Integrated

Subject oriented means that data is linked together and is organized by relationships. Time variant means that any data that is changed in the data warehouse can be tracked.

Usually all changes to data are stamped with a time-date and with a before and after value, so that you can show the changes throughout a period of time. Non volatile means that the data is never deleted or erased. This is a great way to protect your most crucial data. Because this data is retained, you can continue to use it in a later analysis. Finally, the data is integrated, which means that a data warehouse uses data that is organizational wide instead of from just one department.

Data Warehouse Types:

With different characteristics we have divide data warehouse in to for types that are given below.

Offline Operational Data Warehouse:

Offline Operational Data Warehouses are data warehouses where data is usually copied and pasted from real time data networks into an offline system where it can be used. It is usually the simplest and less technical type of data warehouse.

Offline Data Warehouse:

Offline Data Warehouses are data warehouses that are updated frequently, daily, weekly or monthly and that data is then stored in an integrated structure, where others can access it and perform reporting.

Real Time Warehouse:

Real Time Data Warehouses are data warehouses where it is updated each moment with the influx of new data. For instance, a Real Time Data Warehouse might incorporate data from a Point of Sales system and is updated with each sale that is made.

Integrated Data Warehouse:

Integrated Data Warehouses are data warehouses that can be used for other systems to access them for operational systems. Some Integrated Data Warehouses are used by other data warehouses, allowing them to access them to process reports, as well as look up current data.

 Data Warehouse Advantages & Disadvantages:

Advantages:

  1. Main purpose of data warehouse implementation is that everyone or end user can easily access all kind of data for different purposes or reports.
  2. By using data ware house you can get data and analysis you business future with the help of this data.
  3. Consistency is also very big feature of data ware house; you can store and get common data of different warehouses simultaneously.

Disadvantages:

  1. The main disadvantage of data warehouse is that in data warehouse all types of data is available for analysis purpose that is some time danger for organization rules and regulation because data is openly available for all type of users.
  2. Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues
  3. bullet  Long initial implementation time and associated high cost
  4. bullet  Adding new data sources takes time and associated high cost
  5. bullet  Limited flexibility of use and types of users – requires multiple separate data marts for multiple uses and types of users
  6. bullet  Typically, data is static and dated
  7. bullet  Typically, no data drill-down capabilities
  8. bullet  Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries