Data Warehouses
Data Warehouse
A data warehouse is the main storage server hosting an organizations
historical data, its corporate memory. It is distinct from operational
or transactional systems supporting the business functions.
A data warehouse can be normalized or denormalized. It can be a
relational database, multidimensional database, flat file, hierarchical
database, object database, etc.
Uses of Data Warehouse
The datawarehouse is used to contain the raw data for management's
decision support systems [business intelligence tools].
Not all data warehouses are used for decision support. Many data
warehouses are used for post-decision monitoring of the effects
of decisions [for "operational" issues].
Main Benefit
The main benefit of a data warehouse is that complex queries and
analysis, such as data mining, can be made on organisational information
without slowing down the operational or transactional systems.
DW Architecture
A typical, traditional Data Warehouse architecture includes three
main components:
- Server
- Storage [SAN or Storage Area Network]
- Database software

These DW solutions were typically built from systems developed
for on-line transaction processing [OLTP]. They were not designed
to support large and complex Business Intelligence (BI) analysis.
Inherent constraints result in limited performance and high costs
of acquisition and ownership.
For this reason, more innovative DW solutions evolved to DW integrated
with Analytics, such as:
Performance Constraints
General-purpose servers - same computers used
in data centers [web servers,
email servers or application servers] designed for OLTP applications
requiring efficient, RAM-based operations on individual data elements.
General-purpose storage - arrays require time-consuming,
careful synchronization of loaders and data striping mechanisms
to ensure that data is distributed to all BI users.
General-purpose database - full power of general-purpose
database management systems
[DBMS] such as DB2 or Oracle is lost when embedded within general-purpose
hardware and used for data warehousing. The software is not designed
to extract optimal performance out of even the most advanced servers
and storage.
Defintion of Data Warehouse
A data warehouse can be defined in the following terms:
- Subject-oriented - The data in the database
is organized so that all the data elements relating to the same
real-world event or object are linked together.
- Time-variant - The changes to the data in
the database are tracked and recorded so that reports can be produced
showing changes over time.
- Non-volatile - Data in the database is never
over-written or deleted - once committed, the data is static,
read-only, but retained for future reporting; and
- Integrated - The database contains data from
most or all of an organization's operational applications, and
that this data is made consistent.
While operational systems are optimized for simplicity and speed
of modification [Online Transactional Processing, or OLTP] through
heavy use of database normalization and an entity-relationship model,
the data warehouse is optimized for reporting and analysis [online
analytical processing, or OLAP].
Data in data warehouses are often heavily denormalised, summarised
or stored in a dimension-based model. This is not always required
to achieve acceptable query response times.
Types of Data Warehouses
With technology improvements supporting lower cost for more performance
and to support user requirements for faster data load cycle times
and more features, data warehouses have evolved through different
stages:
Off line Operational Databases - Data warehouses
developed by simply copying the database of an operational system
to an off-line server where the processing load of reporting does
not impact on the operational system's performance.
Off line Data Warehouse - Data warehouses are
updated on a regular time cycle (usually daily, weekly or monthly)
from the operational systems and the data is stored in an integrated
reporting-oriented data structure.
Real Time Data Warehouse - Data warehouses are
updated on a transaction or event basis, every time an operational
system performs a transaction [e.g. an order or a delivery or a
booking etc.]
Integrated Analytics Data Warehouse - Data warehouses
are used to generate activity or transactions that are passed back
into the operational systems for use in the daily activity of the
organization. These are now fully integrated to the various types
of business intelligence tools, such as MS Excel, Scorecards, Dashboards,
Analytic Services.
Advantages Of Data Warehouses
There are many advantages to using a data warehouse, including:
- Enhances end-user access to a wide variety of data.
- Decision support system users can obtain specified trend reports,
e.g. the item with the most sales in a particular area/country
within the last two years.
- A data warehouse can be a significant enabler of commercial
business applications, most notably customer relationship management
(CRM).
Data Warehouse Concerns
There are some general concerns regarding the use of data warehouses:
- Extracting, transforming and loading data consumes a lot of
time and computational resources.
- Data warehousing project scope must be actively managed to
deliver a release of defined content and value.
- Compatibility problems with systems already in place.
- Security could develop into a serious issue, especially if
the data warehouse is web accessible.
- Data Storage design controversy warrants careful consideration
and perhaps prototyping of the data warehouse solution for each
project's environments.
Next: Teradata
Active Data Warehouse
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
Database Index | Relational
Model | Object Model | Other
Models | DBMS | Contextual
Types | Microsoft SQL | Oracle
OODBMS | Data Warehouse | Teradata
ADW | Data Mart | MS
vs Teradata | Development Platforms
| ODBC | JDBC | SMP
| MPP | SN
| Glossary
|