Microsoft SQL Server 2008
Microsoft SQL Server 2008 aims to provide a comprehensive, scalable
data warehouse platform with enhancements for data warehousing that
make it faster to build a data warehouse.
SQL Server 2008 integrates all relevant data into a comprehensive
warehouse platform, empowering businesses to manage their growing
data volumes with an enterprise-ready, relational database.
With this capability, organisations are better equipped to deliver
actionable insights through the integration of the Microsoft Business
Intelligence (BI) platform.
The key elements of this 2008 version over previous 2005 include:
- Manageablity
- Reliability - Server Core, Networking, Clustering
- Web - Internet Information Services, Sharepoint Services, Media
Services
- Security - Read Only Domain Controller
- Virtualization - Terminal Services, Gateway Apps
SQL Server 2008 & Data Warehouse
MS Server 2008 provides major performance improvements in Data
warehouse scenarios
Improvement in performance and scalability stem from improvements:
- Integration Services
- Analysis Services
- Reporting Services
- Administration
MS SQL Server 2008 has been tested on large-scale, customer driven
configuration:
Database Engine: to 100 billion fact table rows
on a 32 core machine
Analysis Services: to 25 billion fact table rows
on 16 core hardware
Improvements can be considered in three pillars:
- Extract, Transform and Load [ETL] – Merge, Minimal Logging,
Insert Over DML, Change Data Capture, Integration Services Threading
Model, Lookup Enhancements and Compression
- Query and Analysis
- Administration and Manageability
ETL Improvements
Merge
Syntax SQL standard used to combine multiple data insert operations
into a single statement. Useful in DW to make ETL more efficient.
MS has also added support for merge statement, to comply with SQL
standard. If the source matches the target, it performs an update,
if there is no match, it deletes the source.
For instance take the daily trades and merge them into the stock
tables. If we reach as tock of Zero, the record is deleted. If the
stock is not Zero, the value is updated. If no match, then INSERT
the values [Stock and Delta]
Logging
Improves INSERT statements by logging only what is absolutely necessary
for rollback:
- Normally individual rows are logged
- Page allocations are sufficient to UNDO insertions
- The recovery model must be simple or bulk logged
Expect 3-5 times performance boost over fully logged INSERT compared
to current SQL server fully logged INSERT. This helps maintain user
SLAs inside service windows.
Insert / DML
The INSERT statement has the option of being able to consume DML
results. This is useful in tracking history of slow changing dimensions,
and dumping DML data streams to a secondary table for post-processing.
Change Data Capture
This mechanism provides for easy tracking of changes on a table.
The changes are capatured fro the log asynchronously, and information
is shown as to what has changed on source table. Easily consumed
from Integration services.
Capture process consumes the changes, inserts into change table,
which changes them further. Stored Procedures are used to qwitch
on Change Data Capture Table
New Look Up
In previous versions, the initialisation process of a look up process
took longer than the ETL maintenance process. The main improvement
is that you are able to persist the tables on a local disk so look
up does not have to be run on the database. This provides an unrestricted
cache size and persistent cache
Compression On Data base Engine
The main goal of compression is to shrink data warehouse fact tables,
and thus improve query performance. Can be on a table or index basis.
There is a trade off on storage or CPU usage.
- In SQL 2005 – decimal format was compressed
- In SQL 2008 - extends this to other formats
Prefix compression - A prefix list is stored in
the page for common prefixes. Individual values are replaced by:
- Token for the prefix
- Suffix for the value
Dictionary Compression – a common value
dictionary is stored in the page, allowing common values to be replaced
by tokens.
Going over the data in the rows to ID the common values, these
are represented in the green area, then point to tokens in the dictionary.
Gives real savings of up to 7x, depending upon data. Different degrees
of redundancy in data.
Analysis Services Enhancements
The design enhancements may be the best performance and scalability
improvement. By integrating best practice knowledge into the designers,
using:
- Presentation of attribute relationships
- Improved wizard output
- Simplified creation of composite keys
- Analysis Management Objects [AMO] warnings
- Improved support functions
Query and Analysis
Partitioned Table Parallelism
The major change here is in how threads are allocated to table
query. For example, running weekly queries:
- On Monday only running on one week – previous week. All
threads used for a single partition.
- On Tuesday, have to run across two weeks, two partitions. Using
two threads, one for each partition.
In 2008 an arbitrary number of threads [CPUs] against the first
partition, then move automatically into the second partition, so
you always get the full degree of performance on your data.
On a mid range machine, where most of the data is in one partition
and only a small volume in the second partition.
2008 just changes the way we deal with multiple partitions, with
improvements most noticeable when most of the data is in one partition.
Tests on a 4-core machine [x64] and 40-GB fact table, partitioned
by week provided staggering improvements from SQL Server 2005 performance.
| |
Table 1 |
Table 2 |
| SQL Server 2005 |
1276 ms |
15,780 ms |
| SQL Server 2008 |
368 ms |
992 ms |
| Improvement Factor |
3.5x |
15x |
Star Joint Queries
Decision support query that joins fact tables. If using compression,
expect to see scans speed up. Using Hop query processing –
selects 10-75% of fact table.
Bitmap filters in SQL 2005, now 2008 can have multiple to do semi
join productions. Objective to move all bit map filters for all
dimensions and locate in fact table. Then process the first bit
map filter that has the most data first, to reduce the volume of
data to be queried.
Query response times improved by 17% - 23%. A query in 2005 took
one hour, 2008 out of box impoved down to 35 seconds.
Reporting Enhancements
With a scalable reporting engine and word rendering, you no longer
need to rely on IIS, you can now render reports into Word, making
report layout more flexible.
Administration
Back up Compression
Keeping disk-based back ups online is expensive, and backups take
longe, when maintenance windows are reducing.
With SQL 2008 you can add a COMPRESSION clause to BACKUP statement.
This means that less storage is required to keep backups on line,
backups will run significantly faster [less IO is done] and restore
automatically detects compression and adjusts accordingly
In SQL Server 2005 - back up times grow exponentially
beyond 20G
In SQL Server 2008 - with the replaced back up
system, times only grow linearly.
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
DW Solutions | Teradata
ADW | Netezza NPS | DATAllegro
| HP NeoView | Teradata
12.0 | Green Data Centers
|