The Business Intelligence Guide
   BI Strategy | BI Program | BI Projects | BI Data | BI Infrastructure | BI User Tools | BI Vendors | Resource Guides | Articles | BI Blog

Get a FREE Sample of the
The Total BI Guide
THE LOGICAL ORGANIZATION

and receive the
TLO NEWSLETTER
Just enter your details below

Email:
Name:
 
HOME
 
Business Intelligence
BI Strategy
BI Program Guide
BI Tools
- Dashboards
- Scorecards
- Operational BI
- Analytics
BI Software Solutions
Data Management
Databases
DataWarehouses
Decision Support
Industry Solutions
Case Studies
BI Surveys & Awards
RESOURCES
ARTICLES
BI EVENTS
BI NEWS
Sitemap

About the Author

Google+ Gail La Grouw

Data To Query Processing


Traditional database platforms operate by:

  • Reading data off disk
  • Bringing it across an I/O interconnection
  • Loading it into memory for processing.

Data normally flows smoothly for routine OLTP applications [processing invoices or looking up customer records] that focus on individual data elements.

Moving data across multiple backplanes and I/O channels performs very poorly when:

  • The amount of data to be queried is very large
  • The query involves complex joins requiring multi-phase processing
  • Data is changing rapidly


Unlike OLTP, data warehousing is concerned with moving large quantities of data through the system’s analysis and processing engine as efficiently as possible, with a minimum of internal thrashing.

Where OLTP systems might be optimized to reduce data latency, data warehouse systems are typically much more concerned with data throughput. As a result, bringing OLTP-optimized system designs to build a data warehouse results in greatly reduced performance.

A complex join or other complex query may require a number of processing steps. Delivering multiple enormous tables (billions of rows) off disk, across the network and into memory for processing by the DBMS – all to perform one step is just not possible in this scenario.

All large-scale legacy data warehouse systems share these fundamental limitations to performance becoming worse with more users, data and queries.

To improve performance, DBMS systems typically employ complex schemas, indices,
aggregates and advanced partitions in an attempt to limit the amount of data required for movement and analysis within the architecture. This still results in mostly unsatisfactory performance.

 

SMP and MPP

Enterprise data warehouses use multiprocessing architectures, either:


SMP

SMP systems have several processors, each with its own memory cache, forming a pool of computing resource. Threads of code are automatically distributed, using load balancing, by the operating system for execution.

Resources such as memory and the I/O system are equally accessible to each of the processors.

SMP architecture is limited in its ability to move large amounts of data as required in data warehousing and business intelligence applications.

More on Symmetrical Multiprocessing


MPP

MPP systems consist of very large numbers of loosely coupled processors, each with its own memory, backplane and storage, and operating system. The nothing-shared 'nodes' allow near linear scalability, and software is parallelizable.

The key advantages of MPP are the high performance and high availability. If one node fails, another can take over.

Hybrid SMP / MPP

Pure MPP systems are very high cost, due to additional memory and I/O components, as
well as the administrative challenges in setting up and managing many semi-independent systems.

Typical MPP systems are implemented virtually in clusters of SMPs, often with some sharing of I/O resources. This preserves some of the performance and scalability advantages of MPP while reducing costs and administration time.

 

Limits of SMP Processing

Large SMP systems with additional processors and shared memory to deliver much higher computing power, and large-scale SMP systems are frequently deployed for
data warehousing. Dozens of processors are sharing memory and storage.

When data volumes rise, systems based on SMP architectures tend to outgrow their memory, backplane and I/O resources. As processors take turns accessing massive amounts of data in memory, the memory bus becomes a bottleneck that results in poor performance. Whilst the number of processors and memory can be increased, SMP processing is contrained due to:

  • Memory bus bandwidth is limited and becomes saturated.
  • I/O bus bandwidth is also limited, and can become congested as the amount of data sent from the storage area network to process a query increases.

SMP systems suffer from less-than-linear scalability and a progressive decline in performance as the system grows.

 

MMP on Clustered SMP

An alternative architecture consists of small SMP clusters operating in parallel, sharing a storage area network and management structure.

Each CPU within an SMP node shares RAM and access to the storage network over a shared I/O bus.

Vendors offering database solutions using this approach include:

  • Teradata
  • IBM DB2 Integrated Cluster Environment [ICE]

The constraints of this approach include a bottleneck on resources sharing that limits performance and scalability. SMP nodes contend for access to storage over a common I/O
bus.

This architecture is intended for traditional database applications – not for pushing enormous amounts of data through a shared pipeline. Even with separate I/O paths contraints on data flowing from storage to an SMP cluster for processing are inherent.

 

“Shared Nothing” MPP

In a “shared-nothing” architecture, processor-disk pairs operating in parallel divide the workload to execute queries over large sets of data.

The best vendor solution following this approach is provided by Teradata.

Each processor communicates with its associated disk drive to get raw data and perform calculations.

One processor is assigned to collect the intermediate results and assemble the query response for delivery back to the requesting application.

Scalability

As there is no contention for resources between MPP nodes, this architecture is scalable to terascale database sizes.

Interconnect Constraint

BI queries require significant movement of data from disks to processors. Whilst the processor-disk pairs operate independently, they typically share a proprietary common interconnect. Response times suffer when this becomes clogged with traffic.

Overhead

A 64K block of data may be moved to provide only 1K to respond to the SQL statement, the balance is overhead – unrelated data, project columns, join columns and other extraneous material that is wrapped around the relevant data and has to be filtered out by the processor.

When MPP architectures are used for large-scale query processing, the internal backplanes, busses and I/O connections between processor and storage cannot handle the amount of traffic.

Data transfer speeds decline with growing data volumes, limiting scalability. Such diminishing returns limit the advantage of abundant storage capacity.

Vendor Example - Teradata Datawarehouses

Performance limitations are further compounded in systems based on legacy components. Teradata uses its own generic servers, with storage provided by third parties. To gain maximum performance out of this older system architecture, the solution relies heavily on system tuning through complex choices of primary and secondary indexes and
table denormalization as well as space allocation.

However, the relationship between complex indexing and query speed is difficult to optimize, and highly dependent on the data and query.

As a result, indices are often mis-configured, increasing query response time instead of decreasing it. For this reason, Teradata DW systems take a lot of effort to index and tune.

 

Separate Data, Shared Storage

Processors operating in parallel share the same storage media, which is partitioned
so that processors don’t contend for the same data.

Systems sharing common storage are less costly than a shared-nothing architecture, where each processor has its own dedicated storage device.

However, performance suffers from the classical disadvantage of MPP architectures: system resources are overwhelmed as query data is transferred from disk to the processors. In addition, the common storage leads to scalability issues as data volume grows.

Vendor Example - IBM DB2

IBM DB2 database management system is a hardware-independent solution, consisting of a core database supported by a variety of server and storage options.

The cost of this independence is a range of supported products creating a complex mix of choices when designing a solution. For example:

Storage options - SANs or direct attached storage, requiring customers to understand
the I/O dependency of their warehouse on completely different types of storage architectures.

Elaborate tools - to tune, optimze and manage the database and its network of storage and processors for acceptable query performance.

Heavy administrative workload can be expected, with complexity growing as SMP cluster nodes are added.

Indexing

To compensate for performance constraints, indexing can be used to limit the amount of data examined in a query. DB2 provides a number of indexing strategies for warehouse applications, the combination of partitioning and indexing across distributed nodes makes configuration and loading considerably more complex.

The range of tuning and management challenges of this configuration can significantly increase the TCO and destroy the cost-benefit equation.

 

Shared Data and Storage

Multiple processors operating in parallel access shared data residing on a common
storage system.

Lock Manager

A lock manager is used to prevent simultaneous access to the same data by multiple
query processes. Access to shared data is coordinated via messaging between processes and the lock manager.

Vendor Example - Oracle 9i or 10g Real Application Cluster [RAC] relational database management system.

Shared data architecture supposes that partitioning strategies do not adversely affect query performance. However, like the two previous designs, this approach requires the transfer of massive amounts of data from storage to processors. Shared data architecture contention issues further limit performance and
scalability.

The locking and caching mechanisms used by Oracle to prevent processor contention impose a ceiling on the data scalability of its RAC. Oracle recommends users deploy partitions and indices to help improve query performance, thus eliminating the intended simplicity of shared data architecture. Indexing schemes and their interactions with partitioning greatly increase set-up and maintenance complexity.

As with DB2, the Oracle 9i and 10g RAC solutions are hardware and operating system-independent, capable of running on a variety of servers. This can add weeks of installation assembly, testing, debugging and fine-tuning of system parameters.

 

Blade Servers

Blade servers provide high-density computing, with enormous computing power in a compact frame. They are a tightly integrated, consolidated infrastructure, with a common management framework providing control as a single virtual system.

Each blade has its own collection of processors, memory and I/O capability. Dozens of blades are installed in a single chassis sharing storage, network and power resources. Typically, each blade functions as an SMP cluster of processors and shared RAM, within a matrix of blades operating in parallel, resembling a tightly consolidated version of the MPP on clustered SMP.

A number of vendors offer data warehouse solutions based on blade technology.

I/O Processing Bottleneck

Each blade communicates over the system midplane to a storage area network – a route shared with all the other blades in the rack. Blades also suffer bottlenecks, where massive amounts of data have to be delivered from storage to processors over a common I/O pathway.

Deploying legacy software may exacerbate the bottleneck issues.

Whilst 1G blade servers provide a better performing computing platform than legacy SMP and clustered-SMP/MPP, they still suffer from the same inefficiencies and complexity of traditional data warehouse solutions.

Vendor Example - “Google Search Appliance,”

A custom-designed blade server developed to enable ultra-high speed content searches for the enterprise.

 

Data Warehouse Appliance

To overcome the challenges inherent in above models of data warehousing required to fresh approach. One such solution, developed by Netezza [NPS] eliminated the barriers to performance of traditional systems.

The NPS system is a data warehouse appliance specifically built to enable real-time business intelligence and analytics on terabytes of data.

The NPS system combines server, storage and database in a single scalable platform based on open standards and commodity components.

The architecture provides significant performance advantage – ten to fifty times faster than other data warehousing systems.

By leveraging commodity components it delivers a huge cost advantage, at around half the cost of competitive systems. This approach also eliminates the high operating costs of general-purpose systems adapted for data warehousing. Rapid implementation takes hours, not weeks - there is no need for intensive database tuning, administration and system management.

Data Flow - Bringing the Query to the Data

The critical difference in the NPS appliance is based upon two guiding principles:

Asymmetric Massively Parallel Processing [AMPP] - provides both performance and scalability by using both SMP and MPP as best suited to meet the specific needs of BI applications.

Intelligent Query Streaming Technology - moving processing intelligence to a record stream adjacent to storage produces much better performance and scalability than the traditional approach of moving sets of records to a processor.

The combined approach results in very high real-time performance and scalability at a fraction of the cost of other vendor systems.

AMPP

Netezza’s AMPP architecture is a two-tiered system:

Tier One - A high-performance Linux SMP Host

  1. Compiles queries received from BI applications
  2. Generates query execution plans
  3. Divides a query into a sequence of sub-tasks, or snippets, that can be executed in parallel
  4. Distributes the snippets to the second tier for execution
  5. Returns the final results to the requesting application

Tier Two - High number of Snippet Processing Units [SPUs]

Operating in parallel, each SPU is an intelligent query processing and storage node that consists of:

  • A powerful commodity processor
  • Dedicated memory
  • Disk drive
  • Field-programmable disk controller with hard-wired logic to manage data flows and process queries at the disk level

The massively parallel, shared-nothing SPU blades provide the performance advantage of MPP. The highly autonomous SPUs perform:

  • Nearly all query processing - each SPU operating on its portion of the database.
  • Parallel processing operations such as record operations, parsing, filtering, projecting, interlocking and logging
  • Operations on sets of intermediate results - such as sorts, joins and aggregates [can also be done on the host, depending on processing cost]
  • Their own scheduling, storage management, transaction management, concurrency control and replication - reduces coordination requirements on the host and time-consuming maintenance tasks

This significantly reduces the amount of data required to be moved within the system, overcoming the major constraint of all previous models - I/O Interconnection.

The real strength of the Netezza solution is throughput performance optimization.

Intelligent Query Streaming Technology

Intelligent Query Streaming technology greatly reduces the data traffic among SPU nodes, and between SPU nodes and the SMP host.

Data flow is streamlined by placing silicon processors right next to the storage
device. Rather than moving data into memory or across the network for processing, the technology intelligently filters records as they stream off the disk, delivering only the relevant information for each query.

By performing this first level processing right at the disk, Netezza is at least ten times faster than conventional systems, with disk access speed providing the only limiting factor.

Intelligent Query Streaming is performed on each SPU by a Field-Programmable Gate Array (FPGA) chip
that functions as the disk controller, and is also capable of basic processing as data is read off the
disk. The system is able to run critical database query functions such as parsing, filtering and
projecting at full disk reading speed, while maintaining full ACID (Atomicity, Consistency, Isolation, and
Durability) transactional operations of the database. Data flows from disk to memory in a single
laminar stream, rather than as a series of disjointed steps that require materializing partial results.
With the Netezza approach, the pathways used by traditional architectures to deliver data to the host
are streamlined and shortened. Because the SQL is “understood” by the disk drive in a Netezza
system, there is far less reliance on CPUs, data modeling or bandwidth for performance:
• The storage interconnect, a bottleneck on traditional systems, is eliminated by direct attached
storage – data streams off the SPU disk and straight into the FPGA for initial query filtering.
• Intermediate query tasks are performed in parallel on the SPUs, where streaming processing sharply
reduces CPU workload.
• The gigabit Ethernet network connecting SPUs to the host and each other is used only for
transmitting intermediate results, rather than massive amounts of raw data. Network traffic is
reduced by approximately two orders of magnitude.
• The I/O bus and memory bus on the host computer are used only for assembling final results,
eliminating previous congestion.

Key Differences and the Netezza Advantage

The Netezza Performance Server appliance offers several fundamental advantages over traditional data warehouse architectures:

Data Flow

The AMPP architecture applies elements of SMP and MPP with most processing handled by the massively parallel snippet processing units, as early in the data flow as possible. This approach of “bringing the query to the data” eliminates extraneous traffic and resulting delays.

Traditional systems: SMP and MPP architectures developed for general-purpose systems
(including blade servers) are based on moving data from storage to the processors (“bringing the
data to the query”). When performing BI queries of massive databases, the flood of data creates
bottlenecks that result in slow (and often unacceptable) response times.

Storage Connection

NPS - Intelligent Query Streaming technology filters out unnecessary information as data streams off the disk, greatly reducing the processing burden downstream. There
are no storage interconnects in the traditional sense – the disk controller handling the initial processing is hard-wired to the disk drive. System performance is limited only by disk speed (the NPS system runs at “physics speed”)

Traditional systems - Storage system interconnections simply function as a conduit to deliver data from storage to its associated processor. System performance is limited by the capacity of the I/O bus.

Degree of Integration

NPS - Server, storage and DBMS are integrated in a compact, efficient unit designed
specifically for data warehousing. The system installs in hours, not weeks, and deploys quickly with no need for indexing, tuning, physical modeling or other time-consuming tasks. There is one vendor to manage, and none of the unnecessary components, awkward cabling or conflicting parameters that traditionally cause problems with patchwork solutions.

Traditional systems - Patchwork solutions based on general-purpose products mean a myriad of headaches, including multiple vendors to manage, lengthy and difficult implementations, complex tuning, lower reliability, higher power requirements and extra floor space.

Performance

By “bringing the query to the data,” the NPS appliance delivers at least an order of magnitude performance improvement for BI applications analyzing terabytes of data. Traditional delays are eliminated – analyses that previously took hours now take just seconds. Even “queries from hell” to uncover deeply buried patterns are handled with ease.

Low Acquisition and Operating Costs

As a purpose-built appliance, the purchase price of the NPS appliance is significantly lower than competing systems. Cost savings are even more attractive over the long term. While the care and feeding of traditional systems often requires several highly paid DBAs or system administrators, an NPS system supporting tens of terabytes is usually managed by a part-time administrator. Instead of partitioning table spaces, designing indices and performing all the other optimization tasks previously required, DBAs can devote their time to developing business-critical analyses that help their companies succeed.

Linear Scalability

While I/O bottlenecks are commonplace as general-purpose systems scale to accommodate complex
queries, additional arrays of snippet processing units can be added to the NPS system without
impacting performance. This is because query processing using the NPS architecture involves a
minute fraction of the data traffic associated with traditional systems, and because storage and
processing are tightly coupled into a single unit. The autonomy of the SPUs creates further conditions
for a highly scalable system, allowing SPUs to be added without worrying about coordination with
other units. As a result, growing data volumes can be planned for and accommodated without the
sudden, unexpected need for costly purchases.

 

NPS Query Processing Examples

Query     Business Benefit
Telco - 120 days of CDR records 6 Hours < 30 mins $m's - improved billing and
network utilization
Health - two-billion row patient database 5 Hours ~1 min Able to identify the most effective cost/benefit treatments.
Online Retail - 5.4 billion rows 50 Hours 21 mins Analysis of web site visits to adjust promotions
Grocery Retail - complex market basket analysis report > 3 Days < 4 Hours Understand customer purchasing behavior -
improved operational efficiency and > ARPU

 

Back To Top

For The World's Leading Guide To BI Strategy, Program & Technology


Data Tools Index | MS SQL Server 2008 | Informatica | Teradata MDM | Teradata & Informatica | Vendor Models | DW Platform Requirements | Sybase PowerDesigner

Bookmark and Share
 


BI STRATEGY GUIDE

Proven Pathways to Success with Business Intelligence

IMPROVING BI ROI
BI Strategy Guide

Now Also Available in
EBOOK VERSION

Find out more