Data To Query Processing
Traditional database platforms operate by:
- Reading data off disk
- Bringing it across an I/O interconnection
- Loading it into memory for processing.
Data normally flows smoothly for routine OLTP applications [processing
invoices or looking up customer records] that focus on individual
data elements.
Moving data across multiple backplanes and I/O channels performs
very poorly when:
- The amount of data to be queried is very large
- The query involves complex joins requiring multi-phase processing
- Data is changing rapidly
Unlike OLTP, data warehousing is concerned with moving large quantities
of data through the system’s analysis and processing engine
as efficiently as possible, with a minimum of internal thrashing.
Where OLTP systems might be optimized to reduce data latency, data
warehouse systems are typically much more concerned with data throughput.
As a result, bringing OLTP-optimized system designs to build a data
warehouse results in greatly reduced performance.
A complex join or other complex query may require a number of processing
steps. Delivering multiple enormous tables (billions of rows) off
disk, across the network and into memory for processing by the DBMS
– all to perform one step is just not possible in this scenario.
All large-scale legacy data warehouse systems share these fundamental
limitations to performance becoming worse with more users, data
and queries.
To improve performance, DBMS systems typically employ complex schemas,
indices,
aggregates and advanced partitions in an attempt to limit the amount
of data required for movement and analysis within the architecture.
This still results in mostly unsatisfactory performance.
SMP and MPP
Enterprise data warehouses use multiprocessing architectures,
either:
SMP
SMP systems have several processors, each with its own memory cache,
forming a pool of computing resource. Threads of code are automatically
distributed, using load balancing, by the operating system for execution.
Resources such as memory and the I/O system are equally accessible
to each of the processors.
SMP architecture is limited in its ability to move large amounts
of data as required in data warehousing and business intelligence
applications.
More on Symmetrical
Multiprocessing
MPP
MPP systems consist of very large numbers of loosely coupled processors,
each with its own memory, backplane and storage, and operating system.
The nothing-shared
'nodes' allow near linear scalability, and software is parallelizable.
The key advantages of MPP are the high performance and high availability.
If one node fails, another can take over.
Hybrid SMP / MPP
Pure MPP systems are very high cost, due to additional memory
and I/O components, as
well as the administrative challenges in setting up and managing
many semi-independent systems.
Typical MPP systems are implemented virtually in clusters of SMPs,
often with some sharing of I/O resources. This preserves some of
the performance and scalability advantages of MPP while reducing
costs and administration time.
Limits of SMP Processing
Large SMP systems with additional processors and shared memory
to deliver much higher computing power, and large-scale SMP systems
are frequently deployed for
data warehousing. Dozens of processors are sharing memory and storage.
When data volumes rise, systems based on SMP architectures tend
to outgrow their memory, backplane and I/O resources. As processors
take turns accessing massive amounts of data in memory, the memory
bus becomes a bottleneck that results in poor performance. Whilst
the number of processors and memory can be increased, SMP processing
is contrained due to:
- Memory bus bandwidth is limited and becomes saturated.
- I/O bus bandwidth is also limited, and can become congested
as the amount of data sent from the storage area network to process
a query increases.
SMP systems suffer from less-than-linear scalability and a progressive
decline in performance as the system grows.
MMP on Clustered SMP
An alternative architecture consists of small SMP clusters operating
in parallel, sharing a storage area network and management structure.
Each CPU within an SMP node shares RAM and access to the storage
network over a shared I/O bus.
Vendors offering database solutions using this approach include:
- Teradata
- IBM DB2 Integrated Cluster Environment [ICE]
The constraints of this approach include a bottleneck on resources
sharing that limits performance and scalability. SMP nodes contend
for access to storage over a common I/O
bus.
This architecture is intended for traditional database applications
– not for pushing enormous amounts of data through a shared
pipeline. Even with separate I/O paths contraints on data flowing
from storage to an SMP cluster for processing are inherent.
“Shared Nothing” MPP
In a “shared-nothing” architecture, processor-disk
pairs operating in parallel divide the workload to execute queries
over large sets of data.
The best vendor solution following this approach is provided by
Teradata.
Each processor communicates with its associated disk drive to
get raw data and perform calculations.
One processor is assigned to collect the intermediate results and
assemble the query response for delivery back to the requesting
application.
Scalability
As there is no contention for resources between MPP nodes, this
architecture is scalable to terascale database sizes.
Interconnect Constraint
BI queries require significant movement of data from disks to processors.
Whilst the processor-disk pairs operate independently, they typically
share a proprietary common interconnect. Response times suffer when
this becomes clogged with traffic.
Overhead
A 64K block of data may be moved to provide only 1K to respond
to the SQL statement, the balance is overhead – unrelated
data, project columns, join columns and other extraneous material
that is wrapped around the relevant data and has to be filtered
out by the processor.
When MPP architectures are used for large-scale query processing,
the internal backplanes, busses and I/O connections between processor
and storage cannot handle the amount of traffic.
Data transfer speeds decline with growing data volumes, limiting
scalability. Such diminishing returns limit the advantage of abundant
storage capacity.
Vendor Example - Teradata Datawarehouses
Performance limitations are further compounded in systems based
on legacy components. Teradata uses its own generic servers, with
storage provided by third parties. To gain maximum performance out
of this older system architecture, the solution relies heavily on
system tuning through complex choices of primary and secondary indexes
and
table denormalization as well as space allocation.
However, the relationship between complex indexing and query speed
is difficult to optimize, and highly dependent on the data and query.
As a result, indices are often mis-configured, increasing query
response time instead of decreasing it. For this reason, Teradata
DW systems take a lot of effort to index and tune.
Separate Data, Shared Storage
Processors operating in parallel share the same storage media,
which is partitioned
so that processors don’t contend for the same data.
Systems sharing common storage are less costly than a shared-nothing
architecture, where each processor has its own dedicated storage
device.
However, performance suffers from the classical disadvantage of
MPP architectures: system resources are overwhelmed as query data
is transferred from disk to the processors. In addition, the common
storage leads to scalability issues as data volume grows.
Vendor Example - IBM DB2
IBM DB2 database management system is a hardware-independent solution,
consisting of a core database supported by a variety of server and
storage options.
The cost of this independence is a range of supported products
creating a complex mix of choices when designing a solution. For
example:
Storage options - SANs or direct attached storage, requiring customers
to understand
the I/O dependency of their warehouse on completely different types
of storage architectures.
Elaborate tools - to tune, optimze and manage the database and
its network of storage and processors for acceptable query performance.
Heavy administrative workload can be expected, with complexity
growing as SMP cluster nodes are added.
Indexing
To compensate for performance constraints, indexing can be used
to limit the amount of data examined in a query. DB2 provides a
number of indexing strategies for warehouse applications, the combination
of partitioning and indexing across distributed nodes makes configuration
and loading considerably more complex.
The range of tuning and management challenges of this configuration
can significantly increase the TCO and destroy the cost-benefit
equation.
Shared Data and Storage
Multiple processors operating in parallel access shared data residing
on a common
storage system.
Lock Manager
A lock manager is used to prevent simultaneous access to the same
data by multiple
query processes. Access to shared data is coordinated via messaging
between processes and the lock manager.
Vendor Example - Oracle 9i or 10g Real Application Cluster [RAC]
relational database management system.
Shared data architecture supposes that partitioning strategies
do not adversely affect query performance. However, like the two
previous designs, this approach requires the transfer of massive
amounts of data from storage to processors. Shared data architecture
contention issues further limit performance and
scalability.
The locking and caching mechanisms used by Oracle to prevent processor
contention impose a ceiling on the data scalability of its RAC.
Oracle recommends users deploy partitions and indices to help improve
query performance, thus eliminating the intended simplicity of shared
data architecture. Indexing schemes and their interactions with
partitioning greatly increase set-up and maintenance complexity.
As with DB2, the Oracle 9i and 10g RAC solutions are hardware
and operating system-independent, capable of running on a variety
of servers. This can add weeks of installation assembly, testing,
debugging and fine-tuning of system parameters.
Blade Servers
Blade servers provide high-density computing, with enormous computing
power in a compact frame. They are a tightly integrated, consolidated
infrastructure, with a common management framework providing control
as a single virtual system.
Each blade has its own collection of processors, memory and I/O
capability. Dozens of blades are installed in a single chassis sharing
storage, network and power resources. Typically, each blade functions
as an SMP cluster of processors and shared RAM, within a matrix
of blades operating in parallel, resembling a tightly consolidated
version of the MPP on clustered SMP.
A number of vendors offer data warehouse solutions based on blade
technology.
I/O Processing Bottleneck
Each blade communicates over the system midplane to a storage area
network – a route shared with all the other blades in the
rack. Blades also suffer bottlenecks, where massive amounts of data
have to be delivered from storage to processors over a common I/O
pathway.
Deploying legacy software may exacerbate the bottleneck issues.
Whilst 1G blade servers provide a better performing computing
platform than legacy SMP and clustered-SMP/MPP, they still suffer
from the same inefficiencies and complexity of traditional data
warehouse solutions.
Vendor Example - “Google Search Appliance,”
A custom-designed blade server developed to enable ultra-high speed
content searches for the enterprise.
Data Warehouse Appliance
To overcome the challenges inherent in above models of data warehousing
required to fresh approach. One such solution, developed by Netezza
[NPS] eliminated the barriers to performance of traditional systems.
The NPS system is a data warehouse appliance specifically built
to enable real-time business intelligence and analytics on terabytes
of data.
The NPS system combines server, storage and database in a single
scalable platform based on open standards and commodity components.
The architecture provides significant performance advantage –
ten to fifty times faster than other data warehousing systems.
By leveraging commodity components it delivers a huge cost advantage,
at around half the cost of competitive systems. This approach also
eliminates the high operating costs of general-purpose systems adapted
for data warehousing. Rapid implementation takes hours, not weeks
- there is no need for intensive database tuning, administration
and system management.
Data Flow - Bringing the Query to the Data
The critical difference in the NPS appliance is based upon two
guiding principles:
Asymmetric Massively Parallel Processing [AMPP]
- provides both performance and scalability by using both SMP and
MPP as best suited to meet the specific needs of BI applications.
Intelligent Query Streaming Technology - moving
processing intelligence to a record stream adjacent to storage produces
much better performance and scalability than the traditional approach
of moving sets of records to a processor.
The combined approach results in very high real-time performance
and scalability at a fraction of the cost of other vendor systems.
AMPP
Netezza’s AMPP architecture is a two-tiered system:
Tier One - A high-performance Linux SMP Host
- Compiles queries received from BI applications
- Generates query execution plans
- Divides a query into a sequence of sub-tasks, or snippets, that
can be executed in parallel
- Distributes the snippets to the second tier for execution
- Returns the final results to the requesting application
Tier Two - High number of Snippet Processing Units [SPUs]
Operating in parallel, each SPU is an intelligent query processing
and storage node that consists of:
- A powerful commodity processor
- Dedicated memory
- Disk drive
- Field-programmable disk controller with hard-wired logic to
manage data flows and process queries at the disk level
The massively parallel, shared-nothing SPU blades provide the performance
advantage of MPP. The highly autonomous SPUs perform:
- Nearly all query processing - each SPU operating on its portion
of the database.
- Parallel processing operations such as record operations, parsing,
filtering, projecting, interlocking and logging
- Operations on sets of intermediate results - such as sorts,
joins and aggregates [can also be done on the host, depending
on processing cost]
- Their own scheduling, storage management, transaction management,
concurrency control and replication - reduces coordination requirements
on the host and time-consuming maintenance tasks
This significantly reduces the amount of data required to be moved
within the system, overcoming the major constraint of all previous
models - I/O Interconnection.
The real strength of the Netezza solution is throughput performance
optimization.
Intelligent Query Streaming Technology
Intelligent Query Streaming technology greatly reduces the data
traffic among SPU nodes, and between SPU nodes and the SMP host.
Data flow is streamlined by placing silicon processors right next
to the storage
device. Rather than moving data into memory or across the network
for processing, the technology intelligently filters records as
they stream off the disk, delivering only the relevant information
for each query.
By performing this first level processing right at the disk, Netezza
is at least ten times faster than conventional systems, with disk
access speed providing the only limiting factor.
Intelligent Query Streaming is performed on each SPU by a Field-Programmable
Gate Array (FPGA) chip
that functions as the disk controller, and is also capable of basic
processing as data is read off the
disk. The system is able to run critical database query functions
such as parsing, filtering and
projecting at full disk reading speed, while maintaining full ACID
(Atomicity, Consistency, Isolation, and
Durability) transactional operations of the database. Data flows
from disk to memory in a single
laminar stream, rather than as a series of disjointed steps that
require materializing partial results.
With the Netezza approach, the pathways used by traditional architectures
to deliver data to the host
are streamlined and shortened. Because the SQL is “understood”
by the disk drive in a Netezza
system, there is far less reliance on CPUs, data modeling or bandwidth
for performance:
• The storage interconnect, a bottleneck on traditional systems,
is eliminated by direct attached
storage – data streams off the SPU disk and straight into
the FPGA for initial query filtering.
• Intermediate query tasks are performed in parallel on the
SPUs, where streaming processing sharply
reduces CPU workload.
• The gigabit Ethernet network connecting SPUs to the host
and each other is used only for
transmitting intermediate results, rather than massive amounts of
raw data. Network traffic is
reduced by approximately two orders of magnitude.
• The I/O bus and memory bus on the host computer are used
only for assembling final results,
eliminating previous congestion.
Key Differences and the Netezza Advantage
The Netezza Performance Server appliance offers several fundamental
advantages over traditional data warehouse architectures:
Data Flow
The AMPP architecture applies elements of SMP and MPP with most
processing handled by the massively parallel snippet processing
units, as early in the data flow as possible. This approach of “bringing
the query to the data” eliminates extraneous traffic and resulting
delays.
Traditional systems: SMP and MPP architectures developed for general-purpose
systems
(including blade servers) are based on moving data from storage
to the processors (“bringing the
data to the query”). When performing BI queries of massive
databases, the flood of data creates
bottlenecks that result in slow (and often unacceptable) response
times.
Storage Connection
NPS - Intelligent Query Streaming technology filters out unnecessary
information as data streams off the disk, greatly reducing the processing
burden downstream. There
are no storage interconnects in the traditional sense – the
disk controller handling the initial processing is hard-wired to
the disk drive. System performance is limited only by disk speed
(the NPS system runs at “physics speed”)
Traditional systems - Storage system interconnections simply function
as a conduit to deliver data from storage to its associated processor.
System performance is limited by the capacity of the I/O bus.
Degree of Integration
NPS - Server, storage and DBMS are integrated in a compact, efficient
unit designed
specifically for data warehousing. The system installs in hours,
not weeks, and deploys quickly with no need for indexing, tuning,
physical modeling or other time-consuming tasks. There is one vendor
to manage, and none of the unnecessary components, awkward cabling
or conflicting parameters that traditionally cause problems with
patchwork solutions.
Traditional systems - Patchwork solutions based on general-purpose
products mean a myriad of headaches, including multiple vendors
to manage, lengthy and difficult implementations, complex tuning,
lower reliability, higher power requirements and extra floor space.
Performance
By “bringing the query to the data,” the NPS appliance
delivers at least an order of magnitude performance improvement
for BI applications analyzing terabytes of data. Traditional delays
are eliminated – analyses that previously took hours now take
just seconds. Even “queries from hell” to uncover deeply
buried patterns are handled with ease.
Low Acquisition and Operating Costs
As a purpose-built appliance, the purchase price of the NPS appliance
is significantly lower than competing systems. Cost savings are
even more attractive over the long term. While the care and feeding
of traditional systems often requires several highly paid DBAs or
system administrators, an NPS system supporting tens of terabytes
is usually managed by a part-time administrator. Instead of partitioning
table spaces, designing indices and performing all the other optimization
tasks previously required, DBAs can devote their time to developing
business-critical analyses that help their companies succeed.
Linear Scalability
While I/O bottlenecks are commonplace as general-purpose systems
scale to accommodate complex
queries, additional arrays of snippet processing units can be added
to the NPS system without
impacting performance. This is because query processing using the
NPS architecture involves a
minute fraction of the data traffic associated with traditional
systems, and because storage and
processing are tightly coupled into a single unit. The autonomy
of the SPUs creates further conditions
for a highly scalable system, allowing SPUs to be added without
worrying about coordination with
other units. As a result, growing data volumes can be planned for
and accommodated without the
sudden, unexpected need for costly purchases.
NPS Query Processing Examples
| Query |
|
|
Business Benefit |
| Telco - 120 days of CDR records |
6 Hours |
< 30 mins |
$m's - improved billing and
network utilization |
| Health - two-billion row patient database |
5 Hours |
~1 min |
Able to identify the most effective cost/benefit
treatments. |
| Online Retail - 5.4 billion rows |
50 Hours |
21 mins |
Analysis of web site visits to adjust
promotions |
| Grocery Retail - complex market basket
analysis report |
> 3 Days |
< 4 Hours |
Understand customer purchasing behavior
-
improved operational efficiency and > ARPU |
Source: Netezza Performance Server® Appliance:
An Architectural Comparison [2005]
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
Data Tools Index | MS
SQL Server 2008 | Informatica | Teradata
MDM | Teradata & Informatica
| Vendor Models | DW
Platform Requirements | Sybase PowerDesigner
|