Real-Time Analysis of Telco Data
Marginalised profit from network services is increasing pressure
on communication service providers [CSP’s] to reduce or eliminate
any major threat to these narrow margins. Threats are inherent in
several parts of the organisation:
- Revenue leakage - costing the industry $100
- Interconnect - inaccurate or missed inter-carrier
- Fraud - a $12 billion annual industry problem
- Churn - a multi-billion dollar problem worsened
by the wireless number portability
- Network - inefficient usage and least cost
Latency of visibility of any of these occurrences drives the magnitude
of these threats, hence real time insight to performance is required.
This requires continuous analysis of terabytes of CDR data using
high-end data warehouses and powerful Business
Intelligence (BI) solutions.
Most carriers struggle with a gaggle of of general-purpose data
warehouse solutions to store and analyze the mountains of data created
Large networks and their associated switches, billing systems and
service departments can generate hundreds of millions of individual
CDRs daily. These terabytes of dynamic customer data will continue
to grow exponentially as carriers add new services and as IP-based
This ever-expanding volume of data is straining the performance
capabilities of traditional relational databases, servers and storage
systems that provide the foundation for BI.
To analyze large volumes of records at the CDR level within reasonable
time frames, or at a reasonable cost, using traditional methods
requires sampling and summarising of millions of CDR’s in
an effort to reduce the data load. Even this takes many hours of
processing to analyze aggregated data sets on today’s platforms.
This processing limitation significantly reduces the effectiveness
And in turn, that affects the ability of the business to improve
Very few data warehouse solutions can meet this processing challenge.
Of those which do, the following represent best in class:
Both of these data warehouses are optimized for handling real-time,
terascale analysis of databases at the CDR level.
Telecommunications BI Challenges
Many critical telecommunications functions rely on fast, complex
analysis of CDR data, including:
- CRM - analyzing behavioral data to optimally
target services and reduce churn
- Billing - ensuring complete and accurate billing
- Revenue Assurance - modeling call behavior
- Network Performance - optimizing network operations
using operations management programs.
Each of these functions improves in performance in direct relationship
with improved access to CDR-level data.
Using traditional systems, the cost and time to acquire and manage
large volumes of data rendered the process unviable. Using legacy
servers and RDBMS systems, performing a single complex BI query
against billions of CDR's takes hours or days.
This has prevented adoption of CDR-level analysis and prevents
real-time proactive responses by carriers. In response to this challenge,
most carriers either:
- Summarize or filter the data for analysis
- Create a massive, complex and often custom CDR warehouse to
analyze CDR information.
Neither of these options provide complete information for decision-making.
Call detail records are generated from:
- Telecom networks and associated switches
- Billing systems and
- Service departments
Mainstream BI selects and summarizes data from these disparate
data sources to create data marts for analysis. Summarizing the
data means the CDR level of details is lost.
A Telco may produce 100 - 500 million CDRs per day.
Carriers must use this data to monitor:
- Identify service adoption and usage
- Monitor billing activity
- Drive sales and marketing initiatives
The trade off is either to accept the cost and technical challenge
of storing and accessing this data or not gaining visibility to
calling patterns and the relationships between data using a sampling
and summarizing approach.
For instance, if call volume was to decrease, and service levels
increase, analysing cause and effect is not possible with only a
subset of data. It would be impossible to analyze which event occurred
Further, aggregating data is inflexible from an analytical perspective,
as fixed data
sampling formats are hard to change. If a particular data subset
is skewing the main data set, it cannot be eliminated without programming
changes to the sampling criterion.
This is time-consuming and costly, thereby limiting the value of
One approach to this data challenge that avoids 'sample and store'
is the use of consolidated warehouses to store all the CDR data.
This approach provides key benefits such as:
- Storing data in a single database, rather than in several data
marts improves analysis, and is easier to maintain.
- Allows analysis of trends based on complete historical detailed
- A consolidated warehouse approach provides a high degree of
flexibility, supporting changes to analytical methods as the business
Building a terascale consolidated data warehouse requires:
- Construction of a large network of high-end servers
- Integration of terascale storage systems.
- Development of RDBMS software that can analyze millions or billions
These are costly projects that require large development and maintenance
teams. As data volume grows, along with increasing complexity of
historical, pattern-analysis queries and growth in the user community,
many warehouses are unable to keep up with business demand.
Data Warehouse Appliances
Using a data warehouse appliance, such as Netezza solves the terascale
data warehouse challenge without compromising on performance, supporting
complex BI queries against billions of CDRs within minutes or seconds,
versus hours or days.
A fully-integrated data warehouse appliance consists of:
- A host computer
- Arrays of hot-swappable mirrored storage
- Custom chips and network switches that act as a powerful unit
to manage data flows and process queries at the disk level.
Using Massively Parallel Processing architecture, systems such
ADW or Netessa
NPS are specifically designed for high performance and scalability.
Their performance capability dramatically reduces the latency of
complex BI analysis, with data continuously loaded even during query
performance. This means data is always completely current. More
Back To Top