The Business Intelligence Guide
   BI Strategy | BI Program | BI Projects | BI Data | BI Infrastructure | BI User Tools | BI Vendors | Articles | BI Blog
HOME
 
Business Intelligence
BI Definition
BI Evolution
Updates In BI
 
BI Strategy
Drivers of BI
BI Lifecycle
Setting BI Strategy
BI Strategy Doc
BI Scorecard
BI Guiding Principles
 
BI Programs
BI Governance
BI Program
BI Roadmap
BI Roles
Barriers To BI
 
BI Tools
About BI Tools
OLAP
Scorecards
Dashboards
BI Tools and BPM
Text Mining
 
BI Solutions
BI Software
BI Solution Comparison
BI Vendor Updates
CRM & BI
 
Data
About Data
Data Definition
Data Management
Data Governance
MDM
Metadata
Data Cleansing
Data Integration
 
Databases
About Databases
Data Warehouses
Data Marts
Microsoft SQL
Oracle OODBMS
Contextual Databases
Development Platforms
 
DW Solutions

DW Appliances

Netezza PS
Datallegro
Teradata ADW
 
Industry Solutions
Airline
Health
Retail
Telecommunications
 
Case Studies
BI Case Study Index
Govt Planning Office
Manufacturing Co
Port Logistics
Postal Logistics
Telco Customer Churn
 
RESOURCES
ARTICLES
NEWS
Sitemap

 

Putting the Context Around Text Mining



Only 20 percent of corproate data is structured. This means that text only search and mining tools fail to access up to 80 percent of information. Structured data does not contain the value of information that unstructured data contains - therefore there is a strong drive towards improving the ability to access this type of data.

The cloud nature of the Internet offers limited opportunity for collating information. This is constraining the value of data within a number of industries:

For instance, in the pharmacy industry, the most important factor is 'time to market'. Most of the information to develop drugs and bring them to market is available but findingout what compounds are available, what their competitors are developing and what similar drugs are being used for similar illnesses.

 

Text Mining

Text mining helps to focus on something specific that may require further analyis. It helps identify documents that contain more information on the subject. Ratings can be used to identify the importance of different documents to different subject areas.

Data mining and analysis formed from artificial intelligence in the 1940's. Using these academic products in data minining and text analytic packages to bind applications together. We need to understand the nature of the unstructured data. When you have a known input and a known target - you just need to identify the association. This is quite easy. But to interpret through the many meanings a single term can have, complicates matters considerably.

Without comprehensive and consistent meta data wrapped around all content objects, it is very difficult to form confident relationships between unstructured objects.

Bring unstructured data to the structured world through data mining, guided by meta data.

Using Text Mining In Presidential Elections

Text mining in the blogosphere - for example during presedential elections, monitoring 300-400 blogs to identify the semantics of the public around candidata policies and speeches. This can be used to measure the 'tone' of a document - and align this as to whether it is positive or negative against the brand of the candidate.

Candidates can then be tracked within this environment.

Greater understanding of text allows users to do more with it. Extracted information can be imported into semi-structured applications and used by search engines, relational databases, semantic analytic applications etc.

Modern search engines are developing into highly sophisticated engines, simulating many of the capabilities of text mining tools.

The way data is juxtaposed in relation to other proximal data allows engines to determine with acceptable certainty the 'context' of the subject data.

 

Approaches to Text Mining

There are different approaches to text mining:

  • Identifying keywords or 'tokens' in data - this misses key information that may append critical data to the extracted piece of information.
  • Natural language processing - more complex, understands phrases and relationships between phrases.

Connection between structured data warehouses and repositories of unstructured data - all the text mining takes place offline, then coded in to applications in a disconnected process. A lot f the unstructured analytics are qualitative rather than qualative.

 

Semantic Search and Rich Media

Text is still the key driver of search and navigation across the Internet. It is the most efficient and effective method of communicating 'meaning'. However, with the prevalence of rich media on the Web, 'Semantic' search is becoming more important as a way of locating and navigation through the vast amount of audio and video.

Semantic Web adds structure to text by creating an additional layer of meaning around a content object. Computers are still not that competent at discerning context. Even advanced business intelligence tools require user text content to define data relationships and add meaning to analytics information.

However, when text is combined with structure it adds that much needed element to wrap meta data around content - articles, photos, podcasts, videos.

Meta data is 'information about information'. In text form, is provides the 'currency' of the Semantic Web. In time, search engines will rely on rich meta data for content discovery, presentation, contextualization, and ad targeting.

A simplisitc example of meta data used online is Googles PageRank. It provides more information about the information on the web page by interpreting which pages link to each other, and using this data to calculate an authority and popularity of an individual page.

Behavioral targeting and collaborative filtering are other good examples of use of meta data online. The success of behavioral targeting and collaborative filtering depends on two elements:

  • deep knowledge of the user
  • deep knowledge of the content

Unfortunately knowledge of each one, depends upon the other. High-quality content meta data provides more robust behavioral profiles, whether for ad targeting, recommendations, rankings, etc.

The application of semantic search extends beyond the boundaries of the Web, with corporate enterprise knowledge extending into multimedia content. As business intelligence tools seeks to include both structured and unstructured data in their analytic queries, the need for ALL content to have a complete and consistent set of meta data attached is critical to being able to unlock the 'intelligence' contained within the content.

As our business and personal worlds both move towards multimedia at an increasing rate, the Semantic web is escalating in importance. Without key content objects possessing the native capability to present themselves as text, the ability to attach scalable descriptive titles and tags is the beginning of ensuring complete intelligence and media visibility and access in the future.

What gets tagged - gets found!

Back To Top

For The World's Leading Guide To BI Strategy, Program & Technology


BI Tools Index | OLAP | Scorecards | Dashboards | Using A Dashboard | BI in BPM | MS Excel | Text Mining

 


NOW AVAILABLE!

The Logical Organization
A Strategic Guide To Corporate Performance Using Business Intelligence

THE ULTIMATE BI REFERENCE
FOR MANAGERS & CONSULTANTS

The Logical Organization Book Cover



Feature Articles

Using BI To Drive Corporate Performance

Pervasive BI - The Next Step in BI Excellence

The Executive Guide to BI Tools and Solutions

The Executive Guide To Understanding Corporate Data

Using Business Intelligence To Power Boost Corporate Performance