The Business Intelligence Guide
   BI Strategy | BI Program | BI Projects | BI Data | BI Infrastructure | BI User Tools | BI Vendors | Resource Guides | Articles | BI Blog | BIG Bookstore

Get a FREE Sample of the
The Total BI Guide

and receive the
Just enter your details below

Business Intelligence
BI Strategy
BI Program Guide
BI Tools
- Dashboards
- Scorecards
- Operational BI
- Analytics
BI Software Solutions
Data Management
Decision Support
Marketing Tools
Industry Solutions
Case Studies
BI Surveys & Awards

About the Author

View Gail La Grouw's profile on LinkedIn

Google+ Gail La Grouw

Bookmark and Share

Putting the Context Around Text Mining

Only 20 percent of corproate data is structured. This means that text only search and mining tools fail to access up to 80 percent of information. Structured data does not contain the value of information that unstructured data contains - therefore there is a strong drive towards improving the ability to access this type of data.

The cloud nature of the Internet offers limited opportunity for collating information. This is constraining the value of data within a number of industries:

For instance, in the pharmacy industry, the most important factor is 'time to market'. Most of the information to develop drugs and bring them to market is available but findingout what compounds are available, what their competitors are developing and what similar drugs are being used for similar illnesses.


Text Mining

Text mining helps to focus on something specific that may require further analyis. It helps identify documents that contain more information on the subject. Ratings can be used to identify the importance of different documents to different subject areas.

Data mining and analysis formed from artificial intelligence in the 1940's. Using these academic products in data minining and text analytic packages to bind applications together. We need to understand the nature of the unstructured data. When you have a known input and a known target - you just need to identify the association. This is quite easy. But to interpret through the many meanings a single term can have, complicates matters considerably.

Without comprehensive and consistent meta data wrapped around all content objects, it is very difficult to form confident relationships between unstructured objects.

Bring unstructured data to the structured world through data mining, guided by meta data.

Using Text Mining In Presidential Elections

Text mining in the blogosphere - for example during presedential elections, monitoring 300-400 blogs to identify the semantics of the public around candidata policies and speeches. This can be used to measure the 'tone' of a document - and align this as to whether it is positive or negative against the brand of the candidate.

Candidates can then be tracked within this environment.

Greater understanding of text allows users to do more with it. Extracted information can be imported into semi-structured applications and used by search engines, relational databases, semantic analytic applications etc.

Modern search engines are developing into highly sophisticated engines, simulating many of the capabilities of text mining tools.

The way data is juxtaposed in relation to other proximal data allows engines to determine with acceptable certainty the 'context' of the subject data.


Approaches to Text Mining

There are different approaches to text mining:

  • Identifying keywords or 'tokens' in data - this misses key information that may append critical data to the extracted piece of information.
  • Natural language processing - more complex, understands phrases and relationships between phrases.

Connection between structured data warehouses and repositories of unstructured data - all the text mining takes place offline, then coded in to applications in a disconnected process. A lot f the unstructured analytics are qualitative rather than qualative.


Semantic Search and Rich Media

Text is still the key driver of search and navigation across the Internet. It is the most efficient and effective method of communicating 'meaning'. However, with the prevalence of rich media on the Web, 'Semantic' search is becoming more important as a way of locating and navigation through the vast amount of audio and video.

Semantic Web adds structure to text by creating an additional layer of meaning around a content object. Computers are still not that competent at discerning context. Even advanced business intelligence tools require user text content to define data relationships and add meaning to analytics information.

However, when text is combined with structure it adds that much needed element to wrap meta data around content - articles, photos, podcasts, videos.

Meta data is 'information about information'. In text form, is provides the 'currency' of the Semantic Web. In time, search engines will rely on rich meta data for content discovery, presentation, contextualization, and ad targeting.

A simplisitc example of meta data used online is Googles PageRank. It provides more information about the information on the web page by interpreting which pages link to each other, and using this data to calculate an authority and popularity of an individual page.

Behavioral targeting and collaborative filtering are other good examples of use of meta data online. The success of behavioral targeting and collaborative filtering depends on two elements:

  • deep knowledge of the user
  • deep knowledge of the content

Unfortunately knowledge of each one, depends upon the other. High-quality content meta data provides more robust behavioral profiles, whether for ad targeting, recommendations, rankings, etc.

The application of semantic search extends beyond the boundaries of the Web, with corporate enterprise knowledge extending into multimedia content. As business intelligence tools seeks to include both structured and unstructured data in their analytic queries, the need for ALL content to have a complete and consistent set of meta data attached is critical to being able to unlock the 'intelligence' contained within the content.

As our business and personal worlds both move towards multimedia at an increasing rate, the Semantic web is escalating in importance. Without key content objects possessing the native capability to present themselves as text, the ability to attach scalable descriptive titles and tags is the beginning of ensuring complete intelligence and media visibility and access in the future.

What gets tagged - gets found!

Back To Top

Find Out About Our Leading Executive Guide To BI Strategy, Program & Technology

BI Tools Index | Advanced Analytics | OLAP | Cube Analysis | Ad Hoc Query Analysis | Data Mining | Alerting | Scorecards | Dashboards | Using A Dashboard | BI in BPM | MS Excel | Text Mining

Bookmark and Share


Proven Pathways to Success with Business Intelligence

BI Strategy Guide

Now Also Available in

Find out more