Putting the Context Around Text Mining
Only 20 percent of corproate data is structured. This means that
text only search and mining tools fail to access up to 80 percent
of information. Structured data does not contain the value of information
that unstructured data contains - therefore there is a strong drive
towards improving the ability to access this type of data.
The cloud nature of the Internet offers limited opportunity for
collating information. This is constraining the value of data within
a number of industries:
For instance, in the pharmacy industry, the most important factor
is 'time to market'. Most of the information to develop drugs and
bring them to market is available but findingout what compounds
are available, what their competitors are developing and what similar
drugs are being used for similar illnesses.
Text Mining
Text mining helps to focus on something specific that may require
further analyis. It helps identify documents that contain more information
on the subject. Ratings can be used to identify the importance of
different documents to different subject areas.
Data mining and analysis formed from artificial intelligence in
the 1940's. Using these academic products in data minining and text
analytic packages to bind applications together. We need to understand
the nature of the unstructured data. When you have a known input
and a known target - you just need to identify the association.
This is quite easy. But to interpret through the many meanings a
single term can have, complicates matters considerably.
Without comprehensive and consistent meta data wrapped around all
content objects, it is very difficult to form confident relationships
between unstructured objects.
Bring unstructured data to the structured world through data mining,
guided by meta data.
Using Text Mining In Presidential Elections
Text mining in the blogosphere - for example during presedential
elections, monitoring 300-400 blogs to identify the semantics of
the public around candidata policies and speeches. This can be used
to measure the 'tone' of a document - and align this as to whether
it is positive or negative against the brand of the candidate.
Candidates can then be tracked within this environment.
Greater understanding of text allows users to do more with it.
Extracted information can be imported into semi-structured applications
and used by search engines, relational databases, semantic analytic
applications etc.
Modern search engines are developing into highly sophisticated
engines, simulating many of the capabilities of text mining tools.
The way data is juxtaposed in relation to other proximal data allows
engines to determine with acceptable certainty the 'context' of
the subject data.
Approaches to Text Mining
There are different approaches to text mining:
- Identifying keywords or 'tokens' in data - this misses key information
that may append critical data to the extracted piece of information.
- Natural language processing - more complex, understands phrases
and relationships between phrases.
Connection between structured data warehouses and repositories
of unstructured data - all the text mining takes place offline,
then coded in to applications in a disconnected process. A lot f
the unstructured analytics are qualitative rather than qualative.
Semantic Search and Rich Media
Text is still the key driver of search and navigation across the
Internet. It is the most efficient and effective method of communicating
'meaning'. However, with the prevalence of rich media on the Web,
'Semantic' search is becoming more important as a way of locating
and navigation through the vast amount of audio and video.
Semantic Web adds structure to text by creating an additional layer
of meaning around a content object. Computers are still not that
competent at discerning context. Even advanced business intelligence
tools require user text content to define data relationships and
add meaning to analytics information.
However, when text is combined with structure it adds that much
needed element to wrap meta data around content - articles, photos,
podcasts, videos.
Meta data is 'information about information'. In text form, is
provides the 'currency' of the Semantic Web. In time, search engines
will rely on rich meta data for content discovery, presentation,
contextualization, and ad targeting.
A simplisitc example of meta data used online is Googles PageRank.
It provides more information about the information on the web page
by interpreting which pages link to each other, and using this data
to calculate an authority and popularity of an individual page.
Behavioral targeting and collaborative filtering are other good
examples of use of meta data online. The success of behavioral targeting
and collaborative filtering depends on two elements:
- deep knowledge of the user
- deep knowledge of the content
Unfortunately knowledge of each one, depends upon the other. High-quality
content meta data provides more robust behavioral profiles, whether
for ad targeting, recommendations, rankings, etc.
The application of semantic search extends beyond the boundaries
of the Web, with corporate enterprise knowledge extending into multimedia
content. As business intelligence tools seeks to include both structured
and unstructured data in their analytic queries, the need for ALL
content to have a complete and consistent set of meta data attached
is critical to being able to unlock the 'intelligence' contained
within the content.
As our business and personal worlds both move towards multimedia
at an increasing rate, the Semantic web is escalating in importance.
Without key content objects possessing the native capability to
present themselves as text, the ability to attach scalable descriptive
titles and tags is the beginning of ensuring complete intelligence
and media visibility and access in the future.
What gets tagged - gets found!
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
BI Tools Index | OLAP
| Scorecards | Dashboards
| Using A Dashboard | BI
in BPM | MS Excel | Text
Mining
|