Data Cleansing
Prior to data integration, a
process of data cleansing is required. This is the process of detecting
and correcting [or deleting] corrupt or inaccurate records from
a record set.
Data inconsistencies may arise when:
- Different data dictionary definitions of similar entities are
used in different data stores.
- User entry errors
- Data corruption during transmission or storage.
Data cleansing is also known as data scrubbing.
Data Cleansing Process
The data cleansing process involves:
- Removing typos
- Validating** data defintions against that of the destination
data warehouse. The validation may be strict [rejecting any item
that does not have a valid value] or fuzzy [correcting records
that partially match existing, known records].
- Correcting values against a known list of entities.
Data cleansing is synonymous with the less frequently-used term
data scrubbing.
**Data cleansing differs from data validation in that validation
commonly means data is rejected from the system at entry and is
performed at entry time, rather than on batches of data.
Next: Data Normalization
Back To Top
For
The World's Leading Guide To BI Strategy, Program & Technology
Data Index | Data Defintion
| Meta Data | Data
Management | MDM | Data
Governance | Data Cleansing | Normalization
| Data Integration | Data
Growth | Data Solutions
|