CaaTS       

links

K Hima Prasad photo L V (Venkat) Subramaniam photo

CaaTS - overview


Data Cleansing as a Transient Service

Several customers find that they have data that is poorly capturing address information, name information, product classification, etc. CaaTS offers solutions for cleansing noisy data and improving data quality.

Noisy Data



Real-world data is noisy. Data error rates vary widely between approximately 0.5%-30%, with 1%-5% being very common. Noisy data results in inaccurate reporting, poor customer service and bad decision making. Small errors can result in big problems: A wrong address in the database can result in delays or incorrect shipment of a product; billing discrepancies results in the wrong amount being billed to a customer. These errors result in poor customer satisfaction, increased churn and eventually loss of revenue. Infact statistics reveal that poor data costs billions of dollars to businesses.

Noisy data results from poor data entry methods, lack of information standards, spelling variations, hyphenation, abbreviations:
Apt # I-344 | Sarojini Nagar | N Delhi | 23
344 I Street | Sarojani Ngr | New Delhi | 110023
344 Block I | Nr. S. N. Market | New Delhi | 110023

Data errors:
Apt # I-344 | Sarojini Nagar | N Delhi | 11002

Information buried in free-form fields:
WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH
WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES

What CAATS Delivers



High accuracy data cleansing:
* Leverages structure inherent in data, e. g. addresses have an underlying taxonomy structure
* Ripple down rules framework for easy rule management and high accuracy

Data investigation methods:
* Methods to find synonyms and variants of data fields
* Methods to estimate data quality without manual tagging

Data Cleansing as a transient service on the cloud to manage scalability and isolation of multiple instances:
* Cleansing software instances that can be stored and brought up as required
* Different data access methods to access/update client data
* Optimal resource allocation

The Team: Snigdha Chaturvedi, Tanveer A Faruquie, Hima P Karanam, Mukesh K Mohania, Mrinmaya Sachan, L Venkata Subramaniam

CaaTS Poster
CaaTS chart at the IBM Investor briefing