Case Study

Data cleansing

A leading energy company collected and integrated customer data from a number of independent systems as part of its CRM application modernization process. The success of the endeavour hinged upon the quality and distinctiveness of the data, so the company sought a reliable partner to assist with the cleansing and unification of the customer dataset.

Challenge

As one of the oldest energy companies, our customer possesses a customer database that spans generations. However, this extensive timeline has led to challenges such as duplicate information, missing or deprecated entries, misspelled data, un-normalized records, and even incorrect values.

Solution

We utilized a diverse range of text analytics techniques to validate, complete, normalize, standardize, and cleanse the customer dataset. Example fields processed and corresponding actions taken include:

  1. Person Customer Name Fields: The team recognized the customer type (person or organization) and normalized the customer names by distinguishing between Name, Surname, and Father Name.

  2. Address Fields: The address fields were mapped to a reference dataset containing comprehensive, normalized, and geopositioned Greek addresses.

  3. Telephone Fields: The telephone fields were cleared of noise characters (e.g., extra spaces), completed with area codes, corrected for old number formats, and categorized into mobile and fixed phone numbers.

  4. Person IDs: The team cleared the Person ID fields of noise, identified the type of ID (e.g., Passport ID, Person ID, Army ID), and extracted and corrected the values based on contextual information.

  5. Email Fields: The email values were extracted from the context, incorrect domains were corrected (e.g., XXX@gmail.con → XXX@gmail.com ), and missing parts were filled (e.g., XXX@gmail. → XXX@gmail.com ).

Results

The cleansing process was designed to be applied periodically or on demand. To facilitate this, we implemented a pipeline process consisting of 14 tasks that are activated through a web service.

These tasks encompass loading and unloading data in databases, transferring data between databases, initiating the cleansing processes, logging progress and final status in cloud database tables, and triggering the unification process within IBM InfoSphere Information Server.


call to action image

Facing a challenge?

Equip your team with the Neurocom catalyst and conquer any software engineering challenge with confidence.

Contact Us