Thursday, 29 December 2011

Designing the Mining Datamart

The success of a data mining project strongly depends on the breadth and quality of the available data. That’s why the data preparation phase is typically the most time consuming phase of the project.Data mining applications should not be considered as one-off projects but rather as continuous processes, integrated in the organization’s marketing strategy. Data mining has to be ‘operationalized’. Derived results should be made available to marketers to guide them in their everyday marketing activities. They should also be loaded in the organization’s front line systems in order to enable ‘personalized’ customer handling. This approach requires the setting-up of well-organized data mining procedures, designed to serve specific business goals, instead of occasional attempts which just aim to cover sporadic needs.  

In order to achieve this and become a ‘predictive enterprise’ an organization should focus on the data to be mined. Since the goal is to turn data into actionable knowledge, a vital step in this ‘mining quest’ is to build the appropriate data infrastructure. Ad-hoc data extraction and queries which just provide answers to a particular business problem may soon end up to a huge mess of unstructured information.  The proposed approach is to design and build a central mining datamart that will serve as the main data repository for the majority of the data mining applications. All relevant information should be taken into account in the datamart design. Useful information form all available data sources, including internal sources such as transactional, billing and operational systems, and external sources such as market surveys and third party lists, should be collected and consolidated in the datamart framework. After all, this is the main idea of the datamart. To combine all important blocks of information in a central repository that can enable the organization to have a complete a view of each customer.
The mining data mart should:
n        Integrate data from all relevant sources.
n    Provide a complete view of the customer by including all attributes that characterize each customer and his/hers relationship with the organization.
n        Contain pre-processed information, summarized at the minimum level of interest, for instance at a product account or at a customer level. To facilitate data preparation for mining purposes, preliminary aggregations and calculations should be integrated in the building and updating process of the datamart. 
n        Be updated on a regular and frequent basis to contain the current view of the customer.
n      Cover a sufficient time period (enough days or months, depending on the specific situation) so that the relevant data can reveal stable and non-volatile behavioural patterns. 
n     Contain current and historical data so that the view of the customer can be examined in different moments in time. This is necessary since in many data mining projects analysts have to examine historical data and analyze customers before the occurrence of a specific event, for instance before purchasing an additional product or before churning to competition. 
n        Cover the requirements of the majority of the upcoming mining tasks, without the need of additional implementations and interventions from the IT. The designed datamart could not possibly cover all the needs that may arise in the future. After all there is always the possibility of extracting additional data from the original data sources or for preparing the original data in a different way. Its purpose is to provide fast access to commonly used data and to support the most important and the most common mining tasks. There is a thin red line between incorporating too much or too little information. Although there isn’t a rule-of thumb suitable for all situations, it would be useful to have in mind that raw transactional / operational data may provide depth of information but they also slow down performance and complicate the data preparation procedure. On the other end, high level aggregations may depreciate the predictive power hidden in detailed data. In conclusion, the datamart should be designed as simple as possible with the crucial mining operations in mind. Falling into the trap of designing the ‘mother of all datamarts’ will most probably lead to a complicated solution, no simpler than the raw transactional data it was supposed to replace. 

1 comment:

Oliver Jones said...

This piece of writing is really a good one it assists new web viewers, who are wishing for blogging. gmail login