Data-Centric AI to Improve Churn Prediction with Synthetic Data

2023 3rd International Conference on Computer, Control and Robotics (ICCCR)

Churn prediction is a critical operation for many businesses because acquiring new clients often costs more than retaining existing ones. Therefore, being able to detect customer churn early and take marketing actions based on artificial intelligence (AI) systems is imperative for businesses. In general, algorithms and data are two integral components of any AI system. Therefore, the research of AI systems can be classified into two groups: model-centric AI and data-centric AI. While model-centric AI is developed to improve the performance of specific models, data-centric AI aims to improve the quality of the data for downstream machine learning tasks. During model development, while the model-centric approach is to use the same data and iterate on the model hyper-parameters, architecture, and other configurations, the data-centric approach is to improve existing data or integrate new data, then train and evaluate the machine learning algorithms. To the best of our knowledge, no previous research attempts to make a comparative study from a data-centric AI perspective for churn prediction. Therefore, to fill the gap, this study presents a comparative study of the most widely used data synthesis algorithms with different data strategies on the problem of churn prediction. The main focus of this study is to investigate whether we can improve churn prediction by substituting, balancing, and augmenting real data with data synthesis. The main goal of this study is to analyse and benchmark the best data-centric resampling methods for churn prediction. We expect that our study will shed some light on future projects and in other domains.