The above plot is a Seaborn count plot which is used to count the number of nested variables. In order to do that we plot a count plot. We want to plot a count of customers present in each country and customer churned from the same country. We use Matplotlib and Seaborn libraries in Python to visualize the data and to find a trend in the behavior of the customers.
Every plot tells us it’s story if we plot it properly.
Using that you can understand that Germany has a maximum number of churn percentage which means the maximum number of customer churn are from Germany then Spain, France.Īnything in the world can be understood better if we visualize or picture it better. I have written a customized function that returns the customer churn percentage with respect to each country. We have to do an analysis by taking this important feature into consideration. So here we have to consider Exited feature as a targeted feature because this feature tells us about customers leaving or not. 1 means the customer is churned or has left the bank, 0 means the customer is still associated with the bank. Here the Exited column contains 1 or 0 values. CreditScore Age, Tenure, Balance and EstimatedSalary are Numerical variables Geography, Gender, NumOfProducts, HasCrCard, and IsActiveMember are Categorical variables and the remaining variables i.e. Various columns of the dataset like Customer ID, Credit Score, Geography, etc are shown above.Ī quick look at the dataset tells us that the columns RowNumber, CustomerId, and Surname will not make any difference on the customer’s decision to leave the bank. The data frame is a datatype of the pandas in simple language we can say it is a tabular representation of data. The file which we read here is stored in the data frame. It shows the top five observations of the data as follows. head() which is a function of the pandas library. It’s an alias name for the libraries.ĭescriptive statistics help us to get insights from the data that are not obvious. You may be wondering what pd, plt and sns are.
In your terminal type in the following command to install the Jupyter notebook –Īfter installation, to use the libraries import them by typing in the following in your Jupyter notebook. We will be using Jupyter notebook IDE for EDA. Open the terminal and type in the following commands.
The bank wanted to get insights about customer churning so bank can upgrade or adapt new policies.īefore starting we need to install the libraries we are going to work with. 10000 customers were selected randomly from three countries – France, Germany, Spain. To tackle this alarming situation, the bank decided to collect data for the past 6 months. The bank has noticed an increase in the number of customers leaving the bank. In this blog, I am going to show you the process of EDA through analysis using python libraries like pandas, seaborn, Matplotlib.īank Customer Churn dataset is available hereīefore we proceed to the solution, we will understand the problem statement and its goal. The standard definition of EDA is – The process of visualizing and analyzing the data to extract insights and understand the dataset in a better way. EDA is a methodology where we visualize the data using different charts & graphs and they provide an affirmation to our hypothesis. Because your view will remain a conjecture unless it has a firm base. I have used the word ‘conjecture’ and not ‘fact’ intentionally. We often make assumptions about a business and figure out a few conjectures. In data analysis, there is a term – Exploratory data analysis (EDA). A similar industrial revolution is happening in the 21st century because of data and Data Analysis is a key aspect of this revolution. Yes, in the 19th century the industrial revolution happened because of oil. Y ou’ve probably heard “Data is the new oil”. “Torture the data, and it will confess” - Ronald Coase