Hello, I am working as Data Scientist at Damco Solution Pvt Ltd. Before joining the PGP-AIML program, I worked as Data Scientist for over 2 years. Currently, I am working for a Fintech company that wants to automate the loan lending process and wants to reduce the loan default because their current lending system gives them only 60% accuracy.
The main objective was to reduce the probability of loan default with the help of machine learning by studying the financial behavior of the customer, mainly by analyzing their bank statement and application data. The client specifically wanted to go a non-traditional way while approaching a solution for this problem. Financial behavior is using the applicant’s pattern of how they manage their money over time. The hypothesis is that evaluating income vs. expenses over time, regardless of what those expenses were spent on, yields an objective pattern that tells how the applicant manages their finances.
So, the environment was based on a Linux server with GPU. We used Visual Studio code for the development. The main language used for development was Python 3.10, whereas, for the data extraction part, we used SQL query. We used different Python libraries for data analysis, EDA, feature engineering, data pre-processing, and Model training such as pandas, NumPy, Matplotlib, Seaborn, Sklearn, etc.
After extracting the data, during EDA, we used different visualization to know the relationship and understand features by univariate, bivariate, and multivariate analysis by plotting different graphs and used Hypothesis testing and descriptive statistics for distribution and sampling of data. In pre-processing, we filled missing values and removed some outlier values. In feature engineering, we check correlation within features and drop some features to reduce multicollinearity. We scaled the data using StandardScaler and derived some features, and created target columns as 0(default) and 1(non-default). We used different classification ML algorithms to train our data.
So, with the help of this model, we increased our accuracy by more than 90%, and after deployment, it helped to generate 50% more profit than manual process and automate the process, which reduced the time and manpower with greater confidence.