I’m Deepak, a mechanical engineering graduate. Currently, I’m working in an automobile dealership where I am responsible for stock management and billing, preparing reports, etc. Because of this job, my interest got aligned with data, managing it and organizing it in the best way. I started using MS excel and started searching for different tools to do my work much more efficiently.
Before joining PGPDSBA I was working and preparing for exams but due to the pandemic, I left that job and started learning data science on my own. I was always inclined to learn data science owing to my interest towards computers and learning new technologies. During my earlier internship, I worked on a problem statement that involved predicting the percentage of marks that a student is expected to score based on the number of hours they studied.
This is a simple linear regression task as it involves just two variables. In this task, we have to predict the percentage of marks that a student is expected to score based on the number of hours they studied. This is a simple linear regression task as it involves just two variables. The aim was to predict the student’s score if he/she studies for 9.25 hrs/day. This is one of the tasks which was very important from a knowledge point of view – a simple case of a regression task problem.
I started with importing the data and carried out the required exploratory data analysis (EDA) and descriptive analysis. Thereafter, I visualized the data by plotting the distribution of scores. I also divided the data into tests and trained the algorithm and make predictions. The key insights featured an increase in the percentage of scores as the study hours increased – which is a positive linear relationship.
The regression model (linear regression) was created using the supervised machine-learning technique and helped us predict the percentage of marks that a student is expected to score based on the number of hours they studied. During the course of this activity, I sharpened my skillsets and established a grip on EDA, supervised machine learning, regression model and data visualization techniques.