I come with close to 23+ years of experience in IT. My core skills are Cloud adoption, migration, application modernization, data science, and Machine learning. I am multicolored certified on the above skills as well.
My role is to develop market disruptive products that use Artificial Intelligence and Machine Learning. I am an offering manager responsible for this. We had a mammoth task of going through each of the incident and event tickets description to understand what the problem was and how to remediate it in the future. We had approx—4 lakh rows of data per day. We have to go through each of the tickets and group them into different buckets. While we had some excel macros that would run through each row with a prebuilt logic, these weren’t always helpful and quite time-consuming as well.
In many contacts, we are supposed to reduce the MTTR and proactively identify and solve problems. There are specific KPI’s that are associated with these and pay penalties if we fail to do so.
The approach was to use NLP, K-means clustering, and classification algorithms to arrive at the required results. We used Rapidminer for all these and were later converted as open-source using apache PIO and Pyspark.
We used the Naïve Bayes classification algorithm, which classified 85% of the labels correctly and started to improve as we trained them. The models were able to read through each row’s descriptions and predict/classify the label.
Using NLP, we were able to pre-process the text data. These were fed into the K-means clustering algorithm, which had created several buckets. We had taken time to label these buckets, and these are then fed into the classification model. A web page was created and hosted internally, which will connect to the data source and automatically run the model every midnight.
We were able to reduce huge man-hours. This was taking like 4 – 5 hrs per day, which was drastically reduced to 0. Also, these data are then used to create visuals and dashboards using Grafana, which gives more visibility to the support persons and takes proactive measures.
This exercise really helped me to step in from the basics of ingestion of the data until the final result. A few important learnings that I got are on how to identify which model is right. How does one know where to draw the line of decision though there are high and low scoring that is given by the models. Both the content along with the mentoring sessions from Great Learning helped me in this whole project.