I’m Harshavardhan J N, graduated in BE Computer Science in the year 2002. I hold a PG Diploma in IT from Symbiosis. I have over 19 years’ experience in the IT service field. I have worked in various IT Domains like Incident Management, Change Management & Major Incident Management. I’m married to Maheshwari, and I’m blessed with two daughters, Kruthika and Unnathi. Currently, I’m working as a Service Delivery Manager for the Monitoring Operations and Engineering Team.
At my current workplace, being part of the service delivery domain, there were 2 main problems that I was facing. They were:
- Unplanned downtime due to SSL Certificate expiry
- Huge volume of Incidents tickets
To elaborate, each application used by my organization has a set of SSL certificates associated with it. These certificates have a validity of 1 year and it has to be renewed once in a year. Without proper alert mechanism, some of the SSL certificates were not renewed on time leading to unplanned downtime. There were 1000’s of servers and their associated SSL certificates. So, it is very important to have a mechanism to track and maintain the SSL certificates.
IM volume was also another challenge for me. In the flood of tickets we received, most of them were either false alerts or repetitive alerts. We were spending a lot of FTE (1 FTE = 1 resource working full-time) on this non-value added work. Reducing the ticket count is another major goal in the service industry.
As a Service Delivery Manager, it was my core responsibility to ensure that there is Zero Unplanned Downtime. So, I had to address the SSL Certificate expiry issue on priority. While it is important to ensure Zero Unplanned downtime, it was equally important to reduce the IM ticket volume too. This was to ensure that the employees work more on pro-active cases than on reactive cases, thereby increasing the value addition to the work.
Also, these concerns were affecting our organization majorly as every unplanned downtime was costing the company a huge penalty and also reduced customer satisfaction. Our aim is to maintain 99.5% Uptime and high customer satisfaction rate.
Thus, I used Tableau to build the Dashboard for the SSL Certificate expiry. The dashboard gives an overview of the SSL Certificate that will expire in the next 60 days. We have deployed the power automate solution, which fetches the data from Tableau dashboard and sends the alert to the respective team when SSL Certificate is nearing the expiration. The team then reviews the list of SSL Certificates that are about to expire and will raise an RFC (Request for Change), to renew the SSL Certificate.
Also, I have used Tableau to understand the IM volume trend. Based on the trend analysis, corrective actions have been taken.
Coming to the IM Ticket volume, the top offenders were divided into different groups like recurring issues, duplicate, timeouts, memory over utilization and missing housekeeping jobs. Root causes for each of these groups were analyzed and a solution was implemented. For eg., memory over utilization tickets were analyzed for the usage and additional memory was added. For timeout issues, the timing was changed, so that it reduced the failures.
Insights in quantifiable terms:
SSL Certificate: For SSL Certificate, the dashboard was straight to the point. It indicated the SSL Certificates that would be expiring in the next 60 days.
IM Volume: The Tableau Worksheets and Storyboard indicated various reasons for the spike in the tickets. The spike was due to some recurring issues, duplicate, timeouts, memory over utilization and missing housekeeping jobs to name a few. Categorizing into these buckets helped us in narrowing down the issue and hence we could see a downtrend in the ticket volume.
Impact my recommendations generated at the organization in quantifiable terms:
After implementing the Tableau-Power Automate solution, there was no unplanned downtime reported as a result of SSL Certificate expiry.
Ticket count has come down from 12k per month to 5k per month, thereby saving lots of resource time. Resources now have a lot of Bandwidth which will be used for development activities.
Lastly, this exercise has given me an opportunity to explore Data Science at work and also, I got some exposure to Automation, as I had a chance to work on Power Automate. I now have some confidence in Data Science, and I would be exploring the other Topics in Data Science in the coming future.