Contributed by: Abhishek Gupta
My Name is Abhishek Gupta. I am working as Senior Developer and Subject Matter Expert with one of the leading Consultancy Services Pvt Ltd. I have 7 years of experience as Informatica Developer (ETL Developer) and perusing part-time PhD in Computer Science and Engineering, having main area of research as data warehouse, just 2 months ago I have presented one of my research papers in IEEE International Conference held by University of California, Irvine. So, during work I was searching some automations that can be performed on the workplace where after discussion with team we identified that there are following manual approaches to solve some specific tasks which are highly time consuming, and found 3 pain-areas and worked on the same:
- DBA’s have to check database schema consumption and then asks the developers to delete the tables followed by multiple recursive follow-ups all these things happen manually.
- Manual generation of the prototype queries for Client send the STMs (Requirements).
- Manual validation of report data before sending it to Data Analyst.
For solving this problem, I have used Python, Netezza DB, Unix and Informatica ETL tool. I have done Statistical Data-Analysis for all above maintained 3 points and derive the Algorithm for the same as below which includes solutions for the above given problem in automated format:
- Gathering statistics for database schema, that a mail should be triggered when database schema reaches 10% below the threshold value & intimate it to the DBA that schema is about to reach the threshold value. When it reaches to threshold, trigger mail to developers to select the tables they posses to drop them and developers have to just reply to the mail and the replied mail will be read and will be dropped from the schema automatically, if the schema reaches below the threshold than don’t perform any further actions else repeat the steps, until it reaches below threshold.
- Changing cell of the STMs (STMs are in form of csv) into procedural language (like case statements) that can be further used to plug-in to the prototype queries to retrieve outputs for generating faster prototype queries.
- Validation of report generated by ETL objects and predicting the accuracy of data before sending it to the Data Analysts for validations.
IMPACT: These solutions have been implemented and it is saving a lot of time in terms of man-hours. It’s not a one-time man-hour saving it is recursive in nature. Apart from that, I have received off-cycle salary increment (other than normal salary increment cycle) of 70k+ due to outstanding performance (Completed within 1 months of time).
Want to learn such concepts? Join Great Learning’s PGP Data Science and Business Analytics Course and upskill today!