Curriculum
The Data Science and Machine Learning: Making Data-Driven Decisions Program has a curriculum carefully crafted by MIT faculty to provide you with the skills & knowledge to apply data science techniques to help you make data-driven decisions.
This data science course has been designed for the needs of working professionals looking to grow their careers in the data science field with solid conceptual foundations and a deep understanding of how to problem solve using the most relevant algorithms and techniques across statistics, machine learning, deep learning, network analytics, recommendation systems, and more.
- Curriculum designed by MIT faculty in Data Science and Machine Learning
- Become a Data Science decision maker by learning Deep Learning, Machine Learning, Recommendation Systems, and more.
- Taught in Python
Weeks 1-2: Foundations of Data Science
In the first two weeks, we will cover the foundational concepts for data science that form the building blocks of the course and will help you sail through the rest of the journey with ease.
Python for Data Science
1 Case Study
1 Case Study
Python, for data scientists and machine learning specialists, is a lingua franca owing to the immense promise of this widely-used programming language. To strengthen your Python foundations, this module focuses on NumPy, Pandas and Data Visualization.
- Numpy
A Python package for scientific computing that enables one to work with multi-dimensional arrays and matrices.
- Pandas
An open-source and powerful library in Python that is used to analyse and manipulate data.
- Data Visualization
Dealing with the graphic representation of data, that is very effective in generating insights from data by using matplotlib, seaborn, etc. libraries.
Statistics for Data Science
1 Case Study
1 Case Study
This week will help you understand the role of statistics in helping organizations take effective decisions, learn its most widely-used tools and learn to solve business problems using analysis, data interpretation and experiments. It will cover the following topics:
- Descriptive Statistics
It gives you the basic measures of a statistical summary of the data.
- Inferential Statistics
This will explore the areas of distributions and parameter estimation, ultimately allowing you to make inferences from the data.
Week 3: Making Sense of Unstructured Data
In this week, you will learn about how to apply different ML techniques to discover patterns & insights in the unstructured data.
Introduction
Here, you will learn about one of the important aspects of ML - Unsupervised Learning.
- What is unsupervised learning, and why is it challenging?
- Examples of unsupervised learning
Clustering
2 Case Studies
2 Case Studies
Clustering is an unsupervised learning technique to group similar sets of data points. Here, you will learn about the widely used clustering techniques i.e. K-means clustering.
- What is clustering?
- When to use clustering?
- K-means preliminaries
- The K-means algorithm
- How to evaluate clustering?
- Beyond K-means: What really makes a cluster?
- Beyond K-means: Other notions of distance
- Beyond K-means: Data and pre-processing
- Beyond K-means: Big data and nonparametric Bayes
- Beyond clustering
Spectral Clustering, Components, and Embeddings
2 Case Studies
2 Case Studies
Spectral clustering is one of the most popular techniques when it comes to cluster graphs & networks. Here, you will learn about the spectral, modularity clustering, and PCA algorithm.
- What if we do not have features to describe the data or not all are meaningful?
- Finding the principal components in data and applications
- The magic of eigenvectors I
- Clustering in graphs and networks
- Features from graphs: The magic of eigenvectors II
- Spectral clustering
- Modularity Clustering
- Embeddings: New features and their meaning
Week 4: Regression and Prediction
In this week, you will explore the classical and modern regression methods for prediction and inferential purposes.
Classical Linear and Nonlinear Regression and Extensions
2 Case Studies
2 Case Studies
Here, you will learn about linear and nonlinear regression together with their extensions, including the important case of logistic regression for binary classification and causal inference where the goal is to understand the effects of actively manipulating a variable as opposed to passively measuring it.
- Linear regression with one and several variables
- Linear regression for prediction
- Linear regression for causal inference
- Logistic and other types of nonlinear regression
Modern Regression with High-Dimensional Data
1 Case Studies
2 Case Studies
Here, you will learn about the modern regression with high-dimensional data, or finding a needle in a haystack: for large datasets, it becomes necessary to sort out which variables are relevant for prediction and which are not. Recent years have witnessed the development of new statistical techniques, such as Lasso or Random Forests, that are computationally superior to large datasets and that automatically select relevant data.
- Making good predictions with high-dimensional data
- Avoiding overfitting by validation and cross-validation
- Regularization by Lasso, Ridge, and their modifications
- Regression Trees, Random Forest, Boosted Trees
The Use of Modern Regression for Causal Inference
2 Case Studies
2 Case Studies
This part will cover regression and causal inference to explain why “correlation does not imply causation” and how we can overcome this intrinsic limitation of regression by resorting to randomized control studies or controlling for confounding.
- Randomized Control Trials
- Observational Studies with Confounding
Week 5: Classification and Hypothesis Testing
In this week, you will learn about the basics of anomaly detection, classification, and fundamentals of hypothesis testing, which is the formalization of scientific inquiry. This delicate statistical setup obeys a certain set of rules that will be explained and put in context with classification.
Hypothesis Testing and Classification
1 Case Study
1 Case Study
- What are anomalies? What is fraud? Spams?
- Binary Classification: False Positive/Negative, Precision / Recall, F1-Score
- Logistic and Probit regression: Statistical binary classification
- Hypothesis testing: Ratio Test and Neyman-Pearson p-values: Confidence
- Support vector machine: Non-statistical classifier
- Perceptron: Simple classifier with elegant interpretation
Week 6: Deep Learning
Deep learning has been emerging as a driving force in the ongoing technological revolution. The essence of Deep Learning lies in its ability to imitate the human brain in processing data for various purposes, that too without any human supervision. Neural networks are at the heart of this technology. This week will take you beyond traditional ML and into the realm of Neural Networks and Deep Learning. You’ll learn how Deep Learning can be successfully applied to areas such as Computer Vision, and more.
Deep Learning
1 Case Study
1 Case Study
- What is image classification? Introduce ImageNet and show examples
- Classification using a single linear threshold (perceptron)
- Hierarchical representations
- Fitting parameters using back-propagation
- Non-convex functions
- How interpret-able are its features?
- Manipulating deep nets (ostrich example)
- Transfer learning
- Other applications I: Speech recognition
- Other applications II: Natural language processing
Week 8: Recommendation Systems
As organizations increasingly lean towards data-driven approaches, an understanding of recommendation systems can help not only data science experts but also professionals in other areas such as marketing who, too, are expected to be data literate today. Learn why recommendation systems are now everywhere and some insight on what is required to build a good recommendation system by covering statistical modeling and algorithms.
Recommendations and Ranking
1 Case Study
1 Case Study
Recommendation System algorithms, simply put, suggest relevant items to users - explaining the trends of their usage across a range of industries and their central role in revenue generation.
- What does a recommendation system do?
- So what is the recommendation prediction problem? And what data do we have?
- Using population averages
- Using population comparisons and ranking
Collaborative Filtering
1 Case Study
1 Case Study
Collaborative filtering is an aspect of recommendation systems with which we interact quite frequently. Upon collecting data on preferences of multiple users, collaborative filtering makes predictions for the choice of a particular user.
- Personalization using collaborative filtering using similar users
- Personalization using collaborative filtering using similar items
- Personalization using collaborative filtering using similar users and items
Personalized Recommendations
1 Case Study
1 Case Study
As suggested by the name itself, personalized recommendations work to filter out recommendations which are personally relevant for a user, based on their browsing trends, etc.
- Personalization using comparisons, rankings, and users-items
- Hidden Markov Model / Neural Nets, Bipartite graph, and graphical model
- Using side-information
- 20 questions and active learning
- Building a system: Algorithmic and system challenges
Week 9: Networking and Graphical Models
In this week, you will get a systematic overview of methods for analyzing large networks, determining important structures in such networks, and inferring missing data in networks. An emphasis is placed on graphical models, both, as a powerful way to model network processes and to facilitate efficient statistical computation.
Introduction
Here, you will get to know what networks are and how we can represent networks, with their practical use-cases around us.
- Introduction to networks
- Examples of networks
- Representation of networks
Networks
1 Case Study
1 Case Study
Here, you will learn about the common descriptive measures of a network, such as a centrality, closeness, & betweenness, and standard stochastic models for networks, such as Erdos-Renyi, preferential attachment, infection models, notions of influence, etc.
- Centrality measures: degree, eigenvector, and page-rank
- Closeness and betweenness centrality
- Degree distribution, clustering, and small world
- Network models: Erdos-Renyi, configuration model, preferential attachment
- Stochastic models on networks for spread of viruses or ideas
- Influence maximization
Graphical Models
1 Case Study
1 Case Study
Here, you will get to know how to use graphical models to estimate and display a network of interactions.
- Undirected graphical models
- Ising and Gaussian models
- Learning graphical models from data
- Directed graphical models
- V-structures, “explaining away,” and learning directed graphical models
- Inference in graphical models: Marginals and message passing
- Hidden Markov Model (HMM)
- Kalman filter
Week 10: Predictive Analytics
In this week, you will learn about some practical examples of temporal data sources and how we can begin to understand them. Then, you will dive into several strategies for feature extraction, including Deep Feature Synthesis with primitives and stacking. Finally, you will look toward models for the real world and how to ensure they successfully predict future data.
Predictive Modeling for Temporal Data
1 Case Study
1 Case Study
Here, you will learn about the structure of temporal data and how can we clearly define training inputs and outputs for prediction?
- Prediction Engineering
Feature Engineering
1 Case Study
1 Case Study
In this part, you will know how to utilize feature engineering techniques to extract meaningful insights from temporal data? What are effective strategies for evaluating model performance and preparing to deploy it in the real world?
- Introduction
- Feature Types
- Deep Feature Synthesis: Primitives and Algorithms
- Deep Feature Synthesis: Stacking
Certificate of Completion from MIT IDSS.
Upon successful completion of the program, you will receive one of the best data science professional certificates out there, for it will be from MIT Institute for Data, Systems, and Society (IDSS).
Projects and Case Studies
Following a “learn by doing” pedagogy, the Data Science and Machine Learning Program offers you the opportunity to construct your understanding through solving real-world case studies and practice activities.
Below are samples of potential project topics and case studies.
Healthcare
Pima Indians Diabetes
Area of Project
Exploratory Data Analysis
Small Summary
Analyze the different aspects of Diabetes in the Pima Indians tribe.
Tools & Techniques used:
Python, EDA, Descriptive Statistics etc.
Learn more
Entertainment
Movies Recommendation System
Area of Project
Recommendation Systems
Small Summary
Build your own recommendation system that can recommend the best movies to a user like the one used by Netflix.
Tools & Techniques used:
Python, Content based algorithms, Collaborative Filtering, Popularity recommendations, etc.
Learn more
Transportation
NYC Taxi Trips
Area of Project
Predictive Analytics
Small Summary
To predict the trip duration of a new york taxi cab ride, build different types of features and evaluate them.
Tools & Techniques used:
Python, Regression, Feature Engineering, etc.
Learn more
Research
Predicting Wages
Area of Project
Regression & Prediction
Small Summary
Predict wages and assess predictive performance using various characteristics of workers.
Tools & Techniques used:
Python, Regression, etc.
Learn more
Media
Grouping News Stories
Area of Project
Clustering
Small Summary
Build your own clustering for online news stories—similar to how Google News organizes stories via auto-generated topics.
Tools & Techniques used:
Python, Clustering, NLP, etc.
Learn more
Space
The Challenger Disaster
Area of Project
Classification and Hypothesis Testing
Small Summary
Estimate the likelihood of failure of the equipment in a rocket post the launch.
Tools & Techniques used:
Python, Classification, Hypothesis testing, etc.
Learn more
Manufacturing
Decision boundary of a deep neural network
Area of Project
Deep Learning
Small Summary
Play with one or two layer perceptrons to assess their decision boundaries.
Tools & Techniques used:
Python, Neural Networks, etc.
Learn more
Healthcare
Identifying new Genes that cause Autism
Area of Project
Networking and Graphical Models
Small Summary
Use network-theoretic ideas to identify new candidate genes that might cause autism.
Tools & Techniques used:
Python, Networks, Graphical Models, etc.
Learn more
MIT Faculty and Industry Experts
Learn from the vast knowledge of top MIT faculty in the field of Data Science and Machine Learning, along with experienced data science and machine learning practitioners from leading global organizations.
Program Faculty
Ankur Moitra
Rockwell International Career Development Associate Professor, Mathematics and IDSS, MIT
Caroline Uhler
Henry L. & Grace Doherty Associate Professor, EECS and IDSS, MIT
David Gamarnik
Nanyang Technological University Professor of Operations Research, Sloan School of Management and IDSS, MIT
Devavrat Shah
Professor, EECS and IDSS, MIT
Guy Bresler
Associate Professor, EECS and IDSS, MIT
Jonathan Kelner
Professor, Mathematics, MIT
Kalyan Veeramachaneni
Principal Research Scientist at the Laboratory for Information and Decision Systems, MIT.
Philippe Rigollet
Professor, Mathematics and IDSS, MIT
Stefanie Jegelka
X-Consortium Career Development Associate Professor, EECS and IDSS, MIT
Tamara Broderick
Associate Professor, EECS and IDSS, MIT
Victor Chernozhukov
Professor, Economics and IDSS, MIT
Program Mentors
Your Learning Experience
The Data Science and Machine Learning: Making Data-Driven Decisions Program is distinguished by its unique combination of MIT academic leadership, recorded lectures by MIT faculty, an application-based pedagogy, and personalized mentorship from industry experts.
LEARN WITH MIT FACULTY
Learn Data Science and Machine Learning with MIT Faculty
- Self-paced program with recorded lectures from MIT faculty in Data Science & Machine Learning.
- Program curriculum and design by world-renowned MIT faculty.
- Position yourself as a data science leader by gaining industry-valued skills.
PERSONALIZED AND INTERACTIVE
Personalized Mentorship and Support
- Weekly online mentorship from Data Science and Machine Learning experts.
- Small groups of learners for personalized guidance and support.
- Interact with like-minded peers from diverse backgrounds and geographies.
- Dedicated Program Manager provided by Great Learning, for academic and non-academic queries.
View Experience
PRACTICAL AND HANDS-ON
Build your Data Science and Machine Learning Portfolio
- Demonstrate Data Science leadership by building a portfolio of 3 industry-relevant projects and 50+ case studies.
- Learn via practical applications to understand how data science and machine learning concepts translate into the real world.
Ratings & Reviews by learners
All Reviews
Batch of June 2022
| Material and Process Optimization
at Procter and Gamble
| Indonesia
I enrolled in this program as part of my preparation for PhD in Data Science. As I come from a different background, the program helped me build a strong foundation. It was well-structured and comprehensive. The video lectures by the MIT professors were very clear and made it easier to understand the concepts. The entire team was very supportive. I could reach out to my Program Manager with all the queries that I had.
Program Fees
Data Science and Machine Learning: Making Data-Driven Decisions
USD 1700
View Plans
- Recorded lectures from world renowned MIT Faculty
- Live Mentorship from Data Science & Machine Learning Experts
- 3 industry-relevant projects and 50+ real-world case studies
- Program Manager from Great Learning for Academic & Non-Academic Support
Candidates can pay the course fee through Credit/Debit Cards and Bank Transfer. For further details, please get in touch with our program team.
Cohort Start Dates
Online
To be announced
Program Delivered by:
In Collaboration with:
This program is delivered in collaboration with Great Learning. Great Learning is a professional learning company with a global footprint in 140+ countries. It's mission is to make professionals around the globe proficient and future-ready. Great Learning collaborates with MIT IDSS and provides industry experts, student counsellors, course support and guidance to ensure students get hands-on training and live personalized mentorship on the application of concepts taught by the MIT IDSS faculty.