AWS Certified Machine Learning Specialty Exam Study Guide [MLS-C01]

AWS Certified Machine Learning Specialty Study Exam Guide CLF-C01

AWS Certified Machine Learning Exam

Preparing for the AWS Certified Machine Learning Specialty exam? Don’t know where to start? This post is the AWS Certified Machine Learning Certificate Study Guide (with links to each objective in the exam domain).

I have curated a detailed list of articles from AWS documentation and other blogs for each objective of the AWS Certified Machine Learning (MLS-C01) exam. Please share the post within your circles so it helps them to prepare for the exam.

AWS Certified Machine Learning Online Course

Pluralsight (Free Trial)AWS Certified ML Specialty Learning Path
LinkedIn Learning (Free Trial)AWS Machine Learning Essentials [Path]
UdemyAWS Certified Machine Learning Course

AWS Certified Machine Learning Practice Test

Whizlabs Exam QuestionsAWS ML [145 Practice Tests & 3 Labs]
Udemy Practice TestAWS Cert. ML Specialty 75 Practice Questions

AWS Certified Machine Learning Preparation

CourseraGetting Started with AWS Machine Learning
Amazon e-book (PDF)Mastering Machine Learning on AWS

Check out all the other AWS certificate study guides

Full Disclosure: Some of the links in this post are affiliate links. I receive a commission when you purchase through them.

Domain 1: Data Engineering – 20%

1.1 Create Data Repositories for Machine Learning

Identify data sources (e.g., content and location, primary sources such as user data)

Registry of Open Data on AWS

Supported data sources in QuickSight

Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)

Using Amazon S3 as a data repository

Amazon Redshift as a data source

Using Amazon RDS Database as an Amazon ML Datasource

Amazon Machine Learning & Amazon Elastic File System

Host instance storage volumes

1.2 Identify and Implement a Data-ingestion Solution

Data job styles/types (batch load, streaming)

Difference between batch and stream processing

Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)

Job scheduling

Job scheduling in AWS Batch

Time-based schedules for jobs and crawlers

1.3 Identify and Implement a Data-transformation Solution

Transforming data transit (ETL: Glue, EMR, AWS Batch)

How to extract, transform, and load data using AWS Glue

Handle ML-specific data using map reduce (Hadoop, Spark, Hive)

Large-scale ML with Spark on Amazon EMR

Apache Hive on Amazon EMR

Apache Spark on Amazon EMR

Use Apache Spark with SageMaker

Perform interactive data processing using Spark in SageMaker Studio Notebooks

Domain 2: Exploratory Data Analysis – 24%

2.1 Sanitize and Prepare Data for Modeling

Identify and handle missing data, corrupt data, stop words, etc.

Missing values:


Formatting, normalizing, augmenting, and scaling data




Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])

What is data labeling?

Introduction to Amazon Mechanical Turk

AWS introduces a new way to label data for ML

Use Mechanical Turk with SageMaker for supervised learning

2.2 Perform Feature Engineering

Identify and extract features from data sets, including from data sources such as text, speech, image, public datasets, etc.

Feature engineering

Feature processing

Amazon Textract | Extract text data

Amazon Textract features

Extract data from images

Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, 1 hot encoding, reducing dimensionality of data)

Quantile binning transformation

Building a tokenization solution to mask sensitive data

ML-Powered anomaly detection for outliers

One-hot encoding

Run PCA in Amazon SageMaker

Perform a large-scale PCA

2.3 Analyze and Visualize Data for Machine Learning

Graphing (scatter plot, time series, histogram, box plot)

Using scatter plots in Amazon QuickSight

Run a query that produces a time series visualization

Histograms in Amazon QuickSight

Using Box plots in QuickSight

Interpreting descriptive statistics (correlation, summary statistics, p value)

Statistics 101: Correlation

Summary statistics

P-value in Machine Learning

Clustering (hierarchical, diagnosing, elbow plot, cluster size)

Hierarchical clustering

Diagnosis clusters: a new tool for analyzing the content of medical care

Elbow method in K-Means Clustering

Bounded clustering

aws certified machine learning

Amazon link (affiliate)

Domain 3: Modeling – 36%

3.1 Frame Business Problems as Machine Learning Problems

Determine when to use/when not to use ML

When to use Machine Learning?

When not to use machine learning?

Know the difference between supervised and unsupervised learning

Supervised & unsupervised learning

Differences between supervised & unsupervised learning

Selecting from among classification, regression, forecasting, clustering, recommendation, etc

Difference between classification & regression

5 machine learning techniques for forecasting

K-means clustering with SageMaker

Building a customized recommender system in SageMaker

3.2 Select the Appropriate Model(s) for a Given Machine Learning Problem

Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning

XGBoost algorithm in Amazon SageMaker

Logistic regression detailed overview

K-means clustering with SageMaker

Linear regression with Amazon Machine Learning

An introduction to Decision Tree

Random Forest: A complete guide

What are Recurrent Neural Networks?

Forecast financial time series with RNN

A comprehensive guide to CNN

Use CNN to train forecasting models

Ensemble methods in Machine Learning

Transfer Learning: Machine Learning’s next frontier

Detecting hidden problems in transfer learning models

Express intuition behind models

Why do you need to explain machine learning models?

How to explain your ML models?

3.3 Train Machine Learning Models

Train validation test split, cross-validation

Training a model:

Validating models:

Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc.

Estimators, Loss Functions, Optimizers: Core of ML algorithms


Gradient Descent: A quick, simple introduction

Understanding loss functions in Machine Learning

Local & global minima explained with examples

What is convergence in machine learning?

When & why are batches used in machine learning?

Understand the applications of probability in ML

Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])

CPU vs. GPU for ML algorithms: Which is better?

What is distributed training?

Distributed ML vs. Federated Learning: Which is better?

Why you should use Spark for machine learning?

Introduction to Apache Spark & Analytics

Model updates and retraining

Retraining models on new data

Model retraining and deployment

o Batch vs. real-time/online

Difference between Real-time & Batch processing

3.4 Perform Hyperparameter Optimization



Regularization type and amount

Cross validation

Cross validation and hyperParameter tuning

Cross-validate your with SageMaker

Model initialization

Initialize a neural network with random weights

Model Initialization: AWS DeepRacer

Neural network architecture (layers/nodes), learning rate, activation functions

Neural network architecture, components & algorithms

Configure the no. of. layers and nodes in a Neural Network

Impact of learning rate on Neural Network performance

Activation functions in Neural Networks

Tree-based models (# of trees, # of levels)

Tree-based models: How they work?

Tree-based algorithms

Linear models (learning rate)

Learning rate in regression models

Gradient descent in practice: Learning rate

3.5 Evaluate Machine Learning Models

Avoid overfitting/underfitting (detect and handle bias and variance)

Model fit: Underfitting vs. overfitting

Underfitting, overfitting and its solution

Bias-variance tradeoff in Machine Learning

SageMaker Clarify detects bias

Amazon SageMaker Clarify

Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)

Understanding AUC – ROC curve

Accuracy, Precision, Recall or F1?

What does RMSE really mean?

Confusion matrix

Simple guide to confusion matrix

Confusion Matrix | Amazon Comprehend

Offline and online model evaluation, A/B testing

Fundamentals of Machine Learning model evaluation

Validate a Machine Learning model

Model evaluation & business evaluation

Compare models using metrics (time to train a model, quality of model, engineering costs)

Visualize metrics while training models

Model Quality Metrics | Amazon SageMaker

Monitor model quality

Cross validation


Model support and validation

Domain 4: Machine Learning Implementation and Operations – 20%

4.1 Build Machine Learning Solutions for Performance, Availability, Scalability, Resiliency, and Fault Tolerance


Review the ML Model’s Predictive Performance


Deploy Multiple Instances Across Availability Zones



AWS environment logging and monitoring

Logging and monitoring

o Build error monitoring

ML platform monitoring

Multiple regions, Multiple AZs

Regions and Endpoints | Amazon Machine Learning

Deploy multiple instances across AZ

AMI/golden image

What is the AWS Deep Learning AMI?

Docker containers

Why use Docker containers for ML development?

Using Docker containers with SageMaker

Machine Learning with containers and SageMaker

Auto Scaling groups

Automatically scale Amazon SageMaker models

Configure autoscaling inference endpoints in SageMaker


Load balancing

Manage your ML lifecycle with MLflow & SageMaker

AWS best practices

Machine learning best practices in financial services

4.2 Recommend and Implement the Appropriate Machine Learning Services and Features for a Given Problem

ML on AWS (application services)

AWS service limits

Amazon SageMaker endpoints & quotas

Amazon Machine Learning endpoints & quotas

System limits in Amazon Machine Learning

Build your own model vs. SageMaker built-in algorithms

Use Amazon SageMaker built-in algorithms

Bring your own custom ML models with SageMaker

Infrastructure: (spot, instance types), cost considerations

Instance types for built-in algorithms

A quick guide to using spot instances with SageMaker

Overview of AWS Machine Learning infrastructure

4.3 Apply Basic AWS Security Practices to Machine Learning Solutions


Control access to ML resources with IAM

IAM in AWS Deep Learning containers

S3 bucket policies

Using S3 with Amazon ML

Grant Amazon ML permissions to read your data from S3

Security groups

Secure multi-account model deployment with SageMaker

Prepare an EFA-enabled security group


Secure SageMaker Studio connectivity using a private VPC

Direct access to SageMaker notebooks from Amazon VPC

Build secure ML environments with Amazon SageMaker


Building machine learning models with encrypted data

Protect data at rest using encryption

Protecting data in transit with encryption

Anonymize & manage data in your data lake

4.4 Deploy and Operationalize Machine Learning Solutions

Exposing endpoints and interacting with them

Creating a ML-powered REST API with Amazon SageMaker

Call a SageMaker model endpoint using Amazon API Gateway

ML model versioning

Version control your production ML models

Model versioning

Register a model version

A/B testing

A/B Testing ML models in with Amazon SageMaker

Dynamic A/B testing for ML models with SageMaker

Retrain pipelines

Automating model retraining and deployment

Evolve: Machine Learning Lens

ML debugging/troubleshooting

Debug your Machine Learning models

Analyze ML models using SageMaker Debugger

Troubleshoot Amazon SageMaker model deployments

This brings us to the end of the AWS Certified Machine Learning – Specialty [MLS-C01] Exam Preparation Study Guide.

What do you think? Let me know in the comments section if I have missed out on anything. Also, I love to hear from you about how your preparation is going on!

In case you are preparing for other AWS certification exams, check out the AWS study guides for those exams.

Get Updates on AWS Certified Machine Learning

Want to be notified as soon as I post? Subscribe to the RSS feed / leave your email address in the subscribe section. Share the article to your social networks with the below links so it can benefit others.

Share the AWS Certified Machine Learning Guide

You may also like