AWS Certified Machine Learning Specialty Exam Study Guide [MLS-C01]

AWS Certified Machine Learning Exam

Preparing for the AWS Certified Machine Learning Specialty exam? Don’t know where to start? This post is the AWS Certified Machine Learning Certificate Study Guide (with links to each objective in the exam domain).

I have curated a detailed list of articles from AWS documentation and other blogs for each objective of the AWS Certified Machine Learning (MLS-C01) exam. Please share the post within your circles so it helps them to prepare for the exam.

AWS Certified Machine Learning Online Course

Pluralsight (Free Trial)	AWS Certified ML Specialty Learning Path
LinkedIn Learning (Free Trial)	AWS Machine Learning Essentials [Path]
Udemy	AWS Certified Machine Learning Course

AWS Certified Machine Learning Practice Test

Whizlabs Exam Questions	AWS ML [145 Practice Tests & 3 Labs]
Udemy Practice Test	AWS Cert. ML Specialty 75 Practice Questions

AWS Certified Machine Learning Preparation

Coursera	Getting Started with AWS Machine Learning
Amazon e-book (PDF)	Mastering Machine Learning on AWS

Check out all the other AWS certificate study guides

Full Disclosure: Some of the links in this post are affiliate links. I receive a commission when you purchase through them.

Domain 1: Data Engineering – 20%

1.1 Create Data Repositories for Machine Learning

Identify data sources (e.g., content and location, primary sources such as user data)

Registry of Open Data on AWS

Supported data sources in QuickSight

Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)

Using Amazon S3 as a data repository

Amazon Redshift as a data source

Using Amazon RDS Database as an Amazon ML Datasource

Amazon Machine Learning & Amazon Elastic File System

Host instance storage volumes

1.2 Identify and Implement a Data-ingestion Solution

Data job styles/types (batch load, streaming)

Difference between batch and stream processing

Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)

o Kinesis

Getting started with Amazon Kinesis data streams

o Kinesis Analytics

Amazon Kinesis data analytics

o Kinesis Firehose

Build data streaming pipelines with Kinesis data firehose

o EMR

Process data using Amazon EMR

Optimize data processing with Amazon EMR

o Glue

Simplify data pipelines with AWS Glue

AWS Glue

Job scheduling

Job scheduling in AWS Batch

Time-based schedules for jobs and crawlers

1.3 Identify and Implement a Data-transformation Solution

Transforming data transit (ETL: Glue, EMR, AWS Batch)

How to extract, transform, and load data using AWS Glue

Handle ML-specific data using map reduce (Hadoop, Spark, Hive)

Large-scale ML with Spark on Amazon EMR

Apache Hive on Amazon EMR

Apache Spark on Amazon EMR

Use Apache Spark with SageMaker

Perform interactive data processing using Spark in SageMaker Studio Notebooks

Domain 2: Exploratory Data Analysis – 24%

2.1 Sanitize and Prepare Data for Modeling

Identify and handle missing data, corrupt data, stop words, etc.

Missing values:

Manage missing values in your target datasets

AWS Glue now supports missing value imputation

Amazon SageMaker support for missing values

Stopwords:

Stopwords in Amazon CloudSearch

Preprocessing data: Remove stopwords in SageMaker

Formatting, normalizing, augmenting, and scaling data

Formatting:

Understanding the data format for Amazon ML

Common data formats for training

Normalizing:

Normalization transformations

AWS Glue DataBrew: Visual data preparation tool to normalize data

Augmenting:

Easily train models using datasets labeled by SageMaker

Visualize SageMaker predictions with QuickSight

Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])

What is data labeling?

Introduction to Amazon Mechanical Turk

AWS introduces a new way to label data for ML

Use Mechanical Turk with SageMaker for supervised learning

2.2 Perform Feature Engineering

Identify and extract features from data sets, including from data sources such as text, speech, image, public datasets, etc.

Feature engineering

Feature processing

Amazon Textract | Extract text data

Amazon Textract features

Extract data from images

Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, 1 hot encoding, reducing dimensionality of data)

Quantile binning transformation

Building a tokenization solution to mask sensitive data

ML-Powered anomaly detection for outliers

One-hot encoding

Run PCA in Amazon SageMaker

Perform a large-scale PCA

2.3 Analyze and Visualize Data for Machine Learning

Amazon link (affiliate)

Domain 3: Modeling – 36%

3.1 Frame Business Problems as Machine Learning Problems

Determine when to use/when not to use ML

When to use Machine Learning?

When not to use machine learning?

Know the difference between supervised and unsupervised learning

Supervised & unsupervised learning

Differences between supervised & unsupervised learning

Selecting from among classification, regression, forecasting, clustering, recommendation, etc

Difference between classification & regression

5 machine learning techniques for forecasting

K-means clustering with SageMaker

Building a customized recommender system in SageMaker

3.2 Select the Appropriate Model(s) for a Given Machine Learning Problem

Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning

XGBoost algorithm in Amazon SageMaker

Logistic regression detailed overview

K-means clustering with SageMaker

Linear regression with Amazon Machine Learning

An introduction to Decision Tree

Random Forest: A complete guide

What are Recurrent Neural Networks?

Forecast financial time series with RNN

A comprehensive guide to CNN

Use CNN to train forecasting models

Ensemble methods in Machine Learning

Transfer Learning: Machine Learning’s next frontier

Detecting hidden problems in transfer learning models

Express intuition behind models

Why do you need to explain machine learning models?

How to explain your ML models?

3.3 Train Machine Learning Models

Train validation test split, cross-validation

Training a model:

Train a Model with Amazon SageMaker

Incremental training of model in SageMaker

Training with Amazon EC2 Spot Instances

Validating models:

Validate a Machine Learning model

Cross-validation in Amazon Machine Learning

Model support and validation

Split the data into a training & evaluation set

Splitting your data

Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc.

Estimators, Loss Functions, Optimizers: Core of ML algorithms

Optimizers

Gradient Descent: A quick, simple introduction

Understanding loss functions in Machine Learning

Local & global minima explained with examples

What is convergence in machine learning?

When & why are batches used in machine learning?

Understand the applications of probability in ML

Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])

CPU vs. GPU for ML algorithms: Which is better?

What is distributed training?

Distributed ML vs. Federated Learning: Which is better?

Why you should use Spark for machine learning?

Introduction to Apache Spark & Analytics

Model updates and retraining

Retraining models on new data

Model retraining and deployment

o Batch vs. real-time/online

Difference between Real-time & Batch processing

3.4 Perform Hyperparameter Optimization

Regularization

Regularization type and amount

o Drop out

A gentle introduction to dropout for regularizing Neural Networks

An introduction to dropout regularization

o L1/L2

L1 and L2 regularization methods

Intuitions on L1 & L2 regularisation

Cross validation

Cross validation and hyperParameter tuning

Cross-validate your with SageMaker

Model initialization

Initialize a neural network with random weights

Model Initialization: AWS DeepRacer

Neural network architecture (layers/nodes), learning rate, activation functions

Neural network architecture, components & algorithms

Configure the no. of. layers and nodes in a Neural Network

Impact of learning rate on Neural Network performance

Activation functions in Neural Networks

Tree-based models (# of trees, # of levels)

Tree-based models: How they work?

Tree-based algorithms

Linear models (learning rate)

Learning rate in regression models

Gradient descent in practice: Learning rate

3.5 Evaluate Machine Learning Models

Avoid overfitting/underfitting (detect and handle bias and variance)

Model fit: Underfitting vs. overfitting

Underfitting, overfitting and its solution

Bias-variance tradeoff in Machine Learning

SageMaker Clarify detects bias

Amazon SageMaker Clarify

Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)

Understanding AUC – ROC curve

Accuracy, Precision, Recall or F1?

What does RMSE really mean?

Confusion matrix

Simple guide to confusion matrix

Confusion Matrix | Amazon Comprehend

Offline and online model evaluation, A/B testing

Fundamentals of Machine Learning model evaluation

Validate a Machine Learning model

Model evaluation & business evaluation

Compare models using metrics (time to train a model, quality of model, engineering costs)

Visualize metrics while training models

Model Quality Metrics | Amazon SageMaker

Monitor model quality

Cross validation

Cross-validation

Model support and validation

Domain 4: Machine Learning Implementation and Operations – 20%

4.1 Build Machine Learning Solutions for Performance, Availability, Scalability, Resiliency, and Fault Tolerance

Performance:

Review the ML Model’s Predictive Performance

Availability:

Deploy Multiple Instances Across Availability Zones

Scalability:

Amazon SageMaker: Infinitely Scalable Machine Learning Algorithms

Review this Whitepaper: Power Machine Learning at Scale

Scaling Machine Learning from 0 to millions of users

Resiliency:

Resiliency in Amazon SageMaker

AWS’ approach to operational resilience

AWS environment logging and monitoring

Logging and monitoring

o CloudTrail and CloudWatch

Logging Amazon ML API calls with CloudTrail

Log SageMaker API Calls with AWS CloudTrail

Monitor Amazon ML with CloudWatch metrics

Monitor SageMaker with CloudWatch

Use CloudWatch metrics to monitor Sagemaker performance

o Build error monitoring

ML platform monitoring

Multiple regions, Multiple AZs

Regions and Endpoints | Amazon Machine Learning

Deploy multiple instances across AZ

AMI/golden image

What is the AWS Deep Learning AMI?

Docker containers

Why use Docker containers for ML development?

Using Docker containers with SageMaker

Machine Learning with containers and SageMaker

Auto Scaling groups

Automatically scale Amazon SageMaker models

Configure autoscaling inference endpoints in SageMaker

Rightsizing

o Instances

Right-sizing resources in Amazon SageMaker

How to choose the right instance type for ML inference?

o Provisioned IOPS

Optimize I/O for performance tuning of deep learning training

o Volumes

Customize your notebook volume size with SageMaker

Load balancing

Manage your ML lifecycle with MLflow & SageMaker

AWS best practices

Machine learning best practices in financial services

4.2 Recommend and Implement the Appropriate Machine Learning Services and Features for a Given Problem

ML on AWS (application services)

o Polly

Amazon Polly

Build a unique brand voice with Amazon Polly

o Lex

What is Amazon Lex?

Build effective conversations on Amazon Lex

o Transcribe

Amazon Transcribe

Transcribe speech-to-text in real-time with Amazon Transcribe

AWS service limits

Amazon SageMaker endpoints & quotas

Amazon Machine Learning endpoints & quotas

System limits in Amazon Machine Learning

Build your own model vs. SageMaker built-in algorithms

Use Amazon SageMaker built-in algorithms

Bring your own custom ML models with SageMaker

Infrastructure: (spot, instance types), cost considerations

Instance types for built-in algorithms

A quick guide to using spot instances with SageMaker

Overview of AWS Machine Learning infrastructure

o Using spot instances to train deep learning models using AWS Batch

Train deep learning models using spot instances

Running cost-effective batch workloads w/ AWS Batch & EC2 Spot Instances

4.3 Apply Basic AWS Security Practices to Machine Learning Solutions

IAM

Control access to ML resources with IAM

IAM in AWS Deep Learning containers

S3 bucket policies

Using S3 with Amazon ML

Grant Amazon ML permissions to read your data from S3

Security groups

Secure multi-account model deployment with SageMaker

Prepare an EFA-enabled security group

VPC

Secure SageMaker Studio connectivity using a private VPC

Direct access to SageMaker notebooks from Amazon VPC

Build secure ML environments with Amazon SageMaker

Encryption/anonymization

Building machine learning models with encrypted data

Protect data at rest using encryption

Protecting data in transit with encryption

Anonymize & manage data in your data lake

4.4 Deploy and Operationalize Machine Learning Solutions

Exposing endpoints and interacting with them

Creating a ML-powered REST API with Amazon SageMaker

Call a SageMaker model endpoint using Amazon API Gateway

ML model versioning

Version control your production ML models

Model versioning

A/B testing

A/B Testing ML models in with Amazon SageMaker

Dynamic A/B testing for ML models with SageMaker

Retrain pipelines

Automating model retraining and deployment

Evolve: Machine Learning Lens

ML debugging/troubleshooting

Debug your Machine Learning models

Analyze ML models using SageMaker Debugger

Troubleshoot Amazon SageMaker model deployments

o Detect and mitigate drop in performance

Identify bottlenecks, improve utilization, & reduce costs with debugger

Optimize I/O for GPU performance of deep learning training

o Monitor performance of the model

Amazon SageMaker model monitor

Monitor in-production ML models using SageMaker model monitor

This brings us to the end of the AWS Certified Machine Learning – Specialty [MLS-C01] Exam Preparation Study Guide.

What do you think? Let me know in the comments section if I have missed out on anything. Also, I love to hear from you about how your preparation is going on!

In case you are preparing for other AWS certification exams, check out the AWS study guides for those exams.

Get Updates on AWS Certified Machine Learning

Want to be notified as soon as I post? Subscribe to the RSS feed / leave your email address in the subscribe section. Share the article to your social networks with the below links so it can benefit others.

AWS Certified Machine Learning Specialty Exam Study Guide [MLS-C01]

AWS Certified Machine Learning Exam

AWS Certified Machine Learning Online Course

AWS Certified Machine Learning Practice Test

AWS Certified Machine Learning Preparation

Domain 1: Data Engineering – 20%

1.1 Create Data Repositories for Machine Learning

Identify data sources (e.g., content and location, primary sources such as user data)

Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)

1.2 Identify and Implement a Data-ingestion Solution

Data job styles/types (batch load, streaming)

Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)

Job scheduling

1.3 Identify and Implement a Data-transformation Solution

Transforming data transit (ETL: Glue, EMR, AWS Batch)

Handle ML-specific data using map reduce (Hadoop, Spark, Hive)

Domain 2: Exploratory Data Analysis – 24%

2.1 Sanitize and Prepare Data for Modeling

Identify and handle missing data, corrupt data, stop words, etc.

Formatting, normalizing, augmenting, and scaling data

Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])

2.2 Perform Feature Engineering

Identify and extract features from data sets, including from data sources such as text, speech, image, public datasets, etc.

Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, 1 hot encoding, reducing dimensionality of data)

2.3 Analyze and Visualize Data for Machine Learning

Graphing (scatter plot, time series, histogram, box plot)

Interpreting descriptive statistics (correlation, summary statistics, p value)

Clustering (hierarchical, diagnosing, elbow plot, cluster size)

Domain 3: Modeling – 36%

3.1 Frame Business Problems as Machine Learning Problems

Determine when to use/when not to use ML

Know the difference between supervised and unsupervised learning

Selecting from among classification, regression, forecasting, clustering, recommendation, etc

3.2 Select the Appropriate Model(s) for a Given Machine Learning Problem

Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning

Express intuition behind models

3.3 Train Machine Learning Models

Train validation test split, cross-validation

Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc.

Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])

Model updates and retraining

3.4 Perform Hyperparameter Optimization

Regularization

Cross validation

Model initialization

Neural network architecture (layers/nodes), learning rate, activation functions

Tree-based models (# of trees, # of levels)

Linear models (learning rate)

3.5 Evaluate Machine Learning Models

Avoid overfitting/underfitting (detect and handle bias and variance)

Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)

Confusion matrix

Offline and online model evaluation, A/B testing

Compare models using metrics (time to train a model, quality of model, engineering costs)

Cross validation

Domain 4: Machine Learning Implementation and Operations – 20%

4.1 Build Machine Learning Solutions for Performance, Availability, Scalability, Resiliency, and Fault Tolerance

AWS environment logging and monitoring

Multiple regions, Multiple AZs

AMI/golden image

Docker containers

Auto Scaling groups

Rightsizing

Load balancing

AWS best practices

4.2 Recommend and Implement the Appropriate Machine Learning Services and Features for a Given Problem

ML on AWS (application services)

AWS service limits

Build your own model vs. SageMaker built-in algorithms

Infrastructure: (spot, instance types), cost considerations

4.3 Apply Basic AWS Security Practices to Machine Learning Solutions

IAM

S3 bucket policies

Security groups

VPC

Encryption/anonymization

4.4 Deploy and Operationalize Machine Learning Solutions

Exposing endpoints and interacting with them

ML model versioning

A/B testing