AWS Certified Machine Learning Exam
Preparing for the AWS Certified Machine Learning Specialty exam? Don’t know where to start? This post is the AWS Certified Machine Learning Certificate Study Guide (with links to each objective in the exam domain).
I have curated a detailed list of articles from AWS documentation and other blogs for each objective of the AWS Certified Machine Learning (MLS-C01) exam. Please share the post within your circles so it helps them to prepare for the exam.
AWS Certified Machine Learning Online Course
Pluralsight (Free Trial) | AWS Certified ML Specialty Learning Path |
LinkedIn Learning (Free Trial) | AWS Machine Learning Essentials [Path] |
Udemy | AWS Certified Machine Learning Course |
AWS Certified Machine Learning Practice Test
Whizlabs Exam Questions | AWS ML [145 Practice Tests & 3 Labs] |
Udemy Practice Test | AWS Cert. ML Specialty 75 Practice Questions |
AWS Certified Machine Learning Preparation
Coursera | Getting Started with AWS Machine Learning |
Amazon e-book (PDF) | Mastering Machine Learning on AWS |
Check out all the other AWS certificate study guides
Full Disclosure: Some of the links in this post are affiliate links. I receive a commission when you purchase through them.
Domain 1: Data Engineering – 20%
1.1 Create Data Repositories for Machine Learning
Identify data sources (e.g., content and location, primary sources such as user data)
Supported data sources in QuickSight
Determine storage mediums (e.g., DB, Data Lake, S3, EFS, EBS)
Using Amazon S3 as a data repository
Amazon Redshift as a data source
Using Amazon RDS Database as an Amazon ML Datasource
Amazon Machine Learning & Amazon Elastic File System
1.2 Identify and Implement a Data-ingestion Solution
Data job styles/types (batch load, streaming)
Difference between batch and stream processing
Data ingestion pipelines (Batch-based ML workloads and streaming-based ML workloads)
o Kinesis
Getting started with Amazon Kinesis data streams
o Kinesis Analytics
o Kinesis Firehose
Build data streaming pipelines with Kinesis data firehose
o EMR
Optimize data processing with Amazon EMR
o Glue
Job scheduling
Time-based schedules for jobs and crawlers
1.3 Identify and Implement a Data-transformation Solution
Transforming data transit (ETL: Glue, EMR, AWS Batch)
How to extract, transform, and load data using AWS Glue
Handle ML-specific data using map reduce (Hadoop, Spark, Hive)
Large-scale ML with Spark on Amazon EMR
Use Apache Spark with SageMaker
Perform interactive data processing using Spark in SageMaker Studio Notebooks
Domain 2: Exploratory Data Analysis – 24%
2.1 Sanitize and Prepare Data for Modeling
Identify and handle missing data, corrupt data, stop words, etc.
Missing values:
Stopwords:
Formatting, normalizing, augmenting, and scaling data
Formatting:
Normalizing:
Augmenting:
Labeled data (recognizing when you have enough labeled data and identifying mitigation strategies [Data labeling tools (Mechanical Turk, manual labor)])
Introduction to Amazon Mechanical Turk
AWS introduces a new way to label data for ML
Use Mechanical Turk with SageMaker for supervised learning
2.2 Perform Feature Engineering
Identify and extract features from data sets, including from data sources such as text, speech, image, public datasets, etc.
Amazon Textract | Extract text data
Analyze/evaluate feature engineering concepts (binning, tokenization, outliers, synthetic features, 1 hot encoding, reducing dimensionality of data)
Quantile binning transformation
Building a tokenization solution to mask sensitive data
ML-Powered anomaly detection for outliers
2.3 Analyze and Visualize Data for Machine Learning
Graphing (scatter plot, time series, histogram, box plot)
Using scatter plots in Amazon QuickSight
Run a query that produces a time series visualization
Histograms in Amazon QuickSight
Interpreting descriptive statistics (correlation, summary statistics, p value)
Clustering (hierarchical, diagnosing, elbow plot, cluster size)
Diagnosis clusters: a new tool for analyzing the content of medical care
Elbow method in K-Means Clustering
Amazon link (affiliate)
Domain 3: Modeling – 36%
3.1 Frame Business Problems as Machine Learning Problems
Determine when to use/when not to use ML
When not to use machine learning?
Know the difference between supervised and unsupervised learning
Supervised & unsupervised learning
Differences between supervised & unsupervised learning
Selecting from among classification, regression, forecasting, clustering, recommendation, etc
Difference between classification & regression
5 machine learning techniques for forecasting
K-means clustering with SageMaker
Building a customized recommender system in SageMaker
3.2 Select the Appropriate Model(s) for a Given Machine Learning Problem
Xgboost, logistic regression, K-means, linear regression, decision trees, random forests, RNN, CNN, Ensemble, Transfer learning
XGBoost algorithm in Amazon SageMaker
Logistic regression detailed overview
K-means clustering with SageMaker
Linear regression with Amazon Machine Learning
An introduction to Decision Tree
Random Forest: A complete guide
What are Recurrent Neural Networks?
Forecast financial time series with RNN
Use CNN to train forecasting models
Ensemble methods in Machine Learning
Transfer Learning: Machine Learning’s next frontier
Detecting hidden problems in transfer learning models
Express intuition behind models
Why do you need to explain machine learning models?
How to explain your ML models?
3.3 Train Machine Learning Models
Train validation test split, cross-validation
Training a model:
Validating models:
Optimizer, gradient descent, loss functions, local minima, convergence, batches, probability, etc.
Estimators, Loss Functions, Optimizers: Core of ML algorithms
Gradient Descent: A quick, simple introduction
Understanding loss functions in Machine Learning
Local & global minima explained with examples
What is convergence in machine learning?
When & why are batches used in machine learning?
Understand the applications of probability in ML
Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs. non-Spark])
CPU vs. GPU for ML algorithms: Which is better?
Distributed ML vs. Federated Learning: Which is better?
Why you should use Spark for machine learning?
Introduction to Apache Spark & Analytics
Model updates and retraining
Model retraining and deployment
3.4 Perform Hyperparameter Optimization
Regularization
Regularization type and amount
Cross validation
Cross validation and hyperParameter tuning
Cross-validate your with SageMaker
Model initialization
Initialize a neural network with random weights
Model Initialization: AWS DeepRacer
Neural network architecture (layers/nodes), learning rate, activation functions
Neural network architecture, components & algorithms
Configure the no. of. layers and nodes in a Neural Network
Impact of learning rate on Neural Network performance
Activation functions in Neural Networks
Tree-based models (# of trees, # of levels)
Tree-based models: How they work?
Linear models (learning rate)
Learning rate in regression models
Gradient descent in practice: Learning rate
3.5 Evaluate Machine Learning Models
Avoid overfitting/underfitting (detect and handle bias and variance)
Model fit: Underfitting vs. overfitting
Underfitting, overfitting and its solution
Bias-variance tradeoff in Machine Learning
SageMaker Clarify detects bias
Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)
Accuracy, Precision, Recall or F1?
Confusion matrix
Simple guide to confusion matrix
Confusion Matrix | Amazon Comprehend
Offline and online model evaluation, A/B testing
Fundamentals of Machine Learning model evaluation
Validate a Machine Learning model
Model evaluation & business evaluation
Compare models using metrics (time to train a model, quality of model, engineering costs)
Visualize metrics while training models
Model Quality Metrics | Amazon SageMaker
Cross validation
Domain 4: Machine Learning Implementation and Operations – 20%
4.1 Build Machine Learning Solutions for Performance, Availability, Scalability, Resiliency, and Fault Tolerance
Performance:
Review the ML Model’s Predictive Performance
Availability:
Deploy Multiple Instances Across Availability Zones
Scalability:
Resiliency:
AWS environment logging and monitoring
o Build error monitoring
Multiple regions, Multiple AZs
Regions and Endpoints | Amazon Machine Learning
Deploy multiple instances across AZ
AMI/golden image
What is the AWS Deep Learning AMI?
Docker containers
Why use Docker containers for ML development?
Using Docker containers with SageMaker
Machine Learning with containers and SageMaker
Auto Scaling groups
Automatically scale Amazon SageMaker models
Configure autoscaling inference endpoints in SageMaker
Rightsizing
Load balancing
Manage your ML lifecycle with MLflow & SageMaker
AWS best practices
Machine learning best practices in financial services
4.2 Recommend and Implement the Appropriate Machine Learning Services and Features for a Given Problem
ML on AWS (application services)
AWS service limits
Amazon SageMaker endpoints & quotas
Amazon Machine Learning endpoints & quotas
System limits in Amazon Machine Learning
Build your own model vs. SageMaker built-in algorithms
Use Amazon SageMaker built-in algorithms
Bring your own custom ML models with SageMaker
Infrastructure: (spot, instance types), cost considerations
Instance types for built-in algorithms
A quick guide to using spot instances with SageMaker
Overview of AWS Machine Learning infrastructure
o Using spot instances to train deep learning models using AWS Batch
Train deep learning models using spot instances
Running cost-effective batch workloads w/ AWS Batch & EC2 Spot Instances
4.3 Apply Basic AWS Security Practices to Machine Learning Solutions
IAM
Control access to ML resources with IAM
IAM in AWS Deep Learning containers
S3 bucket policies
Grant Amazon ML permissions to read your data from S3
Security groups
Secure multi-account model deployment with SageMaker
Prepare an EFA-enabled security group
VPC
Secure SageMaker Studio connectivity using a private VPC
Direct access to SageMaker notebooks from Amazon VPC
Build secure ML environments with Amazon SageMaker
Encryption/anonymization
Building machine learning models with encrypted data
Protect data at rest using encryption
Protecting data in transit with encryption
Anonymize & manage data in your data lake
4.4 Deploy and Operationalize Machine Learning Solutions
Exposing endpoints and interacting with them
Creating a ML-powered REST API with Amazon SageMaker
Call a SageMaker model endpoint using Amazon API Gateway
ML model versioning
Version control your production ML models
A/B testing
A/B Testing ML models in with Amazon SageMaker
Dynamic A/B testing for ML models with SageMaker
Retrain pipelines
Automating model retraining and deployment
ML debugging/troubleshooting
Debug your Machine Learning models
Analyze ML models using SageMaker Debugger
Troubleshoot Amazon SageMaker model deployments
o Detect and mitigate drop in performance
Identify bottlenecks, improve utilization, & reduce costs with debugger
Optimize I/O for GPU performance of deep learning training
o Monitor performance of the model
Amazon SageMaker model monitor
Monitor in-production ML models using SageMaker model monitor
This brings us to the end of the AWS Certified Machine Learning – Specialty [MLS-C01] Exam Preparation Study Guide.
What do you think? Let me know in the comments section if I have missed out on anything. Also, I love to hear from you about how your preparation is going on!
In case you are preparing for other AWS certification exams, check out the AWS study guides for those exams.
Get Updates on AWS Certified Machine Learning
Want to be notified as soon as I post? Subscribe to the RSS feed / leave your email address in the subscribe section. Share the article to your social networks with the below links so it can benefit others.