MLA-C01 Study Guide | AWS ML Engineer Associate

MLA-C01 Preparation Details

Preparing for the MLA-C01 AWS Certified Machine Learning Engineer Associate certification exam? Start here with a complete, objective-wise MLA-C01 study guide designed to help you pass faster.

This guide brings together official AWS documentation, key concepts, and curated resources for every MLA-C01 exam objective, making it ideal for both beginners and last-minute revision.

Looking for the best MLA-C01 preparation resources in one place? This page covers everything you need to get exam-ready with confidence.

If this helped you, share it with others preparing for the MLA-C01 certification exam.

AWS Machine Learning Engineer Prep

Coursera	AWS Machine Learning Engineer Associate Exam Prep
Udemy	AWS Certified Machine Learning Engineer Hands On!
Whizlabs	AWS Certified Machine Learning Engineer

Content Domain 1: Data Preparation for Machine Learning (ML)

Task 1.1: Ingest and store data

Knowledge of: Data formats and ingestion mechanisms (for example, validated and non-validated formats, Apache Parquet, JSON, CSV, Apache ORC, Apache Avro, RecordIO)

AWS Glue supported data formats and compression formats

Amazon SageMaker Feature Store offline store data format

AWS Cloud Data Ingestion Patterns and Practices

Knowledge of: How to use the core AWS data sources (for example, Amazon S3, Amazon EFS, Amazon FSx for NetApp ONTAP)

What is Amazon Simple Storage Service?

What is Amazon Elastic File System?

What is Amazon FSx for NetApp ONTAP?

Knowledge of: How to use AWS streaming data sources to ingest data (for example, Amazon Kinesis, Apache Flink, Apache Kafka)

What is Amazon Kinesis Data Streams?

Welcome to the Amazon MSK Developer Guide

What is Amazon Managed Service for Apache Flink?

Data sources and ingestion – Amazon SageMaker AI Feature Store

Knowledge of: AWS storage options, including use cases and tradeoffs

Choosing an AWS storage service

Machine Learning Lens – AWS Well-Architected Framework

Data Analytics Lens – AWS Well-Architected Framework

Skills in: Extracting data from storage (for example, Amazon S3, Amazon EBS, Amazon EFS, Amazon RDS, Amazon DynamoDB) by using relevant AWS service options (for example, Amazon S3 Transfer Acceleration, Amazon EBS Provisioned IOPS)

S3 Transfer Acceleration

Amazon EBS volume types

What is Amazon Relational Database Service?

Amazon DynamoDB Developer Guide

Skills in: Choosing appropriate data formats (for example, Parquet, JSON, CSV, ORC) based on data access patterns

AWS Glue supported data formats and compression formats

Best practices design patterns: Optimizing Amazon S3 performance

Skills in: Ingesting data into Amazon SageMaker Data Wrangler and SageMaker Feature Store

Prepare ML data with Amazon SageMaker Data Wrangler

Create, store, and share features with Feature Store – Amazon SageMaker AI

Data sources and ingestion – Amazon SageMaker AI Feature Store

Skills in: Merging data from multiple sources (for example, by using programming techniques, AWS Glue, Apache Spark)

What is AWS Glue?

Apache Spark on Amazon EMR clusters

Connection types and options for ETL in AWS Glue

Skills in: Troubleshooting and debugging data ingestion and storage issues that involve capacity and scalability

Troubleshoot AWS Glue

Monitoring AWS Glue jobs

Amazon Kinesis Data Streams quotas and limits

Skills in: Making initial storage decisions based on cost, performance, and data structure

Choosing an AWS storage service

Storage options for Amazon SageMaker AI – Machine Learning Lens

Task 1.2: Transform data and perform feature engineering

Knowledge of: Data cleaning and transformation techniques (for example, detecting and treating outliers, imputing missing data, combining, deduplication)

What is AWS Glue DataBrew?

Prepare ML data with Amazon SageMaker Data Wrangler

Recommendations for choosing the right data preparation tool in SageMaker AI

Knowledge of: Feature engineering techniques (for example, data scaling and standardization, feature splitting, binning, log transformation, normalization)

Prepare ML data with Amazon SageMaker Data Wrangler

Create, store, and share features with Feature Store – Amazon SageMaker AI

Machine Learning Lens – AWS Well-Architected Framework

Knowledge of: Encoding techniques (for example, one-hot encoding, binary encoding, label encoding, tokenization)

Prepare ML data with Amazon SageMaker Data Wrangler

What is AWS Glue DataBrew?

Built-in algorithms and pretrained models in Amazon SageMaker

Knowledge of: Tools to explore, visualize, or transform data and features (for example, SageMaker Data Wrangler, AWS Glue, AWS Glue DataBrew)

Prepare ML data with Amazon SageMaker Data Wrangler

What is AWS Glue DataBrew?

What is AWS Glue?

Recommendations for choosing the right data preparation tool in SageMaker AI

Knowledge of: Services that transform streaming data (for example, AWS Lambda, Spark)

What is AWS Lambda?

Using AWS Lambda with Amazon Kinesis – AWS Lambda

Apache Spark on Amazon EMR clusters

What is Amazon Managed Service for Apache Flink?

Knowledge of: Data annotation and labeling services that create high-quality labeled datasets

Use Amazon SageMaker Ground Truth to Label Data

Use Amazon SageMaker Ground Truth Plus to Label Data

Amazon Augmented AI (Amazon A2I) Developer Guide

Skills in: Transforming data by using AWS tools (for example, AWS Glue, DataBrew, Spark running on Amazon EMR, SageMaker Data Wrangler)

Prepare ML data with Amazon SageMaker Data Wrangler

What is AWS Glue DataBrew?

What is AWS Glue?

Apache Spark on Amazon EMR clusters

Skills in: Creating and managing features by using AWS tools (for example, SageMaker Feature Store)

Create, store, and share features with Feature Store – Amazon SageMaker AI

Data sources and ingestion – Amazon SageMaker AI Feature Store

Amazon SageMaker Feature Store offline store data format

Skills in: Validating and labeling data by using AWS services (for example, SageMaker Ground Truth, Amazon Mechanical Turk)

Use Amazon SageMaker Ground Truth to Label Data

Use Amazon SageMaker Ground Truth Plus to Label Data

Amazon Augmented AI (Amazon A2I) Developer Guide

Task 1.3: Ensure data integrity and prepare data for modeling

Knowledge of: Pre-training bias metrics for numeric, text, and image data (for example, class imbalance [CI], difference in proportions of labels [DPL])

Detect Pre-training Data Bias – Amazon SageMaker AI

Pre-training Bias Metrics – Amazon SageMaker AI

What Is Fairness and Model Explainability for Machine Learning Predictions? – Amazon SageMaker

Knowledge of: Strategies to address CI in numeric, text, and image datasets (for example, synthetic data generation, resampling)

Amazon SageMaker Autopilot – Handle Imbalanced Data

Fairness, model explainability and bias detection with SageMaker Clarify

Use Amazon SageMaker Ground Truth to Label Data

Knowledge of: Techniques to encrypt data

AWS Key Management Service Developer Guide

Protect data at rest using encryption – Amazon SageMaker AI

Securing, protecting, and managing data – Storage Best Practices for Data and Analytics Applications

Knowledge of: Data classification, anonymization, and masking

Amazon Macie User Guide

Dynamic data masking in Amazon Redshift

AWS Privacy Reference Architecture – AWS Prescriptive Guidance

Knowledge of: Implications of compliance requirements (for example, PII, PHI, data residency)

Data residency – AWS Whitepaper

Amazon Macie User Guide

AWS Privacy Reference Architecture – AWS Prescriptive Guidance

Skills in: Validating data quality (for example, by using DataBrew and AWS Glue Data Quality)

What is AWS Glue DataBrew?

AWS Glue Data Quality

Profiling data with AWS Glue DataBrew

Skills in: Identifying and mitigating sources of bias in data (for example, selection bias, measurement bias) by using AWS tools (for example, SageMaker Clarify)

Detect Pre-training Data Bias – Amazon SageMaker AI

Fairness, model explainability and bias detection with SageMaker Clarify

Pre-training Bias Metrics – Amazon SageMaker AI

Skills in: Preparing data to reduce prediction bias (for example, by using dataset splitting, shuffling, and augmentation)

Amazon SageMaker Autopilot – Handle Imbalanced Data

Use Amazon SageMaker Ground Truth to Label Data

Prepare – Machine Learning Lens – AWS Well-Architected Framework

Skills in: Configuring data to load into the model training resource (for example, Amazon EFS, Amazon FSx)

What is Amazon Elastic File System?

Amazon FSx for Lustre User Guide

Use File Systems in Amazon SageMaker Training Jobs

Storage options for Amazon SageMaker AI – Machine Learning Lens

Content Domain 2: ML Model Development

Task 2.1: Choose a modeling approach

Knowledge of: Capabilities and appropriate uses of ML algorithms to solve business problems

Types of Algorithms – Amazon SageMaker AI

Built-in algorithms and pretrained models in Amazon SageMaker

Machine Learning Lens – AWS Well-Architected Framework

Knowledge of: How to use AWS artificial intelligence (AI) services (for example, Amazon Translate, Amazon Transcribe, Amazon Rekognition, Amazon Bedrock) to solve specific business problems

Choosing an AWS machine learning service

What is Amazon Rekognition?

What is Amazon Transcribe?

What is Amazon Translate?

What is Amazon Bedrock?

Knowledge of: How to consider interpretability during model selection or algorithm selection

What Is Fairness and Model Explainability for Machine Learning Predictions? – Amazon SageMaker

Fairness, model explainability and bias detection with SageMaker Clarify

Amazon SageMaker Model Cards

Knowledge of: Amazon SageMaker AI built-in algorithms and when to apply them

Built-in algorithms and pretrained models in Amazon SageMaker

Types of Algorithms – Amazon SageMaker AI

Use Amazon SageMaker Built-in Algorithms or Pre-trained Models

Skills in: Assessing available data and problem complexity to determine the feasibility of an ML solution

Choosing an AWS machine learning service

Prepare – Machine Learning Lens – AWS Well-Architected Framework

Recommendations for choosing the right data preparation tool in SageMaker AI

Skills in: Comparing and selecting appropriate ML models or algorithms to solve specific problems

Amazon Bedrock or Amazon SageMaker AI?

Choosing an AWS machine learning service

Types of Algorithms – Amazon SageMaker AI

Skills in: Choosing built-in algorithms, foundation models, and solution templates (for example, in SageMaker JumpStart and Amazon Bedrock)

Amazon SageMaker JumpStart pretrained models

What is Amazon Bedrock?

Supported foundation models in Amazon Bedrock

Skills in: Selecting models or algorithms based on costs

Amazon Bedrock pricing

Amazon SageMaker pricing

Understanding intelligent prompt routing in Amazon Bedrock

Skills in: Selecting AI services to solve common business needs

Choosing an AWS machine learning service

Machine Learning (ML) and Artificial Intelligence (AI) – Overview of Amazon Web Services

Task 2.2: Train and refine models

Knowledge of: Elements in the training process (for example, epoch, steps, batch size)

Train a Model – Amazon SageMaker AI

Distributed training in Amazon SageMaker AI

Machine Learning Lens – AWS Well-Architected Framework

Knowledge of: Methods to reduce model training time (for example, early stopping, distributed training)

Stop Training Jobs Early – Amazon SageMaker AI

Distributed training in Amazon SageMaker AI

SageMaker distributed model parallel best practices

Knowledge of: Factors that influence model size

Model compression overview – Amazon SageMaker AI

Distributed training in Amazon SageMaker AI

Optimized generative AI inference recommendations – Amazon SageMaker AI

Knowledge of: Methods to improve model performance

Automatic model tuning with SageMaker AI

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Machine Learning Lens – AWS Well-Architected Framework

Knowledge of: Benefits of regularization techniques (for example, dropout, weight decay, L1 and L2)

Prevent Overfitting in Machine Learning – Amazon SageMaker AI

Amazon SageMaker Model Monitor

AWS Cloud Adoption Framework for AI, ML, and Generative AI

Knowledge of: Hyperparameter tuning techniques (for example, random search, Bayesian optimization)

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Automatic model tuning with SageMaker AI

How hyperparameter tuning works – Amazon SageMaker AI

Knowledge of: Model hyperparameters and their effects on model performance (for example, number of trees in a tree-based model, number of layers in a neural network)

Built-in algorithms and pretrained models in Amazon SageMaker

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Train a Model – Amazon SageMaker AI

Knowledge of: Methods to integrate models that were built outside SageMaker AI into SageMaker AI

Use your own training code – Amazon SageMaker AI

Custom model import – Amazon Bedrock

Amazon SageMaker Model Registry

Skills in: Using SageMaker AI built-in algorithms and common ML libraries to develop ML models

Built-in algorithms and pretrained models in Amazon SageMaker

Train a Model – Amazon SageMaker AI

Amazon SageMaker Python SDK

Skills in: Using SageMaker AI script mode with SageMaker AI supported frameworks to train models (for example, TensorFlow, PyTorch)

Train with script mode using the SageMaker Python SDK

Use TensorFlow with the SageMaker Python SDK

Use PyTorch with the SageMaker Python SDK

Skills in: Using custom datasets to fine-tune pre-trained models (for example, Amazon Bedrock, SageMaker JumpStart)

Customize an Amazon Bedrock model by fine-tuning

Amazon SageMaker JumpStart pretrained models

Fine-tune Amazon Bedrock models

Skills in: Performing hyperparameter tuning (for example, by using SageMaker AI automatic model tuning [AMT])

Automatic model tuning with SageMaker AI

Stop Training Jobs Early – Amazon SageMaker AI

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Skills in: Integrating automated hyperparameter optimization capabilities

Automatic model tuning with SageMaker AI

Amazon SageMaker Autopilot

Skills in: Preventing model overfitting, underfitting, and catastrophic forgetting (for example, by using regularization techniques, feature selection)

Detect Pre-training Data Bias – Amazon SageMaker AI

Amazon SageMaker Model Monitor

Stop Training Jobs Early – Amazon SageMaker AI

Skills in: Combining multiple training models to improve performance (for example, ensembling, stacking, boosting)

Amazon SageMaker Autopilot

Built-in algorithms and pretrained models in Amazon SageMaker

Machine Learning Lens – AWS Well-Architected Framework

Skills in: Reducing model size (for example, by altering data types, pruning, updating feature selection, compression)

Model compression overview – Amazon SageMaker AI

Optimized generative AI inference recommendations – Amazon SageMaker AI

Skills in: Managing model versions for repeatability and audits (for example, by using the SageMaker Model Registry)

Amazon SageMaker Model Registry

Amazon SageMaker Experiments – Manage ML experiments

Task 2.3: Analyze model performance

Knowledge of: Model evaluation techniques and metrics (for example, confusion matrix, heat maps, F1 score, accuracy, precision, recall, RMSE, ROC, AUC)

MLPER-03: Define relevant evaluation metrics – Machine Learning Lens

Evaluate the model – Amazon SageMaker AI

Metrics and validation – Amazon SageMaker Autopilot

Choose the best performing model using Amazon Bedrock evaluations

Knowledge of: Methods to create performance baselines

Amazon SageMaker Model Monitor

Amazon SageMaker Experiments – Manage ML experiments

MLPER-03: Define relevant evaluation metrics – Machine Learning Lens

Knowledge of: Methods to identify model overfitting and underfitting

Detect Pre-training Data Bias – Amazon SageMaker AI

Stop Training Jobs Early – Amazon SageMaker AI

Amazon SageMaker Model Monitor

Knowledge of: Metrics available in SageMaker Clarify to gain insights into ML training data and models

Fairness, model explainability and bias detection with SageMaker Clarify

What Is Fairness and Model Explainability for Machine Learning Predictions? – Amazon SageMaker

Pre-training Bias Metrics – Amazon SageMaker AI

Knowledge of: Convergence issues

SageMaker Training Compiler Troubleshooting

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Distributed training in Amazon SageMaker AI

Skills in: Selecting and interpreting evaluation metrics and detecting model bias

MLPER-03: Define relevant evaluation metrics – Machine Learning Lens

Fairness, model explainability and bias detection with SageMaker Clarify

Post-training Data and Model Bias Metrics – Amazon SageMaker AI

Skills in: Assessing tradeoffs between model performance, training time, and cost

Amazon Bedrock pricing

MLPER-03: Define relevant evaluation metrics – Machine Learning Lens

Understand the hyperparameter tuning strategies available in Amazon SageMaker AI

Skills in: Performing reproducible experiments by using AWS services

Amazon SageMaker Experiments – Manage ML experiments

Amazon SageMaker Pipelines

Amazon SageMaker Model Registry

Skills in: Comparing the performance of a shadow variant to the performance of a production variant

SageMaker shadow testing overview – Amazon SageMaker AI

Create a shadow test in Amazon SageMaker AI

Skills in: Using SageMaker Clarify to interpret model outputs

Fairness, model explainability and bias detection with SageMaker Clarify

What Is Fairness and Model Explainability for Machine Learning Predictions? – Amazon SageMaker

Feature Attributions that Use Shapley Values – Amazon SageMaker AI

Skills in: Using SageMaker Model Debugger to debug model convergence

Amazon SageMaker Debugger

Use SageMaker Debugger to debug model training

SageMaker Debugger built-in rules – Amazon SageMaker AI

Content Domain 3: Deployment and Orchestration of ML Workflows

Task 3.1: Select deployment infrastructure based on existing architecture and requirements

Knowledge of: Deployment best practices (for example, versioning, rollback strategies)

Deploy models for inference – Amazon SageMaker AI

Amazon SageMaker Model Registry

MLPER-12: Choose an optimal deployment option in the cloud – Machine Learning Lens

Knowledge of: AWS deployment services (for example, Amazon SageMaker AI)

Deploy models for inference – Amazon SageMaker AI

Model Hosting FAQs – Amazon SageMaker AI

Supported features – Amazon SageMaker AI inference options

Knowledge of: Methods to serve ML models in real time and in batches

Deploy models for real-time inference – Amazon SageMaker AI

Use Batch Transform to Get Inferences from Large Datasets – Amazon SageMaker AI

Deploy models with Amazon SageMaker Asynchronous Inference

Knowledge of: How to provision compute resources in production environments and test environments (for example, CPU, GPU)

Amazon SageMaker AI instance types

Optimized generative AI inference recommendations – Amazon SageMaker AI

Inference cost optimization best practices – Amazon SageMaker AI

Knowledge of: Model and endpoint requirements for deployment endpoints (for example, serverless endpoints, real-time endpoints, asynchronous endpoints, batch inference)

Supported features – Amazon SageMaker AI inference options

Deploy models with Amazon SageMaker Serverless Inference

Deploy models with Amazon SageMaker Asynchronous Inference

MLPER-12: Choose an optimal deployment option in the cloud – Machine Learning Lens

Knowledge of: How to choose appropriate containers (for example, provided or customized)

Use Your Own Inference Code with Hosting Services – Amazon SageMaker AI

Adapting your own Docker container to work with SageMaker AI

Amazon SageMaker AI pre-built Docker images

Knowledge of: Methods to optimize models on edge devices (for example, SageMaker Neo)

Optimize Machine Learning Models with SageMaker Neo

Getting Started with Edge Manager – Amazon SageMaker AI

Skills in: Evaluating performance, cost, and latency tradeoffs

Inference cost optimization best practices – Amazon SageMaker AI

MLPER-12: Choose an optimal deployment option in the cloud – Machine Learning Lens

Model Hosting FAQs – Amazon SageMaker AI

Skills in: Choosing the appropriate compute environment for training and inference based on requirements (for example, GPU or CPU specifications, processor family, networking bandwidth)

Optimized generative AI inference recommendations – Amazon SageMaker AI

Amazon SageMaker AI instance types

Inference cost optimization best practices – Amazon SageMaker AI

Skills in: Selecting the correct deployment orchestrator (for example, Apache Airflow, SageMaker Pipelines)

Amazon SageMaker Pipelines

What Is Amazon Managed Workflows for Apache Airflow?

Skills in: Selecting multi-model or multi-container deployments

Multi-model endpoints – Amazon SageMaker AI

Multi-container endpoints – Amazon SageMaker AI

Skills in: Selecting the correct deployment target (for example, SageMaker AI endpoints, Kubernetes, Amazon ECS, Amazon EKS, AWS Lambda)

Deploy models for inference – Amazon SageMaker AI

What is Amazon Elastic Kubernetes Service?

What is Amazon Elastic Container Service?

What is AWS Lambda?

Skills in: Choosing model deployment strategies (for example, real time, batch)

MLPER-12: Choose an optimal deployment option in the cloud – Machine Learning Lens

Supported features – Amazon SageMaker AI inference options

Inference cost optimization best practices – Amazon SageMaker AI

Task 3.2: Create and script infrastructure based on existing architecture and requirements

Knowledge of: Difference between on-demand and provisioned resources

Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock

Deploy models with Amazon SageMaker Serverless Inference

Inference cost optimization best practices – Amazon SageMaker AI

Knowledge of: How to compare scaling policies

Auto scaling policy overview – Amazon SageMaker AI

Automatic scaling of Amazon SageMaker AI models

What is Application Auto Scaling?

Knowledge of: Tradeoffs and use cases of infrastructure as code (IaC) options (for example, AWS CloudFormation, AWS CDK)

What is AWS CloudFormation?

Getting started with the AWS CDK

AWS CDK vs AWS CloudFormation – AWS Prescriptive Guidance

Knowledge of: Containerization concepts and AWS container services

What is Amazon Elastic Container Registry?

What is Amazon Elastic Container Service?

What is Amazon Elastic Kubernetes Service?

Adapting your own Docker container to work with SageMaker AI

Knowledge of: How to use SageMaker AI endpoint auto scaling policies to meet scalability requirements (for example, based on demand, time)

Automatic scaling of Amazon SageMaker AI models

Auto scaling policy overview – Amazon SageMaker AI

Autoscale an asynchronous endpoint – Amazon SageMaker AI

Skills in: Applying best practices to enable maintainable, scalable, and cost-effective ML solutions (for example, automatic scaling on SageMaker AI endpoints, dynamically adding Spot Instances, by using Amazon EC2 instances, by using Lambda behind the endpoints)

Automatic scaling of Amazon SageMaker AI models

Managed Spot Training in Amazon SageMaker AI

Inference cost optimization best practices – Amazon SageMaker AI

Skills in: Automating the provisioning of compute resources, including communication between stacks (for example, by using CloudFormation, AWS CDK)

What is AWS CloudFormation?

Getting started with the AWS CDK

Nested stacks – AWS CloudFormation

Skills in: Building and maintaining containers (for example, Amazon ECR, Amazon EKS, Amazon ECS, by using BYOC with SageMaker AI)

What is Amazon Elastic Container Registry?

Adapting your own Docker container to work with SageMaker AI

What is Amazon Elastic Kubernetes Service?

Skills in: Configuring SageMaker AI endpoints within the VPC network

Give SageMaker AI access to resources in your Amazon VPC

Use SageMaker AI with VPC endpoints – Amazon SageMaker AI

Skills in: Deploying and hosting models by using the SageMaker AI SDK

Deploy models for inference – Amazon SageMaker AI

Amazon SageMaker Python SDK

Skills in: Choosing specific metrics for auto scaling (for example, model latency, CPU utilization, invocations per instance)

Auto scaling policy overview – Amazon SageMaker AI

Automatic scaling of Amazon SageMaker AI models

Monitor Amazon SageMaker with Amazon CloudWatch

Task 3.3: Use automated orchestration tools to set up CI/CD pipelines

Knowledge of: Capabilities and quotas for AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy

What is AWS CodePipeline?

What is AWS CodeBuild?

What is AWS CodeDeploy?

Knowledge of: Automation and integration of data ingestion with orchestration services

Amazon SageMaker Pipelines

What is Amazon EventBridge?

What Is Amazon Managed Workflows for Apache Airflow?

Knowledge of: Version control systems and basic usage (for example, Git)

What is AWS CodeCommit?

Source control integrations for AWS CodePipeline

Knowledge of: CI/CD principles and how they fit into ML workflows

Amazon SageMaker Pipelines

MLOps – Machine Learning Lens

AWS Cloud Adoption Framework for AI, ML, and Generative AI

Knowledge of: Deployment strategies and rollback actions (for example, blue/green, canary, linear)

CodeDeploy deployment configurations

Blue/green deployments on AWS – AWS Whitepaper

SageMaker shadow testing overview – Amazon SageMaker AI

Knowledge of: How code repositories and pipelines work together

What is AWS CodePipeline?

Source control integrations for AWS CodePipeline

Amazon SageMaker Pipelines

Skills in: Configuring and troubleshooting CodeBuild, CodeDeploy, and CodePipeline, including stages

What is AWS CodePipeline?

What is AWS CodeBuild?

Troubleshoot CodePipeline

Skills in: Applying continuous deployment flow structures to invoke pipelines (for example, Gitflow, GitHub Flow)

Source control integrations for AWS CodePipeline

What is AWS CodeCommit?

What is AWS CodePipeline?

Skills in: Using AWS services to automate orchestration (for example, to deploy ML models, automate model building)

Amazon SageMaker Pipelines

What is AWS CodePipeline?

AWS Step Functions Developer Guide

Skills in: Configuring training and inference jobs (for example, by using Amazon EventBridge rules, SageMaker Pipelines, CodePipeline)

Amazon SageMaker Pipelines

Run Amazon SageMaker Pipelines jobs from EventBridge

What is AWS CodePipeline?

Skills in: Creating automated tests in CI/CD pipelines (for example, integration tests, unit tests, end-to-end tests)

What is AWS CodeBuild?

Testing and validations – Amazon SageMaker AI

SageMaker shadow testing overview – Amazon SageMaker AI

Skills in: Building and integrating mechanisms to retrain models

Amazon SageMaker Pipelines

Amazon SageMaker Model Monitor

MLOps – Machine Learning Lens

Content Domain 4: ML Solution Monitoring, Maintenance, and Security

Task 4.1: Monitor model inference

Skills in: Monitoring models in production (for example, by using Amazon SageMaker Model Monitor)

Amazon SageMaker Model Monitor

Data and model quality monitoring with Amazon SageMaker Model Monitor

Model Monitor FAQs – Amazon SageMaker AI

Skills in: Monitoring workflows to detect anomalies or errors in data processing or model inference

Amazon SageMaker Model Monitor

Using Amazon CloudWatch alarms

Amazon SageMaker Pipelines

Skills in: Detecting changes in the distribution of data that can affect model performance (for example, by using SageMaker Clarify)

Bias drift for models in production – Amazon SageMaker AI

Feature attribution drift for models in production – Amazon SageMaker AI

Fairness, model explainability and bias detection with SageMaker Clarify

Skills in: Monitoring model performance in production by using A/B testing

SageMaker shadow testing overview – Amazon SageMaker AI

Safely validate models in production – Amazon SageMaker AI

MLPERF06-BP04 Monitor, detect, and handle model performance degradation – Machine Learning Lens

Task 4.2: Monitor and optimize infrastructure and costs

Knowledge of: Key performance metrics for ML infrastructure (for example, utilization, throughput, availability, scalability, fault tolerance)

Monitor Amazon SageMaker with Amazon CloudWatch

Amazon SageMaker AI runtime metrics

Performance Efficiency Pillar – AWS Well-Architected Framework

Knowledge of: Monitoring and observability tools to troubleshoot latency and performance issues (for example, AWS X-Ray, Amazon CloudWatch Lambda Insights, Amazon CloudWatch Logs Insights)

AWS X-Ray Developer Guide

Using Lambda Insights in Amazon CloudWatch

Analyzing log data with CloudWatch Logs Insights

Knowledge of: How to use AWS CloudTrail to log, monitor, and invoke re-training activities

What is AWS CloudTrail?

Log Amazon SageMaker API calls with AWS CloudTrail

Amazon SageMaker Pipelines

Knowledge of: Differences between instance types and how they affect performance (for example, memory optimized, compute optimized, general purpose, inference optimized)

Amazon SageMaker AI instance types

Optimized generative AI inference recommendations – Amazon SageMaker AI

Inference cost optimization best practices – Amazon SageMaker AI

Knowledge of: Capabilities of cost analysis tools (for example, AWS Cost Explorer, AWS Billing and Cost Management, AWS Trusted Advisor)

What is AWS Cost Explorer?

What is AWS Billing and Cost Management?

What is AWS Trusted Advisor?

Knowledge of: Cost tracking and allocation techniques (for example, resource tagging)

Tagging your Amazon SageMaker AI resources

Using AWS cost allocation tags

Skills in: Configuring and using tools to troubleshoot and analyze resources (for example, CloudWatch Logs, CloudWatch alarms)

What is Amazon CloudWatch Logs?

Using Amazon CloudWatch alarms

Monitor Amazon SageMaker with Amazon CloudWatch

Skills in: Creating CloudTrail trails

Creating a trail for an organization – AWS CloudTrail

Log Amazon SageMaker API calls with AWS CloudTrail

Skills in: Setting up dashboards to monitor performance metrics (for example, by using Amazon QuickSight, CloudWatch dashboards)

Using Amazon CloudWatch dashboards

What is Amazon QuickSight?

Skills in: Monitoring infrastructure (for example, by using Amazon EventBridge events)

What is Amazon EventBridge?

Run Amazon SageMaker Pipelines jobs from EventBridge

Monitor Amazon SageMaker with Amazon CloudWatch

Skills in: Rightsizing instance families and sizes (for example, by using SageMaker AI Inference Recommender and AWS Compute Optimizer)

SageMaker AI Inference Recommender

AWS Compute Optimizer User Guide

Skills in: Monitoring and resolving latency and scaling issues

Auto scaling policy overview – Amazon SageMaker AI

Automatic scaling of Amazon SageMaker AI models

Monitor Amazon SageMaker with Amazon CloudWatch

Skills in: Preparing infrastructure for cost monitoring (for example, by applying a tagging strategy)

Tagging your Amazon SageMaker AI resources

Using AWS cost allocation tags

AWS Tagging Best Practices – AWS Prescriptive Guidance

Skills in: Troubleshooting capacity concerns that involve cost and performance (for example, provisioned concurrency, service quotas, auto scaling)

Amazon SageMaker AI service quotas

Automatic scaling of Amazon SageMaker AI models

Automatically scale Provisioned Concurrency for a serverless endpoint – Amazon SageMaker AI

Skills in: Optimizing costs and setting cost quotas by using appropriate cost management tools (for example, AWS Cost Explorer, AWS Trusted Advisor, AWS Budgets)

What is AWS Cost Explorer?

What is AWS Trusted Advisor?

Managing your costs with AWS Budgets

Skills in: Optimizing infrastructure costs by selecting purchasing options (for example, Spot Instances, On-Demand Instances, Reserved Instances, SageMaker AI Savings Plans)

Managed Spot Training in Amazon SageMaker AI

Inference cost optimization best practices – Amazon SageMaker AI

SageMaker AI Savings Plans

Task 4.3: Secure AWS resources

Knowledge of: IAM roles, policies, and groups that control access to AWS services (for example, IAM, bucket policies, SageMaker Role Manager)

Identity and access management for Amazon SageMaker AI

How to use SageMaker AI execution roles

Amazon SageMaker Role Manager

AWS managed policies for Amazon SageMaker AI

Knowledge of: SageMaker AI security and compliance features

Security in Amazon SageMaker AI

Protect data at rest using encryption – Amazon SageMaker AI

Give SageMaker AI access to resources in your Amazon VPC

Knowledge of: Controls for network access to ML resources

Give SageMaker AI access to resources in your Amazon VPC

Use SageMaker AI with VPC endpoints

What is Amazon VPC?

Knowledge of: Security best practices for CI/CD pipelines

Security in AWS CodePipeline

Security in AWS CodeBuild

Securing DevOps – AWS Cloud Adoption Framework

Skills in: Configuring least privilege access to ML artifacts

Security best practices in IAM

Identity and access management for Amazon SageMaker AI

IAM Access Analyzer policy generation

Skills in: Configuring IAM policies and roles for users and applications that interact with ML systems

How to use SageMaker AI execution roles

Amazon SageMaker Role Manager

Creating IAM policies – AWS IAM

Skills in: Monitoring, auditing, and logging ML systems to ensure continued security and compliance

Log Amazon SageMaker API calls with AWS CloudTrail

Monitor Amazon SageMaker with Amazon CloudWatch

What is AWS Security Hub?

Skills in: Troubleshooting and debugging security issues

Troubleshoot IAM – AWS Identity and Access Management

What is IAM Access Analyzer?

Log Amazon SageMaker API calls with AWS CloudTrail

Skills in: Building VPCs, subnets, and security groups to securely isolate ML systems

What is Amazon VPC?

Subnets for your VPC – Amazon VPC

Security groups for your VPC – Amazon VPC

Give SageMaker AI access to resources in your Amazon VPC

This brings us to the end of the MLA-C01 AWS Certified Data Engineer Associate exam study guide.

What do you think? Let me know in the comments section if I have missed out on anything. Also, I love to hear from you how your preparation is going on!

In case you are preparing for other AWS certification exams, check out the AWS study guides for those exams.

Follow Me to Receive Updates on MLA-C01 Exam

Want to be notified as soon as I post? Subscribe to the RSS feed / leave your email address in the subscribe section. Share the article to your social networks with the below links so it can benefit others.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) Study Guide

MLA-C01 Preparation Details

AWS Machine Learning Engineer Prep

Content Domain 1: Data Preparation for Machine Learning (ML)

Task 1.1: Ingest and store data

Knowledge of: Data formats and ingestion mechanisms (for example, validated and non-validated formats, Apache Parquet, JSON, CSV, Apache ORC, Apache Avro, RecordIO)

Knowledge of: How to use the core AWS data sources (for example, Amazon S3, Amazon EFS, Amazon FSx for NetApp ONTAP)

Knowledge of: How to use AWS streaming data sources to ingest data (for example, Amazon Kinesis, Apache Flink, Apache Kafka)

Knowledge of: AWS storage options, including use cases and tradeoffs

Skills in: Extracting data from storage (for example, Amazon S3, Amazon EBS, Amazon EFS, Amazon RDS, Amazon DynamoDB) by using relevant AWS service options (for example, Amazon S3 Transfer Acceleration, Amazon EBS Provisioned IOPS)

Skills in: Choosing appropriate data formats (for example, Parquet, JSON, CSV, ORC) based on data access patterns

Skills in: Ingesting data into Amazon SageMaker Data Wrangler and SageMaker Feature Store

Skills in: Merging data from multiple sources (for example, by using programming techniques, AWS Glue, Apache Spark)

Skills in: Troubleshooting and debugging data ingestion and storage issues that involve capacity and scalability

Skills in: Making initial storage decisions based on cost, performance, and data structure

Task 1.2: Transform data and perform feature engineering

Knowledge of: Data cleaning and transformation techniques (for example, detecting and treating outliers, imputing missing data, combining, deduplication)

Knowledge of: Feature engineering techniques (for example, data scaling and standardization, feature splitting, binning, log transformation, normalization)

Knowledge of: Encoding techniques (for example, one-hot encoding, binary encoding, label encoding, tokenization)

Knowledge of: Tools to explore, visualize, or transform data and features (for example, SageMaker Data Wrangler, AWS Glue, AWS Glue DataBrew)

Knowledge of: Services that transform streaming data (for example, AWS Lambda, Spark)

Knowledge of: Data annotation and labeling services that create high-quality labeled datasets

Skills in: Transforming data by using AWS tools (for example, AWS Glue, DataBrew, Spark running on Amazon EMR, SageMaker Data Wrangler)

Skills in: Creating and managing features by using AWS tools (for example, SageMaker Feature Store)

Skills in: Validating and labeling data by using AWS services (for example, SageMaker Ground Truth, Amazon Mechanical Turk)

Task 1.3: Ensure data integrity and prepare data for modeling

Knowledge of: Pre-training bias metrics for numeric, text, and image data (for example, class imbalance [CI], difference in proportions of labels [DPL])

Knowledge of: Strategies to address CI in numeric, text, and image datasets (for example, synthetic data generation, resampling)

Knowledge of: Techniques to encrypt data

Knowledge of: Data classification, anonymization, and masking

Knowledge of: Implications of compliance requirements (for example, PII, PHI, data residency)

Skills in: Validating data quality (for example, by using DataBrew and AWS Glue Data Quality)

Skills in: Identifying and mitigating sources of bias in data (for example, selection bias, measurement bias) by using AWS tools (for example, SageMaker Clarify)

Skills in: Preparing data to reduce prediction bias (for example, by using dataset splitting, shuffling, and augmentation)

Skills in: Configuring data to load into the model training resource (for example, Amazon EFS, Amazon FSx)

Content Domain 2: ML Model Development

Task 2.1: Choose a modeling approach

Knowledge of: Capabilities and appropriate uses of ML algorithms to solve business problems

Knowledge of: How to use AWS artificial intelligence (AI) services (for example, Amazon Translate, Amazon Transcribe, Amazon Rekognition, Amazon Bedrock) to solve specific business problems

Knowledge of: How to consider interpretability during model selection or algorithm selection

Knowledge of: Amazon SageMaker AI built-in algorithms and when to apply them

Skills in: Assessing available data and problem complexity to determine the feasibility of an ML solution

Skills in: Comparing and selecting appropriate ML models or algorithms to solve specific problems

Skills in: Choosing built-in algorithms, foundation models, and solution templates (for example, in SageMaker JumpStart and Amazon Bedrock)

Skills in: Selecting models or algorithms based on costs

Skills in: Selecting AI services to solve common business needs

Task 2.2: Train and refine models

Knowledge of: Elements in the training process (for example, epoch, steps, batch size)

Knowledge of: Methods to reduce model training time (for example, early stopping, distributed training)

Knowledge of: Factors that influence model size

Knowledge of: Methods to improve model performance

Knowledge of: Benefits of regularization techniques (for example, dropout, weight decay, L1 and L2)

Knowledge of: Hyperparameter tuning techniques (for example, random search, Bayesian optimization)

Knowledge of: Model hyperparameters and their effects on model performance (for example, number of trees in a tree-based model, number of layers in a neural network)

Knowledge of: Methods to integrate models that were built outside SageMaker AI into SageMaker AI

Skills in: Using SageMaker AI built-in algorithms and common ML libraries to develop ML models

Skills in: Using SageMaker AI script mode with SageMaker AI supported frameworks to train models (for example, TensorFlow, PyTorch)

Skills in: Using custom datasets to fine-tune pre-trained models (for example, Amazon Bedrock, SageMaker JumpStart)

Skills in: Performing hyperparameter tuning (for example, by using SageMaker AI automatic model tuning [AMT])

Skills in: Integrating automated hyperparameter optimization capabilities

Skills in: Preventing model overfitting, underfitting, and catastrophic forgetting (for example, by using regularization techniques, feature selection)

Skills in: Combining multiple training models to improve performance (for example, ensembling, stacking, boosting)

Skills in: Reducing model size (for example, by altering data types, pruning, updating feature selection, compression)

Skills in: Managing model versions for repeatability and audits (for example, by using the SageMaker Model Registry)

Task 2.3: Analyze model performance

Knowledge of: Model evaluation techniques and metrics (for example, confusion matrix, heat maps, F1 score, accuracy, precision, recall, RMSE, ROC, AUC)

Knowledge of: Methods to create performance baselines

Knowledge of: Methods to identify model overfitting and underfitting

Knowledge of: Metrics available in SageMaker Clarify to gain insights into ML training data and models

Knowledge of: Convergence issues

Skills in: Selecting and interpreting evaluation metrics and detecting model bias

Skills in: Assessing tradeoffs between model performance, training time, and cost

Skills in: Performing reproducible experiments by using AWS services

Skills in: Comparing the performance of a shadow variant to the performance of a production variant

Skills in: Using SageMaker Clarify to interpret model outputs

Skills in: Using SageMaker Model Debugger to debug model convergence

Content Domain 3: Deployment and Orchestration of ML Workflows

Task 3.1: Select deployment infrastructure based on existing architecture and requirements

Knowledge of: Deployment best practices (for example, versioning, rollback strategies)

Knowledge of: AWS deployment services (for example, Amazon SageMaker AI)