DEA-C01 Study Guide | AWS Data Engineer Associate

DEA-C01 Preparation Details

Preparing for the DEA-C01 AWS Certified Data Engineer Associate certification exam? Start here with a complete, objective-wise DEA-C01 study guide designed to help you pass faster.

This guide brings together official AWS documentation, key concepts, and curated resources for every DEA-C01 exam objective, making it ideal for both beginners and last-minute revision.

Looking for the best DEA-C01 preparation resources in one place? This page covers everything you need to get exam-ready with confidence.

If this helped you, share it with others preparing for the DEA-C01 certification exam.

AWS Data Engineer Prep

Coursera	AWS Certified Data Engineer Associate Exam Prep
Udemy	AWS Certified Data Engineer Associate
Whizlabs	AWS Certified Data Engineer

Content Domain 1: Data Ingestion and Transformation

Task 1.1: Perform data ingestion

Skill 1.1.1: Read data from streaming sources (for example, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka [Amazon MSK], Amazon DynamoDB Streams, AWS Database Migration Service [AWS DMS], AWS Glue, Amazon Redshift)

What is Amazon Kinesis Data Streams?

Welcome to the Amazon MSK Developer Guide

Change data capture for DynamoDB Streams – Amazon DynamoDB

Streaming ETL jobs in AWS Glue

Streaming ingestion to a materialized view – Amazon Redshift

What is AWS Database Migration Service?

Skill 1.1.2: Read data from batch sources (for example, Amazon S3, AWS Glue, Amazon EMR, AWS DMS, Amazon Redshift, AWS Lambda, Amazon AppFlow)

What is AWS Glue?

What is Amazon EMR?

What is Amazon AppFlow?

Getting started with Amazon S3

Skill 1.1.3: Implement appropriate configuration options for batch ingestion

AWS Glue ETL – AWS Prescriptive Guidance

What is AWS Glue?

AWS Cloud Data Ingestion Patterns and Practices

Skill 1.1.4: Consume data APIs

Using AWS services from the Lambda console – AWS Lambda

What is Amazon API Gateway?

AppFlow API Reference – Amazon AppFlow

Skill 1.1.5: Set up schedulers by using Amazon EventBridge, Apache Airflow, or time-based schedules for jobs and crawlers

What Is Amazon Managed Workflows for Apache Airflow?

Scheduling AWS Glue crawlers – AWS Glue

Scheduling AWS Glue jobs – AWS Glue

Amazon EventBridge Scheduler User Guide

Skill 1.1.6: Set up event triggers (for example, Amazon S3 Event Notifications, EventBridge)

Amazon S3 Event Notifications

What is Amazon EventBridge?

Overview of workflows in AWS Glue

Skill 1.1.7: Call a Lambda function from Kinesis

Using AWS Lambda with Amazon Kinesis – AWS Lambda

How Lambda processes records from stream and queue-based event sources

Process Amazon S3 event notifications with Lambda

Skill 1.1.8: Create allowlists for IP addresses to allow connections to data sources

Security groups for your VPC – Amazon VPC

IP address allow lists – AWS Glue

VPC endpoints – Amazon VPC

Skill 1.1.9: Implement throttling and overcoming rate limits (for example, DynamoDB, Amazon RDS, Kinesis)

Amazon Kinesis Data Streams quotas and limits

Error retries and exponential backoff in AWS

Best practices for DynamoDB – Amazon DynamoDB

Task 1.2: Transform and process data

Skill 1.2.1: Optimize container usage for performance needs (for example, Amazon EKS, Amazon ECS)

What is Amazon Elastic Kubernetes Service?

What is Amazon Elastic Container Service?

Amazon EMR on EKS – Amazon EMR

Skill 1.2.2: Connect to different data sources (for example, JDBC, ODBC)

Connection types and options for ETL in AWS Glue

Adding a JDBC connection to AWS Glue

Skill 1.2.3: Integrate data from multiple sources

What is AWS Glue?

Data integration – Analytics Lens

AWS Cloud Data Ingestion Patterns and Practices

Skill 1.2.4: Optimize costs while processing data

Best practices for cost optimization – Analytics Lens

Amazon EMR cost optimization – Amazon EMR

AWS Glue pricing and cost optimization

Skill 1.2.5: Implement data transformation services based on requirements (for example, Amazon EMR, AWS Glue, Lambda, Amazon Redshift)

What is AWS Glue?

What is Amazon EMR?

Data transformation with Amazon Redshift

What is AWS Lambda?

Skill 1.2.6: Transform data between formats (for example, from .csv to Apache Parquet)

Convert CSV to Parquet using AWS Glue

Writing data with Apache Spark – Amazon EMR

Columnar storage formats in AWS Glue

Skill 1.2.7: Troubleshoot and debug common transformation failures and performance issues

Monitoring AWS Glue jobs

Troubleshoot Amazon EMR

Monitor Amazon Redshift with Amazon CloudWatch

Skill 1.2.8: Create data APIs to make data available to other systems by using AWS services

What is Amazon API Gateway?

AWS AppSync – Developer Guide

Skill 1.2.9: Define volume, velocity, and variety of data (for example, structured data, unstructured data)

Data Analytics Lens – AWS Well-Architected Framework

AWS Cloud Data Ingestion Patterns and Practices

Skill 1.2.10: Integrate large language models (LLMs) for data processing

What is Amazon Bedrock?

Amazon Bedrock Data Automation

Task 1.3: Orchestrate data pipelines

Skill 1.3.1: Use orchestration services to build workflows for data ETL pipelines (for example, Lambda, EventBridge, Amazon MWAA, AWS Step Functions, AWS Glue workflows)

What Is Amazon Managed Workflows for Apache Airflow?

AWS Step Functions Developer Guide

Overview of workflows in AWS Glue

What is Amazon EventBridge?

Skill 1.3.2: Build data pipelines for performance, availability, scalability, resiliency, and fault tolerance

Data Analytics Lens – AWS Well-Architected Framework

Reliability Pillar – AWS Well-Architected Framework

AWS Step Functions error handling

Skill 1.3.3: Implement and maintain serverless workflows

AWS Step Functions Developer Guide

What is AWS Lambda?

What Is Amazon Managed Workflows for Apache Airflow?

Skill 1.3.4: Use notification services to send alerts (for example, Amazon SNS, Amazon SQS)

What is Amazon Simple Notification Service?

What is Amazon Simple Queue Service?

Set up Amazon SNS notifications – Amazon CloudWatch

Task 1.4: Apply programming concepts

Skill 1.4.1: Optimize code to reduce runtime for data ingestion and transformation

Performance Efficiency Pillar – AWS Well-Architected Framework

AWS Glue performance tuning

Amazon EMR performance optimization

Skill 1.4.2: Configure Lambda functions to meet concurrency and performance needs

Lambda function scaling – AWS Lambda

Managing Lambda reserved concurrency

Lambda performance optimization

Skill 1.4.3: Use programming languages and frameworks for data engineering (for example, Python, SQL, Scala, R, Java, Bash, PowerShell)

AWS Glue programming ETL scripts in Python

Apache Spark on Amazon EMR clusters

Using Amazon Redshift with SQL

Skill 1.4.4: Use software engineering best practices for data engineering (for example, version control, testing, logging, monitoring)

Monitoring AWS Glue jobs

What is AWS CodeCommit?

Monitoring and observability – Machine Learning Lens

Skill 1.4.5: Use Infrastructure as Code (IaC) to deploy data engineering solutions

What is AWS CloudFormation?

Getting started with the AWS CDK

Skill 1.4.6: Use the AWS Serverless Application Model (AWS SAM) to package and deploy serverless data pipelines (for example, Lambda functions, Step Functions, DynamoDB tables)

What is the AWS Serverless Application Model (AWS SAM)?

AWS SAM resource and property reference

Deploying serverless applications with AWS SAM

Skill 1.4.7: Use and mount storage volumes from within Lambda functions

Using Lambda with Amazon EFS – AWS Lambda

Configuring file system access for Lambda functions

Skill 1.4.8: Use infrastructure as code (IaC) for repeatable resource deployment (for example, AWS CloudFormation and AWS CDK)

What is AWS CloudFormation?

Getting started with the AWS CDK

AWS CDK vs AWS CloudFormation – AWS Prescriptive Guidance

Skill 1.4.9: Describe continuous integration and continuous delivery (CI/CD) (implementation, testing, and deployment of data pipelines)

What is AWS CodePipeline?

What is AWS CodeBuild?

What is AWS CodeDeploy?

Skill 1.4.10: Define distributed computing

Amazon EMR architecture and service layers

Apache Spark on Amazon EMR clusters

Data Analytics Lens – AWS Well-Architected Framework

Skill 1.4.11: Describe data structures and algorithms (for example, graph data structures and tree data structures)

Amazon Neptune User Guide

Data Analytics Lens – AWS Well-Architected Framework

AWS Cloud Data Ingestion Patterns and Practices

Content Domain 2: Data Store Management

Task 2.1: Choose a data store

Skill 2.1.1: Implement the appropriate storage services for specific cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon MSK)

Choosing an AWS database service

Amazon Redshift – Big Data Analytics Options on AWS

What is AWS Lake Formation?

Data Analytics Lens – AWS Well-Architected Framework

Skill 2.1.2: Configure the appropriate storage services for specific access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB)

Amazon DynamoDB Developer Guide

Best practices for DynamoDB – Amazon DynamoDB

What is Amazon Relational Database Service?

Working with columnar data and Apache Parquet – Amazon EMR

Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like HNSW with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access)

What is Amazon MemoryDB?

Perform vector similarity search using pgvector on Amazon Aurora PostgreSQL

Amazon Aurora User Guide

Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family)

What is AWS Transfer Family?

What is AWS Database Migration Service?

What is AWS DataSync?

Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum)

Querying data with federated queries in Amazon Redshift

Creating materialized views in Amazon Redshift

Amazon Redshift Spectrum

Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS)

Lock and LockManager tables – Amazon Redshift

Locking in Amazon RDS for PostgreSQL

Skill 2.1.7: Manage open table formats (for example Apache Iceberg)

Using Apache Iceberg with AWS Glue

S3 Tables and Apache Iceberg – Amazon S3

Apache Iceberg on Amazon EMR

Skill 2.1.8: Describe vector index types (for example, HNSW, IVF)

Perform vector similarity search using pgvector on Amazon Aurora PostgreSQL

Amazon OpenSearch Service vector database capabilities

Vector search in Amazon MemoryDB

Task 2.2: Understand data cataloging systems

Skill 2.2.1: Use data catalogs to consume data from the data’s source

AWS Glue Data Catalog

Data cataloging – Storage Best Practices for Data and Analytics Applications

Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore)

AWS Glue Data Catalog

Connecting to a Hive metastore – Amazon EMR

Using the AWS Glue Data Catalog as the Metastore for Amazon EMR

Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs

Using AWS Glue crawlers

AWS Glue crawlers and classifiers

Skill 2.2.4: Synchronize partitions with a data catalog

Managing partitions for ETL output in AWS Glue

AWS Glue Data Catalog partitions

Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue)

Adding a connection to a data store in AWS Glue

Connection types and options for ETL in AWS Glue

Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog)

Amazon SageMaker Catalog – SageMaker AI

AWS Glue Data Catalog

Task 2.3: Manage the lifecycle of data

Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift

Loading data from Amazon S3 – COPY command – Amazon Redshift

UNLOAD – Amazon Redshift

Tutorial: Loading data from Amazon S3 – Amazon Redshift

Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data

Managing your storage lifecycle – Amazon S3

Setting lifecycle configuration on a bucket – Amazon S3

Skill 2.3.3: Expire data when it reaches a specific age by using S3 Lifecycle policies

Managing your storage lifecycle – Amazon S3

Expiring objects – Amazon S3 Lifecycle

Skill 2.3.4: Manage S3 versioning and DynamoDB TTL

Using versioning in S3 buckets

Expiring items by using DynamoDB Time to Live (TTL)

Skill 2.3.5: Delete data to meet business and legal requirements

Deleting objects – Amazon S3

Managing your storage lifecycle – Amazon S3

Skill 2.3.6: Protect data with appropriate resiliency and availability

Reliability Pillar – AWS Well-Architected Framework

What is AWS Backup?

Disaster Recovery of Workloads on AWS – AWS Whitepaper

Task 2.4: Design data models and schema evolution

Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation

Amazon Redshift database developer guide

DynamoDB core components – Amazon DynamoDB

What is AWS Lake Formation?

Skill 2.4.2: Address changes to the characteristics of data

Schema evolution in AWS Glue

AWS Glue Schema Registry

Using Apache Iceberg with AWS Glue

Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS DMS Schema Conversion)

What is the AWS Schema Conversion Tool?

Converting database schemas using DMS Schema Conversion

What is AWS Database Migration Service?

Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog)

Amazon SageMaker ML Lineage Tracking

Amazon SageMaker Catalog

AWS Glue Data Catalog

Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques

Amazon Redshift engineering’s advanced table design playbook

Best practices for designing and using partition keys in DynamoDB

Best practices design patterns: Optimizing Amazon S3 performance

Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base)

Retrieve data and generate AI responses with Amazon Bedrock Knowledge Bases

Titan Embeddings G1 – Amazon Bedrock

Vector search in Amazon MemoryDB

Content Domain 3: Data Operations and Support

Task 3.1: Automate data processing by using AWS services

Skill 3.1.1: Orchestrate data pipelines (for example, Amazon Managed Workflows for Apache Airflow [Amazon MWAA], AWS Step Functions)

What Is Amazon Managed Workflows for Apache Airflow?

AWS Step Functions Developer Guide

Overview of workflows in AWS Glue

Migrating workloads from AWS Data Pipeline to Step Functions

Skill 3.1.2: Troubleshoot Amazon managed workflows

Troubleshooting Amazon MWAA

Monitoring and metrics for Amazon MWAA

Troubleshoot AWS Step Functions

Skill 3.1.3: Call SDKs to access Amazon features from code

AWS SDKs and Tools

Boto3 – AWS SDK for Python

AWS SDK for Java Developer Guide

Skill 3.1.4: Use the features of AWS services to process data (for example, Amazon EMR, Amazon Redshift, AWS Glue)

What is AWS Glue?

What is Amazon EMR?

Amazon Redshift database developer guide

Data Analytics Lens – AWS Well-Architected Framework

Skill 3.1.5: Consume and maintain data APIs

What is Amazon API Gateway?

Amazon Redshift Data API

Amazon Athena API Reference

Skill 3.1.6: Prepare data for transformation (for example, AWS Glue DataBrew and Amazon SageMaker Unified Studio)

What is AWS Glue DataBrew?

Prepare ML data with Amazon SageMaker Data Wrangler

Amazon SageMaker Unified Studio

Skill 3.1.7: Query data (for example, Amazon Athena)

What is Amazon Athena?

Running SQL queries with Amazon Athena

Using Amazon Redshift to query external data

Skill 3.1.8: Use AWS Lambda to automate data processing

What is AWS Lambda?

Using AWS Lambda with other services

Using Lambda with Amazon Kinesis – AWS Lambda

Skill 3.1.9: Manage events and schedulers (for example, Amazon EventBridge)

What is Amazon EventBridge?

Amazon EventBridge Scheduler User Guide

Amazon EventBridge rules

Task 3.2: Analyze data by using AWS services

Skill 3.2.1: Visualize data by using AWS services and tools (for example, DataBrew, Amazon QuickSight)

What is Amazon QuickSight?

What is AWS Glue DataBrew?

Visualizing data in Amazon QuickSight

Skill 3.2.2: Verify and clean data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler)

What is AWS Glue DataBrew?

Prepare ML data with Amazon SageMaker Data Wrangler

Create a notebook – Amazon Athena

Skill 3.2.3: Use SQL in Amazon Redshift and Athena to query data or to create views

Amazon Redshift SQL commands

Creating views in Amazon Redshift

Running SQL queries with Amazon Athena

Skill 3.2.4: Use Athena notebooks that use Apache Spark to explore data

Using Athena notebooks with Apache Spark

Create a notebook – Amazon Athena

Skill 3.2.5: Describe tradeoffs between provisioned services and serverless services

Amazon Redshift Serverless overview

Amazon EMR Serverless overview

What is AWS Glue?

Skill 3.2.6: Define data aggregation, rolling average, grouping, and pivoting

Aggregate functions in Amazon Redshift

Window functions in Amazon Redshift

SQL reference for Athena

Task 3.3: Maintain and monitor data pipelines

Skill 3.3.1: Extract logs for audits

AWS CloudTrail User Guide

Database audit logging – Amazon Redshift

Monitoring AWS Glue jobs

Skill 3.3.2: Deploy logging and monitoring solutions to facilitate auditing and traceability

What is Amazon CloudWatch?

Log Amazon EMR API calls with AWS CloudTrail

Monitor AWS Glue using Amazon CloudWatch

Skill 3.3.3: Use notifications during monitoring to send alerts

What is Amazon Simple Notification Service?

Set up Amazon SNS notifications – Amazon CloudWatch

Using Amazon CloudWatch alarms

Skill 3.3.4: Troubleshoot performance issues

Troubleshoot AWS Glue performance

Amazon Redshift engineering’s advanced table design playbook

Diagnose and troubleshoot Amazon EMR

Skill 3.3.5: Use AWS CloudTrail to track API calls

What is AWS CloudTrail?

Log Amazon Redshift API calls with CloudTrail

Log AWS Glue API calls with AWS CloudTrail

Skill 3.3.6: Troubleshoot and maintain pipelines (for example, AWS Glue, Amazon EMR)

Troubleshoot AWS Glue

Troubleshoot Amazon EMR

Monitoring Amazon MWAA

Skill 3.3.7: Use Amazon CloudWatch Logs to log application data (with a focus on configuration and automation)

What is Amazon CloudWatch Logs?

Working with log groups and log streams – Amazon CloudWatch Logs

Enabling continuous logging for AWS Glue

Skill 3.3.8: Analyze logs with AWS services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs)

Analyzing log data with CloudWatch Logs Insights

What is Amazon OpenSearch Service?

Querying Amazon S3 data with Amazon Athena using the Glue Data Catalog

Task 3.4: Ensure data quality

Skill 3.4.1: Run data quality checks while processing the data (for example, checking for empty fields)

AWS Glue Data Quality

Overview of data quality in AWS Glue DataBrew

Skill 3.4.2: Define data quality rules (for example, DataBrew)

Creating and managing rules in AWS Glue DataBrew

AWS Glue Data Quality

Skill 3.4.3: Investigate data consistency (for example, DataBrew)

What is AWS Glue DataBrew?

Profiling data with AWS Glue DataBrew

Skill 3.4.4: Describe data sampling techniques

Profiling data with AWS Glue DataBrew

Sample types in AWS Glue DataBrew

Skill 3.4.5: Implement data skew mechanisms

Optimizing Amazon Redshift performance – data distribution styles

Handling data skew in Apache Spark on Amazon EMR

AWS Glue performance tuning – common pitfalls

Content Domain 4: Data Security and Governance

Task 4.1: Apply authentication mechanisms

Skill 4.1.1: Update VPC security groups

Security groups for your VPC – Amazon VPC

VPC security groups – Amazon Redshift

Skill 4.1.2: Create and update AWS Identity and Access Management (IAM) groups, roles, endpoints, and services

IAM identities (users, user groups, and roles)

IAM roles – AWS Identity and Access Management

VPC endpoints – Amazon VPC

Skill 4.1.3: Create and rotate credentials for password management (for example, AWS Secrets Manager)

What is AWS Secrets Manager?

Rotate AWS Secrets Manager secrets

Using an AWS Secrets Manager VPC endpoint

Skill 4.1.4: Set up IAM roles for access (for example, AWS Lambda, Amazon API Gateway, AWS CLI, AWS CloudFormation)

IAM roles for Lambda – AWS Lambda

Control access to a REST API with IAM permissions – Amazon API Gateway

IAM roles for AWS CloudFormation

Skill 4.1.5: Apply IAM policies to roles, endpoints, and services (for example, S3 Access Points, AWS PrivateLink)

Managing data access with Amazon S3 Access Points

What is AWS PrivateLink?

Policies and permissions in AWS Identity and Access Management

Skill 4.1.6: Describe the differences between managed services and unmanaged services

Choosing an AWS analytics service

Data Analytics Lens – AWS Well-Architected Framework

Skill 4.1.7: Use domain, domain units, and projects for SageMaker Unified Studio

Amazon SageMaker Unified Studio

Amazon SageMaker Catalog

Task 4.2: Apply authorization mechanisms

Skill 4.2.1: Create custom IAM policies when a managed policy does not meet the needs

Creating IAM policies – AWS IAM

IAM policy examples – AWS IAM

Skill 4.2.2: Store application and database credentials (for example, Secrets Manager, AWS Systems Manager Parameter Store)

What is AWS Secrets Manager?

AWS Systems Manager Parameter Store

Choose between Secrets Manager and Parameter Store – AWS Decision Guides

Skill 4.2.3: Provide database users, groups, and roles access and authority in a database (for example, for Amazon Redshift)

Security in Amazon Redshift

Users, groups, and permissions in Amazon Redshift

Managing access to Amazon Redshift with IAM

Skill 4.2.4: Manage permissions through AWS Lake Formation (for Amazon Redshift, Amazon EMR, Amazon Athena, and Amazon S3)

What is AWS Lake Formation?

Working with other AWS services – AWS Lake Formation

Lake Formation permissions reference

Skill 4.2.5: Apply authorization methods that address business needs (role-based, tag-based, and attribute-based)

Attribute-based access control (ABAC) with IAM

Tagging IAM resources

Lake Formation tag-based access control

Skill 4.2.6: Construct custom policies that meet the principle of least privilege

Security best practices in IAM

Creating IAM policies – AWS IAM

IAM Access Analyzer policy generation

Task 4.3: Ensure data encryption and masking

Skill 4.3.1: Apply data masking and anonymization according to compliance laws or company policies

Remove PII from conversations by using sensitive information filters – Amazon Bedrock

Amazon Macie User Guide

Dynamic data masking in Amazon Redshift

Skill 4.3.2: Use encryption keys to encrypt or decrypt data (for example, AWS Key Management Service [AWS KMS])

AWS Key Management Service Developer Guide

AWS KMS concepts

Securing, protecting, and managing data – Storage Best Practices for Data and Analytics Applications

Skill 4.3.3: Configure encryption across AWS account boundaries

Allow key usage across AWS accounts – AWS KMS

Using cross-account access with Amazon S3

Skill 4.3.4: Enable encryption in transit or before transit for data

Encryption in transit – AWS

AWS Certificate Manager overview

Encryption of data in transit – Amazon Redshift

Task 4.4: Prepare logs for audit

Skill 4.4.1: Use AWS CloudTrail to track API calls

What is AWS CloudTrail?

Creating a trail for an organization – AWS CloudTrail

Log Amazon S3 data events using CloudTrail

Skill 4.4.2: Use Amazon CloudWatch Logs to store application logs

What is Amazon CloudWatch Logs?

Working with log groups and log streams – Amazon CloudWatch Logs

Skill 4.4.3: Use AWS CloudTrail Lake for centralized logging queries

What is AWS CloudTrail Lake?

Query your AWS CloudTrail Lake event data

Skill 4.4.4: Analyze logs by using AWS services (for example, Athena, CloudWatch Logs Insights, Amazon OpenSearch Service)

Analyzing log data with CloudWatch Logs Insights

Querying Amazon S3 data with Amazon Athena using the Glue Data Catalog

What is Amazon OpenSearch Service?

Skill 4.4.5: Integrate various AWS services to perform logging (for example, Amazon EMR in cases of large volumes of log data)

Log Amazon EMR API calls with AWS CloudTrail

Configure Amazon EMR to send log files to Amazon S3

Publishing flow logs to CloudWatch Logs – Amazon VPC

Task 4.5: Understand data privacy and governance

Skill 4.5.1: Grant permissions for data sharing (for example, data sharing for Amazon Redshift)

Amazon Redshift data sharing overview

Managing data sharing across accounts – Amazon Redshift

Skill 4.5.2: Implement PII identification (for example, Amazon Macie with Lake Formation)

What is Amazon Macie?

Using managed data identifiers in Amazon Macie

What is AWS Lake Formation?

Skill 4.5.3: Implement data privacy strategies to prevent backups or replications of data to disallowed AWS Regions

Service control policies (SCPs) – AWS Organizations

S3 Replication overview – Amazon S3

AWS Config rules for compliance

Skill 4.5.4: Viewing configuration changes that have occurred in an account (for example, AWS Config)

What is AWS Config?

AWS Config rules for compliance

Viewing the resource timeline in the AWS Config console

Skill 4.5.5: Maintain data sovereignty

Data residency – AWS Whitepaper

Service control policies (SCPs) – AWS Organizations

Geographic cross-Region inference – Amazon Bedrock

Skill 4.5.6: Manage data access through Amazon SageMaker Catalog projects

Amazon SageMaker Catalog

Amazon SageMaker Unified Studio

Skill 4.5.7: Describe governance data framework and data sharing patterns

Data Analytics Lens – AWS Well-Architected Framework

What is AWS Lake Formation?

AWS Cloud Adoption Framework for AI, ML, and Generative AI – Security Perspective

This brings us to the end of the DEA-C01 AWS Certified Data Engineer Associate exam study guide.

What do you think? Let me know in the comments section if I have missed out on anything. Also, I love to hear from you how your preparation is going on!

In case you are preparing for other AWS certification exams, check out the AWS study guides for those exams.

Follow Me to Receive Updates on DEA-C01 Exam

Want to be notified as soon as I post? Subscribe to the RSS feed / leave your email address in the subscribe section. Share the article to your social networks with the below links so it can benefit others.

AWS Certified Data Engineer Associate (DEA-C01) Study Guide

DEA-C01 Preparation Details

AWS Data Engineer Prep

Content Domain 1: Data Ingestion and Transformation

Task 1.1: Perform data ingestion

Skill 1.1.1: Read data from streaming sources (for example, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka [Amazon MSK], Amazon DynamoDB Streams, AWS Database Migration Service [AWS DMS], AWS Glue, Amazon Redshift)

Skill 1.1.2: Read data from batch sources (for example, Amazon S3, AWS Glue, Amazon EMR, AWS DMS, Amazon Redshift, AWS Lambda, Amazon AppFlow)

Skill 1.1.3: Implement appropriate configuration options for batch ingestion

Skill 1.1.4: Consume data APIs

Skill 1.1.5: Set up schedulers by using Amazon EventBridge, Apache Airflow, or time-based schedules for jobs and crawlers

Skill 1.1.6: Set up event triggers (for example, Amazon S3 Event Notifications, EventBridge)

Skill 1.1.7: Call a Lambda function from Kinesis

Skill 1.1.8: Create allowlists for IP addresses to allow connections to data sources

Skill 1.1.9: Implement throttling and overcoming rate limits (for example, DynamoDB, Amazon RDS, Kinesis)

Skill 1.1.10: Manage fan-in and fan-out for streaming data distribution

Skill 1.1.11: Describe replayability of data ingestion pipelines

Skill 1.1.12: Define stateful and stateless data transactions

Task 1.2: Transform and process data

Skill 1.2.1: Optimize container usage for performance needs (for example, Amazon EKS, Amazon ECS)

Skill 1.2.2: Connect to different data sources (for example, JDBC, ODBC)

Skill 1.2.3: Integrate data from multiple sources

Skill 1.2.4: Optimize costs while processing data

Skill 1.2.5: Implement data transformation services based on requirements (for example, Amazon EMR, AWS Glue, Lambda, Amazon Redshift)

Skill 1.2.6: Transform data between formats (for example, from .csv to Apache Parquet)

Skill 1.2.7: Troubleshoot and debug common transformation failures and performance issues

Skill 1.2.8: Create data APIs to make data available to other systems by using AWS services

Skill 1.2.9: Define volume, velocity, and variety of data (for example, structured data, unstructured data)

Skill 1.2.10: Integrate large language models (LLMs) for data processing

Task 1.3: Orchestrate data pipelines

Skill 1.3.1: Use orchestration services to build workflows for data ETL pipelines (for example, Lambda, EventBridge, Amazon MWAA, AWS Step Functions, AWS Glue workflows)

Skill 1.3.2: Build data pipelines for performance, availability, scalability, resiliency, and fault tolerance

Skill 1.3.3: Implement and maintain serverless workflows

Skill 1.3.4: Use notification services to send alerts (for example, Amazon SNS, Amazon SQS)

Task 1.4: Apply programming concepts

Skill 1.4.1: Optimize code to reduce runtime for data ingestion and transformation

Skill 1.4.2: Configure Lambda functions to meet concurrency and performance needs

Skill 1.4.3: Use programming languages and frameworks for data engineering (for example, Python, SQL, Scala, R, Java, Bash, PowerShell)

Skill 1.4.4: Use software engineering best practices for data engineering (for example, version control, testing, logging, monitoring)

Skill 1.4.5: Use Infrastructure as Code (IaC) to deploy data engineering solutions

Skill 1.4.6: Use the AWS Serverless Application Model (AWS SAM) to package and deploy serverless data pipelines (for example, Lambda functions, Step Functions, DynamoDB tables)

Skill 1.4.7: Use and mount storage volumes from within Lambda functions

Skill 1.4.8: Use infrastructure as code (IaC) for repeatable resource deployment (for example, AWS CloudFormation and AWS CDK)

Skill 1.4.9: Describe continuous integration and continuous delivery (CI/CD) (implementation, testing, and deployment of data pipelines)

Skill 1.4.10: Define distributed computing

Skill 1.4.11: Describe data structures and algorithms (for example, graph data structures and tree data structures)

Content Domain 2: Data Store Management

Task 2.1: Choose a data store

Skill 2.1.1: Implement the appropriate storage services for specific cost and performance requirements (for example, Amazon Redshift, Amazon EMR, AWS Lake Formation, Amazon RDS, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon MSK)

Skill 2.1.2: Configure the appropriate storage services for specific access patterns and requirements (for example, Amazon Redshift, Amazon EMR, Lake Formation, Amazon RDS, DynamoDB)

Skill 2.1.3: Apply storage services to appropriate use cases (for example, using indexing algorithms like HNSW with Amazon Aurora PostgreSQL and using Amazon MemoryDB for fast key/value pair access)

Skill 2.1.4: Integrate migration tools into data processing systems (for example, AWS Transfer Family)

Skill 2.1.5: Implement data migration or remote access methods (for example, Amazon Redshift federated queries, Amazon Redshift materialized views, Amazon Redshift Spectrum)

Skill 2.1.6: Manage locks to prevent access to data (for example, Amazon Redshift, Amazon RDS)

Skill 2.1.7: Manage open table formats (for example Apache Iceberg)

Skill 2.1.8: Describe vector index types (for example, HNSW, IVF)

Task 2.2: Understand data cataloging systems

Skill 2.2.1: Use data catalogs to consume data from the data’s source

Skill 2.2.2: Build and reference a technical data catalog (for example, AWS Glue Data Catalog, Apache Hive metastore)

Skill 2.2.3: Discover schemas and use AWS Glue crawlers to populate data catalogs

Skill 2.2.4: Synchronize partitions with a data catalog

Skill 2.2.5: Create new source or target connections for cataloging (for example, AWS Glue)

Skill 2.2.6: Create and manage business data catalogs (for example, Amazon SageMaker Catalog)

Task 2.3: Manage the lifecycle of data

Skill 2.3.1: Perform load and unload operations to move data between Amazon S3 and Amazon Redshift

Skill 2.3.2: Manage S3 Lifecycle policies to change the storage tier of S3 data

Skill 2.3.3: Expire data when it reaches a specific age by using S3 Lifecycle policies

Skill 2.3.4: Manage S3 versioning and DynamoDB TTL

Skill 2.3.5: Delete data to meet business and legal requirements

Skill 2.3.6: Protect data with appropriate resiliency and availability

Task 2.4: Design data models and schema evolution

Skill 2.4.1: Design schemas for Amazon Redshift, DynamoDB, and Lake Formation

Skill 2.4.2: Address changes to the characteristics of data

Skill 2.4.3: Perform schema conversion (for example, by using the AWS Schema Conversion Tool [AWS SCT] and AWS DMS Schema Conversion)

Skill 2.4.4: Establish data lineage by using AWS tools (for example, Amazon SageMaker ML Lineage Tracking and Amazon SageMaker Catalog)

Skill 2.4.5: Describe best practices for indexing, partitioning strategies, compression, and other data optimization techniques

Skill 2.4.6: Describe vectorization concepts (for example, Amazon Bedrock knowledge base)

Content Domain 3: Data Operations and Support

Task 3.1: Automate data processing by using AWS services

Skill 3.1.1: Orchestrate data pipelines (for example, Amazon Managed Workflows for Apache Airflow [Amazon MWAA], AWS Step Functions)

Skill 3.1.2: Troubleshoot Amazon managed workflows