amazon AWS Certified Data Engineer - Associate exam practice questions

Questions for the AMAZON DEA C01 were updated on : Dec 01 ,2025

Page 1 out of 13. Viewing questions 1-15 out of 190

Question 1

A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) Job. The
job contains a sort-merge join of two large and equally sized DataFrames.
The job is failing with the following error: No space left on device.
Which solution will resolve the error?

  • A. Use the AWS Glue Spark shuffle manager.
  • B. Deploy an Amazon Elastic Block Store (Amazon EBS) volume for the job to use.
  • C. Convert the sort-merge join in the job to be a broadcast join.
  • D. Convert the DataFrames to DynamicFrames, and perform a DynamicFrame join in the job.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

A data engineer is using an Apache Iceberg framework to build a data lake that contains 100 TB of
data. The data engineer wants to run AWS Glue Apache Spark Jobs that use the Iceberg framework.
What combination of steps will meet these requirements? (Select TWO.)

  • A. Create a key named -conf for an AWS Glue job. Set Iceberg as a value for the --datalake-formats job parameter.
  • B. Specify the path to a specific version of Iceberg by using the --extra-Jars job parameter. Set Iceberg as a value for the ~ datalake-formats job parameter.
  • C. Set Iceberg as a value for the -datalake-formats job parameter.
  • D. Set the -enable-auto-scaling parameter to true.
  • E. Add the -job-bookmark-option: job-bookmark-enable parameter to an AWS Glue job.
Answer:

A, E

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%

Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 3

A company uses an Amazon Redshift cluster as a data warehouse that is shared across two
departments. To comply with a security policy, each department must have unique access
permissions.

  • A. Group tables and views for each department into dedicated schemas. Manage permissions at the schema level.
  • B. Group tables and views for each department into dedicated databases. Manage permissions at the database level.
  • C. Update the names of the tables and views to follow a naming convention that contains the department names. Manage permissions based on the new naming convention.
  • D. Create an IAM user group for each department. Use identity-based IAM policies to grant table and view permissions based on the IAM user group.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 4

A data engineer notices slow query performance on a highly partitioned table that is in Amazon
Athen
a. The table contains daily data for the previous 5 years, partitioned by date. The data engineer wants
to improve query performance and to automate partition management. Which solution will meet
these requirements?

  • A. Use an AWS Lambda function that runs daily. Configure the function to manually create new partitions in AW5 Glue for each day's data.
  • B. Use partition projection in Athena. Configure the table properties by using a date range from 5 years ago to the present.
  • C. Reduce the number of partitions by changing the partitioning schema from dairy to monthly granularity.
  • D. Increase the processing capacity of Athena queries by allocating more compute resources.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 5

A company uses AWS Glue Apache Spark jobs to handle extract, transform, and load (ETL) workloads.
The company has enabled logging and monitoring for all AWS Glue jobs. One of the AWS Glue jobs
begins to fail. A data engineer investigates the error and wants to examine metrics for all individual
stages within the job. How can the data engineer access the stage metrics?

  • A. Examine the AWS Glue job and stage details in the Spark UI.
  • B. Examine the AWS Glue job and stage metrics in Amazon CloudWatch.
  • C. Examine the AWS Glue job and stage logs in AWS CloudTrail logs.
  • D. Examine the AWS Glue job and stage details by using the run insights feature on the job.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 6

A company wants to combine data from multiple software as a service (SaaS) applications for
analysis.
A data engineering team needs to use Amazon QuickSight to perform the analysis and build
dashboards. A data engineer needs to extract the data from the SaaS applications and make the data
available for QuickSight queries.
Which solution will meet these requirements in the MOST operationally efficient way?

  • A. Create AWS Lambda functions that call the required APIs to extract the data from the applications. Store the data in an Amazon S3 bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight
  • B. Use AWS Lambda functions as Amazon Athena data source connectors to run federated queries against the SaaS applications. Create an Athena data source and a dataset in QuickSight.
  • C. Use Amazon AppFlow to create a Row for each SaaS application. Set an Amazon S3 bucket as the destination. Schedule the flows to extract the data to the bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight.
  • D. Export data the from the SaaS applications as Microsoft Excel files. Create a data source and a dataset in QuickSight by uploading the Excel files.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7

A company wants to migrate a data warehouse from Teradata to Amazon Redshift. Which solution
will meet this requirement with the LEAST operational effort?

  • A. Use AWS Database Migration Service (AWS DMS) Schema Conversion to migrate the schema. Use AWS DMS to migrate the data.
  • B. Use the AWS Schema Conversion Tool (AWS SCT) to migrate the schema. Use AWS Database Migration Service (AWS DMS) to migrate the data.
  • C. Use AWS Database Migration Service (AWS DMS) to migrate the data. Use automatic schema conversion.
  • D. Manually export the schema definition from Teradata. Apply the schema to the Amazon Redshift database. Use AWS Database Migration Service (AWS DMS) to migrate the data.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8

A company has a data pipeline that uses an Amazon RDS instance, AWS Glue jobs, and an Amazon S3
bucket. The RDS instance and AWS Glue jobs run in a private subnet of a VPC and in the same
security group.
A use' made a change to the security group that prevents the AWS Glue jobs from connecting to the
RDS instance. After the change, the security group contains a single rule that allows inbound SSH
traffic from a specific IP address.
The company must resolve the connectivity issue.
Which solution will meet this requirement?

  • A. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the security group as the source.
  • B. Add an inbound rule that allows all TCP traffic on all UDP ports. Set the private IP address of the RDS instance as the source.
  • C. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the DNS name of the RDS instance as the source.
  • D. Replace the source of the existing SSH rule with the private IP address of the RDS instance. Create an outbound rule with the same source, destination, and protocol as the inbound SSH rule.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9

A company receives marketing campaign data from a vendor. The company ingests the data into an
Amazon S3 bucket every 40 to 60 minutes. The data is in CSV format. File sizes are between 100 KB
and 300 KB.
A data engineer needs to set-up an extract, transform, and load (ETL) pipeline to upload the content
of each file to Amazon Redshift.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Create an AWS Lambda function that connects to Amazon Redshift and runs a COPY command. Use Amazon EventBridge to invoke the Lambda function based on an Amazon S3 upload trigger.
  • B. Create an Amazon Data Firehose stream. Configure the stream to use an AWS Lambda function as a source to pull data from the S3 bucket. Set Amazon Redshift as the destination.
  • C. Use Amazon Redshift Spectrum to query the S3 bucket. Configure an AWS Glue Crawler for the S3 bucket to update metadata in an AWS Glue Data Catalog.
  • D. Creates an AWS Database Migration Service (AWS DMS) task. Specify an appropriate data schema to migrate. Specify the appropriate type of migration to use.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10

A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a
physical data model.
The data engineer encounters a de-normalized table that is growing in size. The table does not have a
suitable column to use as the distribution key.
Which distribution style should the data engineer use to meet these requirements with the LEAST
maintenance overhead?

  • A. ALL distribution
  • B. EVEN distribution
  • C. AUTO distribution
  • D. KEY distribution
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

A data engineer is troubleshooting an AWS Glue workflow that occasionally fails. The engineer
determines that the failures are a result of data quality issues. A business reporting team needs to
receive an email notification any time the workflow fails in the future.
Which solution will meet this requirement?

  • A. Create an Amazon Simple Notification Service (Amazon SNS) FIFO topic. Subscribe the team's email account to the SNS topic. Create an AWS Lambda function that initiates when the AWS Glue job state changes to FAILED. Set the SNS topic as the target.
  • B. Create an Amazon Simple Notification Service (Amazon SNS) standard topic. Subscribe the team's email account to the SNS topic. Create an Amazon EventBridge rule that triggers when the AWS Glue Job state changes to FAILED. Set the SNS topic as the target.
  • C. Create an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Subscribe the team's email account to the SQS queue. Create an AWS Config rule that triggers when the AWS Glue job state changes to FAILED. Set the SQS queue as the target.
  • D. Create an Amazon Simple Queue Service (Amazon SQS) standard queue. Subscribe the team's email account to the SQS queue. Create an Amazon EventBridge rule that triggers when the AWS Glue job state changes to FAILED. Set the SQS queue as the target.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12

A data engineer needs to run a data transformation job whenever a user adds a file to an Amazon S3
bucket. The job will run for less than 1 minute. The job must send the output through an email
message to the data engineer. The data engineer expects users to add one file every hour of the day.
Which solution will meet these requirements in the MOST operationally efficient way?

  • A. Create a small Amazon EC2 instance that polls the S3 bucket for new files. Run transformation code on a schedule to generate the output. Use operating system commands to send email messages.
  • B. Run an Amazon Elastic Container Service (Amazon ECS) task to poll the S3 bucket for new files. Run transformation code on a schedule to generate the output. Use operating system commands to send email messages.
  • C. Create an AWS Lambda function to transform the data. Use Amazon S3 Event Notifications to invoke the Lambda function when a new object is created. Publish the output to an Amazon Simple Notification Service (Amazon SNS) topic. Subscribe the data engineer's email account to the topic.
  • D. Deploy an Amazon EMR cluster. Use EMR File System (EMRFS) to access the files in the S3 bucket. Run transformation code on a schedule to generate the output to a second S3 bucket. Create an Amazon Simple Notification Service (Amazon SNS) topic. Configure Amazon S3 Event Notifications to notify the topic when a new object is created.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13

A company manages an Amazon Redshift data warehouse. The data warehouse is in a public subnet
inside a custom VPC A security group allows only traffic from within itself- An ACL is open to all
traffic.
The company wants to generate several visualizations in Amazon QuickSight for an upcoming sales
event. The company will run QuickSight Enterprise edition in a second AW5 account inside a public
subnet within a second custom VPC. The new public subnet has a security group that allows
outbound traffic to the existing Redshift cluster.
A data engineer needs to establish connections between Amazon Redshift and QuickSight.
QuickSight must refresh dashboards by querying the Redshift cluster.
Which solution will meet these requirements?

  • A. Configure the Redshift security group to allow inbound traffic on the Redshift port from the QuickSight security group.
  • B. Assign Elastic IP addresses to the QuickSight visualizations. Configure the QuickSight security group to allow inbound traffic on the Redshift port from the Elastic IP addresses.
  • C. Confirm that the CIDR ranges of the Redshift VPC and the QuickSight VPC are the same. If CIDR ranges are different, reconfigure one CIDR range to match the other. Establish network peering between the VPCs.
  • D. Create a QuickSight gateway endpoint in the Redshift VPC. Attach an endpoint policy to the gateway endpoint to ensure only specific QuickSight accounts can use the endpoint.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14

A data engineer is optimizing query performance in Amazon Athena notebooks that use Apache
Spark to analyze large datasets that are stored in Amazon S3. The data is partitioned. An AWS Glue
crawler updates the partitions.
The data engineer wants to minimize the amount of data that is scanned to improve efficiency of
Athena queries.
Which solution will meet these requirements?

  • A. Apply partition filters in the queries.
  • B. Increase the frequency of AWS Glue crawler invocations to update the data catalog more often.
  • C. Organize the data that is in Amazon S3 by using a nested directory structure.
  • D. Configure Spark to use in-memory caching for frequently accessed data.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 15

A retail company stores order information in an Amazon Aurora table named Orders. The company
needs to create operational reports from the Orders table with minimal latency. The Orders table
contains billions of rows, and over 100,000 transactions can occur each second.
A marketing team needs to join the Orders data with an Amazon Redshift table named Campaigns in
the marketing team's data warehouse. The operational Aurora database must not be affected.
Which solution will meet these requirements with the LEAST operational effort?

  • A. Use AW5 Database Migration Service (AWS DMS) Serverless to replicate the Orders table to Amazon Redshift. Create a materialized view in Amazon Redshift to join with the Campaigns table.
  • B. Use the Aurora zero-ETL integration with Amazon Redshift to replicate the Orders table. Create a materialized view in Amazon Redshift to join with the Campaigns table.
  • C. Use AWS Glue to replicate the Orders table to Amazon Redshift. Create a materialized view in Amazon Redshift to join with the Campaigns table.
  • D. Use federated queries to query the Orders table directly from Aurora. Create a materialized view in Amazon Redshift to join with the Campaigns table.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2