amazon AWS Certified Data Analytics - Specialty exam practice questions

Questions for the AWS CERTIFIED DATA ANALYTICS SPECIALTY DAS C01 were updated on : Nov 30 ,2024

Page 1 out of 11. Viewing questions 1-15 out of 164

Question 1

A manufacturing company wants to create an operational analytics dashboard to visualize metrics from equipment in near-
real time. The company uses Amazon Kinesis Data Streams to stream the data to other applications. The dashboard must
automatically refresh every 5 seconds. A data analytics specialist must design a solution that requires the least possible
implementation effort.
Which solution meets these requirements?

  • A. Use Amazon Kinesis Data Firehose to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
  • B. Use Apache Spark Streaming on Amazon EMR to read the data in near-real time. Develop a custom application for the dashboard by using D3.js.
  • C. Use Amazon Kinesis Data Firehose to push the data into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Visualize the data by using an OpenSearch Dashboards (Kibana).
  • D. Use AWS Glue streaming ETL to store the data in Amazon S3. Use Amazon QuickSight to build the dashboard.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://aws.amazon.com/blogs/big-data/analyze-a-time-series-in-real-time-with-aws-lambda-amazon-kinesis-and-
amazon-dynamodb-streams/

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

An IoT company wants to release a new device that will collect data to track sleep overnight on an intelligent mattress.
Sensors will send data that will be uploaded to an Amazon S3 bucket. About 2 MB of data is generated each night for each
bed. Data must be processed and summarized for each user, and the results need to be available as soon as possible. Part
of the process consists of time windowing and other functions. Based on tests with a Python script, every run will require
about 1 GB of memory and will complete within a couple of minutes.
Which solution will run the script in the MOST cost-effective way?

  • A. AWS Lambda with a Python script
  • B. AWS Glue with a Scala job
  • C. Amazon EMR with an Apache Spark script
  • D. AWS Glue with a PySpark job
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 3

A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in
Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also
helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team's requirements?

  • A. Geospatial chart
  • B. Line chart
  • C. Heat map
  • D. Tree map
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://docs.aws.amazon.com/quicksight/latest/user/geospatial-charts.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 4

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data
Streams and is writing the data to Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the
S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and
transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently
showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)

  • A. Place the small files into one S3 folder. Define one single table for the small S3 files in AWS Glue Data Catalog. Rerun the AWS Glue ETL jobs against this AWS Glue table.
  • B. Create an AWS Lambda function to merge small S3 files and invoke them periodically. Run the AWS Glue ETL jobs after successful completion of the Lambda function.
  • C. Run the S3DistCp utility in Amazon EMR to merge a large number of small S3 files before running the AWS Glue ETL jobs.
  • D. Use the groupFiles setting in the AWS Glue ETL job to merge small S3 files and rerun AWS Glue ETL jobs.
  • E. Update the Kinesis Data Firehose S3 buffer size to 128 MB. Update the buffer interval to 900 seconds.
Answer:

A D

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%

Explanation:
Reference: https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html
https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html

Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 5

A company using Amazon QuickSight Enterprise edition has thousands of dashboards, analyses, and datasets. The
company struggles to manage and assign permissions for granting users access to various items within QuickSight. The
company wants to make it easier to implement sharing and permissions management.
Which solution should the company implement to simplify permissions management?

  • A. Use QuickSight folders to organize dashboards, analyses, and datasets. Assign individual users permissions to these folders.
  • B. Use QuickSight folders to organize dashboards, analyses, and datasets. Assign group permissions by using these folders.
  • C. Use AWS IAM resource-based policies to assign group permissions to QuickSight items.
  • D. Use QuickSight user management APIs to provision group permissions based on dashboard naming conventions.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/quicksight/update-folder-permissions.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 6

A manufacturing company uses Amazon Connect to manage its contact center and Salesforce to manage its customer
relationship management (CRM) data. The data engineering team must build a pipeline to ingest data from the contact
center and CRM system into a data lake that is built on Amazon S3.
What is the MOST efficient way to collect data in the data lake with the LEAST operational overhead?

  • A. Use Amazon Kinesis Data Streams to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
  • B. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon Kinesis Data Streams to ingest Salesforce data.
  • C. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
  • D. Use Amazon AppFlow to ingest Amazon Connect data and Amazon Kinesis Data Firehose to ingest Salesforce data.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://aws.amazon.com/kinesis/data-firehose/?kinesis-blogs.sort-by=item.additionalFields.createdDate&kinesis-
blogs.sort-order=desc

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7

A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few
hours and read-only queries are run throughout the day and evening. There is a
particularly heavy load with no writes for several hours each morning on business days. During those hours, some queries
are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime.
What is the MOST cost-effective solution?

  • A. Enable concurrency scaling in the workload management (WLM) queue.
  • B. Add more nodes using the AWS Management Console during peak hours. Set the distribution style to ALL.
  • C. Use elastic resize to quickly add nodes during peak times. Remove the nodes when they are not needed.
  • D. Use a snapshot, restore, and resize operation. Switch to the new target cluster.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as
.csv files that are stored in Amazon S3. The companys analysts are using Amazon Athena to perform SQL queries against a
recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time,
and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)

  • A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
  • B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
  • C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
  • D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
  • E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
Answer:

B C

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%

Explanation:
Reference: https://www.upsolver.com/blog/apache-parquet-why-use https://aws.amazon.com/blogs/big-data/work-with-
partitioned-data-in-aws-glue/

Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 9

A company has a marketing department and a finance department. The departments are storing data in Amazon S3 in their
own AWS accounts in AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data.
The departments have some databases and tables that share common names.
The marketing department needs to securely access some tables from the finance department.
Which two steps are required for this process? (Choose two.)

  • A. The finance department grants Lake Formation permissions for the tables to the external account for the marketing department.
  • B. The finance department creates cross-account IAM permissions to the table for the marketing department role.
  • C. The marketing department creates an IAM role that has permissions to the Lake Formation tables.
Answer:

A B

User Votes:
A
50%
B
50%
C
50%

Explanation:
Granting Lake Formation Permissions
Creating an IAM role (AWS CLI)
Reference: https://docs.aws.amazon.com/lake-formation/latest/dg/lake-formation-permissions.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html

Discussions
vote your answer:
A
B
C
0 / 1000

Question 10

A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from
AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis
with visualizations of logs in a way that requires minimal development effort.
Which solution meets these requirements?

  • A. Use an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad- hoc analyses and use Amazon QuickSight to develop data visualizations.
  • B. Create a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon OpenSearch Service (Amazon Elasticsearch Service). Use Amazon ES to perform text-based searches of the logs for ad-hoc analyses and use OpenSearch Dashboards (Kibana) for data visualizations.
  • C. Create an AWS Lambda function to convert the logs into .csv format. Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL queries and use Amazon QuickSight to develop data visualizations.
  • D. Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

An analytics software as a service (SaaS) provider wants to offer its customers business intelligence (BI) reporting
capabilities that are self-service. The provider is using Amazon QuickSight to build these reports. The data for the reports
resides in a multi-tenant database, but each customer should only be able to access their own data.
The provider wants to give customers two user role options:
Read-only users for individuals who only need to view dashboards.

Power users for individuals who are allowed to create and share new dashboards with other users.

Which QuickSught feature allows the provider to meet these requirements?

  • A. Embedded dashboards
  • B. Table calculations
  • C. Isolated namespaces
  • D. SPICE
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://docs.aws.amazon.com/quicksight/latest/user/provisioning-users.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12

A company is sending historical datasets to Amazon S3 for storage. A data engineer at the company wants to make these
datasets available for analysis using Amazon Athena. The engineer also wants to encrypt the Athena query results in an S3
results location by using AWS solutions for encryption. The requirements for encrypting the query results are as follows:
Use custom keys for encryption of the primary dataset query results.

Use generic encryption for all other query results.

Provide an audit trail for the primary dataset queries that shows when the keys were used and by whom.

Which solution meets these requirements?

  • A. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the primary dataset. Use SSE-S3 for the other datasets.
  • B. Use server-side encryption with customer-provided encryption keys (SSE-C) for the primary dataset. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the other datasets.
  • C. Use server-side encryption with AWS KMS managed customer master keys (SSE-KMS CMKs) for the primary dataset. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the other datasets.
  • D. Use client-side encryption with AWS Key Management Service (AWS KMS) customer managed keys for the primary dataset. Use S3 client-side encryption with client-side keys for the other datasets.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://d1.awsstatic.com/product-marketing/S3/Amazon_S3_Security_eBook_2020.pdf

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13

A company is providing analytics services to its sales and marketing departments. The departments can access the data
only through their business intelligence (BI) tools, which run queries on Amazon Redshift using an Amazon Redshift internal
user to connect. Each department is assigned a user in the Amazon Redshift database with the permissions needed for that
department. The marketing data analysts must be granted direct access to the advertising table, which is stored in Apache
Parquet format in the marketing S3 bucket of the company data lake. The company data lake is managed by AWS Lake
Formation. Finally, access must be limited to the three promotion columns in the table. Which combination of steps will meet
these requirements? (Choose three.)

  • A. Grant permissions in Amazon Redshift to allow the marketing Amazon Redshift user to access the three promotion columns of the advertising external table.
  • B. Create an Amazon Redshift Spectrum IAM role with permissions for Lake Formation. Attach it to the Amazon Redshift cluster.
  • C. Create an Amazon Redshift Spectrum IAM role with permissions for the marketing S3 bucket. Attach it to the Amazon Redshift cluster.
  • D. Create an external schema in Amazon Redshift by using the Amazon Redshift Spectrum IAM role. Grant usage to the marketing Amazon Redshift user.
  • E. Grant permissions in Lake Formation to allow the Amazon Redshift Spectrum role to access the three promotion columns of the advertising table.
  • F. Grant permissions in Lake Formation to allow the marketing IAM group to access the three promotion columns of the advertising table.
Answer:

B D E

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
F
50%
Discussions
vote your answer:
A
B
C
D
E
F
0 / 1000

Question 14

A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The processed data is then copied into
Amazon S3. The ETL jobs run every 15 minutes. The AWS Glue Data Catalog partitions need to be updated automatically
after the completion of each job.
Which solution will meet these requirements MOST cost-effectively?

  • A. Use the AWS Glue Data Catalog to manage the data catalog. Define an AWS Glue workflow for the ETL process. Define a trigger within the workflow that can start the crawler when an ETL job run is complete.
  • B. Use the AWS Glue Data Catalog to manage the data catalog. Use AWS Glue Studio to manage ETL jobs. Use the AWS Glue Studio feature that supports updates to the AWS Glue Data Catalog during job runs.
  • C. Use an Apache Hive metastore to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.
  • D. Use the AWS Glue Data Catalog to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Upon successful completion of both jobs, an event trigger, Fix/De-dupe succeeded, starts a crawler, Update schema.
Reference: https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 15

A company hosts an on-premises PostgreSQL database that contains historical data. An internal legacy application uses the
database for read-only activities. The companys business team wants to move the data to a data lake in Amazon S3 as
soon as possible and enrich the data for analytics.
The company has set up an AWS Direct Connect connection between its VPC and its on-premises network. A data analytics
specialist must design a solution that achieves the business teams goals with the least operational overhead.
Which solution meets these requirements?

  • A. Upload the data from the on-premises PostgreSQL database to Amazon S3 by using a customized batch upload process. Use the AWS Glue crawler to catalog the data in Amazon S3. Use an AWS Glue job to enrich and store the result in a separate S3 bucket in Apache Parquet format. Use Amazon Athena to query the data.
  • B. Create an Amazon RDS for PostgreSQL database and use AWS Database Migration Service (AWS DMS) to migrate the data into Amazon RDS. Use AWS Data Pipeline to copy and enrich the data from the Amazon RDS for PostgreSQL table and move the data to Amazon S3. Use Amazon Athena to query the data.
  • C. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Create an Amazon Redshift cluster and use Amazon Redshift Spectrum to query the data.
  • D. Configure an AWS Glue crawler to use a JDBC connection to catalog the data in the on-premises database. Use an AWS Glue job to enrich the data and save the result to Amazon S3 in Apache Parquet format. Use Amazon Athena to query the data.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2