Questions for the AWS CERTIFIED DATA ANALYTICS SPECIALTY DAS C01 were updated on : Nov 30 ,2024
A manufacturing company wants to create an operational analytics dashboard to visualize metrics from equipment in near-
real time. The company uses Amazon Kinesis Data Streams to stream the data to other applications. The dashboard must
automatically refresh every 5 seconds. A data analytics specialist must design a solution that requires the least possible
implementation effort.
Which solution meets these requirements?
B
Explanation:
Reference: https://aws.amazon.com/blogs/big-data/analyze-a-time-series-in-real-time-with-aws-lambda-amazon-kinesis-and-
amazon-dynamodb-streams/
An IoT company wants to release a new device that will collect data to track sleep overnight on an intelligent mattress.
Sensors will send data that will be uploaded to an Amazon S3 bucket. About 2 MB of data is generated each night for each
bed. Data must be processed and summarized for each user, and the results need to be available as soon as possible. Part
of the process consists of time windowing and other functions. Based on tests with a Python script, every run will require
about 1 GB of memory and will complete within a couple of minutes.
Which solution will run the script in the MOST cost-effective way?
A
A retail company has 15 stores across 6 cities in the United States. Once a month, the sales team requests a visualization in
Amazon QuickSight that provides the ability to easily identify revenue trends across cities and stores. The visualization also
helps identify outliers that need to be examined with further analysis.
Which visual type in QuickSight meets the sales team's requirements?
A
Explanation:
Reference: https://docs.aws.amazon.com/quicksight/latest/user/geospatial-charts.html
A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data
Streams and is writing the data to Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the
S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and
transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently
showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)
A D
Explanation:
Reference: https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html
https://docs.aws.amazon.com/glue/latest/dg/grouping-input-files.html
A company using Amazon QuickSight Enterprise edition has thousands of dashboards, analyses, and datasets. The
company struggles to manage and assign permissions for granting users access to various items within QuickSight. The
company wants to make it easier to implement sharing and permissions management.
Which solution should the company implement to simplify permissions management?
B
Explanation:
Reference: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/quicksight/update-folder-permissions.html
A manufacturing company uses Amazon Connect to manage its contact center and Salesforce to manage its customer
relationship management (CRM) data. The data engineering team must build a pipeline to ingest data from the contact
center and CRM system into a data lake that is built on Amazon S3.
What is the MOST efficient way to collect data in the data lake with the LEAST operational overhead?
B
Explanation:
Reference: https://aws.amazon.com/kinesis/data-firehose/?kinesis-blogs.sort-by=item.additionalFields.createdDate&kinesis-
blogs.sort-order=desc
A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few
hours and read-only queries are run throughout the day and evening. There is a
particularly heavy load with no writes for several hours each morning on business days. During those hours, some queries
are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime.
What is the MOST cost-effective solution?
A
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as
.csv files that are stored in Amazon S3. The companys analysts are using Amazon Athena to perform SQL queries against a
recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time,
and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)
B C
Explanation:
Reference: https://www.upsolver.com/blog/apache-parquet-why-use https://aws.amazon.com/blogs/big-data/work-with-
partitioned-data-in-aws-glue/
A company has a marketing department and a finance department. The departments are storing data in Amazon S3 in their
own AWS accounts in AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data.
The departments have some databases and tables that share common names.
The marketing department needs to securely access some tables from the finance department.
Which two steps are required for this process? (Choose two.)
A B
Explanation:
Granting Lake Formation Permissions
Creating an IAM role (AWS CLI)
Reference: https://docs.aws.amazon.com/lake-formation/latest/dg/lake-formation-permissions.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html
A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from
AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis
with visualizations of logs in a way that requires minimal development effort.
Which solution meets these requirements?
D
An analytics software as a service (SaaS) provider wants to offer its customers business intelligence (BI) reporting
capabilities that are self-service. The provider is using Amazon QuickSight to build these reports. The data for the reports
resides in a multi-tenant database, but each customer should only be able to access their own data.
The provider wants to give customers two user role options:
Read-only users for individuals who only need to view dashboards.
Power users for individuals who are allowed to create and share new dashboards with other users.
Which QuickSught feature allows the provider to meet these requirements?
D
Explanation:
Reference: https://docs.aws.amazon.com/quicksight/latest/user/provisioning-users.html
A company is sending historical datasets to Amazon S3 for storage. A data engineer at the company wants to make these
datasets available for analysis using Amazon Athena. The engineer also wants to encrypt the Athena query results in an S3
results location by using AWS solutions for encryption. The requirements for encrypting the query results are as follows:
Use custom keys for encryption of the primary dataset query results.
Use generic encryption for all other query results.
Provide an audit trail for the primary dataset queries that shows when the keys were used and by whom.
Which solution meets these requirements?
A
Explanation:
Reference: https://d1.awsstatic.com/product-marketing/S3/Amazon_S3_Security_eBook_2020.pdf
A company is providing analytics services to its sales and marketing departments. The departments can access the data
only through their business intelligence (BI) tools, which run queries on Amazon Redshift using an Amazon Redshift internal
user to connect. Each department is assigned a user in the Amazon Redshift database with the permissions needed for that
department. The marketing data analysts must be granted direct access to the advertising table, which is stored in Apache
Parquet format in the marketing S3 bucket of the company data lake. The company data lake is managed by AWS Lake
Formation. Finally, access must be limited to the three promotion columns in the table. Which combination of steps will meet
these requirements? (Choose three.)
B D E
A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The processed data is then copied into
Amazon S3. The ETL jobs run every 15 minutes. The AWS Glue Data Catalog partitions need to be updated automatically
after the completion of each job.
Which solution will meet these requirements MOST cost-effectively?
A
Explanation:
Upon successful completion of both jobs, an event trigger, Fix/De-dupe succeeded, starts a crawler, Update schema.
Reference: https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html
A company hosts an on-premises PostgreSQL database that contains historical data. An internal legacy application uses the
database for read-only activities. The companys business team wants to move the data to a data lake in Amazon S3 as
soon as possible and enrich the data for analytics.
The company has set up an AWS Direct Connect connection between its VPC and its on-premises network. A data analytics
specialist must design a solution that achieves the business teams goals with the least operational overhead.
Which solution meets these requirements?
B