Questions for the DATABRICKS CERTIFIED DATA ENGINEER ASSOCIATE were updated on : Dec 01 ,2025
A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they
have the appropriate permissions.
In which location can the data engineer review their permissions on the table?
C
Which type of workloads are compatible with Auto Loader?
A
Which SQL keyword can be used to convert a table from a long format to a wide format?
B
Which two components function in the DB platform architecture’s control plane? (Choose two.)
BE
Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint
violation.
A data engineer has created an ETL pipeline using Delta Live table to manage their company travel
reimbursement detail, they want to ensure that the if the location details has not been provided by
the employee, the pipeline needs to be terminated.
How can the scenario be implemented?
B
Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?
C
Identify a scenario to use an external table.
A Data Engineer needs to create a parquet bronze table and wants to ensure that it gets stored in a
specific path in an external location.
Which table can be created in this scenario?
A
Identify how the count_if function and the count where x is null can be used
Consider a table random_values with below data.
What would be the output of below query?
select count_if(col > 1) as count_
a. count(*) as count_b.count(col1) as count_c from random_values col1
NULL -
A
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data,
and then perform a streaming write into a new table.
The code block used by the data engineer is below:
Which line of code should the data engineer use to fill in the blank if the data engineer only wants
the query to execute a micro-batch to process data every 5 seconds?
D
The Delta transaction log for the ‘students’ tables is shown using the ‘DESCRIBE HISTORY students’
command. A Data Engineer needs to query the table as it existed before the UPDATE operation listed
in the log.
Which command should the Data Engineer use to achieve this? (Choose two.)
AB
A new data engineering team team has been assigned to an ELT project. The new data engineering
team will need full privileges on the table sales to fully manage the project.
Which command can be used to grant full permissions on the database to the new data engineering
team?
A
Explanation:
To grant full privileges on a table such as 'sales' to a group like 'team', the correct SQL command in
Databricks is:
GRANT ALL PRIVILEGES ON TABLE sales TO team;
This command assigns all available privileges, including SELECT, INSERT, UPDATE, DELETE, and any
other data manipulation or definition actions, to the specified team. This is typically necessary when
a team needs full control over a table to manage and manipulate it as part of a project or ongoing
maintenance.
Reference:
Databricks documentation on SQL permissions: SQL Permissions in Databricks
A data engineer needs access to a table new_uable, but they do not have the correct permissions.
They can ask the table owner for permission, but they do not know who the table owner is.
Which approach can be used to identify the owner of new_table?
A.
There is no way to identify the owner of the table
B.
Review the Owner field in the table's page in the cloud storage solution
C.
Review the Permissions tab in the table's page in Data Explorer
D.
Review the Owner field in the table’s page in Data Explorer
D
Explanation:
To find the owner of a table in Databricks, one can utilize the Data Explorer feature. The Data
Explorer provides detailed information about various data objects, including tables. By navigating to
the specific table's page in Data Explorer, a data engineer can review the Owner field, which
identifies the individual or role that owns the table. This information is crucial for obtaining the
necessary permissions or for any administrative actions related to the table.
Reference:
Databricks documentation on Data Explorer: Using Data Explorer in Databricks
A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they
only want the associated SQL endpoint to be running when It is necessary. The dashboard has
multiple queries on multiple datasets associated with it. The data that feeds the dashboard is
automatically processed using a Databricks Job.
Which approach can the data engineer use to minimize the total running time of the SQL endpoint
used in the refresh schedule of their dashboard?
B
Explanation:
To minimize the total running time of the SQL endpoint used in the refresh schedule of a dashboard
in Databricks, the most effective approach is to utilize the Auto Stop feature. This feature allows the
SQL endpoint to automatically stop after a period of inactivity, ensuring that it only runs when
necessary, such as during the dashboard refresh or when actively queried. This minimizes resource
usage and associated costs by ensuring the SQL endpoint is not running idle outside of these
operations.
Reference:
Databricks documentation on SQL endpoints: SQL Endpoints in Databricks
A data engineer and data analyst are working together on a data pipeline. The data engineer is
working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is
working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming
input. They now want to migrate their pipeline to use Delta Live Tables.
Which change will need to be made to the pipeline when migrating to Delta Live Tables?
A
Explanation:
When migrating to Delta Live Tables (DLT) with a data pipeline that involves different programming
languages across various data layers, the migration does not require unifying the pipeline into a
single language. Delta Live Tables support multi-language pipelines, allowing data engineers and
data analysts to work in their preferred languages, such as Python for data engineering tasks (raw,
bronze, and silver layers) and SQL for data analytics tasks (gold layer). This capability is particularly
beneficial in collaborative settings and leverages the strengths of each language for different stages
of data processing.
Reference:
Databricks documentation on Delta Live Tables: Delta Live Tables Guide
A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets
are defined against Delta Lake table sources using live table.
The table is configured to run in Production mode using the Continuous Pipeline Mode.
What is the expected outcome after clicking Start to update the pipeline assuming previously
unprocessed data exists and all definitions are valid?
A.
All datasets will be updated once and the pipeline will shut down. The compute resources
will be terminated.
B.
All datasets will be updated at set intervals until the pipeline is shut down. The compute
resources will persist to allow for additional testing.
C.
All datasets will be updated once and the pipeline will shut down. The compute resources
will persist to allow for additional testing.
D.
All datasets will be updated at set intervals until the pipeline is shut down. The compute
resources will be deployed for the update and terminated when the pipeline is stopped.
D
Explanation:
In Delta Live Tables (DLT), when configured to run in Continuous Pipeline Mode, particularly in a
production environment, the system is designed to continuously process and update data as it
becomes available. This mode keeps the compute resources active to handle ongoing data
processing and automatically updates all datasets defined in the pipeline at predefined intervals.
Once the pipeline is manually stopped, the compute resources are terminated to conserve resources
and reduce costs. This mode is suitable for production environments where datasets need to be kept
up-to-date with the latest data.
Reference:
Databricks documentation on Delta Live Tables: Delta Live Tables Guide