Questions for the PROFESSIONAL MACHINE LEARNING ENGINEER were updated on : Sep 08 ,2024
Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your
platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control
of the models code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want
to build on existing resources and use managed services instead of building a completely new model. How should you build
the classifier?
D
You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company
is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide
range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that
will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for
building Extract, Transform, Load (ETL) process. Which service should you use?
D
Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written.
You have a large training dataset that is structured like this:
You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should
you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?
C
You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep
Learning VM Image, you receive the following error: The resource 'projects/deeplearning-platforn/zones/europe-west4-
c/acceleratorTypes/nvidia-tesla-k80' was not found. What should you do?
A
Your team needs to build a model that predicts whether images contain a drivers license, passport, or credit card. The data
engineering team already built the pipeline and generated a dataset composed of 10,000 images with drivers licenses,
1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label
map: [drivers_license, passport, credit_card]. Which loss function should you use?
D
Explanation:
se sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]
Reference: https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-
over-the-other
Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is
performing poorly due to a change in the distribution of the input data. How should you address the input differences in
production?
C
You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset,
you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train
several classification models, but none of them converge. How should you resolve the class imbalance problem?
B
Explanation:
Reference: https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-
written-digit-8aa60330d022
You need to design a customized deep neural network in Keras that will predict customer purchases based on their
purchase history. You want to explore model performance using multiple model architectures, store training data, and be
able to compare the evaluation metrics in the same dashboard. What should you do?
C
You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query
against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve
this in the easiest way possible. What should you do?
A
You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to
organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?
A
You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted
marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing
can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with
AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that AutoML fits
the best model to your data?
D
You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is
performing worse on the validation data. You want the model to be resilient to overfitting. Which strategy should you use
when retraining the model?
D
Your organizations call center has asked you to develop a model that analyzes customer sentiments in each call. The call
center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region
in which the call originated, and no Personally Identifiable Information (PII) can be stored or analyzed. The data science
team has a thirdparty tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to
select components for data processing and for analytics. How should the data pipeline be designed?
B
You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines.
You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to
reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf.data
dataset? (Choose two.)
A E
You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial,
because the production model is required to keep up with market changes. Since being deployed to production, the model
hasnt changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady
decline in model accuracy?
D