Google Professional Cloud DevOps Engineer Real Exam Questions
The questions for Professional Cloud DevOps Engineer were last updated at Jan 12,2025.
- Exam Code: Professional Cloud DevOps Engineer
- Exam Name: Google Cloud Certified - Professional Cloud DevOps Engineer Exam
- Certification Provider: Google
- Latest update: Jan 12,2025
Your company follows Site Reliability Engineering principles. You are writing a postmortem for an incident, triggered by a software change, that severely affected users. You want to prevent severe incidents from happening in the future.
What should you do?
- A . Identify engineers responsible for the incident and escalate to their senior management.
- B . Ensure that test cases that catch errors of this type are run successfully before new software releases.
- C . Follow up with the employees who reviewed the changes and prescribe practices they should follow in the future.
- D . Design a policy that will require on-call teams to immediately call engineers and management to discuss a plan of action if an incident occurs.
B
Explanation:
The best way to prevent severe incidents from happening in the future is to ensure that test cases that catch errors of this type are run successfully before new software releases. This is aligned with the Site Reliability Engineering principle of testing for reliability.
You use Cloud Build to build your application. You want to reduce the build time while minimizing cost and development effort.
What should you do?
- A . Use Cloud Storage to cache intermediate artifacts.
- B . Run multiple Jenkins agents to parallelize the build.
- C . Use multiple smaller build steps to minimize execution time.
- D . Use larger Cloud Build virtual machines (VMs) by using the machine-type option.
C
Explanation:
https://cloud.google.com/storage/docs/best-practices .
https://cloud.google.com/build/docs/speeding-up-builds#caching_directories_with_google_cloud_storage
Caching directories with Google Cloud Storage To increase the speed of a build, reuse the results from a previous build. You can copy the results of a previous build to a Google Cloud Storage bucket, use the results for faster calculation, and then copy the new results back to the bucket. Use this method when your build takes a long time and produces a small number of files that does not take time to copy to and from Google Cloud Storage. upvoted 2 times
You support a production service that runs on a single Compute Engine instance. You regularly need to spend time on recreating the service by deleting the crashing instance and creating a new instance based on the relevant image. You want to reduce the time spent performing manual operations while following Site Reliability Engineering principles.
What should you do?
- A . File a bug with the development team so they can find the root cause of the crashing instance.
- B . Create a Managed Instance Group with a single instance and use health checks to determine the system status.
- C . Add a Load Balancer in front of the Compute Engine instance and use health checks to determine the system status.
- D . Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the crashed instance promptly after it has crashed.
Your organization recently adopted a container-based workflow for application development. Your team develops numerous applications that are deployed continuously through an automated build pipeline to the production environment. A recent security audit alerted your team that the code pushed to production could contain vulnerabilities and that the existing tooling around virtual machine (VM) vulnerabilities no longer applies to the containerized environment. You need to ensure the security and patch level of all code running through the pipeline.
What should you do?
- A . Set up Container Analysis to scan and report Common Vulnerabilities and Exposures.
- B . Configure the containers in the build pipeline to always update themselves before release.
- C . Reconfigure the existing operating system vulnerability software to exist inside the container.
- D . Implement static code analysis tooling against the Docker files used to create the containers.
D
Explanation:
https://cloud.google.com/binary-authorization
Binary Authorization is a deploy-time security control that ensures only trusted container images are deployed on Google Kubernetes Engine (GKE) or Cloud Run. With Binary Authorization, you can require images to be signed by trusted authorities during the development process and then enforce signature validation when deploying. By enforcing validation, you can gain tighter control over your container environment by ensuring only verified images are integrated into the build-and-release process.
You support a high-traffic web application with a microservice architecture. The home page of the application displays multiple widgets containing content such as the current weather, stock prices, and news headlines. The main serving thread makes a call to a dedicated microservice for each widget and then lays out the homepage for the user. The microservices occasionally fail; when that happens, the serving thread serves the homepage with some missing content. Users of the application are unhappy if this degraded mode occurs too frequently, but they would rather have some content served instead of no content at all. You want to set a Service Level Objective (SLO) to ensure that the user experience does not degrade too much.
What Service Level Indicator {SLI) should you use to measure this?
- A . A quality SLI: the ratio of non-degraded responses to total responses
- B . An availability SLI: the ratio of healthy microservices to the total number of microservices
- C . A freshness SLI: the proportion of widgets that have been updated within the last 10 minutes
- D . A latency SLI: the ratio of microservice calls that complete in under 100 ms to the total number of microservice calls
B
Explanation:
https://cloud.google.com/blog/products/gcp/available-or-not-that-is-the-question-cre-life-lessons
You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices.
What should you do first?
- A . Call individual stakeholders lo explain what happened.
- B . Develop a post-mortem to be distributed to stakeholders.
- C . Send the Incident State Document to all the stakeholders.
- D . Require the engineer responsible to write an apology email to all stakeholders.
You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices.
What should you do first?
- A . Call individual stakeholders lo explain what happened.
- B . Develop a post-mortem to be distributed to stakeholders.
- C . Send the Incident State Document to all the stakeholders.
- D . Require the engineer responsible to write an apology email to all stakeholders.
Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users divided between the zones. You can fail over from one zone to another, but it causes a 10-minute service disruption for the affected users. You typically experience a database failure once per quarter and can detect it within five minutes. You are cataloging the reliability risks of a new real-time chat feature for your product.
You catalog the following information for each risk:
• Mean Time to Detect (MUD} in minutes
• Mean Time to Repair (MTTR) in minutes
• Mean Time Between Failure (MTBF) in days
• User Impact Percentage
The chat feature requires a new database system that takes twice as long to successfully fail over between zones. You want to account for the risk of the new database failing in one zone.
What would be the values for the risk of database failover with the new system?
- A . MTTD: 5
MTTR: 10
MTBF: 90
Impact: 33% - B . MTTD:5
MTTR: 20
MTBF: 90
Impact: 33% - C . MTTD:5
MTTR: 10
MTBF: 90
Impact 50% - D . MTTD:5
MTTR: 20
MTBF: 90
Impact: 50%
B
Explanation:
https://www.atlassian.com/incident-management/kpis/common-metrics
https://linkedin.github.io/school-of-sre/
You are developing a strategy for monitoring your Google Cloud Platform (GCP) projects in production using Stackdriver Workspaces. One of the requirements is to be able to quickly identify and react to production environment issues without false alerts from development and staging projects. You want to ensure that you adhere to the principle of least privilege when providing relevant team members with access to Stackdriver Workspaces.
What should you do?
- A . Grant relevant team members read access to all GCP production projects. Create Stackdriver workspaces inside each project.
- B . Grant relevant team members the Project Viewer IAM role on all GCP production projects. Create Slackdriver workspaces inside each project.
- C . Choose an existing GCP production project to host the monitoring workspace. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.
- D . Create a new GCP monitoring project, and create a Stackdriver Workspace inside it. Attach the production projects to this workspace. Grant relevant team members read access to the Stackdriver Workspace.
D
Explanation:
"A Project can host many Projects and appear in many Projects, but it can only be used as the scoping project once. We recommend that you create a new Project for the purpose of having multiple Projects in the same scope."
You deploy a new release of an internal application during a weekend maintenance window when there is minimal user traffic. After the window ends, you learn that one of the new features isn’t working as expected in the production environment. After an extended outage, you roll back the new release and deploy a fix. You want to modify your release process to reduce the mean time to recovery so you can avoid extended outages in the future.
What should you do? Choose 2 answers
- A . Before merging new code, require 2 different peers to review the code changes.
- B . Adopt the blue/green deployment strategy when releasing new code via a CD server.
- C . Integrate a code linting tool to validate coding standards before any code is accepted into the repository.
- D . Require developers to run automated integration tests on their local development environments before release.
- E . Configure a CI server. Add a suite of unit tests to your code and have your CI server run them on commit and verify any changes.