Exam sharing & Tips — How can I pass the Databricks Certified Data Engineer Associate in 2 weeks

Hanson Chiu
5 min readAug 11, 2022

--

Databricks has been named a leader in both the Gartner DBMS and DSML 2021 Magic Quadrants. As a data consultant who always needs to catch up the latest technology trend, the Databricks Certification becomes my first priority exam goal.

This story aims to provide a quick sharing on how I allocate my time to get the Databricks Certified Data Engineer Associate efficiently in 2 weeks.

What is Databricks?

Just in case you have no idea what is Databricks but you are reading this story. Here is a quick introduction about Databricks,

  • It’s not a programming language. Databricks is a Database management system / Data science & Machine learning Data Platform which could support various programming languages in data (e.g. SQL, Python, R, Scala)
  • All your data, analytics and AI on one platform — one of the key selling point that Databricks designs as a lakehouse solution which unifies the data team’s operations which could support data engineering, data science and data analyst tasks (e.g. dashboard, ad-hoc SQL queries) ‘
  • It adopts Delta lake, MLflow, Spark structured streaming ,etc. — these modern data technologies to optimise the data operations / processing within Databricks

A bit background about myself

I’m not an absolute beginner. The SQL and Python knowledge help me to fast track quite a lot of topics especially on the SQL / Python related syntax (e.g. CTE, joining, UDF).

  • 6 years + SQL experiences
  • 2 years + Python experiences
  • 0 knowledge on Databricks, Lakehouse design and Delta Lake

If you have zero experience in SQL / Python, it may take more time to get familiar with the basic syntax and lab sessions to experience and play around.

Exam details

The Databricks Certified Data Engineer Associate certification exam assesses an individual’s ability to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its capabilities.

Below exam details are captured from Databricks as of Aug 2022, you can always refer to their latest detail here

Duration: 90 minutes

Questions: 45 multiple-choice questions (There is no multiple answer question)

Questions distribution:

  • Databricks Lakehouse Platform — 24% (11/45)
  • ELT with Spark SQL and Python — 29% (13/45)
  • Incremental Data Processing — 22% (10/45)
  • Production Pipelines — 16% (7/45)
  • Data Governance — 9% (4/45)

Cost: $200

Result: The exam result will be available once you finish the examination. The badge will be available within few working days

Strategy and time allocation

I literally followed the Databricks recommended path as below,

Source: Databricks

My company is the Databricks partner that we could access their online courses at no cost. And based on my experience, their online courses did cover all the topics in the exam with sufficient depth. But you should also setup your own lab and practice on the code and syntax, also check their documentations online for certain concepts (e.g. autoloader, checkpointing — how they offset the position).

Source: Databricks
  1. Fundamentals of the Databricks Lakehouse Platform Accreditation (Course Duration: 30 mins)
  2. Data Engineering with Databricks (Course Duration: 12 hours)
  3. Exam Information: Databricks Certified Associate Data Engineer (available for additional fee) (Course Duration: < 15 mins)
  4. Certification Overview: Databricks Certified Data Engineer Associate Exam (Course Duration: 1 hour)

Total course study: around 14 hours (but I normally use 1.5x speed so 9 hours in total)

My time allocation

Week 1:

In average I spent average 1–2 hours daily study. Throughout the course, you can also follow the lab and practice but it will cost you more time. I simply read and understand the syntax without setting up the lab on Azure/AWS/GCP.

Week 2:

Throughout the lab practice, I did read quite a lot of articles / Databricks document in explaining the available parameters and options but you could follow the lab flow from Databricks to deep dive topic by topic.

Useful links / materials

  • Practice exam — highly recommend you try few times of the practice exam before going to the actual one
  • Exam Preparation slides — the slides being used in the online courses
  • Course Code repo — the source codes used in the lab sessions by topics
  • Databricks Notebook viewer — in case you don’t have the trial for AWS/Azure/GCP or you are too lazy to set one up, you can viewer to open the dbc file

Hope my tips could help and guide you in the exam preparation. Stay tuned with my next exam sharing and tips!

--

--

Hanson Chiu

Digital & Data Enthusiast | Tech Exam Machine | Cloud computing | My Linkedin Profile: https://www.linkedin.com/in/hanson-chiu-a53272137/