fbpx

The Interview Study Guide For Data Engineers

Interviewing for any technical position generally requires preparing, studying and long-all day interviews. As a data engineer, what we need to study is not always clear. Some positions require...

Interviewing for any technical position generally requires preparing, studying and long-all day interviews. As a data engineer, what we need to study is not always clear. Some positions require Hadoop, others SQL, some roles require understanding statistics while still others require heavy amounts of system design.

We have gathered many of the resources that we have used to study and get jobs at companies int the FAANG family as well as other major tech companies. We have yet to find one that requires you to know anything about Hadoop during the interview so that has not bee included in this study guide.

However, as part of this study guide we have created the checklist. Sometimes, studying for interviews can feel like you are getting nothing done. Thus, this checklist will help you keep track so you know where to improve and what you have done. You can find the checklist here.

Download The Data Engineering Interview Checklist

Outline

  1. SQL Pre-Video Problems
  2. SQL Videos
  3. SQL Post Video Problems
  4. Database, Data Warehouse And ETL Design
  5. Algorithm And Data Structures Programming Problems
  6. Operational Programming Problems
  7. System Design Videos
  8. Udemy Courses
  9. Books

SQL 

As a data engineer it is almost inevitable that you will get some SQL questions. As someone who has participated in many interviews for a lot of top tech companies like Amazon and Capital One. They usually follow some similar patterns.

Typically there will be at least one question that requires an aggregation with a filter, another that requires a few joins and then one that requires a subquery. Along with that, there might be a few curve ball questions that require self-joins, recursions and analytic functions. So let’s look at a couple concepts that are good to cover

SQL Pre-Video Problems

These first few problems will help you gauge where you are on different concepts. That way you can take notes on the study guide and go back and review what you feel you were not comfortable with.

Join with Aggregation

262. Trips and Users

Problem: The Trips table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are both foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).

This is usually the 3rd or 4th style of question.

Sql Questions With Complex Logic

Sometimes, you get lucky and only have to deal with SQL questions that involve one point o logic. Sometimes it gets a little trickier. For instance this question asks you to both find cities with 3 consecutive rows and has populations over 100.

This is usually pretty easy to do when you are working, but can sometimes be a little difficult when you are in the middle of an interview.

 

601. Human Traffic of Stadium

Problem: X city built a new stadium, each day many people visit it and the stats are saved as these columns: idvisit_datepeople

Please write a query to display the records which have 3 or more consecutive rows and the amount of people more than 100(inclusive).

 

185. Department Top Three Salaries

Problem: Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows (order of rows does not matter).

 

197. Rising Temperature

Problem: Given a Weather table, write a SQL query to find all dates’ Ids with higher temperature compared to its previous (yesterday’s) dates.

 

Advanced Join

626. Exchange Seats

Problem: Mary is a teacher in a middle school and she has a table seat storing students’ names and their corresponding seat ids.

The column id is continuous increment.

Mary wants to change seats for the adjacent students.

Can you write a SQL query to output the result for Mary?

 

Simple Joins

The Report

Problem: Ketty gives Eve a task to generate a report containing three columns: NameGrade and MarkKetty doesn’t want the NAMES of those students who received a grade lower than 8. The report must be in descending order by grade — i.e. higher grades are entered first. If there is more than one student with the same grade (8-10) assigned to them, order those particular students by their name alphabetically. Finally, if the grade is lower than 8, use “NULL” as their name and list them by their grades in descending order. If there is more than one student with the same grade (1-7) assigned to them, order those particular students by their marks in ascending order.

Write a query to help Eve.

 

Ranking, Row Numbers And Analytic Functions

177. Nth Highest Salary

Problem: Write a SQL query to get the nth highest salary from the Employeetable.

 

Complex Self Joins

Symmetric Pairs

Problem: You are given a table, Functions, containing two columns: and Y.

Two pairs (X1, Y1) and (X2, Y2) are said to be symmetric pairs if X1 = Y2 and X2 = Y1.

Write a query to output all such symmetric pairs in ascending order by the value of X.

If you need more SQL try these one as well:

Occupations

Ollivander’s Inventory

 

Videos:

IQ15: 6 SQL Query Interview Questions

 

Learning about ROW_NUMBER and Analytic Functions

Advanced Implementation Of Analytic Functions Running Total

Advanced Implementation Of Analytic Functions Median

Wise Owl SQL Videos

Post Video SQL Problems

Once you have finished watching the SQL videos above. Consider trying the new problems below. Try to see if you feel like you are improving. Again, note down any specific topics you feel weak on.

Binary Tree Nodes

595. Big Countries

626. Exchange Seats

Weather Observation Station 18

Challenges

Print Prime Numbers

SQL Interview Questions: 3 Tech Screening Exercises (For Data Analysts)

 

Databases, ETL and Data Warehouses

database interview questions

For database, ETL and data warehouse design questions we have gathered and even created some videos we hope will help you out when it comes to explaining your design in an interview. In addition, we have listed out a few plausible database/DW concepts you could attempt to design out on your own.

*We are working on making similar videos. Sign up to get future emails about our videos.

Designing A Traditional Relational Database Video

Data Warehouse Design Video

ETL Design Video

 

 

Self Practice Problems:

For this part of your interview practice we are going to list out a few business systems that you can try to design out. First we recommend designing a relational database, then thinking about how you would design an ETL and DW that rely on that relational DB.

*In addition, we have found it very common that interviewers will base their interview questions off of your design. So think about some of the questions you could answer with your DB and list them out.

Design a Database/ETL and DW for a:

  • Dating App
  • Bicycle Rental Service
  • Music Streaming App
  • Job Search Website
  • Udemy like website

These are just a few ideas. We hope they help you have a clearer idea of what you can practice modeling and designing. Take some time to think about how users interact with these websites before getting started.

Programming Problems

Data engineers do a significant amount of programming in there daily life. There are several specific languages data engineers use. In particular, Python is arguable the most common.

If the role requires a lot of Hadoop work, then Java is also a useful language to have. There are a few other useful languages like Java and Powershell (if you work at a Microsoft shop).

There are two types of questions we have experienced. Some interviewers will ask you more operational questions. Others will ask classic algorithm and data structure questions.

Below are list of them…

Algorithms And Data Structures

Before going to deep into data structure and algorithms. Let’s do a quick check to see how you are currently doing in this area. We have listed out 8 leetcode problems that vary in difficulty. Try these out and try to gauge yourself on how long it takes you as well as how many hints you needed. If you are following along with the study guide, then note this down. At the end of this list are a few more questions. So once you have watched all the videos, consider doing those problems and see if you feel like you are improving!

Pre-Study Problems

  1. 985. Sum of Even Numbers After Queries
  2. 657. Robot Return to Origin
  3. 961. N-Repeated Element in Size 2N Array
  4. 110. Balanced Binary Tree
  5. 3. Longest Substring Without Repeating Characters
  6. 19. Remove Nth Node From End of List
  7. 23. Merge k Sorted Lists
  8. 31. Next Permutation

 

Now that you have gone through these 8 questions, and shaken off the rust. Let’s start reviewing these concepts.

Data Structures

Data Structures & Algorithms #1 – What Are Data Structures?

Data Structures: Linked Lists

Data Structures: Trees

Data Structures: Heaps

Data Structures: Hash Tables

Data Structures: Stacks and Queues

Data Structures: Crash Course Computer Science #14

Data Structures: Tries

 

Algorithms

Python Algorithms for Interviews

Algorithms: Graph Search, DFS and BFS

Algorithms: Binary Search

Algorithms: Recursion

Algorithms: Bubble Sort

Algorithms: Merge Sort

Algorithms: Quicksort

 

Big O Notation

Introduction to Big O Notation and Time Complexity (Data Structures & Algorithms #7)

Some Interview Walk Throughs

Amazon Coding Interview Question – Recursive Staircase Problem

Google Coding Interview – Universal Value Tree Problem

Google Coding Interview Question and Answer #1: First Recurring Character

 

Post Study Problems

Once you have finished the videos above. Consider trying the algorithm and data structure problems below. Make sure you keep track of how comfortable you felt when working on the problems.

  1. Bigger Is Greater
  2. 6. ZigZag Conversion
  3. 7. Reverse Integer
  4. 40. Combination Sum II
  5. 43. Multiply Strings
  6. Larry’s Array
  7. Short Palindrome
  8. 65. Valid Number

 

If you still feel like you need help, then consider taking a course on algorithms and data structures.

Operational Programming Problems

Operational interview questions are harder to prep for. There are no “classic” interview questions here. However, they are also often easier to figure out on the spot. Algorithm interview questions usually have some sort of trick. Like the balanced brackets problem. If you don’t know you need to use queues, it will be very difficult to get to the correct answer.

Where as operational problems will be more focused on workflows, and businesses processes. So as long as you are good at walking through real problems, this should be easier. Here are some problems that are great for prepping. We find it is helpful to know how to use arrays and dictionaries. Beyond that, there isn’t too much more required.

Kangaroo Problem

 

interview data engineer

 

Breaking Records

Find A String

itertools.permutations()

No Idea!

Days of the programmer

Leaderboard

Word Order

Sherlock And Squares

Equalize The Array

Apples And Oranges

More Operational Style Questions

 

data engineer interview questions

 

 

 

System Design Questions

System design questions aren’t that common. However, this is a good skill to practice because you never know when an interviewer suddenly asks you to design a mobile app for the next Uber (it has happened to us). You don’t want to get stuck on pieces you don’t know when doing high level design. It quickly shows what you have 0 clue about. For instance, what is a CDN, what type of Webserver are you going to use for a ASP.NET site, where does the API layer go. All of these are basic questions you don’t want to get stumped on.

So spending a few minutes watching the videos below will help you be prepared in case your interviewer shocks you with their problem.

Tiny url

Parking Lot System

Whats App

Uber design

Instagram

Tinder Service

Courses

If you have finished all of the questions and videos above and still feel like you need to review, then consider taking the courses below. They will help you get some new perspective. The truth is, these courses will pretty much cover the same concepts covered in the videos we have provided. However, sometimes, hearing the same information from multiple sources is helpful.

The Coding Interview Bootcamp: Algorithms + Data Structures

Data Warehouse Concepts: Basic to Advanced concepts

Data Structures and Algorithms Bootcamp

Books

Along with the courses, it might be helpful to read some programming, data warehousing design books. These are other key concepts worth covering.

Cracking The Code

The Algorithm Design Manual

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

 

Conclusion

We do hope this list will help you prepare for your next data engineering interview. If you need the checklist to help keep track then you can find the checklist here. Please let us know if you have any questions or need any future help. Please do sign up for our email list to continue to receive free content like this!

 

Are You Interested In Learning About Data Science Or Tech?

10 Amazing Articles On Python Programming And Machine Learning

Learning Data Science: Our Top 25 Data Science Courses

The Best And Only Python Tutorial You Will Ever Need To Watch

Engineering Dashboards, Metrics And Algorithms Part 2

Read Last Weeks Top Ten Article For Python Libraries

The Best And Last Python Tutorial You Will Ever Watch

How Algorithms Can Become Unethical and Biased

Dynamically Bulk Inserting CSV Data Into A SQL Server

4 Must Have Skills For Data Scientists

SQL Best Practices — Designing An ETL Video

 

 

Categories
Uncategorized
No Comment

Leave a Reply

*

*

Sponsors

RELATED BY