databricks logo

Databricks Interviews

2 experiences87 reads10 questions0% success rate
Databricks | L4 | Bangalore | May 2024
databricks logo
Databricks
SDE IIBangalore1.6 years
April 14, 20254 reads

Summary

I underwent an L4 interview process at Databricks in Bangalore, which included multiple technical and system design rounds. Despite some initial challenges and mixed feedback, my packet was accepted by the Hiring Committee after internal advocacy. However, I ultimately decided to withdraw my application due to a change in personal circumstances.

Full Experience

Sharing the interview experience of my friend. They sent me this and I have pasted it here as is XD. I don't remeber the questions exactly as it has been some time. Also I faintly remember interviewers running the code in interview but am not sure.

Phone screen: The interviewer was from Amsterdam office.

Problem Statement: Create a keyValueData store which monitors the load on it.

Had to implement these fuctions.

class keyValueDataStore{
	string put(string key, string value){
	}
	string get(){
		return “”;
    }
    // returns the number of get calls in last 5 mins
    int mesaureGetLoad(){
        return 0.0;
    }
    // returns the number of put calls in last 5 mins.
    int measurePutLoad(){
        return 0.0;
    }
}

Used a hashmap as key value store and a queue to store pair of timestamps and value. Any other solution with higher time or space complexity was not accepted. Faced an issue here as I didn't know how to get timestamp in C++. I decided to consider a black box class which would return the timestamp.

After this I had to write working tests. This was an issue because we need to fake the timestamps. So I decided to pass a mock class to the above solution class in tests. Using this fake class I could pass the hardcoded timestamps allowing me to test different scenarios.

Interviewer filled the feedback after 2 weeks and I was conveyed that feedback was negative with it mentioning I was not fully familiar with C++ and did not wrote clean code. This was extremely surprising as knowing timestamp function is really rare and that was the only thing I was not familiar with. I also told the same to interviewer and they did not help me in any way. Also I was confident I wrote good code using many helper functions and good variable names. I told the same to my recruiter and asked them for one more try which recruiter did gave.

Phone screen 2 : Interviewer was from India office this time.

Problem Statement: Create a revenue service that support the following operations:

insert(int revenue): returns a unique and new customerId. Assciates the revenue with customerid.

insert(revenue, refererid): does all the above operations and also adds the input revenue to the customer id matching referer id.

get_k_lowest_revenue(int k, int minRevenue): return the k lowest customerId that have at least minRevenue revenue.

Gave an O(k * logn) solution for the above where customerId and revenue pairs were store in a set.

Followup was to optimise the query time to O(1). For this we have to store the answer for every node during insertion, which will result in O(n^2) intertion time.

After the interview, I asked some questions to the interviewer, which they answered. Since me and interviewer had similar background so we informally talked for some more time. They asked me why I am giving another phone round and I told them I was rejected in the first one. To which they were very surprised. They looked at my code from previous round and told me it looks really good. The testing part was not even expected from candidates. After that they told me to remain calm in interviews and try to keep things simple instead of using complex implementations to prove your knowledge. It was a really fun conversation.

Another followup that has been asked to others is: What if a new function is required: get_nested_referral(customerId,referraldepth): returns customerid's revenue plus referraldepth additional revenue amounts. example: insert(10) -> 0: 10 insert(20) -> 1: 20 insert(30, 0) -> 2: 30 and 0: 40 get_nested_referral(0,0) -> returns 10 get_nested_referral(0, 1) -> returns 40

Optimise this for space. Then optimize for time.

Techincal Round 1: Interviewer was from US office. Also my internet was really bad this day and I was not able to hear the interviewer properly due to poor connection. Due to which I was not able to give my best performance. Anyways

Problem Statement: Similar to this. https://leetcode.com/discuss/post/5743277/databricks-l5-sse-technical-phone-screen-l4zi/ But I had to return the result for an ip according to the first rule it matches. So I gave an liner time solution similar to above. There were multiple edge cases to care of.

In the followup they asked to query the result for multiple ip addresses efficiently. I suggested using trie to which interviewer agreed. Then they further asked me if instead of a single ip address, check if whole CIDR block is allowed or disallowed. This turned into innterval handling problem and I thought of multiple approaches but couldn't solve the problem.

Techincal Round 2: Interviewer was from India office.

Problem Statement: Given a source string and ref string, represent ref string using blocks from source string for eg source : abc1234abcd ref: abcd123 we can use cover to define cov1 = [(7,11) (3, 6)] or cov2 = [(7,10), (10, 11), (3,6)]

implement a function to delete index from ref string and return the new cover delete(cover, index)

I told just splitting the cover/interval in which the index is present should work to which interviewer agreed.

folloup: return the maximal cover after deleting, which means no 2 adjacent interval should be mergable, the solution involves checking 3 intervals for deletion case and 4 to check for insertion case. Had to prove the reason why. (I can't seem to remember the exact questions here. This is what I had noted down earlier :( )

Design round: Interviewer was from US office.

This round is unique to Databricks. Here the interviewer told that they are not looking for a distrbuted system and are not concerned with multiple machines. We have to build a solution considering only one machine. Also expectation is to write some pseudocode as well.

Problem Statement: Design a durable key value store. This is a typical question from system design you can find it easily. But ofc with focus on above mentioned things. We talked about different cases and optimisations. It was pretty fun.

To prepare for this I studied concurrency problems and DDIA for knowledge of system design.

Afterwards I asked some questions about the problems he worked on. Given that the interview was of system design the interviewer started telling some systems he built in databricks like a new file system etc and what challenges he faced with resepect to scale. From there we went on discussing about the differences in tech market of india and US, his background and lot more casual stuff. And we were 1 hour over the interview time. We kinda connected on our passion of systems and tech in general.

Behavioral: After all that I had a small talk with site lead where he pitched me Databricks and prepared me for Behavioral interview.

After that I had behavioral round with standard questions.

After that I had a discussion with hiring manager for team matching.

Then my packedt went to HC where it took some time to get accepted as I had 3 LH and 1 H rating. Hence my interview scores weren't good and generally databricks doesn't accept people with these scores. But the site lead collected data from me and from my references, created a strong packet and then defended me in the HC review.

The end: This complete process took 4 months. My circumstances changed drastically since I applied. So I decided to drop out of the process.

Interview Questions (7)

Q1
KeyValueData Store with Load Monitoring
Data Structures & Algorithms

Create a keyValueData store which monitors the load on it. Had to implement these fuctions.

class keyValueDataStore{
	string put(string key, string value){
	}
	string get(){
		return “”;
    }
    // returns the number of get calls in last 5 mins
    int mesaureGetLoad(){
        return 0.0;
    }
    // returns the number of put calls in last 5 mins.
    int measurePutLoad(){
        return 0.0;
    }
}
Q2
Revenue Service with Referral Tracking
Data Structures & Algorithms

Create a revenue service that support the following operations:

insert(int revenue): returns a unique and new customerId. Assciates the revenue with customerid.

insert(revenue, refererid): does all the above operations and also adds the input revenue to the customer id matching referer id.

get_k_lowest_revenue(int k, int minRevenue): return the k lowest customerId that have at least minRevenue revenue.

Q3
Nested Referral Revenue Calculation
Data Structures & Algorithms

What if a new function is required: get_nested_referral(customerId,referraldepth): returns customerid's revenue plus referraldepth additional revenue amounts. example: insert(10) -> 0: 10 insert(20) -> 1: 20 insert(30, 0) -> 2: 30 and 0: 40 get_nested_referral(0,0) -> returns 10 get_nested_referral(0, 1) -> returns 40

Optimise this for space. Then optimize for time.

Q4
IP Address Matching with CIDR Blocks
Data Structures & Algorithms

Similar to this. https://leetcode.com/discuss/post/5743277/databricks-l5-sse-technical-phone-screen-l4zi/ But I had to return the result for an ip according to the first rule it matches. So I gave an liner time solution similar to above. There were multiple edge cases to care of.

In the followup they asked to query the result for multiple ip addresses efficiently. I suggested using trie to which interviewer agreed. Then they further asked me if instead of a single ip address, check if whole CIDR block is allowed or disallowed. This turned into innterval handling problem and I thought of multiple approaches but couldn't solve the problem.

Q5
Represent Ref String using Source String Blocks with Deletion
Data Structures & Algorithms

Given a source string and ref string, represent ref string using blocks from source string for eg source : abc1234abcd ref: abcd123 we can use cover to define cov1 = [(7,11) (3, 6)] or cov2 = [(7,10), (10, 11), (3,6)]

implement a function to delete index from ref string and return the new cover delete(cover, index)

Q6
Durable Key Value Store Design (Single Machine Focus)
System Design

Design a durable key value store. This is a typical question from system design you can find it easily. But ofc with focus on above mentioned things. We talked about different cases and optimisations. It was pretty fun. This round is unique to Databricks. Here the interviewer told that they are not looking for a distrbuted system and are not concerned with multiple machines. We have to build a solution considering only one machine. Also expectation is to write some pseudocode as well.

Q7
Standard Behavioral Questions
Behavioral

Standard behavioral questions were asked.

Preparation Tips

To prepare for this I studied concurrency problems and DDIA for knowledge of system design.

Databricks Interview SDE
databricks logo
Databricks
SDE I
March 11, 202583 reads

Summary

I interviewed for a Software Development Engineer (SDE) position at Databricks. The process involved a technical phone screen and two subsequent coding rounds, covering algorithms, data structures, and a challenging machine coding system design problem.

Full Experience

My interview journey for the SDE role at Databricks started with a phone screen. The question I received was precisely the one detailed in a specific LeetCode Discuss post concerning a BFS problem. Following this, I proceeded to two coding rounds.

The first round focused on algorithms, where I was tasked with a problem identical to one I'd found on Stack Overflow: finding a path between two nodes in a K-th order Fibonacci tree. The second, a machine coding round, challenged me to design a Map structure capable of put(string key, string value) and get(string key) operations. Crucially, it also required instrumentation methods like measure_put_load() and measure_get_load() to report the average number of calls within a 5-minute rolling window. A significant follow-up involved designing this system to handle a high volume of concurrent calls arriving in rapid succession, such as multiple operations within milliseconds. This design problem bore conceptual similarities to the LeetCode 'Time Based Key-Value Store' problem, but with added real-time load measurement and concurrency considerations.

Interview Questions (3)

Q1
BFS Problem (Databricks Phone Screen)
Data Structures & Algorithms

This question was exactly the same as described in the linked LeetCode Discuss post, which involved a Breadth-First Search (BFS) problem.

Q2
Finding a path between two nodes in a K-th order Fibonacci tree
Data Structures & Algorithms

The problem involved finding a path between two given nodes within a K-th order Fibonacci tree. I was asked this problem exactly as described in the provided Stack Overflow link.

Q3
Design a Time-Windowed Key-Value Store with Load Instrumentation
System Design

I was asked to design a Map structure with put(string key, string value) and get(string key) methods. Additionally, it needed instrumentation methods like measure_put_load() and measure_get_load() to return the average calls made within a 5-minute window. A follow-up asked about handling multiple calls coming in fractions of a second (e.g., 10 calls at 0.01, 0.02, etc.). The problem was similar to the LeetCode 'Time Based Key-Value Store' problem but with these specific instrumentation and concurrency requirements.

Have a Databricks Interview Experience to Share?

Help other candidates by sharing your interview experience. Your insights could make the difference for someone preparing for their dream job at Databricks.