Databricks

Summary

I underwent an L4 interview process at Databricks in Bangalore, which included multiple technical and system design rounds. Despite some initial challenges and mixed feedback, my packet was accepted by the Hiring Committee after internal advocacy. However, I ultimately decided to withdraw my application due to a change in personal circumstances.

Full Experience

Sharing the interview experience of my friend. They sent me this and I have pasted it here as is XD. I don't remeber the questions exactly as it has been some time. Also I faintly remember interviewers running the code in interview but am not sure.

Phone screen: The interviewer was from Amsterdam office.

Problem Statement: Create a keyValueData store which monitors the load on it.

Had to implement these fuctions.

class keyValueDataStore{
	string put(string key, string value){
	}
	string get(){
		return “”;
    }
    // returns the number of get calls in last 5 mins
    int mesaureGetLoad(){
        return 0.0;
    }
    // returns the number of put calls in last 5 mins.
    int measurePutLoad(){
        return 0.0;
    }
}

Used a hashmap as key value store and a queue to store pair of timestamps and value. Any other solution with higher time or space complexity was not accepted. Faced an issue here as I didn't know how to get timestamp in C++. I decided to consider a black box class which would return the timestamp.

After this I had to write working tests. This was an issue because we need to fake the timestamps. So I decided to pass a mock class to the above solution class in tests. Using this fake class I could pass the hardcoded timestamps allowing me to test different scenarios.

Interviewer filled the feedback after 2 weeks and I was conveyed that feedback was negative with it mentioning I was not fully familiar with C++ and did not wrote clean code. This was extremely surprising as knowing timestamp function is really rare and that was the only thing I was not familiar with. I also told the same to interviewer and they did not help me in any way. Also I was confident I wrote good code using many helper functions and good variable names. I told the same to my recruiter and asked them for one more try which recruiter did gave.

Phone screen 2 : Interviewer was from India office this time.

Problem Statement: Create a revenue service that support the following operations:

insert(int revenue): returns a unique and new customerId. Assciates the revenue with customerid.

insert(revenue, refererid): does all the above operations and also adds the input revenue to the customer id matching referer id.

get_k_lowest_revenue(int k, int minRevenue): return the k lowest customerId that have at least minRevenue revenue.

Gave an O(k * logn) solution for the above where customerId and revenue pairs were store in a set.

Followup was to optimise the query time to O(1). For this we have to store the answer for every node during insertion, which will result in O(n^2) intertion time.

After the interview, I asked some questions to the interviewer, which they answered. Since me and interviewer had similar background so we informally talked for some more time. They asked me why I am giving another phone round and I told them I was rejected in the first one. To which they were very surprised. They looked at my code from previous round and told me it looks really good. The testing part was not even expected from candidates. After that they told me to remain calm in interviews and try to keep things simple instead of using complex implementations to prove your knowledge. It was a really fun conversation.

Another followup that has been asked to others is: What if a new function is required: get_nested_referral(customerId,referraldepth): returns customerid's revenue plus referraldepth additional revenue amounts. example: insert(10) -> 0: 10 insert(20) -> 1: 20 insert(30, 0) -> 2: 30 and 0: 40 get_nested_referral(0,0) -> returns 10 get_nested_referral(0, 1) -> returns 40

Optimise this for space. Then optimize for time.

Techincal Round 1: Interviewer was from US office. Also my internet was really bad this day and I was not able to hear the interviewer properly due to poor connection. Due to which I was not able to give my best performance. Anyways

Problem Statement: Similar to this. https://leetcode.com/discuss/post/5743277/databricks-l5-sse-technical-phone-screen-l4zi/ But I had to return the result for an ip according to the first rule it matches. So I gave an liner time solution similar to above. There were multiple edge cases to care of.

In the followup they asked to query the result for multiple ip addresses efficiently. I suggested using trie to which interviewer agreed. Then they further asked me if instead of a single ip address, check if whole CIDR block is allowed or disallowed. This turned into innterval handling problem and I thought of multiple approaches but couldn't solve the problem.

Techincal Round 2: Interviewer was from India office.

Problem Statement: Given a source string and ref string, represent ref string using blocks from source string for eg source : abc1234abcd ref: abcd123 we can use cover to define cov1 = [(7,11) (3, 6)] or cov2 = [(7,10), (10, 11), (3,6)]

implement a function to delete index from ref string and return the new cover delete(cover, index)

I told just splitting the cover/interval in which the index is present should work to which interviewer agreed.

folloup: return the maximal cover after deleting, which means no 2 adjacent interval should be mergable, the solution involves checking 3 intervals for deletion case and 4 to check for insertion case. Had to prove the reason why. (I can't seem to remember the exact questions here. This is what I had noted down earlier :( )

Design round: Interviewer was from US office.

This round is unique to Databricks. Here the interviewer told that they are not looking for a distrbuted system and are not concerned with multiple machines. We have to build a solution considering only one machine. Also expectation is to write some pseudocode as well.

Problem Statement: Design a durable key value store. This is a typical question from system design you can find it easily. But ofc with focus on above mentioned things. We talked about different cases and optimisations. It was pretty fun.

To prepare for this I studied concurrency problems and DDIA for knowledge of system design.

Afterwards I asked some questions about the problems he worked on. Given that the interview was of system design the interviewer started telling some systems he built in databricks like a new file system etc and what challenges he faced with resepect to scale. From there we went on discussing about the differences in tech market of india and US, his background and lot more casual stuff. And we were 1 hour over the interview time. We kinda connected on our passion of systems and tech in general.

Behavioral: After all that I had a small talk with site lead where he pitched me Databricks and prepared me for Behavioral interview.

After that I had behavioral round with standard questions.

After that I had a discussion with hiring manager for team matching.

Then my packedt went to HC where it took some time to get accepted as I had 3 LH and 1 H rating. Hence my interview scores weren't good and generally databricks doesn't accept people with these scores. But the site lead collected data from me and from my references, created a strong packet and then defended me in the HC review.

The end: This complete process took 4 months. My circumstances changed drastically since I applied. So I decided to drop out of the process.

Interview Questions (7)

KeyValueData Store with Load Monitoring

Data Structures & Algorithms

Create a keyValueData store which monitors the load on it. Had to implement these fuctions.

class keyValueDataStore{
	string put(string key, string value){
	}
	string get(){
		return “”;
    }
    // returns the number of get calls in last 5 mins
    int mesaureGetLoad(){
        return 0.0;
    }
    // returns the number of put calls in last 5 mins.
    int measurePutLoad(){
        return 0.0;
    }
}

Revenue Service with Referral Tracking

Data Structures & Algorithms

Create a revenue service that support the following operations:

insert(int revenue): returns a unique and new customerId. Assciates the revenue with customerid.

insert(revenue, refererid): does all the above operations and also adds the input revenue to the customer id matching referer id.

get_k_lowest_revenue(int k, int minRevenue): return the k lowest customerId that have at least minRevenue revenue.

Nested Referral Revenue Calculation

Data Structures & Algorithms

What if a new function is required: get_nested_referral(customerId,referraldepth): returns customerid's revenue plus referraldepth additional revenue amounts. example: insert(10) -> 0: 10 insert(20) -> 1: 20 insert(30, 0) -> 2: 30 and 0: 40 get_nested_referral(0,0) -> returns 10 get_nested_referral(0, 1) -> returns 40

Optimise this for space. Then optimize for time.

IP Address Matching with CIDR Blocks

Data Structures & Algorithms

Similar to this. https://leetcode.com/discuss/post/5743277/databricks-l5-sse-technical-phone-screen-l4zi/ But I had to return the result for an ip according to the first rule it matches. So I gave an liner time solution similar to above. There were multiple edge cases to care of.

View Problem →

Represent Ref String using Source String Blocks with Deletion

Data Structures & Algorithms

Given a source string and ref string, represent ref string using blocks from source string for eg source : abc1234abcd ref: abcd123 we can use cover to define cov1 = [(7,11) (3, 6)] or cov2 = [(7,10), (10, 11), (3,6)]

implement a function to delete index from ref string and return the new cover delete(cover, index)

Durable Key Value Store Design (Single Machine Focus)

System Design

Design a durable key value store. This is a typical question from system design you can find it easily. But ofc with focus on above mentioned things. We talked about different cases and optimisations. It was pretty fun. This round is unique to Databricks. Here the interviewer told that they are not looking for a distrbuted system and are not concerned with multiple machines. We have to build a solution considering only one machine. Also expectation is to write some pseudocode as well.

Standard Behavioral Questions

Behavioral

Standard behavioral questions were asked.

Preparation Tips

To prepare for this I studied concurrency problems and DDIA for knowledge of system design.

Help the Community

Recent Experiences

Databricks | L4 | Bangalore | May 2024

Summary

Full Experience

Interview Questions (7)

KeyValueData Store with Load Monitoring

Revenue Service with Referral Tracking

Nested Referral Revenue Calculation

IP Address Matching with CIDR Blocks

Represent Ref String using Source String Blocks with Deletion

Durable Key Value Store Design (Single Machine Focus)

Standard Behavioral Questions

Preparation Tips

Join the Discussion

No comments yet