Databricks | SDE- II | Interview Experience
Summary
I applied for an SDE-2 role at Databricks and went through five rounds, including a technical screening, two DSA rounds, an LLD round, and multiple team matching discussions. The interview verdict is currently pending.
Full Experience
Hi,
I had applied for the SDE-2 role at Databricks using their career portal via LinkedIn and recruiter reached out to after couple of days and scheduled my interviews one by one.
Round - 1: Technical Screening
Fibonacci trees are binary trees which are recursively defined as follows:
- T_0 is empty
- T_1 consists of a single node.
- T_n consists of a root node, with T_{n-2} as its left child, and T_{n-1} as its right child. Examples:
T_0:
T_1: *
T_2: *
*T_3: * / \
- *
* Now, in order to be able to identify each node within tree, we enumerated the nodes in each tree using DFS pre-order.Write a function that given nodes s and e in a Fibonacci tree of order n returns the shortest path from s to e in the form of a sequence of moves: "U" (up), "L" (left), and "R" (right).
Example output:
fibPath(order = 3, start = 1, end = 3) == "URR" fibPath(order = 4, start = 1, end = 4) == "URL" fibPath(order = 5, start = 3, end = 7) == "UURLR"
- Successfully solved the question using pre-computing fibonacci series and then using that to find the start and end in the tree.
- Later asked for the follow-up if we iterate the tree in in-order what difference will be in the solution.
- Solved by generating ranges for each of the CIDR rules and then checking if the given IP lies in the range or not. Used long long to store the ranges of the IPs (can use bitset if you are comfortable with it).
- Downloads a remote file from a given uri with clients calling the CachedFile class with different start and end ranges.
- Optimise the algorithm to supports repeated reads efficiently
Minimizes repeated network calls- Solved using bucketing the ranges in small chunk size at the cache layer and storing the buckets in the disk for faster retrivel rather than network call
- Only retrieving once for the Storage client via network call for each bucket.
- Implemented thread pool to fetch the chunks parallely from storage client.
- Discussed various trade offs like duplicate chunk fetching in threads, failure scenarios, etc.
- Had multiple team matching rounds with different HMs.
- Asked common leadership and past scenarios related questions.
Round - 2: DSA
Given a set of rules, implement the function access_ok to see if IP address is allowed or denied. Also, print the rules index because of which it is getting allowed or denied. If none of the rules match, deny it.
vector<vector<string>> rules = { {"ALLOW", "192.168.100.0/24"}, -> 192.168.100.2 {"DENY", "192.168.0.5/30"}, {"ALLOW", "192.168.1.1/22"}, {"ALLOW", "1.0.0.0/8"}, {"ALLOW", "2.3.4.9"}, {"DENY", "8.8.8.8/1"}, {"ALLOW", "5.6.7.8"} };
access_ok(rules, ip_address) -> true/false
Round - 3: DSA
Design Tic Tac Toe
for n * m, matrix and k subsequent tokens Follow up: is_robot to let AI play randomly
Round - 4: LLD
You are given an interface StorageClient that allows downloading files from a remote storage system:
getFileSize(uri) → returns the size of the file.
fetch(uri, offset, length, buffer) → fetches length bytes starting at offset into buffer.
Your task is to design a class CachedFile that:
Round - 5: Team Matching
Verdict: Pending
Interview Questions (4)
Shortest Path in Fibonacci Trees
Fibonacci trees are binary trees which are recursively defined as follows:
- T_0 is empty
- T_1 consists of a single node.
- T_n consists of a root node, with T_{n-2} as its left child, and T_{n-1} as its right child.
Examples:
T_0:
T_1: *
T_2: * \n *
T_3: * / \n * * \n *
Now, in order to be able to identify each node within tree, we enumerated the nodes in each tree using DFS pre-order.
Write a function that given nodes s and e in a Fibonacci tree of order n returns the shortest path from s to e in the form of a sequence of moves: "U" (up), "L" (left), and "R" (right).
Example output:
fibPath(order = 3, start = 1, end = 3) == "URR" fibPath(order = 4, start = 1, end = 4) == "URL" fibPath(order = 5, start = 3, end = 7) == "UURLR"
IP Address Access Control (CIDR Rules)
Given a set of rules, implement the function access_ok to see if IP address is allowed or denied. Also, print the rules index because of which it is getting allowed or denied. If none of the rules match, deny it.
vector<vector<string>> rules = { {"ALLOW", "192.168.100.0/24"}, -> 192.168.100.2 {"DENY", "192.168.0.5/30"}, {"ALLOW", "192.168.1.1/22"}, {"ALLOW", "1.0.0.0/8"}, {"ALLOW", "2.3.4.9"}, {"DENY", "8.8.8.8/1"}, {"ALLOW", "5.6.7.8"} };
access_ok(rules, ip_address) -> true/false
Design Tic Tac Toe (n*m, k tokens)
Design Tic Tac Toe
for n * m, matrix and k subsequent tokens
Follow up: is_robot to let AI play randomly
Design CachedFile Class for Remote Storage
You are given an interface StorageClient that allows downloading files from a remote storage system:
getFileSize(uri) → returns the size of the file.
fetch(uri, offset, length, buffer) → fetches length bytes starting at offset into buffer.
Your task is to design a class CachedFile that:
- Downloads a remote file from a given uri with clients calling the CachedFile class with different start and end ranges.
- Optimise the algorithm to supports repeated reads efficiently
- Minimizes repeated network calls