Goldman Sachs

Summary

I successfully interviewed for the Data Analyst role at Goldman Sachs in December 2020, ultimately receiving an offer after three comprehensive rounds that covered my projects, data structures and algorithms, and system design.

Full Experience

Round 1: Technical Discussion - Projects

The first round kicked off with a standard 'Tell me about yourself' and discussions around my resume, specifically addressing my lack of internships. I confidently highlighted my 'mini-torrent' and 'wikipedia search engine' projects. We dove deep into 'mini-torrent', where I explained its step-by-step construction, architectural details like handling slow trackers (active/passive mode with two trackers), and hash calculation using SHA1 for both chunks and complete files.

Next, we moved to my 'wikipedia search engine'. The interviewer presented a scenario: how to manage repeated search requests served by multiple random servers. I proposed a caching approach and was asked to detail its implementation with appropriate data structures and optimizations, specifically explaining LRU with memory considerations. I felt confident in my explanations throughout this round.

Round 2: Competitive Programming - Data Structures & Algorithms

This round was purely focused on competitive programming. After a brief 'tell me about yourself', I was given my first problem:

Find Duplicate in Array (N+1 size, 1 to N values): Given an array of n+1 size, with values from 1 to n (inclusive), where some values are repeated, find any one repeated value. I discussed various approaches, but the interviewer was pushing for the optimal solution using the slow and fast pointer technique, which I initially struggled to articulate.
Minimum Rating in Streaming Data (Top N/3): The second scenario involved a website with a rating system where ratings arrive in a streaming fashion and can be very large. The goal was to display only the top N/3 ratings and identify the minimum rating among those displayed at any point in time. My initial thoughts drifted to a max heap, then a vector, but after a hint from the interviewer to consider heap data structures, I correctly pivoted to a min-heap and proposed an efficient solution.

For every competitive programming question, I was required to write code.

Round 3: System Design & Conceptual

The final round started with a light-hearted question about my interview experience. I again brought up my 'mini-torrent' project, but seeing the interviewer's lack of interest, I switched to my 'wikipedia search engine', which immediately piqued his interest. During my explanation, he interrupted with two scenario-based questions related to its architecture:

Handling Natural Language Phrases in Search Engine: The interviewer asked how my search engine would handle natural language processing data. After some clarification, as he himself struggled to explain NLP, I understood he wanted to know how to search for corresponding or related documents when a user provides a phrase as input. I successfully explained my solution.
Distributed Inverted Indexing with K-way Merge: Given a very large dataset across multiple servers, with the ability to perform K-way merge, I was asked to propose a solution for building an inverted index. I suggested implementing K-way merge sort on individual data segments on each server, followed by merging all sorted data from all servers to build the inverted index, which is a standard approach.
Autocomplete Suggestion (Next Word Prediction): The last question was about implementing an autocomplete feature: if a user types 'Goldman', how would my system suggest 'Sachs' as the next word? I provided a solution, acknowledging it might not be the most optimal but served as a valid starting point.

I concluded the round by declining the offer to ask any further questions, as my doubts were already cleared in previous rounds.

Interview Questions (5)

Find Duplicate in Array (N+1 size, 1 to N values)

Data Structures & Algorithms·Medium

Given an array of n+1 size, starting index is from 1 and given values from 1 to n only. Some values can be repeated so give me any one value which is repeated.

Minimum Rating in Streaming Data (Top N/3)

Data Structures & Algorithms·Medium

You have a rating system on your website where ratings can be in any range, potentially very large. Ratings are coming in streaming mode, and you want to show only N/3 ratings on your website. Determine the minimum rating displayed on the website at any point in time.

Handling Natural Language Phrases in Search Engine

System Design

If a user provides natural language processing data (i.e., input in a phrase) in your search engine, how would you give relevant results to them? The interviewer was specifically asking how to search for corresponding or related documents when the user input is a natural language phrase.

Distributed Inverted Indexing with K-way Merge

System Design

Given a very large dataset, and assuming you have two or more servers where you can perform a K-way merge operation to consolidate all data onto a single server, describe your solution to build an inverted index.

Autocomplete Suggestion (Next Word Prediction)

System Design

If a user has typed 'Goldman', how would you design a system to suggest 'Sachs' as the next word?

Help the Community

Recent Experiences

Goldman Sachs | Data Analyst | Interview Experience | Dec 2020 [Offer]

Summary

Full Experience

Round 1: Technical Discussion - Projects

Round 2: Competitive Programming - Data Structures & Algorithms

Round 3: System Design & Conceptual

Interview Questions (5)

Find Duplicate in Array (N+1 size, 1 to N values)

Minimum Rating in Streaming Data (Top N/3)

Handling Natural Language Phrases in Search Engine

Distributed Inverted Indexing with K-way Merge

Autocomplete Suggestion (Next Word Prediction)

Join the Discussion

No comments yet