Flipkart

Summary

I interviewed for a Data Scientist 2 role at Flipkart, navigating through screening, in-depth math modeling, data science system design, coding, and two hiring manager rounds, and ultimately received an offer.

Full Experience

My Interview Experience and offer (Flipkart Data Scientist 2 role (Grade 9))
Off-campus opportunity via referral.

a) Screening round :
i) 30 mins interview with Research Director
ii) General interview, revolved around my past work and some behavioral questions

b) Depth in Math Modelling :
i) 1 hr interview with Principal Data Scientist / Director
ii) The DMM round interview focuses on Logistic Regression and CLIP/BLIP models, assessing both theoretical rigor and practical expertise. Logistic Regression questions delve into Binary Cross Entropy, Bayes' Theorem, Bernoulli/binomial distributions, L2 regularization, and the derivation of the loss function, emphasizing probabilistic foundations and model learning. CLIP/BLIP questions cover the architecture, contrastive loss for image-text pairs, limitations in semantic understanding, QFormer’s role in capturing meaning, and handling batches (e.g., 32 pairs). They also explore evaluation metrics like ROC-AUC, PR-AUC, Median Rank, and model calibration, alongside strategies for data imbalance, hard negatives, and mitigating catastrophic forgetting, requiring a strong grasp of machine learning theory and vision-language model applications.

c) Depth in Data Science :
i) 1 hr interview with Senior Data Scientist
ii) The DDS round focused on ML system design, particularly building an end-to-end pipeline from an abstract problem. I was asked to design a solution for credit risk modeling—for example, Flipkart launching a new EMI feature where we need to decide EMI approvals (a binary classification task). The challenge involved a cold start scenario with no existing user data and no access to users' bank information. For every approach I took, I was asked follow-up questions like "why this?" and "why not that?" to justify my choices and demonstrate a clear understanding of the reasoning behind each decision.

If you pass both of these rounds, based on the average rating from each, you move on to the next round, which is the coding round.

d) Coding round :
i) 2 hr interview with senior Data Scientist
ii) The coding round emphasized problem-solving approach and coding skills. Internet access was allowed only for referencing documentation. I was given a fraud detection problem and tasked with writing end-to-end code, covering everything from data loading to model inference.

e) Hiring Manager round (Foundational Model Team (FMT) team) :
i) A 1-hour interview with the manager of the team.
ii)The questions in this round were tailored to the team's focus. Since it was a Foundational Model team, the discussion revolved entirely around Generative AI.
iii) The HM round focused on understanding my past work, my approach, and the reasoning behind it. It also included questions on which the team is currently working on like fine-tuning LLMs, VLMs, and multiple reward policies like DPO and PPO, followed by behavioral and values-based questions.

Since confirmation was taking time from the FMT team, I had another HM round with the Cleartrip team.

f) Hiring Manager round (Cleartrip team) :
i) A 1-hour interview with the manager of the team.
ii) The HM round focused on my past work, approach, and reasoning. I was also asked about the team's current focus, like dynamic pricing. I was given a problem to set discounts on multiple shoe brands, considering competitors, and building a model to maximize revenue.

Compensation details : https://leetcode.com/discuss/post/6740368/flipkart-data-scientist-2-bangalore-by-a-yfrk/

Interview Questions (5)

Logistic Regression & Vision-Language Models (CLIP/BLIP) Theory

Other

The DMM round interview focuses on Logistic Regression and CLIP/BLIP models, assessing both theoretical rigor and practical expertise. Logistic Regression questions delve into Binary Cross Entropy, Bayes' Theorem, Bernoulli/binomial distributions, L2 regularization, and the derivation of the loss function, emphasizing probabilistic foundations and model learning. CLIP/BLIP questions cover the architecture, contrastive loss for image-text pairs, limitations in semantic understanding, QFormer’s role in capturing meaning, and handling batches (e.g., 32 pairs). They also explore evaluation metrics like ROC-AUC, PR-AUC, Median Rank, and model calibration, alongside strategies for data imbalance, hard negatives, and mitigating catastrophic forgetting, requiring a strong grasp of machine learning theory and vision-language model applications.

ML System Design for Credit Risk Modeling (Cold Start)

System Design

The DDS round focused on ML system design, particularly building an end-to-end pipeline from an abstract problem. I was asked to design a solution for credit risk modeling—for example, Flipkart launching a new EMI feature where we need to decide EMI approvals (a binary classification task). The challenge involved a cold start scenario with no existing user data and no access to users' bank information. For every approach I took, I was asked follow-up questions like "why this?" and "why not that?" to justify my choices and demonstrate a clear understanding of the reasoning behind each decision.

End-to-End Fraud Detection System Coding

Data Structures & Algorithms

The coding round emphasized problem-solving approach and coding skills. Internet access was allowed only for referencing documentation. I was given a fraud detection problem and tasked with writing end-to-end code, covering everything from data loading to model inference.

Generative AI & LLM/VLM Fine-tuning Concepts

Other

The questions in this round were tailored to the team's focus. Since it was a Foundational Model team, the discussion revolved entirely around Generative AI. The HM round focused on understanding my past work, my approach, and the reasoning behind it. It also included questions on which the team is currently working on like fine-tuning LLMs, VLMs, and multiple reward policies like DPO and PPO, followed by behavioral and values-based questions.

Dynamic Pricing Model for Maximizing Revenue

System Design

I was also asked about the team's current focus, like dynamic pricing. I was given a problem to set discounts on multiple shoe brands, considering competitors, and building a model to maximize revenue.

Help the Community

Recent Experiences

Flipkart | Data Scientist 2 | Interview