Meesho Data Scientist-1 | Bangal - Meesho

Summary

I participated in the Meesho Data Challenge 2024, won, and was invited to interview for a Data Scientist-1 role in Bangalore. The interview process consisted of four rounds covering DSA, SQL, Statistics, ML/DL, system design, and project discussions. I received a confirmation from HR on the same day as my final interview round.

Full Experience

Participated in Meesho Data Challenge 2024 (won!) and hence got the chance to interview for open roles at Meesho.

HR contacted on 10th April 2025 and shared the interview details:

Round 1: DSA, SQL, Statistics & Mathematics, Basic ML/DL/GenAI Knowledge
Round 2: ML Depth
Round 3: ML Breadth (case study based)
Round 4: Overall Discussion

Round 1 (17/04/2025)

What do you understand by p-value?
If we fail to reject the null hypothesis then do we accept an alternate hypothesis?
We want to sample values between [0, 1] uniformly but it should remain within the unit circle. How do you do?
Explain bias-variance tradeoff
Training on a large dataset, training loss is getting reduced but validation loss is stagnant, how will you address it?
Coding
- Given a list of citizens in a country, Info of birth date and death date, Find the max population throughout the history
- Find the max number of libraries that can be installed from ‘libaries_needed’: You are given three lists: Libraries_needed = [‘pandas’, ‘numpy’] Libraries_wd_no_preq = [‘A’, ‘B’, ‘C’] Libraries_wd_preq = [[‘A’, ‘B’], [‘B’]]
  
  Here 0th index elements are prerequisites for ‘pandas’ and so on
Explain ReLU activation function
What is the drawback of using ReLU
Explain positional encoding in transformers
Why did authors use sine and cosine? Why can’t we use binary values?

Round 2

First (21/04/2025)
- Explain layer and batch normalization
- Discussed potential alternate solution of Meesho Hackathon project
- Convex and non-convex loss functions
- CNNs (projects are CV based)
- Discussion on pooling techniques
- Segmentation basics and pooling techniques in that
- Classification metrics: class imbalance, comparison between ROC-AUC and PR-AUC
- If loss is NaN then possible reasons?
Second (25/04/2025) : This was an additional interview with another panel.
- Bias-variance tradeoff
- Regularization techniques
- Why does L1 regularization create sparsity?
- Optimal batch size and why?
- Why do we go into the negative of the gradient?
- Maximum Likelihood Estimation and Maximum a Posteriori Estimation
- Cross Entropy Loss reasoning and relation to KL divergence
- If all weights are initialized to same values then what would happen
- Dropout and what happen at training time
- Multi GPU training, parallelism, PEFT, QLoRA (mentioned in my resume)

Round 3 (30/04/2025)

In-depth Q&A of my preferred project (Tip: if any project with model building from scratch then discuss that)
How does ViT work?
We want to build a visual search system for Meesho? How will you approach it?
- Focus on model building
- Which model will you select and why?
- How do you train it
- Evaluation and Business Metrics

Later that day I received confirmation from HR.

Interview Questions (30)

What is p-value?

Other

What do you understand by p-value?

Null vs Alternate Hypothesis

Other

If we fail to reject the null hypothesis then do we accept an alternate hypothesis?

Sample Uniformly within Unit Circle

Data Structures & Algorithms

We want to sample values between [0, 1] uniformly but it should remain within the unit circle. How do you do?

Explain Bias-Variance Tradeoff

Other

Explain bias-variance tradeoff

Address Stagnant Validation Loss

Other

Training on a large dataset, training loss is getting reduced but validation loss is stagnant, how will you address it?

Max Population Throughout History

Data Structures & Algorithms

Given a list of citizens in a country, Info of birth date and death date, Find the max population throughout the history

Max Installable Libraries with Prerequisites

Data Structures & Algorithms

Find the max number of libraries that can be installed from ‘libaries_needed’:
You are given three lists:
        Libraries_needed = [‘pandas’, ‘numpy’]
        Libraries_wd_no_preq = [‘A’, ‘B’, ‘C’]
        Libraries_wd_preq = [[‘A’, ‘B’], [‘B’]]

        Here 0th index elements are prerequisites for ‘pandas’ and so on

Explain ReLU Activation Function

Other

Explain ReLU activation function

Drawback of ReLU

Other

What is the drawback of using ReLU

10.

Explain Positional Encoding in Transformers

Other

Explain positional encoding in transformers

11.

Sine/Cosine vs Binary for Positional Encoding

Other

Why did authors use sine and cosine? Why can’t we use binary values?

12.

Explain Layer and Batch Normalization

Other

Explain layer and batch normalization

13.

Convex and Non-Convex Loss Functions

Other

Convex and non-convex loss functions

14.

Discuss CNNs

Other

CNNs (projects are CV based)

15.

Discuss Pooling Techniques

Other

Discussion on pooling techniques

16.

Segmentation Basics and Pooling Techniques

Other

Segmentation basics and pooling techniques in that

17.

Classification Metrics (ROC-AUC vs PR-AUC)

Other

Classification metrics: class imbalance, comparison between ROC-AUC and PR-AUC

18.

Possible Reasons for NaN Loss

Other

If loss is NaN then possible reasons?

19.

Bias-Variance Tradeoff

Other

Bias-variance tradeoff

20.

Regularization Techniques

Other

Regularization techniques

21.

Why L1 Regularization Creates Sparsity

Other

Why does L1 regularization create sparsity?

22.

Optimal Batch Size and Reasoning

Other

Optimal batch size and why?

23.

Why Move in Negative Gradient Direction

Other

Why do we go into the negative of the gradient?

24.

MLE vs MAP Estimation

Other

Maximum Likelihood Estimation and Maximum a Posteriori Estimation

25.

Cross Entropy Loss and KL Divergence

Other

Cross Entropy Loss reasoning and relation to KL divergence

26.

Consequences of Same Weight Initialization

Other

If all weights are initialized to same values then what would happen

27.

Dropout During Training

Other

Dropout and what happen at training time

28.

Multi-GPU Training, Parallelism, PEFT, QLoRA

Other

Multi GPU training, parallelism, PEFT, QLoRA (mentioned in my resume)

29.

How does Vision Transformer (ViT) work?

Other

How does ViT work?

30.

Design a Visual Search System for Meesho

System Design

We want to build a visual search system for Meesho? How will you approach it?

Focus on model building
Which model will you select and why?
How do you train it
Evaluation and Business Metrics

Help the Community

Meesho

Recent Experiences

Meesho Data Scientist-1 Interview Experience | Bangalore

Summary

Full Experience

Interview Questions (30)

What is p-value?

Null vs Alternate Hypothesis

Sample Uniformly within Unit Circle

Explain Bias-Variance Tradeoff

Address Stagnant Validation Loss

Max Population Throughout History

Max Installable Libraries with Prerequisites

Explain ReLU Activation Function

Drawback of ReLU

Explain Positional Encoding in Transformers

Sine/Cosine vs Binary for Positional Encoding

Explain Layer and Batch Normalization

Convex and Non-Convex Loss Functions

Discuss CNNs

Discuss Pooling Techniques

Segmentation Basics and Pooling Techniques

Classification Metrics (ROC-AUC vs PR-AUC)

Possible Reasons for NaN Loss

Bias-Variance Tradeoff

Regularization Techniques

Why L1 Regularization Creates Sparsity

Optimal Batch Size and Reasoning

Why Move in Negative Gradient Direction

MLE vs MAP Estimation

Cross Entropy Loss and KL Divergence

Consequences of Same Weight Initialization

Dropout During Training

Multi-GPU Training, Parallelism, PEFT, QLoRA

How does Vision Transformer (ViT) work?

Design a Visual Search System for Meesho

Join the Discussion

No comments yet