Microsoft SDE Intern Interview Experience
💼 LTIMindtree Interview Experience (On-Campus) | Fresher | 2026
Salesforce SMTS | Interview Experience | Rejected
JPMC | SDE2 (Associate) - Java Backend - Interview Experience + Compensation
Microsoft - SDE2 - Coding Round
Meesho Data Scientist-1 Interview Experience | Bangalore
Summary
I participated in the Meesho Data Challenge 2024, won, and was invited to interview for a Data Scientist-1 role in Bangalore. The interview process consisted of four rounds covering DSA, SQL, Statistics, ML/DL, system design, and project discussions. I received a confirmation from HR on the same day as my final interview round.
Full Experience
Participated in Meesho Data Challenge 2024 (won!) and hence got the chance to interview for open roles at Meesho.
HR contacted on 10th April 2025 and shared the interview details:
- Round 1: DSA, SQL, Statistics & Mathematics, Basic ML/DL/GenAI Knowledge
- Round 2: ML Depth
- Round 3: ML Breadth (case study based)
- Round 4: Overall Discussion
Round 1 (17/04/2025)
- What do you understand by p-value?
- If we fail to reject the null hypothesis then do we accept an alternate hypothesis?
- We want to sample values between [0, 1] uniformly but it should remain within the unit circle. How do you do?
- Explain bias-variance tradeoff
- Training on a large dataset, training loss is getting reduced but validation loss is stagnant, how will you address it?
- Coding
-
Given a list of citizens in a country, Info of birth date and death date, Find the max population throughout the history
-
Find the max number of libraries that can be installed from ‘libaries_needed’: You are given three lists: Libraries_needed = [‘pandas’, ‘numpy’] Libraries_wd_no_preq = [‘A’, ‘B’, ‘C’] Libraries_wd_preq = [[‘A’, ‘B’], [‘B’]]
Here 0th index elements are prerequisites for ‘pandas’ and so on
-
- Explain ReLU activation function
- What is the drawback of using ReLU
- Explain positional encoding in transformers
- Why did authors use sine and cosine? Why can’t we use binary values?
Round 2
-
First (21/04/2025)
- Explain layer and batch normalization
- Discussed potential alternate solution of Meesho Hackathon project
- Convex and non-convex loss functions
- CNNs (projects are CV based)
- Discussion on pooling techniques
- Segmentation basics and pooling techniques in that
- Classification metrics: class imbalance, comparison between ROC-AUC and PR-AUC
- If loss is NaN then possible reasons?
-
Second (25/04/2025) : This was an additional interview with another panel.
- Bias-variance tradeoff
- Regularization techniques
- Why does L1 regularization create sparsity?
- Optimal batch size and why?
- Why do we go into the negative of the gradient?
- Maximum Likelihood Estimation and Maximum a Posteriori Estimation
- Cross Entropy Loss reasoning and relation to KL divergence
- If all weights are initialized to same values then what would happen
- Dropout and what happen at training time
- Multi GPU training, parallelism, PEFT, QLoRA (mentioned in my resume)
Round 3 (30/04/2025)
- In-depth Q&A of my preferred project (Tip: if any project with model building from scratch then discuss that)
- How does ViT work?
- We want to build a visual search system for Meesho? How will you approach it?
- Focus on model building
- Which model will you select and why?
- How do you train it
- Evaluation and Business Metrics
Later that day I received confirmation from HR.
Interview Questions (30)
What do you understand by p-value?
If we fail to reject the null hypothesis then do we accept an alternate hypothesis?
We want to sample values between [0, 1] uniformly but it should remain within the unit circle. How do you do?
Explain bias-variance tradeoff
Training on a large dataset, training loss is getting reduced but validation loss is stagnant, how will you address it?
Given a list of citizens in a country, Info of birth date and death date, Find the max population throughout the history
Find the max number of libraries that can be installed from ‘libaries_needed’:
You are given three lists:
Libraries_needed = [‘pandas’, ‘numpy’]
Libraries_wd_no_preq = [‘A’, ‘B’, ‘C’]
Libraries_wd_preq = [[‘A’, ‘B’], [‘B’]]
Here 0th index elements are prerequisites for ‘pandas’ and so on
Explain ReLU activation function
What is the drawback of using ReLU
Explain positional encoding in transformers
Why did authors use sine and cosine? Why can’t we use binary values?
Explain layer and batch normalization
Convex and non-convex loss functions
CNNs (projects are CV based)
Discussion on pooling techniques
Segmentation basics and pooling techniques in that
Classification metrics: class imbalance, comparison between ROC-AUC and PR-AUC
If loss is NaN then possible reasons?
Bias-variance tradeoff
Regularization techniques
Why does L1 regularization create sparsity?
Optimal batch size and why?
Why do we go into the negative of the gradient?
Maximum Likelihood Estimation and Maximum a Posteriori Estimation
Cross Entropy Loss reasoning and relation to KL divergence
If all weights are initialized to same values then what would happen
Dropout and what happen at training time
Multi GPU training, parallelism, PEFT, QLoRA (mentioned in my resume)
How does ViT work?
We want to build a visual search system for Meesho? How will you approach it?
- Focus on model building
- Which model will you select and why?
- How do you train it
- Evaluation and Business Metrics