Microsoft Senior Applied Scientist (L63) Interview Experience

microsoft logo
microsoft
· Senior Applied Scientist (L63)· 3y exp
April 6, 2026 · 1 reads

Summary

I completed a multi‑round interview process for a Senior Applied Scientist role at Microsoft, passing the phone screen and four onsite rounds covering ML fundamentals, coding, and problem solving, and ultimately received an offer.

Full Experience

Background: Senior Data Scientist with ~3 YOE

Interview Process

  1. Phone Screen (60 min) Format: Coding + Problem Solving Problem Solving: Behavioral scenarios and use cases Coding: Min Stack + follow-ups Outcome: Passed to onsite

  2. Onsite Loop (4 rounds, 60 min each) Note: Recruiter's prep material was different from actual rounds for two rounds.

    1. Round 1: ML Fundamentals + ML Coding Actual Format: As described ML Coding: Implement K-means from scratch Follow-up: How would you vectorize this implementation? (I struggled a bit with matrix broadcasting)

    2. Round 2: ML Problem Solving + ML System Design Actual Format: ML fundamentals + coding (no system design) ML Questions (that I remember):

      • Reinforcement learning: Thompson sampling vs epsilon-greedy, explore vs exploit tradeoffs
      • Calibration: Platt scaling
      • Imbalanced data: Downsampling majority class

      Coding: Find max number of points on a line (2D array of points) I spent time handling floating point precision loss but got optimized solution

    3. Round 3: Data Analysis + Applied Sciences Actual Format: ML questions + coding ML Questions:

      • Offline metrics higher than online - why and how to address?
      • Data drift: Covariate shift vs label drift
      • Statistical tests for drift detection
      • Cold start problem for new ads
      • Explore/exploit tradeoffs
      • BERT vs GPT architecture and differences
      • Off-policy learning: "You have logged data from a model trained on an old policy, how would you fit a new model to update the policy?" (Found this confusing)

      Coding: Implement self-attention and masked self-attention I got mask syntax slightly wrong but overall code was correct and optimal otherwise.

    4. Round 4: Problem Solving + Coding (HackerRank) Format: As described Coding: Merge intervals ML Fundamentals: Bias-variance tradeoff, bagging, boosting, calibration, drift Behavioral: Standard behavioral questions (don't remember specifics)

Key Takeaways

  • Prepare for coding in every round
  • ML fundamentals are crucial - specific topics depend on the team and role, but prepare for those thoroughly
  • Coding spans theory to implementation - be ready for everything from LeetCode to implementing ML algorithms from scratch

Outcome

Offer

Interview Questions (13)

1.

Implement K-means from scratch

Data Structures & Algorithms

Implement the K‑means clustering algorithm from scratch. After the implementation, discuss how you would vectorize the algorithm to improve performance, especially handling matrix broadcasting.

2.

Find maximum number of collinear points

Data Structures & Algorithms

Given a 2‑D array of points, find the maximum number of points that lie on a single straight line. The solution should handle floating‑point precision issues efficiently.

3.

Thompson sampling vs. epsilon‑greedy

Other

Explain the differences between Thompson sampling and epsilon‑greedy in reinforcement learning, and discuss the explore vs. exploit trade‑off.

4.

Calibration with Platt scaling

Other

Describe how Platt scaling works for calibrating the probabilities of a binary classifier.

5.

Handling imbalanced data via downsampling

Other

How would you handle an imbalanced dataset by downsampling the majority class?

6.

Offline metrics higher than online metrics

Other

Why might offline evaluation metrics be higher than online (A/B test) metrics, and how would you address this discrepancy?

7.

Data drift: covariate shift vs. label drift

Other

Differentiate between covariate shift and label drift as forms of data drift, and explain how each can be detected.

8.

Statistical tests for drift detection

Other

What statistical tests can be used to detect data drift in a production ML system?

9.

Cold start problem for new ads

Other

Explain the cold‑start problem when introducing new advertisements into a recommendation system and possible mitigation strategies.

10.

BERT vs. GPT architectures

Other

Compare the architectures of BERT and GPT models, highlighting key differences in design and typical use‑cases.

11.

Off‑policy learning from logged data

Other

Given logged data generated by an old policy, how would you train a new model to update the policy (off‑policy learning)?

12.

Implement self‑attention and masked self‑attention

Data Structures & Algorithms

Implement the self‑attention mechanism and its masked variant as used in transformer models.

13.

Merge intervals

Data Structures & Algorithms

Given a collection of intervals, merge all overlapping intervals and return the resulting list of non‑overlapping intervals.

📣 Found this helpful? Please share it with friends who are preparing for interviews!

Discussion (0)

Share your thoughts and ask questions

Join the Discussion

Sign in with Google to share your thoughts and ask questions

No comments yet

Be the first to share your thoughts and start the discussion!