Summary
I interviewed at Avaamo in December 2024 for a Senior Machine Learning Engineer role, with three rounds focused on RAG, BERT, LLMs, and vector databases. The first round was the most challenging, testing foundational knowledge in transformer-based models.
Full Experience
I gave interview at Avaamo in Dec 2024 and these were the questions around which the interviews revolved. I have 3.5 years of experience as a data scientist, the role was Senior Machine Learning Engineer.
Question List for the 3 rounds
- Explain the basics of RAG architecture and its components.
- What are different parsing and chunking strategies?
- How does chunking impact the quality of retrieval in RAG?
- What is BERT pretraining using MLM and NSP?
- How do bi-encoders and cross-encoders differ in architecture and use cases?
- What are the pros and cons of bi-encoders vs cross-encoders?
- How do you fine-tune embedding models? What loss functions are used (e.g., triplet loss)?
- What are the basics of LLMs and their fine-tuning approaches (PEFT, LoRA, instruction tuning, adapters)?
- What are decoding strategies in LLMs (temperature, top-k, top-p, beam search)?
- What are some techniques for evaluating RAG systems?
- How would you optimize latency in a RAG pipeline?
- What are some ANN (Approximate Nearest Neighbor) algorithms used in vector databases?
- What is the difference between using pretrained models vs fine-tuned models in RAG?
- What are some vector database fundamentals and retrieval configurations?
- What is semantic caching and how is it useful?
- How do encoders function in LLM generation tasks?
- What prompting techniques are used in real-world applications?
- How would you optimize or improve the performance of a GenAI classification system?
The rounds were similar with the first one being the most challenging testing the basics and experience with transformer based models. Since there work is around developing framework for chatbot building. They use their own proprietary frameworks to solve the problems hence fundamentals are essential.
Interview Questions (18)
Explain the basics of RAG architecture and its components.
What are different parsing and chunking strategies?
How does chunking impact the quality of retrieval in RAG?
What is BERT pretraining using MLM and NSP?
How do bi-encoders and cross-encoders differ in architecture and use cases?
What are the pros and cons of bi-encoders vs cross-encoders?
How do you fine-tune embedding models? What loss functions are used (e.g., triplet loss)?
What are the basics of LLMs and their fine-tuning approaches (PEFT, LoRA, instruction tuning, adapters)?
What are decoding strategies in LLMs (temperature, top-k, top-p, beam search)?
What are some techniques for evaluating RAG systems?
How would you optimize latency in a RAG pipeline?
What are some ANN (Approximate Nearest Neighbor) algorithms used in vector databases?
What is the difference between using pretrained models vs fine-tuned models in RAG?
What are some vector database fundamentals and retrieval configurations?
What is semantic caching and how is it useful?
How do encoders function in LLM generation tasks?
What prompting techniques are used in real-world applications?
How would you optimize or improve the performance of a GenAI classification system?