Kotak - Data Engineer
Summary
I interviewed for a Data Engineer position at Kotak and faced a Bar Raiser round which included challenging SQL, PySpark, and data modeling design problems.
Full Experience
I recently had my Bar Raiser round interview for the Data Engineer role at Kotak. The interview focused on a variety of data-related challenges, including complex SQL queries, PySpark scripting, and data model design for real-world scenarios. I was presented with several problems covering different aspects of data engineering, testing my ability to handle data transformations, aggregations, and schema design. The round was quite intensive, and I had to demonstrate my problem-solving skills across various data platforms and concepts.
Interview Questions (4)
Given a table EMPLOYEE (which should be interpreted as MATCHES given the context) with columns Team1 (TEXT), Team2 (TEXT), and MatchResult (INTEGER).
CREATE TABLE EMPLOYEE ( Team1 TEXT NOT NULL, Team2 TEXT NOT NULL, MatchResult INTEGER );
-- insert INSERT INTO EMPLOYEE VALUES ("RR" ,"KKR" ,2); INSERT INTO EMPLOYEE VALUES("MI" ,"CSK" ,2); INSERT INTO EMPLOYEE VALUES("RCB" ,"KXP" ,1); INSERT INTO EMPLOYEE VALUES("DD", "RR", 0); INSERT INTO EMPLOYEE VALUES("KKR", "RR", 1); INSERT INTO EMPLOYEE VALUES("CSK", "RCB", 2); INSERT INTO EMPLOYEE VALUES("KXP" ,"DD", 2);
Match Result descriptions:
- 1 => Match won by Team 1
- 2 => Match won by Team 2
- 0 => Draw
Output should have the following columns: Team, Played, Won, Lost, Draw.
Example Output:
Team Played Won Lost Draw
RR 3 0 2 1
CSK 2 1 1 0
RCB 2 2 0 0
Given two tables/dataframes: Sales and Exchange.
Sales Table:
sales_date sales_amount currency
01-Jan-16 500 INR
01-Jan-16 100 GBP
02-Jan-16 1000 INR
02-Jan-16 150 GBP
03-Jan-16 1500 INR
Exchange Table:
source_currency target_currency exchange_rate eff_start_date eff_end_date
INR USD 0.014 31-Dec-15 02-Jan-16
INR USD 0.015 02-Jan-16
GBP USD 1.32 20-Dec-15
GBP USD 1.3 01-Jan-16
GBP USD 1.35 10-Jan-16
Find date-wise sales in USD, considering the effective exchange rates for each date. Exchange rates are effective from eff_start_date up to (but not including) eff_end_date if eff_end_date is present. If eff_end_date is null, it means the rate is effective indefinitely from eff_start_date.
Use Case: Design a data model for a customer segmentation and analysis system. The company wants to segment customers based on various factors such as purchasing behavior, demographics, and engagement. Analytical reports are needed to understand customer trends.
Requirements:
- Include tables for customers, transactions, demographics, and segments.
- Consider necessary columns for each table.
- Write a SQL query to retrieve the count of customers in each segment.
Use Case: Design a data model for a social media analytics platform. The system should store information about users, posts, comments, likes, and user interactions. The company wants to analyze user engagement, popular posts, and comment trends.
Requirements:
- Include tables for users, posts, comments, likes, and interactions.
- Consider necessary columns for each table.
- Write a SQL query to retrieve the top 5 posts with the highest engagement.
Additional notes provided by the candidate for potential tables/columns:
users:id,name,numberpost:post_id,post_content,user_id,no_of_likes,[comment_id]comment:id,post_id,comment_id,no_of_likes,[interactions]