MathCo Cloud Engineer II Interview Experience

mathco logo
mathco
Cloud Engineer IIRejected
August 20, 20256 reads

Summary

I interviewed for the Cloud Engineer II role at MathCo, going through two rounds. The first round covered Python, SQL, and PySpark, which went well, leading to the second round. The final round, however, was brief and focused on PySpark and orchestration, and I was ultimately rejected.

Full Experience

I initiated contact with a recruiter via email, which led to my first interview round being scheduled. This round was primarily focused on technical skills across Python, Pandas, SQL, Spark, and Azure. After a brief self-introduction, I was asked several questions about Azure services, specifically differentiating between ADF, Logic Apps, Azure Functions, and explaining Blob storage.

Following the Azure discussion, I was given a Python and Pandas task. I needed to write a script to read a CSV sales file, group the data by region to calculate total sales, identify the top three regions by sales, and then output these results to a new CSV file. I implemented this using standard Pandas operations.

Next, I tackled an SQL problem. Given orders(order_id, customer_id, order_date, amount) and customers(customer_id, name, region) tables, my task was to find the top three customers based on their total order amount within the last six months, including their name and region. I used CTEs and joins to construct my query.

Finally, in the first round, I was presented with a PySpark task. The goal was to load a large dataset of user activity logs, filter them for a specific date range, group the data by user to count their actions, and save the final result in a Parquet file. I leveraged SparkSession and PySpark functions to achieve this. I was informed that I had been selected for the next round after successfully completing these tasks.

My second interview round was less engaging. The interviewer seemed somewhat uninterested. After a short introduction, they inquired about my real-time experience with PySpark, which I honestly didn't have much to elaborate on. They then asked about best practices for workflow orchestration, where I discussed using tools like Airflow or ADF, modular DAGs, error handling, parameterization, and logging. The final question was a simple Python task: writing a function to sum any number of inputs using *args. The interview concluded rather abruptly after only about 10 minutes. Unfortunately, I was rejected after this round. Overall, it felt like it wasn't my day.

Interview Questions (6)

Q1
Azure Services and Differences
Other

Explain Azure Data Factory (ADF), Logic Apps, Azure Functions, and Blob storage. Discuss the differences between ADF, Logic Apps, and Azure Functions.

Q2
Python Pandas Sales Data Analysis
Data Structures & AlgorithmsEasy

Write a Python script to:

  • Read a CSV file with sales data
  • Group by region and calculate total sales
  • Identify the top 3 regions by sales
  • Output results to a new CSV

Q3
SQL Top 3 Customers by Recent Orders
Data Structures & AlgorithmsMedium

Tables:

  • orders(order_id, customer_id, order_date, amount)
  • customers(customer_id, name, region)
Task:
Find the top 3 customers by total order amount in the last 6 months. Include customer name and region.

Q4
PySpark User Activity Log Analysis
Data Structures & AlgorithmsMedium

Given a large dataset of user activity logs:

  • Load the data using PySpark.
  • Filter logs for a specific date range.
  • Group by user and count actions.
  • Save the result as a Parquet file.

Q5
Orchestration Best Practices
System Design

Discuss best practices for workflow orchestration.

Q6
Python Function to Sum Arbitrary Inputs
Data Structures & AlgorithmsEasy

Write a function that gives the sum of any number of inputs using *args.

Discussion (0)

Share your thoughts and ask questions

Join the Discussion

Sign in with Google to share your thoughts and ask questions

No comments yet

Be the first to share your thoughts and start the discussion!