Axon | Phone Interview | Audio Segments
Summary
I had a phone interview with Axon for a Machine Learning Engineer role and was given a challenging audio segmentation coding problem, which I unfortunately couldn't solve within the given time.
Full Experience
I recently had a phone interview for a Machine Learning Engineer position at Axon. The interviewer presented a coding challenge focused on audio segmentation. I was allocated approximately 22 minutes to devise a solution. Despite my efforts, I was unable to complete a successful implementation within the allotted time.
Interview Questions (1)
You have an array of (audio) frames representing an audio file, and a text file with the corresponding transcription. Can you segment the audio into sets of monologues for each speaker?
The transcription file follows the format:TIMESTAMP(S)\tSpeaker\tTextFPS: 20
Example:
Inputaudio_frames = [f1, f2, f3]transcription = [10\tJohn\t'Hi Kate', 12\tKate\t'Hi John', ...]
Output"John" : [f1, ... f11], [f23, ... f25],"Kate" : ...