Google Interview: Filesystem Handling
Summary
I recently interviewed at Google, where I tackled a problem involving filesystem handling, specifically calculating directory sizes and discussing optimization strategies within a Python dictionary representation.
Full Experience
During my interview at Google, I was presented with a problem that simulated a filesystem using a Python dictionary. Each node could be a file with a size or a directory containing children. My primary task was to implement a function, get_size(), to determine the total size for any given key in the filesystem. Following this, we explored various approaches to optimize the size calculation, particularly focusing on caching techniques. The discussion also extended to the fundamental properties and invariants expected in a robust filesystem, such as the absence of cycles, ensuring every file belongs to a directory, and maintaining unique children per directory.
Interview Questions (1)
You have a file system represented as a Python dictionary. Each node is either a file or a directory. Directories can contain other directories or files as children. For example:
fs = {
1: { "type": "directory", "name": "root", "children": [2, 3] },
2: { "type": "directory", "name": "dir", "children": [4, 5] },
4: { "type": "file", "name": "file1", "size": 100 },
5: { "type": "file", "name": "file2", "size": 200 },
3: { "type": "file", "name": "file3", "size": 300 }
}
Tasks:
- Write a function
get_size()that returns the total size for a key. - How to optimise the size calculation with caching if needed.
- Discussion about what is expected from a typical filesystem. I could come up with the following:
- No cycles: a directory cannot include itself directly or indirectly.
- Every file belongs to at least one directory.
- No two directories share the same child.
- No directory contains a non-existent child key.