Honestly one of the toughest interview processes I have been through. You definitely need good computer vision experience to get through them and I mean actual experience because their questions are centered that way around deployment, configuration, etc. The 1st round was difficult and I answered around 65-70% of the questions. When I made it to the 2nd round, the interview was done in 15 minutes since the questions were too focused. Know your resume and projects inside and out and the tech used there(CNNs, transformers, etc.).
Both interviewers were super helpful though, explaining certain concepts that I didn't understand. Also, HR did regular follow-ups.
Interview questions [1]
Question 1
Edge devices
Transformers
YOLO (feature extractors, if YOLO uses grids, how would it be able to capture spatial data, how to process all the bounding boxes to get a single one i.e. non max suppression )
Difference between YOLO and R-CNN (and its faster variants) in terms of performance and speed.
GPU optimization for trained model (important one which was asked in both interviews) - If I have a model that has 30 FPS on GPU, how would I tune it to get 50 or 60 FPS?
Image matting - since my project revolved around it
Types of convolutions
Skip connections
Difference between AlexNet and ResNet
How to extract key value pairs from text using NLP