In a research scientist interview, you'll be expected to show that you have the necessary technical knowledge and expertise pertaining to the specific position you're applying for. Some of the common topics include basic statistical methods, machine learning concepts, and case study analysis. Also, the interviewer will most likely assess your communication and interpersonal skills, which are essential for effective teamwork and funding acquisition.

10,868 Research Scientist interview questions shared by candidates

Here are three of the top research scientist interview questions and how to answer them:

How to answer: Basically, such an interview question asks for a textbook recall of a certain machine learning concept and its conditions and applications. Avoid overcomplicating it. Just give a simple and straightforward answer that shows that you have a solid grasp of the concept.

How to answer: The interviewer wants to evaluate your problem-solving skills. Carefully choose a challenging situation that best reflects your ability to solve problems and explain what you did to overcome it. Preferably, the problem should be one that's relevant to your desired position.

How to answer: If you had successfully secured research funding in the past, you can talk about some of the methods you used. If not, highlight the abilities you possess that can help you acquire funding, such as grant writing skills and networking skills.

Data Scientist Intern was asked...February 25, 2012

↳

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

↳

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

↳

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Data Scientist was asked...April 6, 2013

↳

We cannot say what has caused the spike since causal relationship cannot be established with observed data. But we can compare the averages of all the months by performing a hypothesis testing and rejecting the null hypothesis if the F1 score is significant. Less

↳

The photos are definitely Halloween pictures. Segment by country and date and check for a continual rise in photo uploads leading up to October 31st and a few days after for the lag. There's also a ton of these product questions like this on InterviewQuery.com for data scientists Less

↳

Hypothesis: the photos are Halloween pictures. Test: look at upload trends in countries that do not observe Halloween as a sort of counter-factual analysis. Less

Data Scientist & Machine Learning was asked...October 19, 2013

↳

Arrays are more efficient for accessing elements , while linked list are better for inserting or deleting elements, the choice between the two data structure depends on the specific requirements of the problem being solved. Less

↳

Stack and queues have different order of processing, operations for adding and removing elements, and usage scenarios.The choice between the two data structure depends on the specific requirements of the problem being solved Less

↳

A hash table is a data structure that allows for efficient insertion, deletion, and lookup of key-value pairs. It is based on the idea of hashing, which involves mapping each key to a specific index in an array using a hash function. The hash function takes a key as input and returns a unique index in the array. In order to handle collisions (when two or more keys map to the same index), some form of collision resolution mechanism is used, such as separate chaining or open addressing. In separate chaining, each index in the array is a linked list, and each key-value pair is stored in a node in the corresponding linked list. When a collision occurs, the new key-value pair is added to the end of the linked list at the corresponding index. In open addressing, when a collision occurs, a different index in the array is searched for to store the new key-value pair. There are several techniques for open addressing, such as linear probing, quadratic probing, and double hashing. Hash tables have an average case time complexity of O(1) for insertion, deletion, and lookup operations, making them a highly efficient data structure for many applications, such as database indexing, caching, and compiler symbol tables. However, their worst-case time complexity can be as bad as O(n) in rare cases, such as when there are many collisions and the hash table needs to be resized. Less

Data Scientist was asked...March 1, 2016

↳

CREATE temporary table likes ( userid int not null, pageid int not null ) CREATE temporary table friends ( userid int not null, friendid int not null ) insert into likes VALUES (1, 101), (1, 201), (2, 201), (2, 301); insert into friends VALUES (1, 2); select f.userid, l.pageid from friends f join likes l ON l.userid = f.friendid LEFT JOIN likes r ON (r.userid = f.userid AND r.pageid = l.pageid) where r.pageid IS NULL; Less

↳

select w.userid, w.pageid from ( select f.userid, l.pageid from rollups_new.friends f join rollups_new.likes l ON l.userid = f.friendid) w left join rollups_new.likes l on w.userid=l.userid and w.pageid=l.pageid where l.pageid is null Less

↳

Use Except select f.user_id, l.page_id from friends f inner join likes l on f.fd_id = l.user_id group by f.user_id, l.page_id -- for each user, the unique pages that liked by their friends Except select user_id, page_id from likes Less

Data Scientist was asked...May 9, 2016

↳

Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two. Less

↳

Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question? Less

↳

for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum. Less

Data Scientist was asked...March 29, 2015

↳

Because at the beginning time, A has 8 and B has 6, so let A:x and B:y, then A:8+x-y and B:6-x+y; so there are 10/36 prob of B wins. And A wins prob is 21/36 and the equal prob for next round is 5/36. So for B wins at round prob is 10/36. And if they are equal and to have another round, the number has changed to 7 and 7. So A:7+x-y and B:7-x+y, so this time B wins has prob 15/36 and A wins has prob 15/36. And the equal to have another round is 6/36=1/6. So overall B wins in 2 rounds has prob 5/36*15/36. And for round 3,4,...etc, since after each equal round, the number will go back to 7 and 7 so the prob will not change. So B wins in round 3,4,...n has prob 5/36*(6/36)^(r-2)*15/36. r means the number of the total rounds. Less

↳

So many answers...Here's my version: For round1, B win only if it gets 3 or more stones than A, which is (A,B) = (1,4) (1,5) (1, 6) (2, 5) (2,6) (3,6) which is 6 cases out of all 36 probabilities. So B has 1/6 chance to win. To draw, B needs to get exactly 2 stones more than A, which is (A, B) = (1,3) (2,4) (3,5) (4,6) or 1/9. Entering the second round, all stones should be equal, so the chance to draw become 1/6, and the chance for either to win is 5/12. So the final answer is (1/6, 1/9*5/12, (1/9)^2*5/12, .....(1/9)^(n-1)*5/12) ) Less

↳

I don't get it. Shouldn't prob of B winning given it's tie at 1st round be 15/36? given it's tie at 1st round, at the 2nd round Nb > Na can happen if (B,A) is (2,1), (3,1/2),(4,1/2/3), (5,1/2/3/4),(6,1/2/3/4/5), which totals 15 out of 36. Less

Data Scientist was asked...April 25, 2015

↳

def mergeSort(lst): split = len(lst) / 2 left = lst[:split] right = lst[split:] if len(left) > 1 or len(right) > 1: left = mergeSort(left) right = mergeSort(right) return merge(left, right) def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_list Less

↳

list = ["1", "4", "2", "10", "5"] list = [ int(i) for i in list ] list.sort() print (list) Less

↳

def mysort(arr): if len(arr) pivot] return mysort(left) + middle + mysort(right) Less

Data Scientist was asked...January 7, 2015

↳

Assuming ctr is defined as total number clicks / total number of impressions (not counting each unique user's action) select appid, total_click_ct / total_imp_ct as ctr from ( select appid, count(distinct case when flag = 'imp' then 1 else 0 end) as total_imp_ct, count(distinct case when flag = 'click' then 1 else 0 end) as total_click_ct, from table where ts > x and ts < y group by appid) table2 order by ctr desc; Less

↳

re: assessing the quality of an app, some ideas: if it's your company's app, you can look at numbers of downloads and server traffic. You can survey your users, measure profitability, and track how often your users refer others to get the app. if it's not your app, you can look at ratings in the app store, cost, read reviews, and look at how well it does against your competitors' cost and ratings. you can look at specific features: does it do what's needed? does it look good? does it have a bunch of unnecessary features that nobody uses or wants? Does it clutter up your storage or run very slowly, i.e. is it optimized? you can look at the company's reputation, if you don't know anything about them, look at their website, does it look professional? does it have demos? can you test the app with a free trial before you buy? Less

↳

SELECT appid, sum(click) / count(*) AS crt FROM dialoglog GROUP BY appid;

Data Scientist was asked...May 11, 2018

↳

For the product question: This is an example of selection bias. The subset of users who opted in for the security features are not representative of the underlying population of FB users. They are probably more concerned about how their data is being used. Further, because the survey would have been completed on a voluntary basis, it represents another source of selection bias. These individuals are more likely to be engaged with the FB product than the general population, and care enough to complete the survey. Once again, these users are not a representative (or for that matter random) sample from the underlying population of FB users. This would explain why Group 1 had a much lower satisfaction rate as compared to Group 2, as they represent a specific subset of FB's users who care deeply about this issue, and by the virtue that they have the security feature on is indicative of the fact that they take data privacy seriously. For the SQL question: This is probably not the most efficient, but is the only thing that came to my mind: with all_users as ( select userA as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userA union all select userB as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userB ), engaged as ( select userID, sum(interactions) as totalInteractions from all_users group by userID having totalInteractions >= 5 ) select count(1) as greaterThanFive from engaged Less

↳

select c.user_id , sum(c.cnt) as sum_cnt from ( SELECT USER_A as user_id,count(user_b) cnt FROM interactions group by user_a union all SELECT USER_B as user_id,count(user_a) cnt FROM interactions group by user_B) c group by c.user_id having sum(c.cnt)>5 ; Less

↳

WITH CTE AS ( SELECT date, User_a AS user FROM users UNION ALL SELECT date, User_b AS user FROM users) SELECT date, user, count(DISTINCT user) AS interactions FROM CTE where date = DATE_SUB(date INTERVAL 1 day) GROUP BY 1,2 HAVING interactions > 5 Less

Data Scientist was asked...April 29, 2019

↳

SELECT country, SUM(count_of_request_sent) AS total_count_of_request_sent, SUM((count_of_request_sent * percent_of_request_failed) / 100) AS total_count_of_request_failed FROM table GROUP BY country; Less

↳

SELECT country, SUM(requests_sent), SUM((percent_failed/100) * requests_sent) FROM requests GROUP BY country; Less

↳

My assumption is that the requests_sent columns is not an integer. It might just have status as per 'failed", "successful" etc. Best to verify from the interviewer. That means we may not be able to take the sum of teh request_sent but have to use count instead. WITH CTE AS ( SELECT country, request_sent, (perc_failed * count(request_sent)) AS total_failed FROM T1 ) SELECT country, COUNT(request_sent), SUM(total_failed) GROUP BY 1 Less

phd engineerphysical scientistimaging scientistresearch specialiststaff scientistrd scientistresearch fellowphd scientistspeech scientistresearch associatesenior scientistprincipal scientistresearch staff memberscientist phdresearch engineeryield engineerdata mining scientistassociate scientistresearch technologist