Research Scientist Interview Questions in Minneapolis-St. Paul, MN

Research Scientist Interview Questions in Minneapolis-St. Paul, MN

In a research scientist interview, you'll be expected to show that you have the necessary technical knowledge and expertise pertaining to the specific position you're applying for. Some of the common topics include basic statistical methods, machine learning concepts, and case study analysis. Also, the interviewer will most likely assess your communication and interpersonal skills, which are essential for effective teamwork and funding acquisition.

10,868 Research Scientist interview questions shared by candidates

Top Research Scientist Interview Questions & How to Answer

Here are three of the top research scientist interview questions and how to answer them:

Question #1: What is X concept? What are its assumptions and how do you apply it?

How to answer: Basically, such an interview question asks for a textbook recall of a certain machine learning concept and its conditions and applications. Avoid overcomplicating it. Just give a simple and straightforward answer that shows that you have a solid grasp of the concept.

Question #2: Provide an example of a problem you faced in your previous role and how you solved it.

How to answer: The interviewer wants to evaluate your problem-solving skills. Carefully choose a challenging situation that best reflects your ability to solve problems and explain what you did to overcome it. Preferably, the problem should be one that's relevant to your desired position.

Question #3: How would you obtain research funding?

How to answer: If you had successfully secured research funding in the past, you can talk about some of the methods you used. If not, highlight the abilities you possess that can help you acquire funding, such as grant writing skills and networking skills.

Top Interview Questions

Sort: Relevance|Popular|Date
Data Scientist Intern was asked...February 25, 2012

Find the second largest element in a Binary Search Tree

15 Answers

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Show More Responses

You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?

3 Answers

We cannot say what has caused the spike since causal relationship cannot be established with observed data. But we can compare the averages of all the months by performing a hypothesis testing and rejecting the null hypothesis if the F1 score is significant. Less

The photos are definitely Halloween pictures. Segment by country and date and check for a continual rise in photo uploads leading up to October 31st and a few days after for the lag. There's also a ton of these product questions like this on for data scientists Less

Hypothesis: the photos are Halloween pictures. Test: look at upload trends in countries that do not observe Halloween as a sort of counter-factual analysis. Less


The three data structure questions are: 1. the difference between linked list and array; 2. the difference between stack and queue; 3. describe hash table.

4 Answers

Arrays are more efficient for accessing elements , while linked list are better for inserting or deleting elements, the choice between the two data structure depends on the specific requirements of the problem being solved. Less

Stack and queues have different order of processing, operations for adding and removing elements, and usage scenarios.The choice between the two data structure depends on the specific requirements of the problem being solved Less

A hash table is a data structure that allows for efficient insertion, deletion, and lookup of key-value pairs. It is based on the idea of hashing, which involves mapping each key to a specific index in an array using a hash function. The hash function takes a key as input and returns a unique index in the array. In order to handle collisions (when two or more keys map to the same index), some form of collision resolution mechanism is used, such as separate chaining or open addressing. In separate chaining, each index in the array is a linked list, and each key-value pair is stored in a node in the corresponding linked list. When a collision occurs, the new key-value pair is added to the end of the linked list at the corresponding index. In open addressing, when a collision occurs, a different index in the array is searched for to store the new key-value pair. There are several techniques for open addressing, such as linear probing, quadratic probing, and double hashing. Hash tables have an average case time complexity of O(1) for insertion, deletion, and lookup operations, making them a highly efficient data structure for many applications, such as database indexing, caching, and compiler symbol tables. However, their worst-case time complexity can be as bad as O(n) in rare cases, such as when there are many collisions and the hash table needs to be resized. Less

Show More Responses

Write an SQL query that makes recommendations using the pages that your friends liked. Assume you have two tables: a two-column table of users and their friends, and a two-column table of users and the pages they liked. It should not recommend pages you already like.

40 Answers

CREATE temporary table likes ( userid int not null, pageid int not null ) CREATE temporary table friends ( userid int not null, friendid int not null ) insert into likes VALUES (1, 101), (1, 201), (2, 201), (2, 301); insert into friends VALUES (1, 2); select f.userid, l.pageid from friends f join likes l ON l.userid = f.friendid LEFT JOIN likes r ON (r.userid = f.userid AND r.pageid = l.pageid) where r.pageid IS NULL; Less

select w.userid, w.pageid from ( select f.userid, l.pageid from rollups_new.friends f join rollups_new.likes l ON l.userid = f.friendid) w left join rollups_new.likes l on w.userid=l.userid and w.pageid=l.pageid where l.pageid is null Less

Use Except select f.user_id, l.page_id from friends f inner join likes l on f.fd_id = l.user_id group by f.user_id, l.page_id -- for each user, the unique pages that liked by their friends Except select user_id, page_id from likes Less

Show More Responses

Data challenge was very similar to the ads analysis challenge on the book the collection of data science takehome challenge, so that was easy (if you have done your homework). SQL was: you have a table where you have date, user_id, song_id and count. It shows at the end of each day how many times in her history a user has listened to a given song. So count is cumulative sum. You have to update this on a daily basis based on a second table that records in real time when a user listens to a given song. Basically, at the end of each day, you go to this second table and pull a count of each user/song combination and then add this count to the first table that has the lifetime count. If it is the first time a user has listened to a given song, you won't have this pair in the lifetime table, so you have to create the pair there and then add the count of the last day. Onsite: lots of ads related and machine learning questions. How to build an ad model, how to test it, describe a model. I didn't do well in some of these.

18 Answers

Can't tell you the solution of the ads analysis challenge. I would recommend getting in touch with the book author though. It was really useful to prep for all these interviews. SQL is a full outer join between life time count and last day count and then sum the two. Less

Can you post here your solution for the ads analysis from the takehome challenge book. I also bought the book and was interested in comparing the solutions. Also can you post here how you solved the SQL question? Less

for the SQL, I think both should work. Outer join between lifetime count and new day count and then sum columns replacing NULLs with 0, or union all between those two, group by and then sum. Less

Show More Responses

Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, ..., n rounds?

16 Answers

Because at the beginning time, A has 8 and B has 6, so let A:x and B:y, then A:8+x-y and B:6-x+y; so there are 10/36 prob of B wins. And A wins prob is 21/36 and the equal prob for next round is 5/36. So for B wins at round prob is 10/36. And if they are equal and to have another round, the number has changed to 7 and 7. So A:7+x-y and B:7-x+y, so this time B wins has prob 15/36 and A wins has prob 15/36. And the equal to have another round is 6/36=1/6. So overall B wins in 2 rounds has prob 5/36*15/36. And for round 3,4,...etc, since after each equal round, the number will go back to 7 and 7 so the prob will not change. So B wins in round 3,4,...n has prob 5/36*(6/36)^(r-2)*15/36. r means the number of the total rounds. Less

So many answers...Here's my version: For round1, B win only if it gets 3 or more stones than A, which is (A,B) = (1,4) (1,5) (1, 6) (2, 5) (2,6) (3,6) which is 6 cases out of all 36 probabilities. So B has 1/6 chance to win. To draw, B needs to get exactly 2 stones more than A, which is (A, B) = (1,3) (2,4) (3,5) (4,6) or 1/9. Entering the second round, all stones should be equal, so the chance to draw become 1/6, and the chance for either to win is 5/12. So the final answer is (1/6, 1/9*5/12, (1/9)^2*5/12, .....(1/9)^(n-1)*5/12) ) Less

I don't get it. Shouldn't prob of B winning given it's tie at 1st round be 15/36? given it's tie at 1st round, at the 2nd round Nb > Na can happen if (B,A) is (2,1), (3,1/2),(4,1/2/3), (5,1/2/3/4),(6,1/2/3/4/5), which totals 15 out of 36. Less

Show More Responses

Write a sorting algorithm for a numerical dataset in Python.

8 Answers

def mergeSort(lst): split = len(lst) / 2 left = lst[:split] right = lst[split:] if len(left) > 1 or len(right) > 1: left = mergeSort(left) right = mergeSort(right) return merge(left, right) def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_list Less

list = ["1", "4", "2", "10", "5"] list = [ int(i) for i in list ] list.sort() print (list) Less

def mysort(arr): if len(arr) pivot] return mysort(left) + middle + mysort(right) Less

Show More Responses

data question: dialoglog (userid int appid int type char , a flag either "imp" or "click" ds timestamp ) How would you access the quality of app? How to compute click-through rate (in mySQL)?

8 Answers

Assuming ctr is defined as total number clicks / total number of impressions (not counting each unique user's action) select appid, total_click_ct / total_imp_ct as ctr from ( select appid, count(distinct case when flag = 'imp' then 1 else 0 end) as total_imp_ct, count(distinct case when flag = 'click' then 1 else 0 end) as total_click_ct, from table where ts > x and ts < y group by appid) table2 order by ctr desc; Less

re: assessing the quality of an app, some ideas: if it's your company's app, you can look at numbers of downloads and server traffic. You can survey your users, measure profitability, and track how often your users refer others to get the app. if it's not your app, you can look at ratings in the app store, cost, read reviews, and look at how well it does against your competitors' cost and ratings. you can look at specific features: does it do what's needed? does it look good? does it have a bunch of unnecessary features that nobody uses or wants? Does it clutter up your storage or run very slowly, i.e. is it optimized? you can look at the company's reputation, if you don't know anything about them, look at their website, does it look professional? does it have demos? can you test the app with a free trial before you buy? Less

SELECT appid, sum(click) / count(*) AS crt FROM dialoglog GROUP BY appid;

Show More Responses

SQL question: given a table of interaction between users (user_a | user_b | day), find number of users who had more than 5 interactions yesterday (assume there is only one unique interaction between a pair of users per day). Product Question: A user satisfaction survey was conducted for two groups of facebook users (each with 50 k sample size). Group1: who had enabled certain login security features Group 2: who had not enabled these security features. It was found that user satisfaction with group1 was 30% lower than with group 2. Why do you think so? Comment on how the survey was conducted?

7 Answers

For the product question: This is an example of selection bias. The subset of users who opted in for the security features are not representative of the underlying population of FB users. They are probably more concerned about how their data is being used. Further, because the survey would have been completed on a voluntary basis, it represents another source of selection bias. These individuals are more likely to be engaged with the FB product than the general population, and care enough to complete the survey. Once again, these users are not a representative (or for that matter random) sample from the underlying population of FB users. This would explain why Group 1 had a much lower satisfaction rate as compared to Group 2, as they represent a specific subset of FB's users who care deeply about this issue, and by the virtue that they have the security feature on is indicative of the fact that they take data privacy seriously. For the SQL question: This is probably not the most efficient, but is the only thing that came to my mind: with all_users as ( select userA as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userA union all select userB as userID, count(1) as interactions from table where date = current_date - interval 'day' group by userB ), engaged as ( select userID, sum(interactions) as totalInteractions from all_users group by userID having totalInteractions >= 5 ) select count(1) as greaterThanFive from engaged Less

select c.user_id , sum(c.cnt) as sum_cnt from ( SELECT USER_A as user_id,count(user_b) cnt FROM interactions group by user_a union all SELECT USER_B as user_id,count(user_a) cnt FROM interactions group by user_B) c group by c.user_id having sum(c.cnt)>5 ; Less

WITH CTE AS ( SELECT date, User_a AS user FROM users UNION ALL SELECT date, User_b AS user FROM users) SELECT date, user, count(DISTINCT user) AS interactions FROM CTE where date = DATE_SUB(date INTERVAL 1 day) GROUP BY 1,2 HAVING interactions > 5 Less

Show More Responses

Given a table with columns country (with two-letter country abbreviation), count of requests sent, percentage of requests sent failed, condense down to this data grouped by country: country (one row per country), total count of requests sent, total count of requests sent failed.

7 Answers

SELECT country, SUM(count_of_request_sent) AS total_count_of_request_sent, SUM((count_of_request_sent * percent_of_request_failed) / 100) AS total_count_of_request_failed FROM table GROUP BY country; Less

SELECT country, SUM(requests_sent), SUM((percent_failed/100) * requests_sent) FROM requests GROUP BY country; Less

My assumption is that the requests_sent columns is not an integer. It might just have status as per 'failed", "successful" etc. Best to verify from the interviewer. That means we may not be able to take the sum of teh request_sent but have to use count instead. WITH CTE AS ( SELECT country, request_sent, (perc_failed * count(request_sent)) AS total_failed FROM T1 ) SELECT country, COUNT(request_sent), SUM(total_failed) GROUP BY 1 Less

Show More Responses
Viewing 1 - 10 of 10,868 interview questions

Glassdoor has 10,868 interview questions and reports from Research scientist interviews in Minneapolis-St. Paul, MN. Prepare for your interview. Get hired. Love your job.