# 5K

Senior Data Scientist interview questions shared by candidates

## Top Interview Questions

Sort: Relevance|Popular|Date
Senior Data Scientist was asked...October 21, 2014

### How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

### How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Less

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. prepfully.com/practice-interviews Less

### They check for your attitude, your approach and your anxiety level! I solved both the case studies and I know I rocked the behavioral and Data challenge round, but still didn't ended up not being selected, why?

Can you please tell me more about the hacker rank round? what type of questions were there? Also about the sql ? Less

Hi Y0u mentioned that you have cases compiled together. Can you please share it here? It would be really helpful. Thanks in advance Less

I didn't practice the case solving approach enough. Though I had the math right, my approach was messy with too many papers and switching between many papers. Less

### Case 1: Given APR, Interchange fee, Avg monthly balance, Avg spend every month, and loss rate of 3% calculate the profit per customer. Now justify if it is profitable to give cash back to the customers.. Case 2: 2 ways of campaigning for credit cards 1. Email - 10% of applicants become customers - each representative can verify 10 email applications in an hr and is paid \$25/hr 2. Chat - 20% of applicants become customers - each representative can respond to 4 applications in an hr and is paid \$25/hr Profit per customer in both the cases in \$100. which one is profitable email or chat. Draw the graph of profit vs no of applicants Consider a scenario where there are only 5 representatives to handle applications. In this case which one is more profitable email or chat. calculate the breakeven point for the no of representatives where chat will be profitable than email.

Hi Could you please let me know if the guys who are selected had 5th round?

First Question: Email: 10% * 10 = 1 --&gt; get 1 customer with \$ 25 per hour Chat: 20% * 4 = 0.8 --&gt; get 0.8 customer with \$25 per hour Email is more profitable Second question: Email cost \$25 to get one customer Chat cost \$25/0.8 = \$31.25 to get one customer Both of the profit is \$100 Assume the profit is before the pay to the representatives Email graph: line with slope of (100-25)/10 Cost graph: line with slope of (100-31.25)/10 Not sure how to solve the rest of the questions Less

Basic profit and loss calculations. 1 hr of case round and these can be completed in 45 mins. They assess your thought process and your accuracy in doing the calculations. But not sure on what basis they finally select some one for fifth round. You may do well in case round and still you will not be called for the fifth round. Less

### A gas station has 30 gallon of gasoline worth 1.20 per gallon and some worth 1.40 per gallon .how many gallons of the 1.40 brand must the owner mix in to produce gasoline that cost 1.28 per gallon

(1.2x + 1.4y) / (x + y) = 1.28 1.2x + 1.4y = 1.28x + 1.28y 1.4y - 1.28y = 1.28x - 1.2x .12y = .08x y = .66666667x = (2/3) x if x = 30, y = 30 x (2/3) = 20 Less

x+y=30 (amt of gas the station has available) x = 30 - y (1.2x + 1.4y)/(30) = 1.28 ((1.2*(30 - y) + 1.4y))/(30) = 1.28 1.2*(30 - y) + 1.4y = 38.4 36 - 1.2y + 1.4y = 38.4 .2y = 2.4 y = 12 Less

Let x be the total number of gallons priced at 1.20/g Total number = total price 30 + x = 1.20(30) + 1.40x Average price = 1.28/gallon (1.20(30) + 1.40x)/(30+x) = 1.28/gallon Doing the algebra and solving for x, x = 20 Less

### 1. Given the sample: id, status 1, active 2, active 3, active 4, pending 5, expired 6, expired 7, expired 8, pending Pull the unique statuses that show up consecutively 3 times, e.g. from the sample, the output would be 'active', 'expired'. 2. Given the sample: employee, in_out, time A, IN, 6:00 B, IN, 7:00 A, OUT, 8:00 C, IN, 9:30 A, IN, 9:00 A, OUT, 10:00 B, OUT, 11:00 C, OUT, 10:00 Determine which employees are in the building at 10:30.

I was perturbed since I thought this was going to be a Behavioral Interview. I could not answer. Less

select distinct status from (select *, case when status = lead(status,1) over(order by id) and lead(status,1) over(order by id) = lead(status,2) over(order by id) then 1 else 0 end as consecutive from tab) where consecutive =1 Less

with cte as ( select * , dense_rank() over(partition by employee order by time) as rnk from table ) select distinct a.employee from cte as a, cte as b where a.employee=b.employee and a. in_out='IN' and b. in_out='OUT' and a.rnk = b.rnk-1 and a.time=10:30 Less

### Given a list, create a new list that does not include the duplicates of the original list.

a = old list b = new list code : a = set(a) b = list(a)

Maybe they were asking to do it in-place. In that case, switch the duplicate elements to the end. Less

python 4 lines of code.

### A time that help someone on the team.

Could you tell us how were the cases?

The cases are more math than business, involoves a lot of calculation and unit conversion. Less

could you please tell me what should i practice for python hackerrank test assessment? Thank you in advance. Less

### If you can build a perfect (100% accuracy) classification model to predict some customer behavior, what will be the problem in application?

Distribution shift. You can never guarantee your train or test distribution covers future observations. Less

Than we have a determinist problem, so what is the point of building a model at all Less

Than we have a determinist problem, so what is the point of building a model at all Less