# 40K

Data Analyst I interview questions shared by candidates

## Top Interview Questions

Sort: Relevance|Popular|Date
Data Scientist Intern was asked...February 25, 2012

### Find the second largest element in a Binary Search Tree

15 Answers

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null &amp;&amp; root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Show More Responses

### Case interview: basic business problem (if product X costs Capital One \$4.00 per unit, with a \$800 sunk cost, and we charge X amount of dollars along with a \$10 annual fee, how many do we need to sell to break even, etc). Followed by a longer discussion of more complex problems that the situation might entail.

7 Answers

Actually, to correct the above response, if the selling price for 1 product is \$794 (to cover the cost price of the product itself (\$4) ), only 1 quantity needs to be sold. (\$794 + \$10) = (\$800 + \$4). Less

How is it possible to answer this without knowing the selling price per product? # of quantities to sell in order to breakeven depends on how much the selling price per product is? Since the selling price says X, you can sell it for \$790 and breakeven by just selling 1 product (\$790 per product + \$10 annual fee (assuming it's per product)). But I may be misunderstanding the information provided. Any feedback? Less

Erin - it probably shouldn't be. The Revenue side should be (unit price + 10) * number of units sold. Dapo has assumed that unit price and number of units sold is the same since the letter X is used to describe both variables. That might be correct, but it would be an odd quirk of the question - I'd bet that you can ask what the unit price is. Less

Show More Responses
Data Analyst was asked...December 23, 2012

### If you have 10 bags of marbles with 10 marbles each and one bag has marbles that weigh differently than the others, how would you figure it out from one weighing

6 Answers

Assume all marbles are from 10g and the heavier one is 11g Take 1 marble from bag1 ,2 from bag2, 3 from bag3, 4 from bag4, 5 from bag5, 6 from bag6, 7 from bag7, 8 from bag8, 9 from bag9,10 from bag 10..and weigh them together Let it be W. So if bag5 contains the heavy marbles The total weight (W) will be 10+20+30+40+55+60+70+80+90+100 = 555. where as if all were of 10g it should have been 550. meaning the bag which is heavy will always be MeasuredWeight - 550 Mathematically if bag X is the one which is heavy, X can be found using one weighing of sample (W) - (N(N+1)/2) Less

Terribly worded question; you never specified that we are given the normal and heavy weights. Without that information, the votes up solution here doesn’t work. Less

Using a series sum [N(N+1)/2], will not work in this case, as we've not been provided with enough specifics (such as "whats the standard weight of the marbles?). In fact any solution dependent on summed weight will have overlapping solution space and fail. Consider if the standard weight is 10 ounce per marble, and bag 1 is the exception at 12 ounce per marble, for a total weight of 552. Now consider if bag 2 is the exception at 11 ounce per marble - also for a total weight of 552. There is no way to distinguish between the two cases. Given the setup, if appears the interviewer did expect the answer to use some variation of summed series, and I suspect the poster has paraphrased the question and missed some key language - and as best I can tell is not solvable as stated. Google will expect you to provide a generalized solution that can be automated, so the "add the bags one at a time", while simple and clever, would probably not be acceptable as a "weigh only once" solution, and if accepted would be a lesser answer as it does not provide a code-able business solution. Less

Show More Responses

### How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 Answers

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

Show More Responses

### You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?

3 Answers

We cannot say what has caused the spike since causal relationship cannot be established with observed data. But we can compare the averages of all the months by performing a hypothesis testing and rejecting the null hypothesis if the F1 score is significant. Less

The photos are definitely Halloween pictures. Segment by country and date and check for a continual rise in photo uploads leading up to October 31st and a few days after for the lag. There's also a ton of these product questions like this on InterviewQuery.com for data scientists Less

Hypothesis: the photos are Halloween pictures. Test: look at upload trends in countries that do not observe Halloween as a sort of counter-factual analysis. Less

### How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 Answers

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Less

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. prepfully.com/practice-interviews Less

Show More Responses

### How do you take millions of users with 100's of transactions each, amongst 10k's of products and group the users together in a meaningful segments?

3 Answers

You can group similar users and similar items by calculating the distance between like users and items. Jaccard distance is a common approach when building graphs of items x users relationships. For each user you have a vector of N items that they had the potential to buy. For each product you have a vector of M users that bought that product. You can calculate a euclidean distance matrix of user x user pairs and product x product pairs using these vectors. Calculating the distance between u1 and u2: f(u1, u2) = intersection(u1, u2) / (len(u1) + len(u2) - intersection(u1, u2)) same with products: f(p1, p2) = intersection(p1, p2) / (len(p1) + len(p2) - intersection(p1, p2)) You do this for each of the N^2 and M^2 pairs. Then you rank each row of the euclidean matrices for the product matrix and the users matrix. This will give you rows of rankings for each user; Example: "product p1's closest products p4, p600, p5, etc..." These rankings are according to purchase behavior. Similar to Amazon's "people who bought this also bought..." This is only working with the purchase graph. You could segment users by price of item bought. Someone who bought a Macbook retina probably have enough money to buy an another expensive laptop but kids of only paid \$30 for headphones probably don't. Less

That is one way but also clustering algorithms can help in doing it in a more efficient ways Less

Of course there are many ways to separate the market. But apple has already got several segments that I believe work. First is the Mac line, within this is The education market. This includes 3 segments. Instructors, Students, and Schools. Instructors will be more likely to spend more on a single product, and buy software relevant to their subjects, but these decisions will influence there students to do the same, but generally students will seek a "value" product, and will buy software based on requirements. School on the other hand will buy a large amount of Computers and software at once, which also effect instructor and student purchases. So selling to schools will raise the sales in both other categories, and selling to instructors will raise the sales for students. This is just the first segment. You also have corporate industries which are similar to Education. Now lets move to the iPhone Segment within this segment you have to ask, why do people buy iPhone. There is the High-Tech segment, meaning those who always want the newest and best. Then you have the Mid-Tech segment. These are those that don't feel it is logical to flip out phones each year, they wait for two years before buying a phone. Now lets move into iPad. Interestingly this segment can move from business, to leisure. The business segment seeks to have an iPad because it allows them to get work done faster and easier. The leisure market seeks to have an iPad because it brings them entertainment and helps them relax. Then lets go to iPod. The wonder of the iPod, the product that sent Apple on a crash course to stardom. I believe the greatest segment for the iPod would be parents wanting to get a gift for kids / something to keep kids entertained. because the iPhone acts as a iPod there is a spill of sales that goes to iPhone, although the iPod touch does offer an affordable alternatives to those who do not want an iPhone. Although the iPod Nano does capture the convenience segment. These are just the segments for the Main Products of apple. Less

### If you have a 5*5*5 cubic, what is the outside surface area?

3 Answers

Its surface area...not volume .So answer should be 6*(5*5)

Looking at the first answer makes me doubtful about what a cubic means. If its a cube only, then the answer should be 6*(5*5), as mentioned above. Less

The simplest answer will be 5*5*5-3*3*3=98

### The three data structure questions are: 1. the difference between linked list and array; 2. the difference between stack and queue; 3. describe hash table.

1 Answers

Wow... pathetically easy

### Describe yourself

57 Answers

So let me clear this they have fooled many people with different phone number and different email id and telling different different names. So you are thinking of one or two good reviews mentioned here then don't get fooled again because its a trap again. They are reviewing here so that then can be able to fool again and manipulate you. This has happened with me thats why. Less

So if you get any call from a company sating that you are selected then first visit their office get your hard copy of offer letter do some research on google ask people then think. And if anyone is saying you first you need to do certification then it will be final then dont choose that company. They are fooling you by taking your money and wasting your time. Less

Yeah actually, so we should take action on this. We should get back our money

Show More Responses
Viewing 1 - 10 of 39,645 Interview Questions