In a data analyst interview, employers ask questions that help ascertain your level of technical skills, including familiarity with data analysis software, data analysis terminology, and statistical methods. Interviewers may also use guesstimate questions to learn about your problem-solving skills and creative thinking.

58,985 Data Analyst interview questions shared by candidates

Here are three data analyst interview questions and how to answer them:

How to answer: Data analysts use software to perform many of their job duties. Interviewers ask this question to determine your current skill level and how much training you may require. List and discuss any data analysis software you have used. Be sure to mention specific software training and give examples of how you used the software on the job.

How to answer: With this question, interviewers are testing your understanding of the most commonly used statistical methods, such as the simplex algorithm, imputation, the Bayesian method, and the Markov process. List and discuss data analysis methods, including the specific benefits of each process. If possible, include examples of how you used specific statistical methods on the job.

How to answer: Guesstimate questions are often used in data analyst interviews to test your analytical thinking and problem-solving methods. Share how you would work through the problem, including the data sets you would need, how you would go about finding data, and the methods you would use to calculate an estimated answer.

Data Scientist Intern was asked...February 25, 2012

↳

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

↳

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

↳

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Data Analyst was asked...November 8, 2010

↳

How is it possible to answer this without knowing the selling price per product? # of quantities to sell in order to breakeven depends on how much the selling price per product is? Since the selling price says X, you can sell it for $790 and breakeven by just selling 1 product ($790 per product + $10 annual fee (assuming it's per product)). But I may be misunderstanding the information provided. Any feedback? Less

↳

let's say we should sell y+1 units to break even 800 + 4.y = y*x + 10*y 800 = y(x + 10 - 4) 800/(x+6) = y so we should sell (800/(x+6)) + 1 units Less

↳

Actually, to correct the above response, if the selling price for 1 product is $794 (to cover the cost price of the product itself ($4) ), only 1 quantity needs to be sold. ($794 + $10) = ($800 + $4). Less

Data Analyst was asked...December 24, 2012

↳

Assume all marbles are from 10g and the heavier one is 11g Take 1 marble from bag1 ,2 from bag2, 3 from bag3, 4 from bag4, 5 from bag5, 6 from bag6, 7 from bag7, 8 from bag8, 9 from bag9,10 from bag 10..and weigh them together Let it be W. So if bag5 contains the heavy marbles The total weight (W) will be 10+20+30+40+55+60+70+80+90+100 = 555. where as if all were of 10g it should have been 550. meaning the bag which is heavy will always be MeasuredWeight - 550 Mathematically if bag X is the one which is heavy, X can be found using one weighing of sample (W) - (N(N+1)/2) Less

↳

Terribly worded question; you never specified that we are given the normal and heavy weights. Without that information, the votes up solution here doesn’t work. Less

↳

Using a series sum [N(N+1)/2], will not work in this case, as we've not been provided with enough specifics (such as "whats the standard weight of the marbles?). In fact any solution dependent on summed weight will have overlapping solution space and fail. Consider if the standard weight is 10 ounce per marble, and bag 1 is the exception at 12 ounce per marble, for a total weight of 552. Now consider if bag 2 is the exception at 11 ounce per marble - also for a total weight of 552. There is no way to distinguish between the two cases. Given the setup, if appears the interviewer did expect the answer to use some variation of summed series, and I suspect the poster has paraphrased the question and missed some key language - and as best I can tell is not solvable as stated. Google will expect you to provide a generalized solution that can be automated, so the "add the bags one at a time", while simple and clever, would probably not be acceptable as a "weigh only once" solution, and if accepted would be a lesser answer as it does not provide a code-able business solution. Less

Senior Data Scientist was asked...October 21, 2014

↳

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

↳

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

↳

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

Data Scientist was asked...April 6, 2013

↳

We cannot say what has caused the spike since causal relationship cannot be established with observed data. But we can compare the averages of all the months by performing a hypothesis testing and rejecting the null hypothesis if the F1 score is significant. Less

↳

The photos are definitely Halloween pictures. Segment by country and date and check for a continual rise in photo uploads leading up to October 31st and a few days after for the lag. There's also a ton of these product questions like this on InterviewQuery.com for data scientists Less

↳

Hypothesis: the photos are Halloween pictures. Test: look at upload trends in countries that do not observe Halloween as a sort of counter-factual analysis. Less

Data Scientist & Machine Learning was asked...October 19, 2013

↳

Arrays are more efficient for accessing elements , while linked list are better for inserting or deleting elements, the choice between the two data structure depends on the specific requirements of the problem being solved. Less

↳

Stack and queues have different order of processing, operations for adding and removing elements, and usage scenarios.The choice between the two data structure depends on the specific requirements of the problem being solved Less

↳

A hash table is a data structure that allows for efficient insertion, deletion, and lookup of key-value pairs. It is based on the idea of hashing, which involves mapping each key to a specific index in an array using a hash function. The hash function takes a key as input and returns a unique index in the array. In order to handle collisions (when two or more keys map to the same index), some form of collision resolution mechanism is used, such as separate chaining or open addressing. In separate chaining, each index in the array is a linked list, and each key-value pair is stored in a node in the corresponding linked list. When a collision occurs, the new key-value pair is added to the end of the linked list at the corresponding index. In open addressing, when a collision occurs, a different index in the array is searched for to store the new key-value pair. There are several techniques for open addressing, such as linear probing, quadratic probing, and double hashing. Hash tables have an average case time complexity of O(1) for insertion, deletion, and lookup operations, making them a highly efficient data structure for many applications, such as database indexing, caching, and compiler symbol tables. However, their worst-case time complexity can be as bad as O(n) in rare cases, such as when there are many collisions and the hash table needs to be resized. Less

Senior Data Scientist was asked...June 7, 2015

↳

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

↳

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. InterviewQuery.com has it more in depth of an answer. Less

↳

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. prepfully.com/practice-interviews Less

Data Scientist was asked...November 6, 2013

↳

You can group similar users and similar items by calculating the distance between like users and items. Jaccard distance is a common approach when building graphs of items x users relationships. For each user you have a vector of N items that they had the potential to buy. For each product you have a vector of M users that bought that product. You can calculate a euclidean distance matrix of user x user pairs and product x product pairs using these vectors. Calculating the distance between u1 and u2: f(u1, u2) = intersection(u1, u2) / (len(u1) + len(u2) - intersection(u1, u2)) same with products: f(p1, p2) = intersection(p1, p2) / (len(p1) + len(p2) - intersection(p1, p2)) You do this for each of the N^2 and M^2 pairs. Then you rank each row of the euclidean matrices for the product matrix and the users matrix. This will give you rows of rankings for each user; Example: "product p1's closest products p4, p600, p5, etc..." These rankings are according to purchase behavior. Similar to Amazon's "people who bought this also bought..." This is only working with the purchase graph. You could segment users by price of item bought. Someone who bought a Macbook retina probably have enough money to buy an another expensive laptop but kids of only paid $30 for headphones probably don't. Less

↳

That is one way but also clustering algorithms can help in doing it in a more efficient ways Less

↳

Of course there are many ways to separate the market. But apple has already got several segments that I believe work. First is the Mac line, within this is The education market. This includes 3 segments. Instructors, Students, and Schools. Instructors will be more likely to spend more on a single product, and buy software relevant to their subjects, but these decisions will influence there students to do the same, but generally students will seek a "value" product, and will buy software based on requirements. School on the other hand will buy a large amount of Computers and software at once, which also effect instructor and student purchases. So selling to schools will raise the sales in both other categories, and selling to instructors will raise the sales for students. This is just the first segment. You also have corporate industries which are similar to Education. Now lets move to the iPhone Segment within this segment you have to ask, why do people buy iPhone. There is the High-Tech segment, meaning those who always want the newest and best. Then you have the Mid-Tech segment. These are those that don't feel it is logical to flip out phones each year, they wait for two years before buying a phone. Now lets move into iPad. Interestingly this segment can move from business, to leisure. The business segment seeks to have an iPad because it allows them to get work done faster and easier. The leisure market seeks to have an iPad because it brings them entertainment and helps them relax. Then lets go to iPod. The wonder of the iPod, the product that sent Apple on a crash course to stardom. I believe the greatest segment for the iPod would be parents wanting to get a gift for kids / something to keep kids entertained. because the iPhone acts as a iPod there is a spill of sales that goes to iPhone, although the iPod touch does offer an affordable alternatives to those who do not want an iPhone. Although the iPod Nano does capture the convenience segment. These are just the segments for the Main Products of apple. Less

Data Analyst was asked...May 22, 2012

↳

Its surface area...not volume .So answer should be 6*(5*5)

↳

Looking at the first answer makes me doubtful about what a cubic means. If its a cube only, then the answer should be 6*(5*5), as mentioned above. Less

↳

The simplest answer will be 5*5*5-3*3*3=98

Data Analyst was asked...January 9, 2020

↳

So if you get any call from a company sating that you are selected then first visit their office get your hard copy of offer letter do some research on google ask people then think. And if anyone is saying you first you need to do certification then it will be final then dont choose that company. They are fooling you by taking your money and wasting your time. Less

↳

So have you paid money then visited? These are looters.

↳

Ya we will report it to police and cybercrime their all numbers their all Email ids and all from so called company people to specialist learning institute we will report everyones number Less

data scientistdata specialistbusiness analystsas analystresearch analystassociate analystsql analystreporting analystdata sciencedatabase analystapplication analystpricing analystdata modelerdatabase marketing analyststatistical analysttableau developersales analystshipping and receiving supervisoroperations analyst