Data Scientist Interview Questions in Washington, DC

Data Scientist Interview Questions in Washington, DC

In a data scientist interview, expect employers to ask questions that assess your data modeling, problem-solving, and programming skills. Be prepared to answer general questions that test your knowledge of statistics and data science. You should also be ready to answer open-ended questions that test your creativity, communication skills, and formal education in data modeling and programming.

69,271 Data Scientist interview questions shared by candidates

Top Data Scientist Interview Questions & How to Answer

Here are three data scientist interview questions and how to answer them:

Question #1: Which data modeling techniques do you prefer and why?

How to answer: Turning data into understandable and actionable information is a critical part of the data scientist's job. This question allows employers to understand your data modeling skills and background. List and discuss your preferred data modeling techniques, including benefits such as ease of use, flexibility, etc.

Question #2: How would you detect bogus Instagram accounts used for scamming consumers?

How to answer: Questions like this one allow an employer to test your problem-solving skills. When answering open-ended questions such as these, feel free to ask clarifying questions and use whiteboards to demonstrate your coding and diagramming skills. Share your thought process as you work through the problem.

Question #3: Describe circumstances that require a list, tuple, or set in Python.

How to answer: Interviewers will use questions such as this one to test your Python programming skills. Review Python basics such as lists, tuples, and sets before your interview. You should be able to explain when and how each tool is used by data scientists.

Top Interview Questions

Sort: Relevance|Popular|Date
Data Scientist Intern was asked...February 25, 2012

Find the second largest element in a Binary Search Tree

15 Answers

The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; } Less

find the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child. Less

One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch. Less

Show More Responses
Capital One

Case interview: basic business problem (if product X costs Capital One $4.00 per unit, with a $800 sunk cost, and we charge X amount of dollars along with a $10 annual fee, how many do we need to sell to break even, etc). Followed by a longer discussion of more complex problems that the situation might entail.

8 Answers

How is it possible to answer this without knowing the selling price per product? # of quantities to sell in order to breakeven depends on how much the selling price per product is? Since the selling price says X, you can sell it for $790 and breakeven by just selling 1 product ($790 per product + $10 annual fee (assuming it's per product)). But I may be misunderstanding the information provided. Any feedback? Less

let's say we should sell y+1 units to break even 800 + 4.y = y*x + 10*y 800 = y(x + 10 - 4) 800/(x+6) = y so we should sell (800/(x+6)) + 1 units Less

Actually, to correct the above response, if the selling price for 1 product is $794 (to cover the cost price of the product itself ($4) ), only 1 quantity needs to be sold. ($794 + $10) = ($800 + $4). Less

Show More Responses
Data Analyst was asked...December 24, 2012

If you have 10 bags of marbles with 10 marbles each and one bag has marbles that weigh differently than the others, how would you figure it out from one weighing

6 Answers

Assume all marbles are from 10g and the heavier one is 11g Take 1 marble from bag1 ,2 from bag2, 3 from bag3, 4 from bag4, 5 from bag5, 6 from bag6, 7 from bag7, 8 from bag8, 9 from bag9,10 from bag 10..and weigh them together Let it be W. So if bag5 contains the heavy marbles The total weight (W) will be 10+20+30+40+55+60+70+80+90+100 = 555. where as if all were of 10g it should have been 550. meaning the bag which is heavy will always be MeasuredWeight - 550 Mathematically if bag X is the one which is heavy, X can be found using one weighing of sample (W) - (N(N+1)/2) Less

Terribly worded question; you never specified that we are given the normal and heavy weights. Without that information, the votes up solution here doesn’t work. Less

Using a series sum [N(N+1)/2], will not work in this case, as we've not been provided with enough specifics (such as "whats the standard weight of the marbles?). In fact any solution dependent on summed weight will have overlapping solution space and fail. Consider if the standard weight is 10 ounce per marble, and bag 1 is the exception at 12 ounce per marble, for a total weight of 552. Now consider if bag 2 is the exception at 11 ounce per marble - also for a total weight of 552. There is no way to distinguish between the two cases. Given the setup, if appears the interviewer did expect the answer to use some variation of summed series, and I suspect the poster has paraphrased the question and missed some key language - and as best I can tell is not solvable as stated. Google will expect you to provide a generalized solution that can be automated, so the "add the bags one at a time", while simple and clever, would probably not be acceptable as a "weigh only once" solution, and if accepted would be a lesser answer as it does not provide a code-able business solution. Less

Show More Responses

How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections?

4 Answers

This is a very basic psychometrics question. Calculate Cronbach's alpha for the survey items. If it is low (below .5), it is very likely that the questions were answered at random. Less

I would design the test in a way that certain information is asked two different ways. if two answers disagree with each other I would seriously doubt the validity of the answers. Less

We need to find the histograms of the questions in the survey to see the distribution of each answer in each question. All question histograms will likely follow the normal distribution if they are truthful selection. If one response with more than of half of total answers being located outside of 95% confidential interval in each histogram, the response will be categorized as random fall out of mean plus tw Less

Show More Responses

You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?

3 Answers

We cannot say what has caused the spike since causal relationship cannot be established with observed data. But we can compare the averages of all the months by performing a hypothesis testing and rejecting the null hypothesis if the F1 score is significant. Less

The photos are definitely Halloween pictures. Segment by country and date and check for a continual rise in photo uploads leading up to October 31st and a few days after for the lag. There's also a ton of these product questions like this on for data scientists Less

Hypothesis: the photos are Halloween pictures. Test: look at upload trends in countries that do not observe Halloween as a sort of counter-factual analysis. Less


The three data structure questions are: 1. the difference between linked list and array; 2. the difference between stack and queue; 3. describe hash table.

4 Answers

Arrays are more efficient for accessing elements , while linked list are better for inserting or deleting elements, the choice between the two data structure depends on the specific requirements of the problem being solved. Less

Stack and queues have different order of processing, operations for adding and removing elements, and usage scenarios.The choice between the two data structure depends on the specific requirements of the problem being solved Less

A hash table is a data structure that allows for efficient insertion, deletion, and lookup of key-value pairs. It is based on the idea of hashing, which involves mapping each key to a specific index in an array using a hash function. The hash function takes a key as input and returns a unique index in the array. In order to handle collisions (when two or more keys map to the same index), some form of collision resolution mechanism is used, such as separate chaining or open addressing. In separate chaining, each index in the array is a linked list, and each key-value pair is stored in a node in the corresponding linked list. When a collision occurs, the new key-value pair is added to the end of the linked list at the corresponding index. In open addressing, when a collision occurs, a different index in the array is searched for to store the new key-value pair. There are several techniques for open addressing, such as linear probing, quadratic probing, and double hashing. Hash tables have an average case time complexity of O(1) for insertion, deletion, and lookup operations, making them a highly efficient data structure for many applications, such as database indexing, caching, and compiler symbol tables. However, their worst-case time complexity can be as bad as O(n) in rare cases, such as when there are many collisions and the hash table needs to be resized. Less

Show More Responses

How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences?

4 Answers

1) Develop a list of shows/movies that are representative of different taste categries (more on this later) 2) Obtain ranking of the items in the list from 2 users 3) Use Spearman's rho (or other test that works with rankings) to assess dependence/conguence between the 2 people's rankings. * To find shows/movies to include in the measurement instrument, maybe do cluster analysis on large number of viewer's viewing habits. Less

Look at the mean average precision of the movies that the users watch out of the rankings. So if out of 10 recommended movies one user prefers the third and the other user prefers the sixth, the recommendation engine of the user who preferred the third would be better. has it more in depth of an answer. Less

It's essential to demonstrate that you can really go deep... there are plenty of followup questions and (sometimes tangential) angles to explore. There's a lot of Senior Data Scientist experts who've worked at Netflix, who provide this sort of practice through mock interviews. There's a whole list of them curated on Prepfully. Less

Show More Responses

How do you take millions of users with 100's of transactions each, amongst 10k's of products and group the users together in a meaningful segments?

3 Answers

You can group similar users and similar items by calculating the distance between like users and items. Jaccard distance is a common approach when building graphs of items x users relationships. For each user you have a vector of N items that they had the potential to buy. For each product you have a vector of M users that bought that product. You can calculate a euclidean distance matrix of user x user pairs and product x product pairs using these vectors. Calculating the distance between u1 and u2: f(u1, u2) = intersection(u1, u2) / (len(u1) + len(u2) - intersection(u1, u2)) same with products: f(p1, p2) = intersection(p1, p2) / (len(p1) + len(p2) - intersection(p1, p2)) You do this for each of the N^2 and M^2 pairs. Then you rank each row of the euclidean matrices for the product matrix and the users matrix. This will give you rows of rankings for each user; Example: "product p1's closest products p4, p600, p5, etc..." These rankings are according to purchase behavior. Similar to Amazon's "people who bought this also bought..." This is only working with the purchase graph. You could segment users by price of item bought. Someone who bought a Macbook retina probably have enough money to buy an another expensive laptop but kids of only paid $30 for headphones probably don't. Less

That is one way but also clustering algorithms can help in doing it in a more efficient ways Less

Of course there are many ways to separate the market. But apple has already got several segments that I believe work. First is the Mac line, within this is The education market. This includes 3 segments. Instructors, Students, and Schools. Instructors will be more likely to spend more on a single product, and buy software relevant to their subjects, but these decisions will influence there students to do the same, but generally students will seek a "value" product, and will buy software based on requirements. School on the other hand will buy a large amount of Computers and software at once, which also effect instructor and student purchases. So selling to schools will raise the sales in both other categories, and selling to instructors will raise the sales for students. This is just the first segment. You also have corporate industries which are similar to Education. Now lets move to the iPhone Segment within this segment you have to ask, why do people buy iPhone. There is the High-Tech segment, meaning those who always want the newest and best. Then you have the Mid-Tech segment. These are those that don't feel it is logical to flip out phones each year, they wait for two years before buying a phone. Now lets move into iPad. Interestingly this segment can move from business, to leisure. The business segment seeks to have an iPad because it allows them to get work done faster and easier. The leisure market seeks to have an iPad because it brings them entertainment and helps them relax. Then lets go to iPod. The wonder of the iPod, the product that sent Apple on a crash course to stardom. I believe the greatest segment for the iPod would be parents wanting to get a gift for kids / something to keep kids entertained. because the iPhone acts as a iPod there is a spill of sales that goes to iPhone, although the iPod touch does offer an affordable alternatives to those who do not want an iPhone. Although the iPod Nano does capture the convenience segment. These are just the segments for the Main Products of apple. Less


If you have a 5*5*5 cubic, what is the outside surface area?

3 Answers

Its surface area...not volume .So answer should be 6*(5*5)

Looking at the first answer makes me doubtful about what a cubic means. If its a cube only, then the answer should be 6*(5*5), as mentioned above. Less

The simplest answer will be 5*5*5-3*3*3=98

Acuverco Pharmaceuticals

Describe yourself

56 Answers

So if you get any call from a company sating that you are selected then first visit their office get your hard copy of offer letter do some research on google ask people then think. And if anyone is saying you first you need to do certification then it will be final then dont choose that company. They are fooling you by taking your money and wasting your time. Less

So have you paid money then visited? These are looters.

Ya we will report it to police and cybercrime their all numbers their all Email ids and all from so called company people to specialist learning institute we will report everyones number Less

Show More Responses
Viewing 1 - 10 of 69,271 interview questions

See Interview Questions for Similar Jobs

Glassdoor has 69,271 interview questions and reports from Data scientist interviews in Washington, DC . Prepare for your interview. Get hired. Love your job.