# Data scientists Interview Questions

data scientists interview questions shared by candidates

## Top Interview Questions

How do you take millions of users with 100's of transactions each, amongst 10k's of products and group the users together in a meaningful segments? Of course there are many ways to separate the market. But apple has already got several segments that I believe work. First is the Mac line, within this is The education market. This includes 3 segments. Instructors, Students, and Schools. Instructors will be more likely to spend more on a single product, and buy software relevant to their subjects, but these decisions will influence there students to do the same, but generally students will seek a "value" product, and will buy software based on requirements. School on the other hand will buy a large amount of Computers and software at once, which also effect instructor and student purchases. So selling to schools will raise the sales in both other categories, and selling to instructors will raise the sales for students. This is just the first segment. You also have corporate industries which are similar to Education. Now lets move to the iPhone Segment within this segment you have to ask, why do people buy iPhone. There is the High-Tech segment, meaning those who always want the newest and best. Then you have the Mid-Tech segment. These are those that don't feel it is logical to flip out phones each year, they wait for two years before buying a phone. Now lets move into iPad. Interestingly this segment can move from business, to leisure. The business segment seeks to have an iPad because it allows them to get work done faster and easier. The leisure market seeks to have an iPad because it brings them entertainment and helps them relax. Then lets go to iPod. The wonder of the iPod, the product that sent Apple on a crash course to stardom. I believe the greatest segment for the iPod would be parents wanting to get a gift for kids / something to keep kids entertained. because the iPhone acts as a iPod there is a spill of sales that goes to iPhone, although the iPod touch does offer an affordable alternatives to those who do not want an iPhone. Although the iPod Nano does capture the convenience segment. These are just the segments for the Main Products of apple. You can group similar users and similar items by calculating the distance between like users and items. Jaccard distance is a common approach when building graphs of items x users relationships. For each user you have a vector of N items that they had the potential to buy. For each product you have a vector of M users that bought that product. You can calculate a euclidean distance matrix of user x user pairs and product x product pairs using these vectors. Calculating the distance between u1 and u2: f(u1, u2) = intersection(u1, u2) / (len(u1) + len(u2) - intersection(u1, u2)) same with products: f(p1, p2) = intersection(p1, p2) / (len(p1) + len(p2) - intersection(p1, p2)) You do this for each of the N^2 and M^2 pairs. Then you rank each row of the euclidean matrices for the product matrix and the users matrix. This will give you rows of rankings for each user; Example: "product p1's closest products p4, p600, p5, etc..." These rankings are according to purchase behavior. Similar to Amazon's "people who bought this also bought..." This is only working with the purchase graph. You could segment users by price of item bought. Someone who bought a Macbook retina probably have enough money to buy an another expensive laptop but kids of only paid $30 for headphones probably don't. That is one way but also clustering algorithms can help in doing it in a more efficient ways |

You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it? |

Find the second largest element in a Binary Search Tree |

The three data structure questions are: 1. the difference between linked list and array; 2. the difference between stack and queue; 3. describe hash table. |

How would you test if survey responses were filled at random by certain individuals, as opposed to truthful selections? |

How would you build and test a metric to compare two user's ranked lists of movie/tv show preferences? |

You're about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle? |

Write a function that takes in two sorted lists and outputs a sorted list that is their union. |

generating a sorted vector from two sorted vectors. |

Why is data important? (or something along those lines; I'd call this an unexpected question) |

**1**–

**10**of

**851**Interview Questions

## See Interview Questions for Similar Jobs

- Software Engineer
- Data Analyst
- Senior Data Scientist
- Senior Software Engineer
- Analyst
- Business Analyst
- Quantitative Analyst
- Research Scientist
- Software Developer
- Intern
- Product Manager
- Associate
- Consultant
- Data Engineer
- Director
- Software Development Engineer
- Senior Data Analyst
- Vice President
- Manager
- Financial Analyst