# 932

Statistician Intern interview questions shared by candidates

## Top Interview Questions

Sort: Relevance|Popular|Date

### when given a data set, what is the first step u take? before u start analyzing it?

I would say that we should visualize the data first, getting a sense of it. How many attributes it contains? What are the ranges of the attributes? What is the data type for each attribute (ordinal, categorical, ratio, interval)? How might they be related? What are the predictors and how about the response(s)? These are typically the first thing I'd like to know when investigating a data set that I have no previous knowledge of. Less

I said we should clean it up. Try to see which variables we can get rid by checking for collinearity. The interviewer's followup question was - what about even before this? I said "determine the goal as in figure out what we want to accomplish". She then asked "how we do this?" We just kept going in a circle with this question... Less

### What does p value mean?

The probability that the model ( assumed in the null hypothesis) generates a result as extreme as the observed data. Less

### One probability question and one r coding. Teams A and B are playing a game with 7 matches. A has probability p to win a match. What's the probability A wins the game at the 7th match.

Team A will win the game at the 7th match if and only if (1) it wins any 3 of the first 6 matches and (2) it wins the last match. Thus, [C(6,3)*p^3*(1-p)^3] * p = C(6,3)*p^4*(1-p)^3. Less

C(6,3)*p^3*(1-p)^3*p

P({A wins last match} AND {A wins 4 matches out of 7}) = // definition of conditional prob P({A wins last match} | {A wins 4 matches out of 7}) * P({A wins 4 matches out of 7}) The first element of this product is easy to compute: P({A wins last match} | {A wins 4 matches out of 7}) = 4/7 The second element of this product can be computed using a Binomial(prob=p, n=7) distribution, with k=4. P({A wins 4 matches out of 7}) = binom(7, 4) * p^4 * (1-p)^3 So overall, P({A wins last match} AND {A wins 4 matches out of 7}) = 4/7 * binom(7, 4) * p^4 * (1-p)^3 = binom(6, 3) * p^4 * (1-p)^3 That is, the final answer is: binom(6, 3) * p^4 * (1-p)^3 Less

### what will you do if your manager asks you do something which you think is wrong?

I will also follow to certain stage to see if I can learn something new out of this. If it keep continuing, I would seriously consider changing the job. Less

I will present my case; if he doesn't agree with me, I will still follow him.

### In the coding interview: First question was to develop a function which takes a numeric list, and generates the results of the CDF of the empirical distribution for each number in that list. The second was to run a logistic regression (one line of code, very simple) The third was FizzBuzz The fourth was to generate an algorithm to add all the numbers in any given number. (ie, 132 => 1+3+2 = 6)

The first one was by far the most challenging question, I did not get it correct. The second, third, and fourth were easy - but I ran out of time and did not completely answer the fourth. Less

Coding in whiteboard or IDE or third party coding app?

### (1) 20 hard drivers under monitor for 100 hours. 4 of them were broken. The broken time was recorded. Provide a good estimator of the mean of the life time of the hard drive? Any assumption? (2) Compare two algorithms. The old algorithms was running in 50 countries and the running time of 4 consecutive weeks was recorded. Then the new algorithm was tested. The same countries, each countries 4 weeks records. So in total, we have 50 * 8 measurements. How to compare these two algorithms.

repeated measure ANOVA?

Paried T-test, ANOVA, Boostrap........

### If I wanted to have children and when

The question is outright illegal. Now, how hard you want to be on the interviewer? Tell "I prefer not to answer"? Or "Wow, I can't believe you ask this illegal question"? Less

Wow. Legal only if you brought up kids first, I think? But the nerve of asking. No thanks. Less

### With a data set from normal distribution, suppose you can't get any observation greater than 5, how to estimate the mean?

bootstrap

Technically MLE doesn't exist in this case if \bar{x}&lt;5, but the restricted MLE max{5,\bar{x}} has asymptotic optimal properties and hence that's how the mean is generally estimated. Any standard mathematical statistics 1 course should cover this. Less