## Interview Question

Quantitative Analyst Interview

-Mountain View, CA

# You observe a sample of measurements coming from a fixed length ruler. If the object is shorter than the ruler you observe the actual measurement. Otherwise you observe the length of the ruler. What would be a good estimator of the ruler length?

5

get rid of measurements that are equal to the ruler length. then take the average of the rest of the measurements that are within the range (0, ruler_length), ruler_length is 2 times this average value

Anonymous on

2

This is a censored regression problem. Could use the Tobit estimator where the censoring occurs at the length of the ruler. y_i = {12 if y*_i >= 12, y*_i if y*_i < 12}, y*_i = beta*x_i + u_i, u_i ~ N(0, sigma^2) Note that this assumes that the error term is normally distributed, but that's a standard regression assumption.: https://en.wikipedia.org/wiki/Tobit_model

Anonymous on

0

As a followup to the above, I forgot to mention that you of course then need to construct the log-likelihood function as described later in that wikipedia article: https://en.wikipedia.org/wiki/Tobit_model#The_likelihood_function

Anonymous on

0

Please ignore my previous two answers above. I misread the question and thought it was a regression problem, when it wasn't.

Anonymous on

0

If we now the distribution... I'd analyse the tail of cumulative density function.

Anonymous on

0

round(central tendency) * 2

Anonymous on

1

I came up a solution: if we know the distribution of actual measurement and the value of actual measurement, then the expected probability of getting wrong measurement should equal to the probability of actual measurement greater than length of ruler. Not sure this is correct and will interview on friday. good luck to me.

Anonymous on

1

Assuming that the measurements are on a continuous scale, you would have a lot of mass on the point exactly corresponding to the ruler's length, so you could use something akin to a mode I'd imagine.

Anonymous on

0

The mode should work, right? The length of the ruler is likely to be the only specific value that shows up more than once in the data.

Jacob Curtis on

0

L^{hat} = N/(N+1) * max(X1,X2,X3,...XN) is an unbiased estimator

Anonymous on

0

L^{hat} = 2*sum(X)/N is another unbiased estimator

Anonymous on

0

The length of the ruler would be the censored value in the data. If you draw the histogram of observed values, there should be a mass on the largest value, which is the length of the ruler. The more observation you have, the better the estimation.

Anonymous on

0

I think it should be (N+1)/N * max(X1,..XN). Is there anyone agreeing with me?

Anonymous on

0

Should first ask whether we have some prior knowledge about the object length distribution

Anonymous on

0

If we now the distribution... I'd analyse

Anonymous on

2

My initial answer was to use the MAX of the sample. That however is a biased estimator. How can you account for the bias and come up with an unbiased estimator? I think this is where you need to start making assumptions on the distribution. A uniform distribution would allow to estimate the bias.

Anonymous on

0

what kind of distribution is it? what do we do with the data? what precision should we get? what happens if we "lose" the oversize data?

Ludovico Grossi on

One or more comments have been removed.