Google Interview Question
1,069 Interview Reviews |
Back to all Google Interview Questions & Reviews
Interview questions and reviews posted anonymously by interview candidates
Interview Question for Software Engineer at Google:
You're writing an application that receives a stream of individual items of data. The stream may be very long or very short, but you have no way of knowing how long it is (i.e. there's no trick to figuring out the size of the stream of data). How would you go about choosing m items such that any subset of m items was equally likely? (Not an even distribution of values, but just that any m items are equally likely to be chosen). So for example, m=1000, and the number of items in the stream, n, may be 1000, or 10000, or 100000000, or much much larger; there is no way to know how many.
Helpful Question?
Yes |
No
Inappropriate?
Answers & Comments (2)
Helpful Answer?
Yes |
No
Inappropriate?
Members can
answer or comment on this question
–
Join Now (It's Free) or
Sign In
0 of 0 people found this helpful
by Shards:
1. Pick the first m members in the stream (if there is less than m, just return the entire list).
2. For each of the subsequent member of the list, pick the member with m/n probability, where m is the number of elements already seen. If the member is picked, replace 1 member at random from the already selected m members with the new member.
We can prove that this is random via induction. If there is m members (or less), each member has probability 1, which satisfy the requirement that probability is m/N. For the induction step, we assume that for the first k-1 members, we pick them with probability m/k-1, so in step k, we have 2 cases to consider:
(a) the k-th member has m/k probability of being picked, so the induction step holds for k-th member.
(b) for all the previous member, the probability of being picked is now P(k-th member not picked)*P(original) + P(k-th member picked)*P(original)*P(member is not dropped) = (k-m)/k * m/(k-1) + m/k * m/(k-1) * (m-1)/m = m/(k-1) * [(k-m)/k + (m-1)/k] = m/k, which is the required probability.