"Every business collects data, and it's the job of the data scientist to analyze, interpret, and communicate that information in a way that will help drive company decisions. In an interview, expect to answer technical questions about your ability to perform quantitative tests as well as create clear visualizations of large, complex data sets. Come ready to discuss past projects you've worked on and how you communicate data findings clearly and concisely in order to help solve business-related problems."
You're about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 3 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle?37 Answers
Bayesian stats: you should estimate the prior probability that it's raining on any given day in Seattle. If you mention this or ask the interviewer will tell you to use 25%. Then it's straight-forward: P(raining | Yes,Yes,Yes) = Prior(raining) * P(Yes,Yes,Yes | raining) / P(Yes, Yes, Yes) P(Yes,Yes,Yes) = P(raining) * P(Yes,Yes,Yes | raining) + P(not-raining) * P(Yes,Yes,Yes | not-raining) = 0.25*(2/3)^3 + 0.75*(1/3)^3 = 0.25*(8/27) + 0.75*(1/27) P(raining | Yes,Yes,Yes) = 0.25*(8/27) / ( 0.25*8/27 + 0.75*1/27 ) **Bonus points if you notice that you don't need a calculator since all the 27's cancel out and you can multiply top and bottom by 4. P(training | Yes,Yes,Yes) = 8 / ( 8 + 3 ) = 8/11 But honestly, you're going to Seattle, so the answer should always be: "YES, I'm bringing an umbrella!" (yeah yeah, unless your friends mess with you ALL the time ;)
I thought about this a little differently from a non-bayes perspective. It's raining if any ONE of the friends is telling the truth, because if they are telling the truth then it is raining. If all of them are lieing, then it isn't raining because they told you that it was raining. So what you want is the probability that any one person is telling the truth. Which is simply 1-Pr(all lie) = 26/27 Anyone let me know if I'm wrong here!
Here's another perspective on how to answer a question like this: Bring an umbrella. It's Seattle - if it's not raining right now, it probably will be by the time you get there.
I flagged Nub data scientist's answer as useful, because it shows an interesting flaw in reasoning. The 3 random variables are not to be treated as intrinsically independent. Only conditioned on the truth (raining/not raining) are they independent.
Isn't the answer 2/3. The key thing is that they are ALL saying "Yes". You can't have all 3 says yes and have some people lying and some people telling the truth. It either is raining or it isn't. Not both. They either are all lying or all telling the truth. Since they are all in agreement (all lying or all truthful), they are essentially voting as one person. What is the probability that one person is telling the truth? 2/3
Answer from a frequentist perspective: Suppose there was one person. P(YES|raining) is twice (2/3 / 1/3) as likely as P(LIE|notraining), so the P(raining) is 2/3. If instead n people all say YES, then they are either all telling the truth, or all lying. The outcome that they are all telling the truth is (2/3)^n / (1/3)^n = 2^n as likely as the outcome that they are not. Thus P(ALL YES | raining) = 2^n / (2^n + 1) = 8/9 for n=3 Notice that this corresponds exactly the bayesian answer when prior(raining) = 1/2.
I'm not sure why it's not just as simple as this: All three friends say it is raining. Each friend has prob. 1/3 of lying. Since the friends all say the same thing, they are either all telling the truth or all lying. The question asks what is the probability that it is raining. This is equivalent to asking, what is the probability that all three friends are telling the truth. And that is equivalent to asking, what is the probability that not one of them is lying. Since the the friends were asked independently, this should equal 1 - (1/3 * 1/3 * 1/3) = 0.962. Ah. Looks like my answer agrees with "nub data scientist". What is the probability that both he and I are wrong? :-)
TLP and nub data scientists, Your answers include possibilities which are not feasible; we cannot have any combination of 2/3 and 1/3 together... what about (2/3)^3?
I agree with TLP and nub scientist. For me, the question is really (1 - the odds that all three of your friends are lying to you) Clearly 1 - 1/3 * 1/3 * 1/3. It's convenient that they all gave the same answer, otherwise it would be more difficult.
Let Y denote rain, N denote no rain Actual Answer probability ------------------------------------------ Y=> 8/27 YYY, 1/27 NNN, 12/27 YYN, 6/27 YNN N=> 1/27 YYY, 8/27 NNN, 6/27 YYN, 12/27 YNN So, P(Y|YYY) = (8/8+1) = 8/9
The probability of raining is that they are all telling the truth, therefore, (2/3)^3.
P(rain / yes yes yes) = (2/3)^3 / ((2/3)^3 + (1/3)^3) =(8/27) / ((8/27) + (1/27)) = 8 / (8 +1) = 8/9
26/27 is incorrect. That is the number of times that at least one friend would tell you the truth (i.e., 1 - probability that would all lie: 1/27). What you have to figure out is the odds it raining | (i.e., given) all 3 friends told you the same thing. Because they all say the same thing, they must all either be lying or they must all be telling the truth. What are the odds that would all lie and all tell the truth? In 1/27 times, they would the all lie and and in 8/27 times they would all tell the truth. So there are 9 ways in which all your friends would tell you the same thing. And in 8 of them (8 out of 9) they would be telling you the truth.
There is an obvious conceptual reason as to why several answers here (ones that don't use Bayes' formula) are incorrect. The probability in question has to depend on the probability of rain in Seattle. If, for the sake of discussion, it ALWAYS rains in Seattle, i.e. P(rain)=1, then the required prob. is always 1 as well. Likewise if it's a place where it never rains, or if the question asks about the prob. of it raining elephants given the 3 friends said yes, it'd be still 0. I believe this is a std. textbook example of the Bayes' formula, anything short of that I don't think will work out.
Please correct me if incorrect. But I would just prefer to condition. either they are all telling the truth and its it raining or they are all lying and it is not raining. P(rain)=P(rain|truth,truth,truth)*P(truth,truth, truth)+P(rain|lie,lie,lie)*P(lie,lie,lie) notice that truth does not mean yes it is raining, it simply corresponds to them telling the truth. Since they said yes, IF they were lying and we knew they were lying then the probability of rain would be zero, thus eliminating the second term. P(rain)=P(rain|3xtruth)*P(3xtruth) and the probability of the truth is (2/3)^3 and the probability of rain if they are telling the truth is 1. I did a little skipping of steps, since truth doesnt equal yes, but i just sort of meshed it toegher towards the end
YES=yes,yes,yes T=truth, truth, truth L=lie,lie,lie P(Rain|YES)=P(Rain|YES,T)*P(T)+P(Rain|YES,L)*P(L) P(Rain|YES,L)=0==> whats the probability of rain given we know that they are lying and theyve told us it is raining. P(Rain|YES)=P(Rain|YES,T)*P(T) P(Rain|YES,T)=1==> whats the probability of it raining given that they are telling the truth and have told us its raining then P(T)=(2/3)^3 its obvious. why in the world would i do bayesian methods when its certain
I think the first answer is incorrect. The basic flaw is that it is assumed that all three friends lie together or be honest together, so it does not take the cases of Yes.no.Yes or Yes.Yes.no ...etc For the correct answer we need to update posterior probability after each yes so Assuming P(raining) =0.75 prior probabilty P(raining | yes) = (2/3)*0.75 / ( (2/3)*0.75 + (1/3)*0.25 ) = 6/7 P(raining | yes,yes) = (6/7)*(2/3) / ( 6/7*2/3 + 1/7*1/3) = 12/13 P(raining | yes,yes,yes) = (12/13)*(2/3) / ( 12/13*2/3 + 1/13*1/3) = 24/25 I dont see the interview saying that all friends are sitting together so they are independent which means they can lie separately
I agree with (2/3)^3.
Interview Candidate solves this problem using Bayesian stats despite the fact that no enough information is given to do Bayesian probability analysis i.e. he had to pull the probability of it raining in Seattle out of thin air when it was not given in the interview question. With only the information from the interview question, we have to assume that friends are either all lying or all telling the truth. Let truth=T and lie=L P(TTT)=8/27, P(LLL)=1/27, P(TLL)=2/27,P(TTL)=4/27. But we know that they all had the same answer, so we must compare P(TTT) to P(LLL). P(TTT) is 8 times more likely than P(LLL), so we have P(All same answers|TTT)=8/9, P(All same answers|LLL)=1/9. Therefore the solution given ONLY THE INFORMATION GIVEN is P(Rain)=8/9, P(Dry)=1/9.
This problem requires the marginal probability of rain to solve, following Interview Candidate's answer. M.B. provides the rationale behind why the bayes approach is necessary: if the pr(rain) = 0, then the pr(rain|y, y, y) = 0. (maybe it is July in Seattle). A few conceptual problems in many answers that I want to point out: 1) There is lots of conflation between Pr(truth) and Pr(Y). Pr(truth) = Pr(Y|R) does not equal Pr(Y). 2) Consider there is only a single friend and they say yes, the logical conclusion from a lot of these answers is that Pr(Rain|Yes) = Pr(Yes|Rain) = 2/3, which is not correct. Bayes' rule is very clear in this simpler case. 3) The friends' answers are conditionally independent assuming no collusion. The combinations of their honesty/lying adds no additional information. The marginal probabilities are not independent, Pr(y,y,y) does not equal pr(y)^3, it equals pr(y,y,y,rain) + pr(y,y,y, no rain), the integration of the joint space over rain. Using conditional independence and bayes rule, this becomes: pr(y|rain)^3*pr(rain) + pr(y|no rain)^3(1-pr(rain)). A more general solution using Pr(rain) = r. Pr(rain|y,y,y) = Pr(y,y,y|rain)*pr(rain)/pr(y,y,y) #Bayes' formula pr(y,y,y|rain) = pr(y|rain)^3 = (2/3)^3 #conditional independence pr(y,y,y) = pr(y|rain)^3*pr(rain) + pr(y|no rain)^3*pr(no rain) #by definition, see point 3 the answer: r*(2/3)^3 / [r*(2/3)^3 + (1 - r)*(1/3)^3]
It should be (2/3)^3, I think zen and todo is correct.
As a big dumb animal, I have to write out a probability tree and thing about this simply. You only have 2 scenarios where all three say it is raining (all three are telling the truth-raining OR all three are lying - not raining). Assume the probability of rain is 0.5 for simplicity. P(Rain and YYY) = 1/2 * 2/3 * 2/3 * 2/3 = 8/54 P(Not Rain and YYY) = 1/2 * 1/3 * 1/3 * 1/3 = 1/54 Thus P(Rain | YYY) = P(Rain and YYY) / [P(Rain and YYY) + P(Not Rain and YYY)] = 8 / (8+1) = 8/9 I know it isn't the most mathematically rigorous or syntactically correct solution, but I'd bet a pretty penny that the answer is 8/9 with the following assumptions (P(rain) = 0.5 and naive bayes - friends didn't collaborate).
Most of the answers/comments made all unconditional assumptions except a few reasonings that lead to the 8/9 probability. Note that the question states that "Each of your friends has a 2/3 chance of telling you the truth". This essentially means P(raining, yes) + P (non-raining, no) = 2/3. Any attempts to interpret this as conditional probability P(raining | yes) = 2/3 or P(yes | raining) = 2/3 are making other assumptions.
8/27 is not the answer. For the weather to be nice in this case, all 3 of your friend NEED to have lied to you. Therefor the odds are 1/27.
It's really shocking to see how many people post incorrect answers here with such confidence. That said, Bayes' rule is somewhat counterintuitive if you're not familiar with probability theory. Let P(y|r) = prob of each yes given raining = 2/3, P(y|n) = prob yes given not raining = 1/3. Let P(r) = probability of rain = 1/4 given the prior knowledge. P(n) = probability of no rain = 3/4. P(r | y^3) = ( P(y^3 | r) P(r) ) / ( P(y^3 | r) P(r) + P(y^3 | n) P(n) ) = ( P(y | r)^3 P(r) ) / ( P(y | r)^3 P(r) + P(y | n)^3 P(n) ) = ( (2/3)^3 (1/4) / ( (2/3)^3 (1/4) + (1/3)^3 (3/4) ) = (2/27) / ( (2/27) + (.75/27) ) = 2/2.75 = 8/11
What if the answer is 50% since the chance of rain and not rain does not depend on what your friends tell you.
In the absence of further information, the only correct answer is the posterior probability of rain p is in the interval (0, 1). In the absence of further information any prior is as good as any other, so by implication the posterior can take any value as well. The interval for p can be restricted to [0, 1] on the assumption that the question to the friends would not be posed if the prior is absolute certainty whether it will rain or not. With the further assumption that the prior probability is measured with limited precision (e.g. rounded to a percentage point), the posterior would be in the interval (0,075, 1). If the alternative assumption is made that information from the friends will be requested only if it had any chance to move the posterior below or above 0.5, the posterior interval for the probability is (0.5, 1). any more precise answer than that requires further information about the prior which is not supplied in the original problem formulation. Also note that even a precise answer about the probability of rain is not sufficient to answer the question whether an umbrella should be brought or not.
Assume probability of raining in Seattle P(R) = 1/4 Assume friend says Y 50% of the time (Theoretical probability) P(Y) = 1/2 Probability of friend saying yes given its raining P(Y/R) = 2/3 Probability of 3 friends saying yes given its raining = P(YYY/R) = 8/27 Probability of 3 friends saying yes = P(YYY) = 1/8 P(R/YYY) * P(YYY) = P(YYY/R)*P(R) P(R/YYY) = 8/27*1/4/(1/8) = 16/27 (About 59%) A posterior probability of 59% given 3 yes and a prior probability of 25% sounds reasonable to me
The probability of each of the friend say "YES" is 2/3 * 2/3 * 2/3 = 8/27. Now the probability that it is actually raining in Seattle depends on that how do I select them to phone. There is only three way to select and phone them. So, the probability that it is actually raining in Seattle is 3 * (8/27) = 8/9.
Probability that it is raining given that all 3 of them said "yes" = P(AT LEAST one of them is telling the truth) = P(exactly 1 of them telling the truth) + P(2 of them telling the truth) + P(all 3 of them telling the truth) P(exactly 1 of them telling the truth) = P(of first person telling truth) * P(of 2nd person telling lie) * P(of 3rd person telling a lie) = (2/3) * (1/3) * (1/3) = 2/27 + P(exactly 2 of them telling the truth) = P(of first person telling truth) * P(of 2nd person telling the truth) * P(of 3rd person telling a lie) = (2/3) * (2/3) * (1/3) = 4/27 + P(exactly 3 of them telling the truth) = P(of first person telling truth) * P(of 2nd person telling the truth) * P(of 3rd person telling the truth) = (2/3) * (2/3) * (2/3) = 8/27 ANSWER: Probability that it is raining given that all 3 of them said "yes" = P(AT LEAST one of them is telling the truth) = P(exactly 1 of them telling the truth) + P(2 of them telling the truth) + P(all 3 of them telling the truth) = (2/27) + (4/27) + (8/27) = 14/27
Rule of conditional probability states P(A|B) = P( A & B ) / P(B) Reformulating to this case, P(Rain | 3Y) = P(R & 3Y) / P(3Y) P(R & 3Y) = 2/3 ^3 (if it is raining, then they must all speak the truth) = 8/27 (one could multiply probability of rain here. I assumed as prior) P(3y) = all truth or all lie = 2/3 ^ 3 + 1/3 ^3 = 9/27 hence P(R | 3Y) = 8/9
Let X be the probability it's raining. Obviously we want P(X|all three say yes). Now let Y be the probability at least one of them is lying. If Y = 0 it's easy to solve, if not then not so easy. Now you keep going.
Obvious, bayesian is a way to go...
There is a way to easily confirm the right answer. Just write a computer simulation and run it a few million times, which I did. If the long term chance of rain in Seattle is 25%, the chance it is raining now, given the YYY answers and the 2/3 truth 1/3 lying, is 73% (rounded to whole number), which is the same as 8/11, so the reasoning with the Bayesian math is correct.
This can easily be solved without Bayes: There are two cases: Case 1: It is raining and all friends are telling the truth: 0.25*(2/3)^3 = 1/4*8/27 Case1: It is not raining and all friends are lying: 0.75*(1/3)^3 = 3/4*1/27 Probability: P(E) = Case1 / (Case1+Case2) = (1/4*8/27) / (3/4*1/27 + 1/4*8/27) = 2 / (11/4) = 8/11
Bayes should yield 8/11. For examples of other questions asked by top tier data science companies check out: https://datascienceprep.com/
P(3yes) = all true or all lie = 8/27 + 1/27 = 9/27 = 1/3 P(3True) = (2/3)**3 = 8/27 P(3Lie) = (1/3)**3 = 1/27 P(rain|3yes) = P(rain&3yes)/P(3yes) = (8/27)/(1/3) = 8/9