Data scientist Interview Questions | Glassdoor

# Data scientist Interview Questions

"Every business collects data, and it's the job of the data scientist to analyze, interpret, and communicate that information in a way that will help drive company decisions. In an interview, expect to answer technical questions about your ability to perform quantitative tests as well as create clear visualizations of large, complex data sets. Come ready to discuss past projects you've worked on and how you communicate data findings clearly and concisely in order to help solve business-related problems."

## Top Interview Questions

Sort: RelevancePopular Date

Sep 12, 2013

Mar 5, 2015

Mar 23, 2017
 Given an list A of objects and another list B which is identical to A except that one element is removed, find that removed element.21 AnswersSelect * from A except Select * from BI think it is a coding in algorithm rather than SQL query. So here is my take: def ret_miss(A, B): k = len(A) if k == 2: if A == B: return A elif A == B: return A n = k/2 print A[n], B[n] if A[n] == B[n]: A= A[n:] B=B[n:] else: A=A[:n+1] B=B[:n+1] print A,B return ret_miss(A,B) This works nicely actually.Show More ResponsesIn Python: (just numbers) def rem_elem_num(listA,listB): sumA = 0 sumB = 0 for i in listA: sumA += i for j in listB: sumB += j return sumA-sumB (general) def rem_elem(listA, listB): dictB = {} for j in listB: dictB[j] = None for i in listA: if i not in dictB: return iHow about this in python, will this work? x = set(listA)-set(listB) print(x)All these supposed answers are missing the point, and this question isn't even worded correctly. It should be lists of NUMBERS, not "objects". Anyway, the question is asking how you figure out the number that is missing from list B, which is identical to list A except one number is missing. Before getting into the coding, think about it logically - how would you find this? The answer of course is to sum all the numbers in A, sum all the numbers in B, subtract the sum of B from the sum of A, and that gives you the number.# python code def missing_obj(original_lst, new_lst): for x in new_lst: original_lst.remove(x) return original_lstselect b.element from b left join a on b.element = a.element where a.element is nulltwo ways to do it using sql: 1. select * from A where not in (select * from B) -- assuming you know what element you're looking for 2. select * from (select * from A UNION select * from B) having count(element) = 1 -- again assuming you know the elementIn R: removed_element <- A[which(!A %in% B)] removed_elementDepends on the kind of elements in the lists. If they're numbers, sum(A) minus sum(B) will give the missing element. If they're characters/strings, just dump the elements of A into a dictionary and check each element in B for existence in A.[i for i in A if i not in [j for j in B]]SQL: select a.list as a, b.list as b from ListA as a full join ListB as b on a.list = b.list where a.list eq '' OR b.list eq "" ;Show More Responsesmissing_letters = [] for letter in A: if letter in B: pass else: missing_letters.append(letter) print (missing_letters)Python: sum(A)-sum(B)In SQL: SELECT A.object FROM A LEFT JOIN B ON A.object = B.object WHERE B.object IS NULL;Careful, there could be a repeated object that's being removed. i.e. A = [3, 4, 5, 6, 5] B = [3, 4, 6, 5] This is how I would do it on Python (works for numbers and strings) def missingval(lA, lB): a = sorted(lA) b = sorted(lB) c = None for i in range(len(b)): if a[i] != b[i]: c = a[i] break if c is None: c = a[-1] print(c, "was removed from list A")A = [1,2,3,4,5,6,7,8] B = [1,2,3,4,5,6,8] [i for i in A if i not in B]find the sum of the two list and subtract. ans = sum(a) - sum(b) where a and b are list of numbers.XOR all elements One or more comments have been removed. Please see our Community Guidelines or Terms of Service for more information.

Mar 29, 2017

Mar 28, 2015
 Consider a game with 2 players, A and B. Player A has 8 stones, player B has 6. Game proceeds as follows. First, A rolls a fair 6-sided die, and the number on the die determines how many stones A takes over from B. Next, B rolls the same die, and the exact same thing happens in reverse. This concludes the round. Whoever has more stones at the end of the round wins and the game is over. If players end up with equal # of stones at the end of the round, it is a tie and another round ensues. What is the probability that B wins in 1, 2, ..., n rounds?19 AnswersKey is to define the problem correctly. Let Na be the # player A rolls, and Nb be the number player B rolls. Then at the end of the first round A will have (8 + Na - Nb) and B will have (6 - Na + Nb) stones. So all you need is to compute Pr{6 - Na + Nb > 8 + Na - Nb} for round 1 victory. Subsequent rounds are even easier since all subsequent rounds can only start when the stone count is equal.I got [1/6, 15/36, 15/36, 15/36, ...]For B to Win 6 - Na + Nb > 8 +Na - Nb or, Nb - Na > 1 which can happen if (B,A) is (3,1),(4,1),(4,2),(5,1),(5,2),(5,3),(6,1),(6,2),(6,3),(6,4) Hence prob of B winning is = 10/36 = 5/18 For any round to ensue, the prior round has to end in a draw, or 6 - Na + Nb = 8 +Na - Nb or, Nb - Na = 1, for which prob = 4/36 = 1/9 Once equal, B's prob of winning is if total stones with B > total stones with A or 13/36 possibility of another draw = 1/6 For B to win in nth round, prob = 1/9x1/6^(n-2)x13/36Show More ResponsesFor B to Win 6 - Na + Nb > 8 +Na - Nb or, Nb - Na > 1 which can happen if (B,A) is (3,1),(4,1),(4,2),(5,1),(5,2),(5,3),(6,1),(6,2),(6,3),(6,4) Hence prob of B winning is = 10/36 = 5/18 For any round to ensue, the prior round has to end in a draw, or 6 - Na + Nb = 8 +Na - Nb or, Nb - Na = 1, for which prob = 5/36 Once equal, B's prob of winning is if total stones with B > total stones with A or 13/36 possibility of another draw = 1/6 For B to win in nth round, prob = 5/36x1/6^(n-2)x13/36I don't get it. Shouldn't prob of B winning given it's tie at 1st round be 15/36? given it's tie at 1st round, at the 2nd round Nb > Na can happen if (B,A) is (2,1), (3,1/2),(4,1/2/3), (5,1/2/3/4),(6,1/2/3/4/5), which totals 15 out of 36.I am getting a bit different result. In first round A player (8 stones) has 6 certain ways to win where the sum of both dices is (5,4 or -4,-5 when B draws). Since there is 36 ways in total, i took that 50% of those can be a tie. So, P of winning for A player in first round is: (6+15)/36=21/36. What am I missing?Mistake, P for A player to win in first round might be: (6+10)/36 based on 6 ways to win for sure, 10 to tie, 10 to lose and 10 to win from the 36-6 ways left.Depending on what A rolls (1-6), and then what B rolls (1-6), you are given 36 different possibilites: and then you can make a web out of that to see in what scenarios B will win in round 1. B has a 10/36 chances to win in round 1: (A rolls 1 and B rolls above 4) + (A rolls 2 and B rolls above 3) + ... A has a 21/36 chance to win in round 2 using the same idea and they will tie with 5/36 chances. If they tie, then the game goes to round 2 with A and B having the same number of chips, and so after that it'll be 1/2 chances.So many answers...Here's my version: For round1, B win only if it gets 3 or more stones than A, which is (A,B) = (1,4) (1,5) (1, 6) (2, 5) (2,6) (3,6) which is 6 cases out of all 36 probabilities. So B has 1/6 chance to win. To draw, B needs to get exactly 2 stones more than A, which is (A, B) = (1,3) (2,4) (3,5) (4,6) or 1/9. Entering the second round, all stones should be equal, so the chance to draw become 1/6, and the chance for either to win is 5/12. So the final answer is (1/6, 1/9*5/12, (1/9)^2*5/12, .....(1/9)^(n-1)*5/12) )Because at the beginning time, A has 8 and B has 6, so let A:x and B:y, then A:8+x-y and B:6-x+y; so there are 10/36 prob of B wins. And A wins prob is 21/36 and the equal prob for next round is 5/36. So for B wins at round prob is 10/36. And if they are equal and to have another round, the number has changed to 7 and 7. So A:7+x-y and B:7-x+y, so this time B wins has prob 15/36 and A wins has prob 15/36. And the equal to have another round is 6/36=1/6. So overall B wins in 2 rounds has prob 5/36*15/36. And for round 3,4,...etc, since after each equal round, the number will go back to 7 and 7 so the prob will not change. So B wins in round 3,4,...n has prob 5/36*(6/36)^(r-2)*15/36. r means the number of the total rounds.On the first round, B can win if (A,B) rolls: (1,4) , (1,5) , (1,6) , (2,5) , (2,6) , (3,6) - so there are 6 out of 36 possibilities where B wins. P(B1) [read, probability that B wins in round 1] = 1/6 On the first round, B can tie if (A,B) rolls: (1,3) , (2,4) , (3,5) , (4,6) - so there are 4 out of 36 possibilites where B ties. If B ties with A, there is a second round, where B wins with probability 15/36, A wins with probability 15/36 = 5/12, and they tie with probability 6/36. So P(B2) = 1/9 * (5/12) After the second round, the game only continues if both players have equal number of points, and the probability of a tie in each game is 1/6, so P(B3) = (1/9)*(1/6)*(5/12) Generally, P(Bn) = (1/9)*(1/6)^(n-2)*(5/12)B wins only: if A throws 4 and B throws 6. if A throws 3 and B throws 5 or 6. if A throws 2 and B throws 4, 5 or 6. if A throws 1 and B throws 3,4, 5 or 6. Hence, B wins 10/36. Also the chance of tie is 5/36 ( all events there is one possibility of tie except when A throws 6). B has to throw one greater than A. After the first throw there are only 14 (hence 7 each). Now the chances of tie are when A and B throw the same. Which means 6/36. But probability of A and B winning are the same hence that would be 15/36.Show More ResponsesBuild a 6x6 table of possible rolls and outcomes. For round 1: 5 result in a tie -> P(tie) = 5/36 21 result in A winning -> P(Awins) = 21/36 10 result in B winning -> P(Bwins) = 10/36 For round 2: (which can only start if they are tied with 7 stones each) 6 result in a tie -> P(tie) = 6/36 15 result in A winning -> P(Awins) = 15/36 15 result in B winning -> P(Bwins) = 15/36 To generalize the formula for B wins in n rounds: P(BinfirstRound) = (5/36) P(BinNrounds; N>1) = (5/36)*(1/6)^(n-2)*15/36This is a poorly worded question. Probability of the game being over in Round 1 is 10/36. Do they continue playing after round 1 ? However, if the game has to go one, the probability is 15/36 (difference between A and B is equal and greater than 1). The game will keep going only if it is 7-7, the probability of 7-7 after round 1 is 6/36. How can B win in so many rounds ? Should it not be over the moment B wins a round and has more stones ? What am I missing ?Let x be what A rolls and y be what b rolls B wins in first round if: 6+y-x > 8+x-y 2y > 2 + 2x Or y > 1 + x Possible scenarios: (1,3) (1 ,4) (1,5) (1,6) (2,4) (2,5) (2,6) (3, 5) (3,6) (4,6) Probability(B wins in 1st round) = 10/36 The game draws in first round if : y = 1+x (1,2) (2,3) (3,4) (4,5) (5,6) Probability (draws in 1st round) = 5/36 Probabilty (draw in 2nd round ) = 6/36 Now with equal stones, probability of B winning all events where y > x 5+4+3+2+1 = 15 P(b wins in R2) = 15/36 Total probability that B wins the game = 10/36 + (5/36) * (6/36)*n-3 * 15/36Bi - player B wins the ith round Ti - round i is a Tie = player A got m, player B got n We want to find the probability: P(T1, T2, ..., Ti-1, Bi) # ( , ) means AND = P(Bi | T1, ..., Ti-1)*P(T1)*...*P(Ti-1)= (P()+,,, +P())*(4/6*1/6)*(1/6)*...*(1/6) #(...) = i-2 times # P()+,,, +P() here we sum all the probabilities where player B wins #(4/6*1/6) getting Tie at the first round # (1/6)*...*(1/6) here we multiply i-2 cases were we got tie after a tie One or more comments have been removed. Please see our Community Guidelines or Terms of Service for more information.

Feb 25, 2012
 Find the second largest element in a Binary Search Tree16 Answersfind the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child.One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch.if (root == null || (!root.hasRightChild() ) { return null;} else return findSecondGreatest(root, root.getValue()); value findSecondGreatest(Node curr, value oldValue) { if(curr.hasRightChild()) { return (findSecondGreatest( curr.getRightChild(), curr.value)); } else return oldValue; }Show More ResponsesThe above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; }Soln by "mindpower" works. Thank you. I am trying to solve a similar problem Find the 2nd nearest high(in in-order traversal) value for a given node Eg: Given nums: 12 7 14 3, construct a BST. If the given value is: 7 then we should return 14 (in the sort order: 3, 7, 12, 14) if the given value is: 3 then we should return 12 (in the sort order: 3, 7, 12, 14)Generic solution in C# for any k. Notice that this example can be easily changed to find the k-th smallest node by doing a depth-first recursion on root.Left first, and then a tail recursion on root.Right. public Node GetKthLargest(int k) { return GetKthLargest(ref k, this.Root); } Node GetKthLargest(ref int k, Node root) { if (root == null || k < 1) return null; var node = GetKthLargest(ref k, root.Right); if (node != null) return node; if (--k == 0) return root; return GetKthLargest(ref k, root.Left); }recursion is not needed. SecondLargest(Node root, Node secondLarge) { if(root.right==null) return root.left; Node secondLargest = root; while(secondLargest.right.right==null) secondLargest=secondLargest.right; return secondLargest; }int getmax(node *root) { if(root->right == NULL) { return root->d; } return getmax(root->right); } int secondmax(node *root) { if(root == NULL) { return -1; } if(root->right == NULL && root->left != NULL) { return getmax(root->left); } if(root->right != NULL) { if(root->right->right == NULL && root->right->left == NULL) { return root->d; } } return secondmax(root->right); }In-order traverse the tree. The second last element in the array in the answer.In Python: def find_second_largest_bst_element(root, parent=None): if parent is None: # BST root if root.right is None: # no right subtree if root.left is not None: # if a left subtree exists... return root.left else: # root is the only element of the BST return False else: if root.right is None: # right-most element if root.left is not None: # left subtree exists return root.left else: # leaf return parent else: # check right subtree find_second_largest_bst_element(root.right, root) find_second_largest_bst_element(root)For kth smallest, descend the left subtree first. class Node: def __init__(self, value, left=None, right=None): self.value = value self.left = left self.right = right def findKthLargest(root, k): global count if root is None: return findKthLargest(root.right, k) count += 1 if count == k: print root.value return findKthLargest(root.left, k) count = 0 r = Node(10, Node(5, Node(2), Node(7)), Node(30, Node(22), Node(32))) findKthLargest(r, 3)// solution in java // main routine Node findSecondMax(Node root) { if(root == null || (root.left == null && root.right == null) return null; else { Node max = findMax(root); return (max.parent == null) ? findMax(max.left) : max.parent; } } //helper routine, recursive implementation.... can also be done non-recursively Node findMax(Node root) { return (root.right == null) ? root : findMax(root.right); }Show More ResponsesFind the largest number in the binary tree and delete it. And again find the largest number. Short and fast.Reverse in-order traversal of the BST, keeping a count of # of visited nodes. This methods works great to return the kth largest element in a BST.mindpower's solution looks right One or more comments have been removed. Please see our Community Guidelines or Terms of Service for more information.

Nov 23, 2016

Apr 22, 2017