Interview Question

Data Scientist Interview

-

Meta

Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?

AnswerAdd Tags

Interview Answers

3 Answers

20

Over the last 7 days, how many users made more than 10 searches? WITH search_agg as( SELECT user_id, count(search_id) as count_search FROM search WHERE date>current_date -interval '7 days' GROUP BY user_id HAVING count_search>10) SELECT count (distinct user_id) FROM search_agg

Anonymous on

4

DROP VIEW IF EXISTS users_clicked; DROP TABLE IF EXISTS searches; CREATE TABLE searches ( "Date" VARCHAR(10), search_id int , user_id int, age_group VARCHAR(25), search_query VARCHAR(255) ); INSERT INTO searches ("Date", search_id, user_id, age_group, search_query) VALUES ('2020-01-01', 101, 9991, ' 4; CREATE VIEW users_clicked as( SELECT s."Date", s.search_id, s.user_id, s.age_group, sr.clicked FROM searches as s Left join search_results as sr on s.search_id = sr.search_id); SELECT '-----users_clicked------' as msg; SELECT * from users_clicked; -- how many costumers clicked over the past 10 days SELECT count(distinct user_id) filter (where clicked=True and CURRENT_DATE - TO_DATE("Date", 'YYY-MM-DD')<=10) From users_clicked;

Thoughts? on

5

import pandas as pd from datetime import datetime,timedelta df = pd.read_csv(searches) df['date'] = pd.to_datetime(df['date']).dt.date last_7 = (df['date'] - datetime.now().date())10) num_searches_per_user_over_10 = num_searches_per_user.sum() percent_searches_per_user_over_10 = num_searches_per_user.mean()

Frank on

Add Answers or Comments

To comment on this, Sign In or Sign Up.