when given a data set, what is the first step u take? before u start analyzing it?
I said we should clean it up. Try to see which variables we can get rid by checking for collinearity. The interviewer's followup question was - what about even before this? I said "determine the goal as in figure out what we want to accomplish". She then asked "how we do this?" We just kept going in a circle with this question...
I would say that we should visualize the data first, getting a sense of it. How many attributes it contains? What are the ranges of the attributes? What is the data type for each attribute (ordinal, categorical, ratio, interval)? How might they be related? What are the predictors and how about the response(s)? These are typically the first thing I'd like to know when investigating a data set that I have no previous knowledge of.
Feb 22, 2010