"Every business collects data, and it's the job of the data scientist to analyze, interpret, and communicate that information in a way that will help drive company decisions. In an interview, expect to answer technical questions about your ability to perform quantitative tests as well as create clear visualizations of large, complex data sets. Come ready to discuss past projects you've worked on and how you communicate data findings clearly and concisely in order to help solve business-related problems."
How do you take millions of users with 100's of transactions each, amongst 10k's of products and group the users together in a meaningful segments?
Of course there are many ways to separate the market. But apple has already got several segments that I believe work. First is the Mac line, within this is The education market. This includes 3 segments. Instructors, Students, and Schools. Instructors will be more likely to spend more on a single product, and buy software relevant to their subjects, but these decisions will influence there students to do the same, but generally students will seek a "value" product, and will buy software based on requirements. School on the other hand will buy a large amount of Computers and software at once, which also effect instructor and student purchases. So selling to schools will raise the sales in both other categories, and selling to instructors will raise the sales for students. This is just the first segment. You also have corporate industries which are similar to Education. Now lets move to the iPhone Segment within this segment you have to ask, why do people buy iPhone. There is the High-Tech segment, meaning those who always want the newest and best. Then you have the Mid-Tech segment. These are those that don't feel it is logical to flip out phones each year, they wait for two years before buying a phone. Now lets move into iPad. Interestingly this segment can move from business, to leisure. The business segment seeks to have an iPad because it allows them to get work done faster and easier. The leisure market seeks to have an iPad because it brings them entertainment and helps them relax. Then lets go to iPod. The wonder of the iPod, the product that sent Apple on a crash course to stardom. I believe the greatest segment for the iPod would be parents wanting to get a gift for kids / something to keep kids entertained. because the iPhone acts as a iPod there is a spill of sales that goes to iPhone, although the iPod touch does offer an affordable alternatives to those who do not want an iPhone. Although the iPod Nano does capture the convenience segment. These are just the segments for the Main Products of apple.
You can group similar users and similar items by calculating the distance between like users and items. Jaccard distance is a common approach when building graphs of items x users relationships. For each user you have a vector of N items that they had the potential to buy. For each product you have a vector of M users that bought that product. You can calculate a euclidean distance matrix of user x user pairs and product x product pairs using these vectors. Calculating the distance between u1 and u2: f(u1, u2) = intersection(u1, u2) / (len(u1) + len(u2) - intersection(u1, u2)) same with products: f(p1, p2) = intersection(p1, p2) / (len(p1) + len(p2) - intersection(p1, p2)) You do this for each of the N^2 and M^2 pairs. Then you rank each row of the euclidean matrices for the product matrix and the users matrix. This will give you rows of rankings for each user; Example: "product p1's closest products p4, p600, p5, etc..." These rankings are according to purchase behavior. Similar to Amazon's "people who bought this also bought..." This is only working with the purchase graph. You could segment users by price of item bought. Someone who bought a Macbook retina probably have enough money to buy an another expensive laptop but kids of only paid $30 for headphones probably don't.
That is one way but also clustering algorithms can help in doing it in a more efficient ways