Andreas Weigend
Stanford University
Stat 252 and MS&E 238

Data Mining and Electronic Business


Homework 6

(due Sunday, May 20, by 5PM)
Please complete Task 1 and one of the three Options of Task 2.

Task 1:

Our time slot for the final exam is on Wednesday, June 13, 2007, from 12:15PM to 3:15PM in the classroom (Gates B03). There will be no final exam, but given that the Monday in the first week of classes had no classes yet, and give that Memorial Day had no class, we want to offer a final exciting class on that day if you are interested, Please enter your thoughts on what topics related to the class you want to hear about in our Topic wishlist wiki. Please also put your contribution in a short email to stat252spring2007@gmail.com. so it can be evaluated by the TAs. with an indication whether you would be able to attend that day. And please feel free to send email about any further thoughts about the content for June 13 as well as feedback about the course in general to Prof Weigend directly.

Task 2:

These are the three options for Task 2. You need to complete one of these three.
  • Option 1: If your group is one of the finalists for the Consumer Confidence Index (CCI) project, continue to work on formulating an index. Prof. Weigend sent you an email on Saturday, May 12, with more detailed instructions.
  • Option 2 (may be done in groups of up to 4 people): Implement a prediction market on any software platform of your choice. (If you are ambitious, you might consider designing your own software.) The objective of this exercise is for you to obtain first-hand experience setting up the market. Your assignment should thoroughly document the implementation of your prediction market. At a bare minimum, please address the following issues:
    • What software platform did you use? Why did you choose this particular platform?
    • How did you advertise your prediction market? How do players enter?
    • Describe the types of contracts being traded in your market. Who is responsible for writing these contracts?
    • Describe the dynamics of the market transactions. What insights have you gained based on the market transactions?
    • What were the major difficulties that you faced in both setting up and running your prediction market?
  • Option 3 (to be done individually): This option consists of 3 parts. It is designed to give you a deeper background understanding for the class on recommender systems. If you decide to do Option 3, please submit about one page for each of the following 3 parts:
    • Part 1 -- the early days. More than a decade ago (only 3 years after the first Web browser was created, thus in retrospect already a very very long time ago in weighted Web history), the ACM published an issue on Recommender Systems. The Introduction by Resnick and Varian takes you through the issue, and you should be able to download the specific articles your are most interested in. Please answer on one page: How do you evaluate the progress in the field since? Which of the promised have come true? Which ones have not? What has happened instead?
    • Part 2 -- current approach. One of the most important methodological contributions to the field has been the application of relational probabilistic models to recommender systems. Familiarize yourself with the key ideas of this tool and describe in one page the differences to the standard approach, such as Amazon.com's item-by-item filtering.
    • Part 3 -- Netflix contest. The online movie rental company Netflix ran a data mining contest asking participants to predict how much a user will enjoy a movie based on his/her previous movie ratings, see http://www.netflixprize.com/. Besides marketing, the hope for Netflix might have been to get ideas on how to improve their algorithm. Please take the following questions as a starting point for a one-page critical evaluation of the Million Dollar prize (that was withheld since none of the participants met the goal of 10% improvement): Was the task set up well, especially, was the evaluation set up appropriately? Discuss "ground truth" and offline vs online evaluation, How would you have set this up instead? What were the key learnings?