Andreas Weigend
Stanford University
Stat 252 and MS&E 238

Data Mining and Electronic Business



Class 6 (05/13/07) - Prediction Market/Reputation System/Social Networks


Topics Addressed:

I. Prediction Markets at Google

II. Prediction Markets Software

III. Reputation Systems

IV. Social Networks



I. Prediction Markets at Google

Presenter: Bo Cowgill (website ), Project Manager of Google Prediction Market (blog), Stanford University Graduate '03

Origins, Objectives and Implementation

  • Objective:
    1. Empowering average employee
    2. Improve awareness
    3. Quantify unquantifiable
    4. Improve morale
    5. Set goals based on the probability of success (quarterly planning)
  • 25 markets per quarter for 2 years.
  • Markets about:
    • Product launches (number of Gmail users)
    • Office/position openings (will this important position be filled on time)
    • Employee quality of life (when will gym re-open)
    • Competitors’ actions
    • Fun markets (who will win the NBA championship)
  • Implementation
    • Every quarter get new cash, can't use cash from previous quarter
    • Continuous double auction with limit orders
    • Alternative would be with market makers.
    • People can not see each other’s position.

Growth and Usage

  • Market stats
    • Number of shares and number of trades has grown over time (due to added liquidity, more participants, and more cash/user)
    • Approximately 68,000 trades had occurred by the end of 2006, through a total of 2000 trade accounts
    • Long tail distribution
      • One participant has made 3100 trades in 1 year 8 months (~5 per day)
      • Approximately 400 participants have made 2 or less trades
    • A little more than 50% of trades are made by engineers (but this reflects the overall structure of the company).
    • People that are higher up (closer to CEO) and have a long tenure at Google are more likely to participate.
  • Individual performance
    • Geography was a better predictor than job function of success
    • Finance and software engineers have done better (best explained by bots?)
    • "Batting Average" increases slightly with market participation (evidence of learning or do individuals who are not good quit?)
    • Success for certain individuals has been consistent from quarter to quarter
  • Market set up:
    • Typically 5 “options”
      external image happened_by_predicted_all_weeks-707724.GIF
      external image happened_by_predicted_all_weeks-707724.GIF
    • E.g., when does a certain project launch: Jan, Feb, Mar, Apr, May or later
  • Analysis of Predictiveness
    • Average winning price 0 weeks before closing was $55; average losing price was $15
    • Some predicted better (close to $100 the week of closing), whereas others like the Bush v Gore election was still 50/50 on election day
  • Rewards
    • Money earned through trading is used to buy raffle tickets
    • Prizes are then awarded through a random drawing
    • Financial rewards seem not as strong as ‘Reputation’

Further Research


II. Prediction Markets software (open source - from sourceforge:)

Zocalo Prediction Markets

26.81%
98.70%

2,451

2005-05-31
2007-04-19
5,019
external image dbimage.php?id=79004&cached=1
external image dbimage.php?id=79004&cached=1
Zocalo is a toolkit for building Prediction Markets, markets in securities that pay out depending on outcomes of future events. They provide estimates of the likelihood of specific outcomes that are more reliable than other sources of predictions.

Serotonin Prediction Markets

93.74%
84.70%

28,723

2006-07-13
2006-07-15
267
A web-based (Java) prediction market. Users can place funds (real or play) on their prediction of the outcome of some future event. Observers have noted that such markets are better able to predict the actual outcome than experts.

Idea Futures Prediction Market

100.00%
78.78%

39,840

2005-03-23
2005-08-26
2,501
A web-based prediction market (Idea Futures) system. In prediction markets, the commodities traded are claims about future events; the market price gives a consensus probability about the event's likelihood of coming true.
Download

Commercial prediction market software / services:

This site from HSXresearch makes it very simple and clear:

############EXAMPLE##########
Will the US Dollar go up today?
Will Housing prices plunge this quarter?
Will Gold prices rise or fall today?
Will Crude Oil prices go up next week?
Buyers say YES. Sellers say NO.
Whoever is right earns $100.

###############################

III. Reputation Systems

Presenter: Ramesh Johari (email, webpage), Assistant Professor, Management Science and Engineering, Stanford University
Description: A system to recommend members of a community to each other by other members within the same community. Two examples consistently offered is the feedback system in Ebay and social networking system of Linkedin.
  • Types of Reputation: Reputation measures
    • Intrinsic quality: Not what a user does, but who they are
      • An Ebay seller who consistently offers high quality items.
      • A well connected or influential person on Linkedin
    • Good behavior: Actions by user
      • An Ebay buyer who faithfully reports a transaction ("inside" the system)
      • An Ebay seller who truthfully advertises his goods ("outside" the system)
      • Linkedin user who faithfully forwards an introduction
    • Fundamental Problem
      • Incentive problem because the rater are also rating each other
      • Need good behavior "inside" the system in order for the reputation system to work, i.e. to provide good information about the "intrinsic quality" and the "good behavior outside the system". Need to design reputation system to address this (example is ebay's reply to negative feedback option)
    • Asymmetric information exist for users to exploit
      • Hidden Informations (adverse selection) example: Seller advertise on Ebay has information about item that buyer does not
    • Hidden Action (moral hazard)
      • Evaluating whether there was good behavior
      • Example is Principle agent problem. If a company hires an agent for their expertise to make decision, the agent may act in her own interest instead of the company.
  • Sufficient statistics: reputation system provides compressed representation of past interactions
    • Define the metrics needed to sufficiently describe reputation and past interactions
    • For Ebay uses feedback scores, which contains both the rating as well as number of feedbacks
    • A better system will weigh present behavior more than past behavior (exponential weighting, moving window) to reduce incentive to cheat.
    • Other metrics: cause of negative rating, value of past transactions, distinction between buyer and seller feedback
    • Ratings can be skewed
      • Users will tend to filter out the good reviews and focus on reading the bad to judge another user.
      • When there's a skew in one of the bins, the relative value of the rating decreases
  • Cheap Pseudonyms: alternate identity acquired at no cost
    • incentive to exploit high reputation for short-term gain
    • to impost cost ("friction") to a new identity
      • explicit cost: cost to create new identity
      • implicit cost: time to build new reputation
      • provide identification
    • Sybil Attack, attack on reputation system using forged identities.
  • Recommendation and reputation
  • Preference aggregation and units of measurement
    • How do we aggregate individual ratings?
      • Key problem - difficult to quantify units of measurement: What does "4.3 stars" mean to you?
      • Can this process of aggregation be personalized?
    • Theories from Economics: No voting scheme can accurately reflect society's preferences
      • Condorcet Paradox :
        • Person 1 prefers A to B to C
        • Person 2 prefers B to C to A
        • Person 3 prefers C to A to B
        • The following are all true: (A > B) ; (B > C) ; (C > A)
        • Therefore preferences are not transitive, and the result depends on the voting scheme
      • Arrow's Impossibility Theorem:Under "reasonable" assumptions, the only way to aggregate preferences is a dictatorship.
      • Moral: The process of aggregation matters!

III. Social Networks

  • Theories and Concepts

    • "Six degrees of separation": refers to the theory that anyone on the planet can be connected to any other person on the planet through a chain of acquaintances that has no more than five intermediaries. Also referred to as the “small-world phenomenon”.
      • The concept was first proposed in 1967 by renowned social psychologist Stanley Milgram
      • Experimental proofs:
        • Milgram's small world experiment: Milgram asked 96 randomly selected people around the country to send a piece of mail to an acquaintance, who would send the mail along to another acquaintance, in an attempt to reach a designated "target" person in Boston. The messages that actually made it to their destination passed through an average of six people.
        • One modern version of the experiment: Email experiments by Peter Sheridan Dodds and colleagues at Columbia University, which involved 60,000+ participants from 166 different countries. The experiment confirmed that, in social searches, a message initiated by a random person reaches its destination in a median of 5 to 7 steps, depending on the separation of source and target. (Full paper published in Science, August, 2003.)
      • Fun stuff: try the Kevin Bacon game, a game based on a variation of the concept of six degree of separation that states that any actor can be linked, through their film roles, to Kevin Bacon (try to find an actor that is not connected to Kevin Bacon in the game – not very easy =)).
    • Popular theories/concepts related to social networks:
      • Strength of the Weak Ties: Developed by sociologist Mark Granovetter (now a professor at Stanford) in 1973 (full paper). In simple words: our inner circle has similar info, similar interests and similar contacts as we do. When we attempting to do something outside of what's typical for us (finding a job/changing careers), it's our weak ties that support and inform us. We don’t know what we don’t know, but maybe our weak ties do.
      • Structural Holes: For its application in social networks, see Ronald Burt’s “Structural Holes”. In simple words, you increase your “social capital” if you position yourself as a brokerage between two otherwise unconnected groups/social networks.
  • Research in Social Networks

    • Example: Marc Smith's work at Microsoft Research applying Social Network Analysis (click for the VIDEO of his talk at Stanford CS547: Human-Computer Interaction Seminar ). Here are some highlights of the talk (the video has a lot more interesting analyses/representation of data, but unfortunately they can't be screenshot):
      • Netscan: using social network analysis method, Netscan can monitor and analyze user behaviors on online communities (in the case, Microsoft USENET forums), including 1) finding newsgroups where others share your unique interests; 2) monitoring the health of newsgroups related to your interests and pursuits; 3) staying informed on current events and the latest trends; 4) locating sources for technical assistance and information; 5) tracking the participation of your favorite authors across Usenet newsgroups, etc. The following is an example of the representation of the hierarchy of forums under rec.music (colors indicate changes over the past month (red means increase in messages, green decrease); sizes of boxes indicate popularity of the topic):
      • netscan_boxplot_small.gif
        netscan_boxplot_small.gif
      • SNARF (the Social Network and Relationship Finder): an email client that promotes different alternatives to handle email triage, which provides a quicker (arguably better) overview of unread emails, ranked by its importance (as measured by a number of social network metrics). Screenshots of SNARF's main window and its "Thread View":
      • snarfMainwindow.jpg
        snarfMainwindow.jpg
        snarfThreadView.jpg
        snarfThreadView.jpg
    • Other interesting work
      • Click to see a collection of interesting studies using Social Network Analysis from visualcomplexity.com. The websites "intends to be a unified resource space for anyone interested in the visualization of complex networks. The project's main goal is to leverage a critical understanding of different visualization methods, across a series of disciplines, as diverse as Biology, Social Networks or the World Wide Web" (click on the snapshot below for the "social networks" collection; each mini-graph represents a project).
      • visComplexity.png
        visComplexity.png
      • (There are 55 studies on "Social Networks", many of which present network tools that visualize relationships in a variety of "communities", ranging from one's friendship network , jazz musicians , emails , the novel Les Miserables , and online communities such as Flickr , mailing list , and message boards , etc. )
  • Data Collection and Usage

    • How/Where can we get data about the proximity of people?
      • Email traffic
        • Sending/receiving emails
        • Mailing lists
      • IM (instant messaging)
        • "Buddies" list
          • Different "groups" for "buddies" - further triage of sub-networks
        • Initiation of communication/relationships
        • Response to communications
      • Blogs
        • Commenting behavior
        • Linking behavior
      • Online commuities
        • Facebook
          • Appearing in the same pictures
          • Group/"Network" memberships (for example, the "Bay Area" network)
          • Sending in-community emails
          • Testimonials
          • Initiation of frienship (adding people as "Friends")
        • LinkedIn (professional networking site)
          • Contacts' list
          • Endorsements
      • Online traffic
        • Visits to one's personal/professional website (number of visits, lengths/depths of visits)
        • Linking to one's personal/prefessional website
        • Domain names (e.g., residing on the same domain)
      • RSS feeds
        • Subsribing
      • Mobile phones
        • Stored contacts
        • Calls made/received, weighted by
          • Call initiation
          • Length of conversation
          • Non-response (ignoring vs. turning off)
          • Explicitly sending to voicemail
          • Replying of voicemail
        • SMS (Short Message Service)
      • Face-to-face communications (especially useful for people who are no so tech-savvy)
        • People living in the same community
        • People involved in the same (face-to-face) activities
    • Usage
      • Viral marketing (aka viral advertisting): refer to marketing techniques that use pre-existing social networks to produce increases in brand awareness, through self-replicating viral processes, analogous to the spread of pathological and computer viruses. It can be word-of-mouth delivered or enhanced by the network effects of the Internet.
        • Example: communication software, such as skype (you have to also download the software to communicate to those who use it)
      • Hard: blocking of communications/contacts
      • Soft: ranking of communications/contacts


Initial Contributors