Andreas Weigend
Stanford University
Stat 252 and MS&E 238

Data Mining and Electronic Business

Class 1 (April 9)

  • Topics addressed

    • What is the class about?
    • People and data, specifically the evolution of web and user behavior, and what this means for e-businesses
      • Collection of data, modeling of human behavior, and actional insights gained
    • 5 levels of architecture
      • Data collection - Collection of data generated by customer on the web, such as "click" data, transactional data and search data; insufficient in establishing causality
      • Experimentation - Determine causality and relationships underlying data
      • Participation - Users actively contribute content (,,, editing wiki, tagging blogs)
      • Interaction - Users engage with other users and their content (Flickr, Facebook and other social networks, also discussion forums and message boards)
      • Community - Expands upon participation and interaction (MySpace)
    • Better data collection possible by moving to a higher architectural level, and indeed we have witnessed this trend in the evolution of the web
  • Distinctions made

    • Intention Economy vs. Attention Economy
      • Intention – The traditional economic system of bi-directional exchange; e.g., searching for a book on Amazon; buying a drink at a café.
        • The key problem in the intention economy is Ranking - Statistical Modeling is employed to rank search results. However, there is no standard way to evaluate search results.
      • Attention - In an information-overloaded society, value is measured in terms of how much mindshare or attention we are willing and able to devote to something.
        • An example is Attensa, an RSS feeder that allows you to subscribe to feeds of others -- the idea being, if you know somebody who is more well-informed than you in an area in which you share interests, it is more efficient for you to subscribe to same sources of information as this person than to search for these sources on your own.
        • The attention economy leads to a complete dissolution of institution.
      • Another dimension of this shift towards attention economy is personal influence vs. institutional influence.
        • Consider traditional ads vs. targeted, personalized ads. Traditional business model of most printed media is to sell ad space; in information economy, target ads to specific interests of individual (Google AdSense).
        • Trust information/recommendations from an individual based on his/her established credibility within the community. People increasingly use social networks as filter for information.
        • Question of anonymity vs. identification: How do we establish credibility of an anonymous contributor/recommender?
        • Recent survey revealed that 62% of the content that teenagers/students read comes from people they know (ie. they use their social filters, since there is so much of rubbish information around)
    • Collective vs. Individual intelligence
      • A course wiki taps upon the collective intelligence of the class, engaging all its members, whereas a traditional website only allows the individual to be involved in its creation
      • Algorithmic vs. Social Search
        • Computers and A.I. v.s. Human Knowledge – can computers produce better search results than humans? More general question of the quality of search – what defines a relevant search?
        • Yahoo! Answers is an example of social (human) search where the questions are posted to everyone.
        • On the other hand, Illumio is a social search example; it is a desk client that indexes a user's computer files to build a user profile which it then uses to identify which users are most likely to know the answer to a problem posted by someone in the community, instead of everyone. Unlike Google Desktop Search, Illumio does not index the disk's contents online.
        • Amazon's Mechanical Turk allows users to ask questions to the rest of the world by paying a fee for it (An implication of the economization of life)
        • Exploring other means to Social Searching - one possible direction is searching in bookmarks/tags/etc. among your close networks for more relevant results. This concept is from a Yahoo! researcher who gave a talk at Stanford a couple of months ago. He used the search query "Lisa" as an example (which refers to Large Installation System Administration (a conference) in this case). If you search this in Y! Search, you do not find a good match on the first screen; however, searching "Lisa" among your colleagues' bookmarks could bump up the most relevant result to the top (which was shown in his demo).
      • Prediction markets, another application of collective intelligence – Future contracts for elections, upcoming events. Idea is that free market will provide more accurate prediction than traditional research.
      • Google Zeitgeist provides a compilation of the top search terms in the US and around the world.
    • Maintenance vs. Discovery
      • Maintenance – focus is on maintaining relationships and keeping with what is familiar. Facebook is maintenance in the sense that it is primarily used to maintain relationships.
      • Discovery – focus is primarily on building new relationships and experiencing the novel. and other online dating sites fit into this category.
    • Auto-creation vs. Static content
      • In auto-creation, new content is created without user effort, whereas traditionally users have to upload new content manually.
      • Facebook is an example of auto-creation, since the content of an individual's webpage is automatically updated by newsfeeds from their friends.
    • Search vs. Discovery
      • Limitations of search vs. the increasingly important role of serendipitous discovery, i.e., stumbling upon something unexpected and pleasing.
      • Progression from search to discovery
  • Insights gained

    • Mental model
      • If your data influences the system you will put more thought into the data you provide. Put another way, a user is only willing to give as much as he/she expects to get back.
      • Example: Users will give more thought to their Netflix ratings than they would Zagat ratings
    • Success of e-businesses depends on ability to glean key data from vast quantities of available information
      • With e-business, data selection and functionalization of this data through experimentation is critical
      • Contrast with Wall St., where everybody works with the same data set and the way they look at it is what makes the difference.
    • Economization of life
      • As we move towards an ever-more digital world, our actions and interactions leave behind a digital trail of data which can be measured and analyzed by companies. Businesses use this data to improve user experience. You in turn benefit from the improved experience.
        • Example: Hitwise has detailed internet usage data which it sells to companies looking to gain insights into consumer behavior and target marketing. Much of this data comes from their relationships with ISPs. GM of Hitwise maintains an interesting blog.
    • Experiment Design and Importance of Appropriate Metrics
      • Experiments should be done in parallel as opposed to sequential. Simple way to think about this: sell blue umbrellas on Saturday and yellow umbrellas on Sunday given that it rains on Saturday but not on Sunday. How valid is the data?
      • The metrics should be agreed upon beforehand
      • Importance of Metrics
        • Amazon's checkout/recommendation page – should the purchase button be placed on left or right side of page? What are relevant metrics in this case?
          • (1) Conversion rate - percentage of visits placing an order (A 1% increase was observed with the cart on the right); (2) Order size: number of additional items in cart.
    • Levels of Analysis and Actionability
      • Pyramid model (page->visit->customer->network)
        • Frame individual pages (content) within a visit (intention, situation, mode)
        • Combine multiple visits to build a profile of a customer (demographics, behavior, personalization-based)
        • Aggregate customers to a network (social network research)
      • Models and Actions at each layer
        • Modeling Customer Behavior – the role of Behavioral Economics
          • Model: Attitude towards Complexity – Study done on conversion rate of customers given selection of 6 vs. 24 jams to sample.
          • Action: Apply results to business – number of matches displayed on, or contrast between Woot and Zazzle
        • Modeling at Network Layer – Customer Lifetime Value
          • Model: Intrinsic value – how much customer spends and Network value – how much customer can get others to spend, i.e., his/her degree of influence
          • Action: Amazon’s Share the Love Program; across-the-board service adjustments
    • Pricing of digital goods
      • Versioning - set different price levels and experiment, but need to be aware of how this may be percevied by consumers (Amazon has this feature that says 'Click here to see the price' for certain commodities that are priced so low, probably due to a sale/offer that the vendor would not want to display the price directly on the page, due to branding and other limitations. Clicking on the link takes the user to a separate page where this is explained and the incredibly low price is quoted)
      • Let customers decide which price group they'd like to sign up
    • Recommendation systems
      • 20% of Amazon's revenues are due to people clicking on recommendations.
      • Cleverset is a leader in this space.
      • Can make recommendations by modeling products or modeling people

· Summary of links to resources and examples mentioned in class


· Readings

· Homework

· Initial contributors