viral marketing

  • epidemics
  • information diffusion
  • viral marketing
  • temporal networks

Team

Chris, Alex, Fabio, Donna, Minji, Ellen

Project

Computing tasks:
1. Read Leskovec’s paper (click on name)
2. Implement Leskovec’s stochastic model
3. Import data into MATLAB (or alternative SW)
– Get the data from here: http://socialnetworks.mpi-sws.org/data-wosn2009.html
– Build an undirected graph (for now) using the first two columns of the text file: http://socialnetworks.mpi-sws.mpg.de/data/facebook-links.txt.gz
4. Extract a subgraph of 1,000 nodes using BFS
5. Run Leskovec’s model in the subgraph

Policy tasks:
1. Read Aral’s paper (click on name)
2. Think about effective marketing strategies

COMM students: have a look at the questions listed in this document and try to articulate a paragraph-long answer to them. Submit your answers to the blog.

NETS students: due on Friday, April 3
1. Read and review the following papers:
a. http://cs.stanford.edu/people/jure/pubs/viral-tweb.pdf
b. “Identification of Influencial Spreaders in Complex Networks” by Kitsak et al. in Nature 2010.
2. In the following steps, you need to use the network imported in Phase I.
3. Design viral marketing campaigns based on the following criteria:
a. Largest degrees
b. Eigenvector centrality
c. k-core decomposition (see Kitsak’s paper)
d. Cascading size (to be covered in class)
4. Code algorithms to implement each one of the above criteria:
a. Run your algorithms in the network extracted in Phase I
b. Use a number of seeds that varies from 1 to 20
c. Compare the effectiveness of each strategy

Due on Friday April 10.
All the students in the group should collaborate to solve the tasks below.
1. Code an algorithm to visualize the state of the campaign over time
a. Find a network layout that allows you to visualize the network in a convenient manner
b. Each node should change colors, from green to red, depending on the adoption level of each node
c. Edges should light up whenever they transmit information in a particular time
Due on Friday April 17.
All the students in the group should collaborate to solve the tasks below.
1. Consider other spreading criteria to increase influence
2. If you are a Facebook engineer, how would you define the edge probabilities (weight/bandwidth) in order to increase the effectiveness of a viral marketing campaign?
3. Think about the implications of this research for implementations and policy.
You should prepare, as a group, a slide show and present your work in class on Friday April 24. The presentation should be less than 20 minutes long. Send the slides to Josh (email below) no later than Thursday April 23!
 

Do not forget to update the blog with your advances! You will have to register your email the first time you post. If you have any figures, send them to Joshua Becker: jbecker[at]asc.upenn.edu. We will add them to the projects portfolio!

white_rectangle
figures
white_rectangle

Sandra Bailonviral marketing

Comments 11

  1. Stefania Maiman

    In Phase II of our project, we examined the differences of viral marketing strategies depending on which nodes in the network were seeded based on largest degrees, eigenvector centrality, k-core decomposition, and cascading size. After many simulations to obtain roughly ~15% adoption, we found a spreading value, p, of 0.016. Our results are organized by method and number of seeds and show the percent of nodes that adopt the “product” as an average of 100 simulations. Here are our results:

    Largest Degree:
    Number of Seeds: 1 Result: 0.15403
    Number of Seeds: 2 Result: 0.152234
    Number of Seeds: 3 Result: 0.154102
    Number of Seeds: 4 Result: 0.156543
    Number of Seeds: 5 Result: 0.155355
    Number of Seeds: 6 Result: 0.156065
    Number of Seeds: 7 Result: 0.155578
    Number of Seeds: 8 Result: 0.156184
    Number of Seeds: 9 Result: 0.156853
    Number of Seeds: 10 Result: 0.157841
    Number of Seeds: 11 Result: 0.156584
    Number of Seeds: 12 Result: 0.156567
    Number of Seeds: 13 Result: 0.157253
    Number of Seeds: 14 Result: 0.158218
    Number of Seeds: 15 Result: 0.15832
    Number of Seeds: 16 Result: 0.158069
    Number of Seeds: 17 Result: 0.15629
    Number of Seeds: 18 Result: 0.157587
    Number of Seeds: 19 Result: 0.157921
    Number of Seeds: 20 Result: 0.15752

    Eigenvector Centrality:
    Number of Seeds: 1 Result: 0.001657
    Number of Seeds: 2 Result: 0.0452
    Number of Seeds: 3 Result: 0.058827
    Number of Seeds: 4 Result: 0.069913
    Number of Seeds: 5 Result: 0.062396
    Number of Seeds: 6 Result: 0.064276
    Number of Seeds: 7 Result: 0.099681
    Number of Seeds: 8 Result: 0.079465
    Number of Seeds: 9 Result: 0.087644
    Number of Seeds: 10 Result: 0.103731
    Number of Seeds: 11 Result: 0.11207
    Number of Seeds: 12 Result: 0.123481
    Number of Seeds: 13 Result: 0.144904
    Number of Seeds: 14 Result: 0.152007
    Number of Seeds: 15 Result: 0.150062
    Number of Seeds: 16 Result: 0.155263
    Number of Seeds: 17 Result: 0.157112
    Number of Seeds: 18 Result: 0.155026
    Number of Seeds: 19 Result: 0.157909
    Number of Seeds: 20 Result: 0.158783

    K-core:
    Note: When analyzing the size of the cores for implementing the k-core decomposition, we noticed that the highest core (47) was made up of ~500 nodes. Therefore, seeding the top core causes tremendous spread since seeding 500 nodes is much more significant than seeding 20 nodes. Here are the results of seeding the entire core:

    Core: 1 Result: 0.113559
    Core: 2 Result: 0.164686
    Core: 3 Result: 0.175547
    Core: 4 Result: 0.183317
    Core: 5 Result: 0.186323
    Core: 6 Result: 0.187406
    Core: 7 Result: 0.18787
    Core: 8 Result: 0.194621
    Core: 9 Result: 0.188718
    Core: 10 Result: 0.19481
    Core: 11 Result: 0.19472
    Core: 12 Result: 0.193636
    Core: 13 Result: 0.190666
    Core: 14 Result: 0.204797
    Core: 15 Result: 0.206568
    Core: 16 Result: 0.201897
    Core: 17 Result: 0.200081
    Core: 18 Result: 0.199878
    Core: 19 Result: 0.198581
    Core: 20 Result: 0.194038
    Core: 21 Result: 0.214042
    Core: 22 Result: 0.200447
    Core: 23 Result: 0.191034
    Core: 24 Result: 0.214792
    Core: 25 Result: 0.241468
    Core: 26 Result: 0.164804
    Core: 27 Result: 0.176777
    Core: 28 Result: 0.164914
    Core: 29 Result: 0.169519
    Core: 30 Result: 0.170274
    Core: 31 Result: 0.171262
    Core: 32 Result: 0.173851
    Core: 33 Result: 0.166429
    Core: 34 Result: 0.16973
    Core: 35 Result: 0.165838
    Core: 36 Result: 0.168541
    Core: 37 Result: 0.173326
    Core: 38 Result: 0.177772
    Core: 39 Result: 0.159496
    Core: 40 Result: 0.166992
    Core: 41 Result: 0.163682
    Core: 42 Result: 0.16334
    Core: 43 Result: 0.162834
    Core: 44 Result: 0.161622
    Core: 45 Result: 0.16168
    Core: 46 Result: 0.168362
    Core: 47 Result: 0.192433

    To prevent this naive implementation, we decided to combine k-core decomposition with the other algorithms to be able to select the top 20 nodes in the highest core. So, for example, when combining k-core with highest degree centrality, we selected the top 20 nodes from the highest core based on their degrees:

    Number of Seeds: 1 Result: 0.137649
    Number of Seeds: 2 Result: 0.152938
    Number of Seeds: 3 Result: 0.151646
    Number of Seeds: 4 Result: 0.152765
    Number of Seeds: 5 Result: 0.152946
    Number of Seeds: 6 Result: 0.15526
    Number of Seeds: 7 Result: 0.154436
    Number of Seeds: 8 Result: 0.154334
    Number of Seeds: 9 Result: 0.155494
    Number of Seeds: 10 Result: 0.155742
    Number of Seeds: 11 Result: 0.156498
    Number of Seeds: 12 Result: 0.154422
    Number of Seeds: 13 Result: 0.154506
    Number of Seeds: 14 Result: 0.154106
    Number of Seeds: 15 Result: 0.15596
    Number of Seeds: 16 Result: 0.155022
    Number of Seeds: 17 Result: 0.154045
    Number of Seeds: 18 Result: 0.156237
    Number of Seeds: 19 Result: 0.156936
    Number of Seeds: 20 Result: 0.155538

    Next, we also selected the top nodes from the highest core based on their cascade centralities:

    Number of Seeds: 1 Result: 0.137649
    Number of Seeds: 2 Result: 0.152938
    Number of Seeds: 3 Result: 0.151646
    Number of Seeds: 4 Result: 0.152765
    Number of Seeds: 5 Result: 0.152946
    Number of Seeds: 6 Result: 0.15526
    Number of Seeds: 7 Result: 0.154436
    Number of Seeds: 8 Result: 0.154334
    Number of Seeds: 9 Result: 0.155494
    Number of Seeds: 10 Result: 0.155742
    Number of Seeds: 11 Result: 0.156498
    Number of Seeds: 12 Result: 0.154422
    Number of Seeds: 13 Result: 0.154506
    Number of Seeds: 14 Result: 0.154106
    Number of Seeds: 15 Result: 0.15596
    Number of Seeds: 16 Result: 0.155022
    Number of Seeds: 17 Result: 0.154045
    Number of Seeds: 18 Result: 0.156237
    Number of Seeds: 19 Result: 0.156936
    Number of Seeds: 20 Result: 0.155538

    And finally, we selected one node from each core and selected this node based on its cascade centrality to get these results:

    Number of Seeds: 1 Result: 0.043358
    Number of Seeds: 2 Result: 0.06721
    Number of Seeds: 3 Result: 0.112706
    Number of Seeds: 4 Result: 0.130003
    Number of Seeds: 5 Result: 0.140617
    Number of Seeds: 6 Result: 0.151632
    Number of Seeds: 7 Result: 0.150573
    Number of Seeds: 8 Result: 0.153809
    Number of Seeds: 9 Result: 0.154747
    Number of Seeds: 10 Result: 0.157013
    Number of Seeds: 11 Result: 0.157295
    Number of Seeds: 12 Result: 0.158308
    Number of Seeds: 13 Result: 0.156674
    Number of Seeds: 14 Result: 0.15841
    Number of Seeds: 15 Result: 0.159814
    Number of Seeds: 16 Result: 0.158921
    Number of Seeds: 17 Result: 0.15867
    Number of Seeds: 18 Result: 0.159403
    Number of Seeds: 19 Result: 0.159429
    Number of Seeds: 20 Result: 0.157662

    Cascade (using hill climbing algorithm to determine size of cascade):
    Number of Seeds: 1 Results: 0.141957
    Number of Seeds: 2 Results: 0.148079
    Number of Seeds: 3 Results: 0.150383
    Number of Seeds: 4 Results: 0.153696
    Number of Seeds: 5 Results: 0.153699
    Number of Seeds: 6 Results: 0.153551
    Number of Seeds: 7 Results: 0.153865
    Number of Seeds: 8 Results: 0.153313
    Number of Seeds: 9 Results: 0.15433
    Number of Seeds: 10 Results: 0.154668
    Number of Seeds: 11 Results: 0.154373
    Number of Seeds: 12 Results: 0.153658
    Number of Seeds: 13 Results: 0.152733
    Number of Seeds: 14 Results: 0.154657
    Number of Seeds: 15 Results: 0.153191
    Number of Seeds: 16 Results: 0.155978
    Number of Seeds: 17 Results: 0.155468
    Number of Seeds: 18 Results: 0.154967
    Number of Seeds: 19 Results: 0.1565
    Number of Seeds: 20 Results: 0.15717

    After analyzing the results obtained from several simulations of each algorithm, we found that our results seemed to converge to the same spread of approximately 15% when seeding 20 nodes. Therefore, at this moment in our project we don’t seem to have results that are statistically significant to be able to prefer a certain method over another one. We believe that this is due to the shape of our graph, or rather the sparsity of the network (average degree ~18). We believe that once we have a more visual representation we can find a better algorithm/ tune our ‘p’ value to be able to increase the spread.

  2. Ellen Lee

    Minji – Here is the notes from the first meeting with Donna, Chris, and me.
    Chris & Donna – Please feel free to correct or update anything I say here. I may have misinterpreted some information.
    Chris – Were you able to reach Victor and ask if the brain group can join us instead?

    NOTES FROM 1ST OUTSIDE MEETING:

    1. We need NETS people, a Coder and Cloud person:
    Alex and Fabio aren’t participating in the group anymore. One idea we discussed was asking Victor to make the NETS guys from the brain project to join our team. Chris said that he’ll contact Victor to ask the question.

    2. Overview of NETS data:
    The dataset is a time graph from Facebook – list of links and wall posts from the early days of Facebook around early 2000’s. There are four categories (DVD, music, book, and video recommendations) via email referral. Chris admitted that he’s not sure about the trends/context since Facebook has changed quite a bit since the data was collected. It could be used to understand a product preference. The data itself is a series of numbers that has “castic probability” where some information could be “random happenings” or “noise with potential meaning.” They show evident effects but the cause is unclear.

    3. Background, Talents, and Goals:
    I asked Chris and Donna about their backgrounds, talents, and goals to get a better feel of how we can best utilize our skills to what end since the overall direction of the project is still undecided. Minji, maybe you can also add a little bit about yourself and the startup you’re working on?

    Chris: identifies as a Math person for NETS side of things but can also code. Is ready to join a startup that brokers data as middlemen between the Big Data producers and users by anonymizing it for the data producers’ privacy. They also build statistical models based on the data collected.

    Dana: also identifies as a Math person for NETS side but can also code. Is working on developing and marketing a new social media app, Toast, with a friend. Donna showed the app to us and it’s super neat! It has a very local focus between close friends and the activities they do (like hanging out at restaurants, coffee shops, etc.) , sort of a scrapbook. Their business model will most likely be selling data to local companies who can use the info to better market to the app users. Much of what she’s doing in this project is hyper-relevant.

    1. Minji Kwak

      Hi all,

      Sorry it took so long to respond. It’s been crazy between Spring Break and midterms.
      Thanks for the update Ellen!

      Looks like we really need other engineering students on our team. Hopefully the brain people can join us.

      As for the NETS data, does that imply we will focus on Facebook for our project?

      A little about my background, talents, and startup. I have a lot of experience in various marketing functions, mostly from the fashion/luxury/retail industry. I like analytics, strategy, and social media content management. I think they’re pretty relevant to the viral marketing aspect of this product; I really love behavioral economics which I think will play a strong component in enticing virality. Donna’s start up sounds very cool! Mine isn’t a social network; I’m working with a team of physicists, web developers and film students on an online film distribution platform. Basically, many films are made each year that never see release; there are also foreign films that have trouble picking up distribution in the US. Our platform will stream those on a unique payment model for a limited amount of time to gain these films more attention to eventually be picked up by bigger distributors. I work on marketing in a dual role of marketing to both consumers as well as filmmakers, enticing the former to use our platform and the latter to collaborate with us. It’s pre-launch so there is a lot of initial brand strategy and PR involved.

      Hope that helps in terms seeing where my skills fit. I love marketing so it’s great to work on this project. Please do update me on anything about the social platform we are focusing on or the status of the engineering side of the team. Thanks!

      Minji

      1. Ellen Lee

        Hi Miniji,

        Thanks for providing your experience and interests here! I’m afraid that I’m not involved in any startups but I do have some experience doing online marketing work for an artist which involves WordPress (web design and blogging), Facebook, and Instagram. I’m also involved with dotCOMM, a blog for COMM majors by COMM majors, as an editor. These jobs naturally demand that I know as much about viral marketing as possible though I don’t have any chance of making pretty pennies like some of you!

        As far as your question about the focus being Facebook specifically, my guess is no. I think we’re still a bit early in the stage and isn’t it possible that Victor might give us new data from other sources like Twitter or Amazon? Though I think what your implying is right on… It reminds me of the paper we read before spring break by Crawford, “Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon.” The tool of research can define the content. Viral marketing is technically anything that’s contagious, right? Actual word of mouth — where people talk about products in coffee shops, school, etc. — is a powerful factor but it’s difficult to measure with Big Data. So most likely we’ll be limited to a platform like social media.

  3. Ellen Lee

    Testing! My computer has been a problem. I’m uploading this via my phone. I will upload my real feedback very soon. Thank you!

  4. Donna Lee

    Thanks to Minji who provided a nice summary on our reading. There are few more things I would like to point out from the reading and will continue write about that in terms of marketing strategy.
    First, the authors address that “viral product design may be more effective in encouraging new product adoption than traditional marketing strategies” (1633). I think we should closely go over this part in our project. We have to ask questions such as: what types of platforms do we have/can we utilize/can we afford? Are there more creative and newer technological marketing strategies that we can use other than Facebook? These questions seem to be essential in the first part of our project. More than anything else, we should decide WHICH PLATFORM are we going to use and FROM WHO’S PERSPECTIVE.
    Secondly, as the authors point out, we should consider the importance of social cost along with financial cost needed for marketing. Since it became much easier to send out information indiscriminately, “wrong” marketing strategies can make people overwhelmed with marketing information easily and quickly that will result in negative brand image at the end. When we are making an equation on the effectiveness of a marketing strategy, this component seems to be a very important one. By adding this parameter, we should find the most effective point instead of going infinite forever.
    Lastly, it would be interesting to do a cross study between different products (not different media platforms we are going to use for marketing) to see how much impact does different product type have on marketing strategy. I cannot think of a specific example top of my head, but different products should be more suitable with different platforms. This part would mostly integrate sociological research on how people process information and what kind of expectations people have in terms of receiving information. By meeting those expectations, or coming up with completely novel platform, we can penetrate into markets more effectively. I’m currently working on a new social media platform with my friend and working on how to spread the product to as many people as possible. It would be interesting to incorporate ideas gathered from our project to a real situation. And willing to share what happens in reality too!

  5. Christopher Hockenbrocht

    A more informative post on what was done:

    1. The paper. Leskovec’s model studies stochastic graph processes for four different types of networks based on a model. The model associates the concept of an edge with a recommendation for a product, stored as ‘metadata’ on the edge. Most notably the model is time dependent and only counts the first purchase from a series of recommendations, within the timeframe of a week (although data analysis in the paper isn’t entirely dependent on this first purchase)The author’s contrast this model with other models (SIS, SIR, SIRS, etc.), generally saying that they aren’t the most applicable to studying viral marketing, despite the linguistic connection (“viral”). He looks at various rates on the four dataset types that he has (DVD, books, music and video recommendation networks), and looks as subgraphs and trends in the subgraphs. Most notably, he looks at associations of strongly connected components as communities and looks at the merging of communities. From there, the authors look at how number of recommendations affect the probability of purchase and the various economic incentives (i.e., possibility of discount) affects purchase. The authors also curiously examine when in the day things were purchased for various networks, the amount of recommendation within certain communities within a network, and various other facets, especially concerning probabilities within a network of responding to purchases.

    Quite frankly, we will likely have to use an expanded set of more than 1000 nodes to really apply this model to our dataset of facebook friends, and work on classification.

  6. Christopher Hockenbrocht

    I have a subset of nodes. I’m very interested to see if any cascading effects can be observed in such a small dataset. Shortly I will begin modeling with Leskovec’s model, using the overlay of wall post data with the friend network of the small subset of the facebook data. Sorry to leave this post somewhat uninformative, but I am proceeding slowly with implementing this complex model. We will need to come up with a classification system for wall posts on facebook shortly, to gain the full benefit of the Leskovec model.

  7. Minji Kwak

    Hi all,

    I wanted to (1) sum up findings in Aral’s papers and (2) throw out general ideas for implementation in a viral marketing strategy, particularly for Facebook.

    Aral’s paper examines whether broadcast notifications or personalized invites are more effective in adoption of a Facebook app. The research design involved randomizing people who used the app to either have the viral feature of broadcast or invites or none. (sidenote: he makes a distinction between viral characteristics and viral features, the former being more of the content and psychological effects and the ladder being the features that allow sharing). With those who adopted *from* a share, they were randomized into a group as well. The findings show that while broadcast led to more overall adoption rates than invites (with the baseline of no sharing features being the lowest), it had a smaller marginal increase than invites. This can be explained by the fact that less energy is exerted when using broadcast (because they’re just notifications sent out to the network) than invites (personalized messages), and thus those are easier to send out BUT since invites are more personalized/thought out, more people adopt in that pool of people. In other words, broadcast = more people, but less rates of adoption within the pool; invites = less people, higher rates of adoption within the pool. These are the basic general findings. Other details were about alternative explanations of adoption rates related to viral features, but not directly as a result of viral features. Most of these findings pointed to the fact that viral features alone do not fully explain sharing. Lastly, the discussion pointed to the idea that since broadcast and invite features garnered high rates of adoption and were relatively cheaper than display ads, that shoving more budget to these kinds of viral features may be more effective in the long run.

    It is important to note that this research was done in partnership with a Facebook app company, and thus, findings may be different based on different types of apps/companies/platforms.

    Now, onto the fun part – marketing strategy!

    Facebook: again, I think depending on what it *is* that we want shared, the approach we use will be different. Since overall findings point to the fact that invites have higher marginal adoptions yet notification broadcasts have higher overall adoptions, I want to see if a combination of these trumps both. This may be complicated by the fact that some users will inherently prefer one over the other. Still, an experiment with a similar design to Aral’s comparing invite/notification/both/baseline could be illuminating. Also, utilizing these viral features may work for apps/games, but what about for other services? I would find it hard to really use a “broadcast” notification for the startup I’m in, which is about distribution of indie films. However, an invite strategy would work much better to show a sense of exclusivity – you only gain entrance through invites. It’d be interesting to see if certain strategies work significantly better for certain businesses. Also, to the point about ad budget on viral features v display, I think it’s a bit extreme to rule out display altogether. Sure, a viral feature may work to boost adoption, but if someone has never heard of it before/doesn’t know what it is, than what’s the point? I get a lot of “invites” to play some random game — I don’t know what they are, so I don’t sign up. A display ad may boost more awareness to help in higher rates of adoption.

    I think a good model of the invite-only strategy is how Google utilizes it for certain product launches. For example, its new Inbox is currently invite only. While it may seem counterintuitive for viral marketing, the idea here is that people want what they can’t have — and they’ll talk about it. I didn’t even know what Inbox was till a few people on my (ironically enough) Facebook feed started posting about wanting an invite. In this way, a viral feature also becomes a viral psychological characteristic.

    TL;DR – Invites and notifications boost adoption rates. We should test out these ideas for different types of business models on Facebook. Invite-exclusivity may be an idea to fiddle around with to gain WOM.

    -Minji

Leave a Reply

Your email address will not be published. Required fields are marked *