## Team

Ethan, John, Emmanuel, Gabriel, Yunhee, Tiffani## Project

1. Read Mislove’s paper (click on name)

2. Import data into MATLAB (or alternative SW)

– Get the data from here: http://socialnetworks.mpi-sws.org/data-wosn2009.html

3. Replicate Figures 1,2,3, and 4, from Mislove’s paper.

Policy tasks:

1. Read Burt’s paper (click on name)

2. Think about how you can become a broker in a network

NETS students: due on Friday, April 3

1. Read and review Leskovec’s paper on Microscopic Evolution:

a. http://www.cs.cornell.edu/~lars/kdd08.pdf

2. Import the temporal network, including time stamps for the links, in your software of choice:

a. http://socialnetworks.mpi-sws.org/data-wosn2009.html

3. Find the person with the median degree and the maximum degree.

4. Plot the graphs around these persons, with radius one.

5. Extract the subgraph with no time stamp (NaN). We call this graph the *initial graph*. Then, count the number of edges, wedges, triangles, spoons, quadrangles and pentagons in this graph. We call this set of subgraphs the *relevant* subgraphs.

All the students in the group should collaborate to solve the tasks below.

1. Design an

*efficient*algorithm to count how many relevant subgraphs are created by the addition of an edge.

2. Implement Leskovec’s microscopic evolution algorithm and apply it to study the evolution of subgraphs over time in the temporal Facebook graph.

3. Compare the evolution of subgraphs in the real network and in Leskovec’s model.

All the students in the group should collaborate to solve the tasks below.

1. How would you modify Leskovec’s algorithm to obtain an evolution of subgraphs that is closer to the real network?

2. What are the implications of this research for how we think about social capital and the opportunities that networks create for individuals and groups?

3. How can we rewire social networks to increase brokerage potential?

Do not forget to update the blog with your advances! You will have to register your email the first time you post. If you have any figures, send them to Joshua Becker: jbecker[at]asc.upenn.edu. We will add them to the projects portfolio!

## Comments 12

Here is a link to our Phase IV answers!

https://docs.google.com/document/d/1j2n9xmlUaSQIyarY56_Wij8Gx-aEb5j7oaJZZsqDX44/edit

Here’s a gameplan:

Content for Presentation

1) Background:

Nets: Leskovecs Microscopic Evolution

Comm: What they read?

2) What We Have Done:

Phase II: Extracting the initial graph, and counting the relevant subgraphs.

We were given a Facebook graph on ~60,000 nodes. We reduced it to the first 6,000.

We extracted the initial graph by only allowing for ‘\n’ edges between those first 6,000 nodes.

We counted the number of relevant subgraphs of the initial graph.

Phase III: Counting subgraphs over time, and implementing two new models.

We designed algorithms to count the increase in number of relevant subgraphs for each added edge.

3) Still TO DO:

We will find the number of relevant subgraphs over time for the real network. We start with the initial graph and for each added edge, we update the number of relevant subgraphs and plot this over time.

We will implement preferential attachment, and track the number of relevant subgraphs over time.

We will implement random-random triadic closure, and track the number of relevant subgraphs over time.

We will compare these 2 models to the results for the actual network.

Phase IV: Modifying our model to better replicate the evolution of the real network.

We will use a probabilistic combination of preferential attachment and triadic closure, and find which ratios work well to replicate the real network’s evolution.

**We should think about other modifications we can make to the model to further increase performance. **

2. What are the implications of this research for how we think about social capital and the opportunities that networks create for individuals and groups?

3. How can we rewire social networks to increase brokerage potential?

We should also try to answer these questions…

For a detailed analyses of the model we are using to evolve an accurate network look here:

https://docs.google.com/document/d/1y-fKze33dd_cpDWZa7SXC5vNjgQqAAF_Of_vHmKY_N8/

At the end, it includes the model itself, and suggestions for calculating the relevant parameters. It also begins a discussion on how to validate the model so we can look at performance.

First 5 median degree nodes:

97

134

208

274

462

Only maximum degree node:

2332

Review of Microscopic Evolution of Social Networks

This paper addresses the evolution of networks with regard to node arrival and edge creation, which collectively lead to macroscopic properties of networks. The researchers utilized four large online social networks (Fickr, Delicious, Yahoo Answers, and LinkedIn) with full temporal information about node and edge arrivals. In contrast to other research in this area, this paper attempts to validate models by looking at the microscopic node behavior, rather than comparing the resulting macroscopic network statistics.

The researchers found that there was a lot of variation in the node arrival rates across the four networks, so the node arrival rate is an input to their model. Upon arrival, a new node draws its lifetime and keeps adding edges until it reaches the end of its lifetime. The edges inter-arrival rate follows a power law with exponential cut-off distribution. The researchers found that edge initiations accelerate with age (node degree), which leads to power law out degree distributions. The researches found that 30%-60% of edges are local and close triangles. Therefore, they decided to have the source node chose and intermediate node uniformly from its neighbors, and then this node does the same. The researchers argue that this simple model accurately reflects the true evolution of networks, while maintaining enough flexibility to model networks with very different structures and evolutions.

The implications of this research are potentially quite useful. One implication is that all we need to know about a network to model its evolution on a macroscopic scale is the node arrival rate, the lifetime distributions, and parameters to the gap distribution. This could allow us to evolve a large scale social network years into the future to asses the macroscopic properties and compare them to the present. This type of research could be combined with research into information propagation to see what would happen to a piece of information in the predicted future evolution of a network. These are just a few of the implications of this research.

Review of “On the Evolution of User Interaction in Facebook”

This paper produces some very interesting findings. There has been a good bit of research into activity networks, primarily driven by a desire to determine the strength of ties with information other than binary connections. In this paper, however, the goal is to study the evolution of these activity networks both on a microscopic and macroscopic level. That is, they set out to determine how pairs of users interact (microscopic), and how the changing patterns of interaction on the activity network affect the overall structure of the network (macroscopic).

The researchers presented some very interesting findings. The study broke user pairs into two buckets: those which sent more than 5 wall posts in the first year of activity and those which sent less than 5. In the low activity case, they found that in most cases, the first wall post between user pairs was not triggered by the link creation. They found that interactions between pairs of users who interact infrequently are likely triggered by site mechanisms. Over 54% of the interactions between these infrequently interacting friends can be directly attributed to Facebook’s birthday reminders (something with which I have first hand experience).

In the high activity case, the median time to first interaction is only 6 days instead of 74 days in the low activity case. They also found that the activity level of user pairs tends to decrease over time, and that highly active user pairs exhibit this decreasing trend to a lesser degree. It was also surprising to discover that the activity network itself changes rapidly. Over the course of only one month 70% of links in the activity network disappear. Many graph theoretic metrics (average node degree, average clustering coefficient, average path length) also are relatively stable despite these rapid shifts in graph structure.

It is worth noting some of the measurement methodologies that this research study utilized. For one, they were only able to find data for users with public profiles and are part of the New Orleans regional network. It is worth noting that these two factors likely add some regional and personal bias to the data collected (for example requiring users to have public profiles might say something more substantial about a person i.e. they interact more or less on social networks). Another limitation is that this study only uses two measures for user interaction (friendship links and wall posts). Facebook, however, offers messaging, gaming, photos, pokes, likes, comments, and many more. While these surely are shortcomings, there is no direct reason to believe they significantly alter the data collected (or the conclusions made).

One experiment worth running is to attempt to find the increase in probability that two friends will interact in a given month after both friends become friends with the same new person in a short amount of time.

Here’s some MATLAB code and answers for step 5 of Phase II:

% Create the initial graph matrix A

A = sparse(63731,63731);

for i=1:length(F1)

if strcmp(Date(i), ‘\N’)

A(F1(i),F2(i)) = 1;

A(F2(i),F1(i)) = 1;

end

end

% Create the initial matrix B, a subset of A

B = sparse(6000,6000);

for i=1:length(B)

for j=1:length(B)

B(i,j) = A(i,j);

end

end

% Eigenvalues of B

e = eig(B);

% Edges of B, checked in 2 different ways

edges = sumEigsPower(e,2) / 2;

edges2 = 0;

for i=1:length(degreesInitial)

edges2 = edges2 + degreesInitial(i,1);

end

% Degree vector for B. Used to compute number of wedges.

degreesInitial = sparse(length(B),1);

for i=1:length(B)

for j=1:length(B)

degreesInitial(i,1) = degreesInitial(i,1) + B(i,j);

end

end

% Wedges of B

wedges = 0;

for i=1:length(degreesInitial)

if degreesInitial(i,1) > 1

wedges = wedges + nchoosek(degreesInitial(i,1),2);

end

end

% Triangles of B

triangles = sumEigsPower(e,3) / 6;

% Quadrangles of B

quadrangles = (sumEigsPower(e,4) – 4*wedges – edges) / 8;

% Not sure how to compute number of spoons…

spoons = 1;

% Guess on formula for computing number of pentagons…

pentagons = (sumEigsPower(e,5) – 10*spoons – 12*triangles) / 10;

% Takes a vector and a power, and sums the elements of the vector each

% raised to that power.

function sumEigsPower = sumEigsPower(vector,power)

sumEigsPower = 0;

B = vector.^power;

for i=1:length(B)

sumEigsPower = sumEigsPower + B(i);

end

end

———————————————————————————–

Number of relevant subgraphs for the initial graph B. The actual graph A was on ~63,000 nodes, which was too many for computation, so I used a subgraph B that contains the first 6,000 nodes of A.

Edges: 54,534

Wedges: 2,375,307

Triangles: 119,490

Quadrangles: 2,602,600

Spoons: ?

Pentagons: 98,861,000, before taking out 10*spoons…?

Here’s a review of the Leskovec’s paper on Microscopic Evolution:

In this study, the researchers used data from four real world networks, in which they knew the time each edge was introduced into the network. This allowed them to create a model to accurately produce these real world networks by comparing the likelihood of adding specific edges given the current state of the network. The optimal model’s parameters should allow the synthetic network to most similarly replicate the real world network’s behavior at every time step. They were able to find optimal parameters by using the maximum likelihood principle – they can compare the probability that such an edge in the real networks would be added under other models with known parameters.

Overall, they found that the preferential attachment model performed very well, and naturally accounted for most of the phenomena found in the network, most importantly the power law degree distribution of the nodes. However, the preferential attachment model does not produce edges according to the relative distance between nodes, which is the other most important characteristic of these real world networks. As mentioned, these real networks have as many as ~60% of edges added completing triangles in the graph. The likelihood of adding an edge based on increasing geodesic distance decreases exponentially. They found that this fact, along with the fact that the number of nodes within a certain distance increases exponentially, means that the likelihood of forming non-local connections decreases so much as double-exponentially.

They used a few different models to compare the likelihood of adding edges to optimize parameters. These models contained biases based on the degree of each node (as in preferential attachment), age of each node, number of common neighbors between nodes, time passed since addition of an edge to a node, and combinations of these parameters. Models considering the formation of triangles as a bias were also considered.

After the application of maximum likelihood, they formed a model that succeeded in very accurately replicating the evolution of real world networks. This model uses the fact that edge initiation closely follows the degree bias of preferential attachment, producing a power law with exponential cutoff. It also takes into account that edge destinations are most often local, and thus allows for many triangle creating edges – as apparent in the real networks. Finally, the model even considers how each node’s time between adding a new edge is relevant. With all of this information, they created an accurate model that parallels the nature of the evolution of real world networks.

Phase 2 answers by Tiffany Chan and Yunhee Park

https://docs.google.com/a/sas.upenn.edu/document/d/1pzYyTxG2UTmXPX9uvdAfZT6Q9L_bPIApy97Yrqd4RK4/edit

Hi, here is my summary and reflection on Burt’s article,

In his article, Burt outlines the mechanism by which brokerage provides social capital. He explains that brokerage becomes social capital through providing a vision of options otherwise unseen across the structural holes between groups. First, he hypothesizes that people who stand near the holes in a social structure are more likely to generate good ideas. Because opinion and behavior are more homogeneous within than between groups, those who are connected across groups are more capable of developing alternative ways of thinking and behaving. New ideas emerge from selection and syntheses across the structural holes between groups. These brokers are more likely to express ideas, less likely to have ideas dismissed, and more likely to have their ideas evaluated as valuable.

In addition, Burt explains that social capital occurs where people have an advantage due to their location in a social structure. Here, he brings in the concept of division of labor, in which people realize the incentives to integrate rather than specialize. People at great geographic distance can communicate with one another through surprisingly few intermediaries because of bridges between social worlds.

Through a comprehensive analysis of archival and survey data on several hundred managers in a large company, Burt identifies evidence of a vision advantage associated with brokerage. In his study, Burt presents numerous opportunities for brokerage and rewards managers who brokered connections across structural holes with compensation, positive performance evaluations, and propositions. The results evidently show that managers whose networks spanned structural holes are more likely to express an idea and to discuss it with colleagues, have the idea engaged by senior management, and have it judged valuable (p.386).

In conclusion, Burt elucidates that good ideas do, in fact, emerge from the interaction of social worlds. However, he states that these ideas spread in a way that would continue segregation between the worlds.

Test Comment

Hey guys,

Here are my thoughts on Burt’s article:

What is interesting about Burt’s article is that he believes that the key to innovation is not about creating a completely good idea, but recognizing the opportunity to re-use an ordinary idea from another group. This comes down to how the brokers utilize information gained from other groups. To become a broker in a network, one has to stand near the holes in a social structure so that he or she has a higher exposure to having good ideas. Simply speaking, a broker serves as a bridge connecting two clusters that have non-redundant information. On the other hand, a structural hole serves as a space between people one is connected to. There are two representations of a structural hole: cohesion and structural equivalence. Cluster A in figure 1 is an example of cohesion, where relationships within the group are more dense and stronger than between the groups. Structural equivalence, as shown in cluster D in figure 1, shows that the cluster has strong relations with another group. The benefits of the structural holes are they maximize the number of non-redundant contacts, and they can easily identify and draw one’s primary contacts from different social worlds.

In his experiment, Burt asked managers in an electronic company to come up with an idea to improve the supply chain. The managers were then asked to talk about whom they discussed the idea with, whom they discuss supply chain issues with in general, do those contacts discuss ideas with one another, etc. The results showed that people whose networks bridge structural holes have higher compensation, positive performance evaluations, more promotions, and more good ideas. As a result, these brokers are more likely to express ideas, less likely to have their ideas dismissed by judges, and more likely to have their ideas evaluated as valuable.

Brokerage provides social capital and this matters because networks are a resource in-themselves and having an access to these networks grants brokers the knowledge that other people may not have access to, and also the experience needed to translate information across groups. Moreover, this access provides the opportunity to explore good ideas since the brokers are constantly receiving new ideas from other clusters.

Tiffany