文献学习——Structure and Dynamics of Information Pathways in Online Media

a highly efficient stochastic version of NETRATE, called INFOPATH

Manuel Gomez-Rodriguez, Jure Leskovec, Bernhard Schölkopf:
Structure and dynamics of information pathways in online media. WSDM 2013: 23-32

ABSTRACT

	Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. 译文：信息的扩散、谣言的传播和传染病都是发生在潜在网络边缘的随机过程。 Many times networks over which contagions spread are unobserved, and such networks are often dynamic and change over time. 译文：很多时候，传染传播的网络是无法被观察到的，而这种网络往往是动态的，并随着时间的推移而变化。 In this paper, we investigate the problem of inferring dynamic networks based on information diffusion data.译文：本文研究了基于信息扩散数据的动态网络推断问题。 We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. 译文：我们假设存在一个随时间变化而不可观测的动态网络，同时我们观察一个动态过程在网络边缘扩展的结果。 The task then is to infer the edges and the dynamics of the underlying network. 译文：接下来的任务是推断出潜在网络的边缘和动态。

	We develop an on-line algorithm that relies on stochastic convex optimization to efficiently solve the dynamic network inference problem.译文：提出了一种基于随机凸优化的动态网络推理在线算法。 We apply our algorithm to information diffusion among 3.3 million mainstream media and blog sites and experiment with more than 179 million different pieces of information spreading over the network in a one year period. 译文：我们将我们的算法应用于330万主流媒体和博客网站的信息传播，并在一年的时间内实验了超过1.79亿不同的信息片段在网络上传播。 We study the evolution of information pathways in the online media space and find interesting insights. Information pathways for general recurrent topics are more stable across time than for on-going news events.译文：我们研究了在线媒体空间中信息路径的演变，并发现了有趣的见解。相对于持续发生的新闻事件，一般反复出现的主题的信息路径在时间上更稳定。 Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events.译文：成群的新闻媒体网站和博客经常在几天内出现和消失，因为正在发生的新闻事件。 Major social movements and events involving civil population, such as the Libyan’s civil war or Syria’s uprise, lead to an increased amount of information pathways among blogs as well as in the overall increase in the network centrality of blogs and social media sites.译文：涉及平民的重大社会运动和事件，如利比亚内战或叙利亚起义，导致博客之间的信息通道数量增加，博客和社交媒体网站的网络中心地位总体上升。

	Categories and Subject Descriptors: H.2.8 [Database Management]: Database applications—Data mining General Terms: Algorithms; Experimentation. Keywords: Networks of diffusion, Information cascades, Blogs, News media, Meme-tracking, Social networks 传播网络，信息瀑布，博客，新闻媒体，表情包追踪，社交网络

1. INTRODUCTION

We assume there is an unobserved dynamic network that changes over time, while we observe the node
infection times of many different contagions spreading over the edges of the network.

译文：我们假设存在一个随时间变化而不可观测的动态网络，同时我们观察到许多不同的传染在网络边缘扩散的节点感染时间。

The task then is to infer the edges and the dynamics of the underlying network.译文：接下来的任务是推断出潜在网络的边缘和动态。

	For example, in case of information diffusion, the contagion represents a piece of information [16, 18] and infection events correspond to times when nodes mention or copy the information from one of their neighbors in the network. 译文：例如，在信息扩散的情况下，传染病代表一条信息[16,18]，感染事件对应的是节点在网络中提及或复制邻居信息的次数。

	In the context of network diffusion, we often observe the temporal traces of diffusion while the pathways over which contagion spreads remain hidden.译文：在网络扩散的背景下，我们经常观察到扩散的时间痕迹，而传染传播的路径却被隐藏起来 In other words, we observe the times when each node gets infected by the contagion, but the edges of the network that gave rise to the diffusion remain unobservable.译文：换句话说，我们观察到每个节点被传染病感染的时间，但引起扩散的网络边缘仍然无法观察到。 For example, we can often measure and observe the time when people decide to adopt a new behavior while we do not explicitly observe which neighbor in the social network influenced them to do so.译文：例如，我们通常可以测量和观察人们决定采取一种新行为的时间，而我们不明确观察社会网络中的哪个邻居影响了他们这样做。 In case of information diffusion, we often observe people (or media sites) talking about a new piece of information without explicitly observing the path it took in the information diffusion network to reach the particular node of interest. 译文：在信息扩散的情况下，我们经常看到人们(或媒体网站)谈论一条新的信息，而没有明确地观察它在信息扩散网络中到达特定感兴趣节点所采取的路径。 And, epidemiologists often observe when a person gets sick but usually cannot tell who infected her.译文：而且，流行病学家经常观察一个人何时生病，但通常不能判断是谁感染了她。 In all these examples, one can observe the infection events themselves while not knowing over which edges of the network the contagions spread. 译文：在所有这些例子中，人们可以观察感染事件本身，而不知道传染在网络的哪个边缘传播。 Therefore, one of the fundamental research problems in the context of network diffusion is inferring the structure of networks over which various types of contagions spread [10]. 译文：因此，在网络扩散背景下的基础研究问题之一是推断各种类型的传染在网络上传播的结构。 Moreover, many times networks over which contagions diffuse are not static but change over time. Depending on the type of contagion, the time of the day, or death of the existing and birth of new nodes, the underlying network may dynamically change and shift over time. 译文：此外，传染传播的网络很多时候不是静态的，而是随时间而变化的。根据传染的类型，一天中的时间，或现有节点的死亡和新节点的诞生，潜在的网络可能会随着时间而动态变化和转移。

	In recent years, several network inference algorithms have been developed [9, 10, 12, 20, 24, 30].译文：近年来，人们发展了几种网络推理算法[9,10,12,20,24,30]。 Some approaches infer only the network structure [10, 30], while others infer not only the network structure but also the strength or the average latency of every edge in the network [9, 20].译文：有些方法仅推断网络结构[10,30]，有些方法不仅推断网络结构，还推断网络中每条边的强度或平均时延[9,20]。 However, to the best of our knowledge, previous work has always assumed networks to be static and contagion pathways to be constant over time. 译文：然而，据我们所知，之前的工作总是假设网络是静态的，传染途径是恒定的。 However, in most cases, networks are dynamic, and contagion pathways change over time, depending upon the contagions that propagate through them [22, 28].译文：然而，在大多数情况下，网络是动态的，传染途径随着时间的推移而改变，这取决于通过网络传播的传染[22,28]。 For example, a blog can increase its popularity abruptly after one of its posts turns viral, this may create new edges in the information transmission network and so the content the blog produces in the future will likely spread to larger parts of the network. Similarly, at any given time a particular unexpected event may occur and a topic or piece of news may become very popular for a limited period of time. 译文：例如，一个博客可以在它的一个帖子变成病毒后突然增加它的人气，这可能会在信息传播网络中创造新的边缘，因此博客的内容在未来可能会传播到更大的网络部分。类似地，在任何给定的时间，一个特定的意外事件可能会发生，一个话题或新闻可能会在有限的时间内变得非常受欢迎。 This again will lead to different emerging and vanishing information pathways, and thus to a time-varying underlying network. 译文：这将再次导致不同的出现和消失的信息路径，从而导致一个时变的底层网络。 In order to better understand these temporal changes, one needs to reconstruct the time-varying structure and underlying temporal dynamics of these networks and then study the information pathways of real-world events, topics or content. 译文：为了更好地理解这些时间变化，人们需要重建这些网络的时变结构和潜在的时间动态，然后研究现实世界事件、主题或内容的信息路径。

	Our approach to time-varying network inference. In this paper we investigate the problem of inferring dynamic networks based on information diffusion data. 译文：我们对时变网络推理的方法。本文研究了基于信息扩散数据的动态网络推断问题。 We assume there is an unobserved dynamic network that changes over time, while we observe the node infection times of many different contagions spreading over the edges of the network.译文：我们假设存在一个随时间变化而不可观测的动态网络，同时我们观察到许多不同的传染在网络边缘扩散的节点感染时间。 The task then is to infer the edges and the dynamics of the underlying network.译文：接下来的任务是推断出潜在网络的边缘和动态。 We develop an efficient on-line dynamic network inference algorithm, INFOPATH, that allows us to infer daily networks of information diffusion between online media sites over a one year period using more than 179 million different contagions diffusing over the underlying media network. 译文：我们开发了一种高效的在线动态网络推断算法INFOPATH，该算法允许我们利用底层媒体网络上超过1.79亿种不同的传染情况，推断一年时间内在线媒体站点之间的信息传播日常网络。

	We model diffusion processes as discrete networks of fully continuous temporal processes occurring at different rates building on our previous work [9, 11].译文：基于我们之前的工作[9,11]，我们将扩散过程建模为以不同速率发生的全连续时间过程的离散网络。 Our model allows information to propagate at different rates across different edges by adopting a datadriven approach, where only the recorded temporal diffusion events are used.译文：我们的模型允许信息在不同的边缘以不同的速率传播，采用数据驱动的方法，其中只使用记录的时间扩散事件。 The model considers the information which propagates through the network due only to diffusion, while ignoring any external sources [22]. However, our original diffusion model considered only static networks [9].译文：该模型只考虑了由于扩散而在网络中传播的信息，而忽略了任何外部源[22]。然而，我们原来的扩散模型只考虑静态网络[9]。 Here, we generalize the model and develop a new inference method to support dynamic networks.译文：在此，我们推广了该模型，并开发了一种新的支持动态网络的推理方法。 Our time-varying network inference algorithm, INFOPATH, uses stochastic gradient [26] to provide estimates of the time-varying structure and temporal dynamics of the inferred network. 译文：在此，我们推广了该模型，并开发了一种新的支持动态网络的推理方法。我们的时变网络推断算法INFOPATH使用随机梯度[26]来估计推断网络的时变结构和时间动态。 The framework enables us to study the temporal evolution of information pathways in the online media space. 译文：该框架使我们能够研究网络媒体空间中信息路径的时间演化。

	We apply the INFOPATH algorithm to synthetic as well as real Web information propagation data.译文：我们将INFOPATH算法应用于合成的以及真实的Web信息传播数据。 We study 179 million different information cascades spreading among 3.3 million blog and news media sites over a one year period, from March 2011 till February 2012.1 Results on synthetic data show INFOPATH is able to track changes in the topology of dynamic networks and provides accurate on-line estimates of the time-varying transmission rates of the edges of the network. 译文：我们研究了1年时间里在330万个博客和新闻媒体网站上传播的1.79亿种不同的信息级联，2.1综合数据结果表明，INFOPATH能够跟踪动态网络的拓扑变化，并对网络边缘的时变传输速率提供准确的在线估计。 INFOPATH is also robust across network topologies, and temporal trends of edge transmission dynamics.译文：INFOPATH在网络拓扑和边缘传输动态的时间趋势方面也很健壮。

	Experiments on large-scale real news and social media data lead to interesting insights and findings.译文：对大规模真实新闻和社交媒体数据的实验可以带来有趣的见解和发现。 For example, we find that the information pathways over which general recurrent topics propagate remain more stable over time, while unexpected events lead to dramatically changing information pathways.译文：例如，我们发现一般循环主题传播的信息路径会随着时间的推移保持更稳定，而意外事件会导致信息路径的急剧变化。 Clusters of mainstream news and blogs often emerge and vanish in a matter of days, and our on-line algorithm is able to uncover such structures.译文：主流新闻和博客的集群经常在几天内出现和消失，我们的在线算法能够发现这样的结构。 News events that involve large-scale social movements, as the Libyan civil war, Egypt’s revolution or Syria’s uprise, result in a greater increase in information transfer among blogs than among mainstream media.译文：涉及大规模社会运动的新闻事件，如利比亚内战、埃及革命或叙利亚起义，导致博客之间的信息传递比主流媒体之间的信息传递更多。 Perhaps surprisingly, the amount of mainstream media and blogs among the most influential nodes for most topics or news events are comparable.译文：也许令人惊讶的是，在大多数话题或新闻事件的最具影响力节点中，主流媒体和博客的数量是相当的。 However, we find that growing numbers of influential blogs on some topics or news events are often temporally correlated with large-scale social movements (e.g., the Occupy Wall Street movement in Sept-Nov 2011).译文：然而，我们发现，越来越多关于某些话题或新闻事件的有影响力的博客通常与大规模社会运动(例如，2011年9月至11月的占领华尔街运动)存在时间上的关联。

	Further related work. Previous methods for inferring diffusion networks [9, 10, 12, 20] also use a generative probabilistic model for modeling cascading processes over networks. NETINF [10] and MULTITREE [12] infer the network connectivity using submodular optimization. 译文：进一步的相关工作。先前推断扩散网络的方法[9,10,12,20]也使用生成概率模型来建模网络上的级联过程。NETINF [10] MULTITREE[12]使用子模块优化推断网络连通性。 NETRATE [9] and CONNIE [20] infer not only the network connectivity but also transmission rates of infection or prior probabilities of infection using convex optimization. Moreover, there have been also attempts to model information diffusion without assuming the existence of an underlying network [33, 32]. 译文：NETRATE[9]和CONNIE[20]不仅利用凸优化来推断网络的连通性，还利用凸优化来推断感染的传播率或感染的先验概率。此外，也有人试图在不假设潜在网络存在的情况下对信息扩散建模[33,32]。

	However, to the best of our knowledge, all previous approaches to network inference assume the network and the underlying dynamics of the edges to be constant, i.e., the network structure and the transmission rates of each edge do not change over time. 译文：然而，据我们所知，之前所有的网络推断方法都假设网络和边缘的潜在动态是恒定的，也就是说，网络结构和每条边缘的传输速率不会随时间而变化。 Therefore, they consider the pathways over which information propagates to be time-invariant. 译文：因此，他们认为信息传播的路径是时不变的。 The main contribution of this paper is to combine stochastic gradient and the diffusion model introduced in [9] to develop an efficient on-line network inference algorithm that provides time-varying estimates of the edges of a network and the transmission rates of each edge.译文：本文的主要贡献是将[9]中引入的随机梯度和扩散模型相结合，发展了一种高效的在线网络推理算法，该算法提供了网络边的时变估计和每条边的传输速率。 This allows us to detect how information pathways emerge and vanish over time, and identify when nodes produce highly viral content.译文：这使我们能够检测信息通路是如何随着时间的推移出现和消失的，并识别节点何时产生高度病毒性内容。

	The remainder of the paper is organized as follows: in Sec. 2, we revisit the model of diffusion and state the dynamic network inference problem. Section 3 describes the proposed time-varying network inference method, called INFOPATH. Section 4 evaluates INFOPATH quantitatively and qualitatively using synthetic and real diffusion data. We conclude with a discussion of results in Section 5. 译文：本文的其余部分组织如下:在第2节中，我们重新讨论扩散模型并说明动态网络推理问题。第3节描述了所提出的时变网络推理方法，称为INFOPATH。第四节评估INFOPATH定量和定性使用合成和真实扩散数据。最后，我们讨论了本节的结果.

	译文：各种边缘传输可能性模型

2. PROBLEM FORMULATION

	In this section, we build on our fully continuous time model of diffusion [9, 11]. We start by briefly describing the generative model for the observed data. We then revisit how to compute the likelihood of a cascade using the model and state the continuous time network inference problem for both static and dynamic networks. Across the section, we explicitly point out which assumptions of the original model need to be extended in order to support dynamic networks. 译文：在本节中，我们建立在完全连续时间扩散模型上[9,11]。我们首先简要描述观测数据的生成模型。然后我们重新讨论如何使用该模型计算级联的可能性，并说明静态和动态网络的连续时间网络推理问题。在这一节中，我们明确指出，为了支持动态网络，原始模型的哪些假设需要扩展。

	Observed data. For now let’s consider a single static directed network. Over the edges of the network multiple contagions propagate. As the contagion spreads from infected to non-infected nodes over the edges of the network the contagion creates a cascade. For each contagion c, we observe a cascade tc, which is simply a record of observed node infection times during a time window of length Tc. In an information propagation setting, each cascade corresponds to a different piece of information and the infection time of a node is simply the time when the node first mentioned the piece of information c. 观测数据。现在让我们考虑一个静态有向网络。在网络的边缘，多重传染病蔓延开来。当传染病在网络边缘从受感染的节点扩散到未受感染的节点时，它就会creates a cascade。对于每个感染c，我们观察到一个级联tc，它只是在tc长度的时间窗口内观察到的节点感染次数的记录。在信息传播设置中，每个级联对应于不同的信息片段，节点的感染时间简单地是节点第一次提到该片段的时间。
	work but we assume each contagion to propagate independently of each other. Given a set of node infection times of many different contagions, our goal is to infer the underlying dynamic network over which contagions propagated.译文：给定一组不同传染的节点感染时间，我们的目标是推断传染传播的潜在动态网络。 We apply the Maximum Likelihood principle in order to infer the network that most likely generated the observed data. We proceed by assuming a static network and describe the generative model of information diffusion. We then generalize the model to dynamic networks. 译文：我们应用最大似然原则来推断最有可能产生观测数据的网络。我们假设一个静态网络，并描述了信息扩散的生成模型。然后将该模型推广到动态网络中。

	Pairwise transmission likelihood. 成对传播的可能性 The first step in modeling diffusion dynamics is to consider pairwise node interaction. For every pair of nodes ( j; i), we define a pairwise transmission rate aj;i which models how frequently information spreads from node j to node i; the strength of an edge ( j; i). We pay attention to the rather general case of heterogeneous pairwise transmission rates, i.e., infections can occur at different transmission rates over different edges of the network. As aj;i ! 0 the expected transmission time from node j to node i becomes arbitrarily long. In contrast with the original model [9], we will later allow transmission rates aj;i to change over time. In particular, we will allow the transmission rates aj;i to change across cascades but not within a cascade. Allowing edge transmission rates to dynamically increase and decay over time will enable us to infer time-varying diffusion networks. 译文：我们关注的是相当普遍的异质成对传播率的情况，即，在网络的不同边缘上，感染可以以不同的传播率发生。译文：与原始模型[9]相比，我们稍后将允许传输速率aj;i随时间变化。特别地，我们将允许传输速率aj;i在级联之间改变，但在级联内部不改变。允许边缘传输速率随时间动态增加和衰减，将使我们能够推断出时变的扩散网络。

	有条件传播可能性的形状可能取决于传播发生的特定环境(信息、影响、疾病等)。在某些情况下，可以估计非参数似然值，而在其他情况下，可以使用专家知识来确定参数模型。为简单起见，我们考虑三种著名的边缘传输率参数模型:指数模型、幂律模型和瑞利模型，如表1所示。指数似然和幂律似然被用于社会和信息中的信息传播建模。

	似然函数边生存函数危险函数——瞬间感染函数

	Likelihood of a cascade.译文：级联似然 Consider some node i in a directed network. Node i can get infected by any of its parents (i.e., nodes pointing to i). Once infected, node i can then also spread the contagion to its children (i.e., nodes i points to). As in the independent cascade model [13], we assume that node gets infected once the first parent infects it (i.e., a node can get infected only once). Then, the likelihood of infection of node i at time ti given a collection of previously infected nodes (t1; : : : ; tNjtk ti) results from summing over the likelihoods of the mutually disjoint events that each node is the first parent that generated the infection event of our node i: 译文：在独立级联模型[13]中，我们假设一旦第一个父节点感染了它，节点就会被感染(即一个节点只能被感染一次)。 Perhaps surprisingly, our continuous time model of diffusion is a particular case of Aalen’s additive regression model, frequently used in survival theory analysis [3]. In Aalen’s model, the hazard function, or instantaneous infection rate, of node i is parametrized as ai;0(t)+a(t)Ti si(t), where a(t) is a vector that accounts for the effect of a collection of observable covariates s(t) and ai;0(t) is a baseline. It is easy to show that the hazard function of node i at time ti for the three pairwise transmission models: exponential, power-law and Rayleigh, has the following form: 译文：也许令人惊讶的是，我们的连续时间扩散模型是Aalen的加法回归模型的一个特殊情况，该模型经常用于生存理论分析[3]。在Aalen的模型中，节点i的危险函数或瞬时感染率被参数化为ai;0(t)+a(t)Ti si(t)，其中a(t)是一个向量，用来解释可观察协变量s(t)和ai的集合的影响;0(t)是一个基线。对于指数型、幂律型和瑞利型这三种成对传输模型，很容易证明节点i在ti时刻的危险函数为:

	Dynamic network inference problem.译文：动态网络推理问题

3. THE INFOPATH ALGORITHM

	The problem defined by equation Eq. 8 is convex for the three transmission models we consider. Therefore we can aim to find the unique optimal solution at any given time point t: 译文：对于我们所考虑的三种传输模型，方程Eq. 8定义的问题是凸的。因此，我们的目标是找到任意给定时间点t的唯一最优解: THEOREM 1 ([9]). Given log-concave survival functions and concave hazard functions in the parameter(s) of the pairwise transmission likelihoods, the network inference problem defined by equation Eq. 8 is convex in A.
	Stochastic gradient (SG) methods have been shown to be extremely successful for taking advantage of the structure exhibited by the optimization problem stated in Eq. 8. They have received increasing attention in the machine learning literature [4, 5, 7, 8, 29]. Although many optimization methods based on stochastic gradient descent have been proposed, we have found that in practice the basic projected stochastic gradient method [26] works well for our problem. Other more sophisticated methods, like the stochastic average gradient [29] or incremental average gradient [7] do not offer a significant advantage. Therefore, we proceed with the basic stochastic gradient method in the remainder of the paper.

	Evolution of node centrality. Having studied the dynamics of edges in the network we now move towards investigating the network centrality of blogs and mainstream media sites over time for different topics and world events. To measure network centrality of node S in the network at time t, we first compute shortest path length from S to any other node R in the network. Then centrality of node S is defined as åR 1=d(S;R), where d(S;R) is the shortest path length from S to R (if R is not reachable from S then d(S;R) = ¥). For networks with core-periphery structure, nodes with high centrality are typically located in the “central” core of the network. Figure 6 plots the percentage of blogs among the top 100 most central sites over time for eight different topics/events of 2011. Perhaps surprisingly, we observe there is a about the same number of mainstream media and blogs in the top-100 most central nodes for most networks – the number of blogs in the top-100 does not typically decreases below 30% or increases over 70%. For some topics, mainstream media are always more central (e.g., baseball and NBA in Figures 6(a, b)). In contrast, for other topics, blogs dominate mainstream media over a significant amounts of time (e.g., Gaddafi in Fig. 6(c)). Centrality of mainstream media and blogs can be relatively constant (Fig. 6(a,b)) or more time-varying (Fig. 6(c,h)). We find that a significant rise in the number of central blogs is often temporally correlated with an increasing social unrest (e.g., the Occupy Wall Street movement in Sept-Nov 2011 in Fig. 6(f)).

	Accuracy on real data. So far, we have used memes to trace the flow of information over theWeb and have made several qualitative observations about the structure and dynamics of information pathways in online media. We now proceed and attempt to also quantitatively evaluate INFOPATH on real data. In case of real data the ground-truth information diffusion network is impossible to obtain. However, we can use the temporal dynamics of hyperlinks created between news sites as a proxy for real information flow. Thus, by observing the times when sites create hyperlinks, our goal is to infer the ‘targets’ of the links (i.e., infer the hyperlink network from the hyperlinks times).

5. CONCLUSION

	All previous network inference algorithms have assumed diffusion networks to be static. Therefore, they have considered the pathways over which information propagates to be static over time. In contrast, we developed an algorithm for time-varying network inference, INFOPATH. Our algorithm provides on-line time-varying estimates of the edges of the network as well as the dynamic edge transmission rates, which allows us to detect how information pathways emerge and vanish over time. 以往的网络推理算法都假设扩散网络是静态的。因此，他们认为随着时间的推移，信息传播的途径是静态的。相比之下，我们开发了一种用于时变网络推理的算法，INFOPATH。我们的算法提供了网络边缘的在线时变估计以及动态边缘传输速率，这使我们能够检测信息路径是如何随着时间的推移出现和消失的。

	We evaluated our algorithm on synthetic data and demonstrated that INFOPATH successfully tracks changes in the topology of dynamic networks, provides accurate on-line estimates of the timevarying edge transmission rates and is also robust across network topologies, edge transmission models and patterns of evolution of edge transmission rates. 译文：我们在合成数据上评估了我们的算法，并证明了INFOPATH成功地跟踪动态网络拓扑的变化，提供了时变边缘传输速率的准确在线估计，而且在网络拓扑、边缘传输模型和边缘传输速率演化模式中也具有鲁棒性。

	We also run INFOPATH on real data and investigated how real networks and information pathways evolve over time. We found that information pathways over which general recurrent topics propagate remain relatively stable across time. In contrast, major realworld events lead to dramatic changes and shifts in the information pathways. We observed that clusters of mainstream news and blogs often emergence and vanish in matter of days. We discovered that there is an early greater increase in information transfer among blogs than among mainstream media for news involving general population and social unrest, such as the Libyan civil war, Egyptian revolution, Syria’s uprise and the Occupy Wall Street movement. 译文：我们还在真实的数据上运行INFOPATH，并研究了真实的网络和信息路径是如何随时间发展的。我们发现，一般复发主题传播的信息通路在时间上保持相对稳定。相反，现实世界的重大事件会导致信息路径的戏剧性变化和转变。我们观察到，主流新闻和博客常常在几天内出现或消失。我们发现，对于涉及大众的新闻，博客之间的信息传递比主流媒体之间的信息传递早得多。

	Our work also opens various venues for future work. For example, rigorous theoretical analysis of the convergence of our stochastic gradient descent method would provide further insights for its performance. Moreover, we notice that many times the changes in the inferred network structure could be attributed to sudden external real-world events. This opens two interesting questions. How can diffusion network inference be combined with methods for detecting external influence in networks [22]? And also, how can dynamic network inference be extended for detecting unexpected real-world events based on a stream of documents? Last, many times not only information but also sentiment attached to a piece of information spreads through the network [19]. It would be interesting to think about inference of signed networks, where a positive/ negative valence of an edge models sentiment relationship between a pair of nodes. Overall, such methods would allow us to improve our understanding of the current landscape of news coverage, the role that news media plays in framing the discussion of important topics, and the evolving ecosystem that news media occupies. 译文：我们的工作也为未来的工作提供了各种场所。例如，对随机梯度下降法的收敛性进行严格的理论分析将为其性能提供进一步的见解。此外，我们注意到，推断出的网络结构的变化很多时候可以归因于突发的外部现实事件。这引出了两个有趣的问题。如何将扩散网络推理与检测网络[22]外部影响的方法相结合?如何扩展动态网络推理来检测意外的真实网络。

redme.txt

	Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved and need to be inferred from the diffusion data. Moreover, such networks are often dynamic and change over time. 译文：信息的扩散、谣言的传播和传染病都是发生在潜在网络边缘的随机过程。很多时候，传染病传播的网络是无法观测到的，需要从扩散数据中推断出来。此外，这样的网络往往是动态的，并随着时间的推移而变化。

	We have developed an on-line algorithm, INFOPATH, that relies on stochastic gradient descent to efficiently infer dynamic networks based on information diffusion data. We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. The task then is to infer the edges and the dynamics of the underlying network. 译文：我们开发了一个在线算法INFOPATH，该算法依靠随机梯度下降来有效地推断基于信息扩散数据的动态网络。我们假设存在一个随时间变化而不可观测的动态网络，同时我们观察一个动态过程在网络边缘扩展的结果。接下来的任务是推断出潜在网络的边缘和动态。

	For more information about the procedure see: Structure and Dynamics of Information Pathways in On-line Media Manuel Gomez-Rodriguez, Jure Leskovec and Bernhard Schölkopf http://snap.stanford.edu/infopath/

	In order to compile on MacOS: 'make' OR 'make opt'. In order to compile in Linux: 'make linux' OR 'make opt_linux'. The code should also work in Windows but you will need to edit the Makefile. 'make opt' and 'make opt_linux' compile the optimized (fast) version of the code.

	Usage: Infer the network given a text file with cascades (nodes and timestamps): ./infer -i:cascades.txt All arguments are shown any time ./infer is run.

	Format input cascades: The cascades input file should have two blocks separated by a blank line. - A first block with a line per node. The format of every line is <id>,<name> - A second block with a line per cascade. The format of every line is <cascade id>;<id>,<timestamp>,<id>,<timestamp>,<id>,<timestamp>...

	Additional Tool: In addition, generate_nets is also provided. It allows to build time-varying Kronecker and Forest-Fire networks and generate cascades with exponential, powerlaw and rayleigh transmission models. Please, run without any argument to see how to use them. 译文：此外，还提供了generate_nets。它允许建立时变的克罗内克网络和森林火灾网络，并生成具有指数、幂律和瑞利传输模型的级联。

posted on 2021-08-10 10:59 海阔凭鱼跃越阅读(92) 评论(0) 编辑收藏举报

刷新页面返回顶部

文献学习——Structure and Dynamics of Information Pathways in Online Media

公告