a highly efficient stochastic version of NETRATE, called INFOPATH
- Manuel Gomez-Rodriguez, Jure Leskovec, Bernhard Schölkopf:
Structure and dynamics of information pathways in online media. WSDM 2013: 23-32
ABSTRACT
Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. 译文:信息的扩散、谣言的传播和传染病都是发生在潜在网络边缘的随机过程。
Many times networks over which contagions spread are unobserved, and such networks are often dynamic and change over time. 译文:很多时候,传染传播的网络是无法被观察到的,而这种网络往往是动态的,并随着时间的推移而变化。
In this paper, we investigate the problem of inferring dynamic networks based on information diffusion data.译文:本文研究了基于信息扩散数据的动态网络推断问题。
We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. 译文:我们假设存在一个随时间变化而不可观测的动态网络,同时我们观察一个动态过程在网络边缘扩展的结果。
The task then is to infer the edges and the dynamics of the underlying network. 译文:接下来的任务是推断出潜在网络的边缘和动态。 |
|
We develop an on-line algorithm that relies on stochastic convex We apply our algorithm to information diffusion among 3.3 million mainstream media and blog sites and experiment with more than 179 million different pieces of information spreading over the network in a one year period. 译文:我们将我们的算法应用于330万主流媒体和博客网站的信息传播,并在一年的时间内实验了超过1.79亿不同的信息片段在网络上传播。 We study the evolution of information pathways in the online media space and find interesting insights. Information pathways for general recurrent topics are more stable across time than for on-going news events.译文:我们研究了在线媒体空间中信息路径的演变,并发现了有趣的见解。相对于持续发生的新闻事件,一般反复出现的主题的信息路径在时间上更稳定。 Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events.译文:成群的新闻媒体网站和博客经常在几天内出现和消失,因为正在发生的新闻事件。 Major social movements and events |
|
Categories and Subject Descriptors: H.2.8 [Database Management]: 传播网络,信息瀑布,博客,新闻媒体,表情包追踪,社交网络 |
|
1. INTRODUCTION
We assume there is an unobserved dynamic network that changes over time, while we observe the node
infection times of many different contagions spreading over the edges of the network.
译文:我们假设存在一个随时间变化而不可观测的动态网络,同时我们观察到许多不同的传染在网络边缘扩散的节点感染时间。
The task then is to infer the edges and the dynamics of the underlying network.译文:接下来的任务是推断出潜在网络的边缘和动态。
For example, in case of information diffusion, the contagion represents a piece of information [16, 18] and 译文:例如,在信息扩散的情况下,传染病代表一条信息[16,18],感染事件对应的是节点在网络中提及或复制邻居信息的次数。 |
|
In the context of network diffusion, we often observe the temporal
In other words, we observe the times when each node gets infected by the contagion, but the edges of the network that gave rise to the diffusion remain unobservable.译文:换句话说,我们观察到每个节点被传染病感染的时间,但引起扩散的网络边缘仍然无法观察到。
For example, we can often measure and observe the time when people decide to adopt a new behavior while we do not explicitly observe which neighbor in the social network influenced them to do so.译文:例如,我们通常可以测量和观察人们决定采取一种新行为的时间,而我们不明确观察社会网络中的哪个邻居影响了他们这样做。 In case of information diffusion, we often observe people (or media sites) talking about a new piece of information without explicitly observing the path it took in the information diffusion network to reach the particular node of interest. 译文:在信息扩散的情况下,我们经常看到人们(或媒体网站)谈论一条新的信息,而没有明确地观察它在信息扩散网络中到达特定感兴趣节点所采取的路径。
And, epidemiologists often observe when a person gets sick but usually cannot tell who infected her.译文:而且,流行病学家经常观察一个人何时生病,但通常不能判断是谁感染了她。 In all these examples, one can observe the infection events themselves while not knowing over which edges of the network the contagions spread. 译文:在所有这些例子中,人们可以观察感染事件本身,而不知道传染在网络的哪个边缘传播。
Therefore, one of the fundamental research problems in the context of network diffusion is inferring the structure of networks over which various types of contagions spread [10]. 译文:因此,在网络扩散背景下的基础研究问题之一是推断各种类型的传染在网络上传播的结构。 Moreover, many times networks over which contagions 译文:此外,传染传播的网络很多时候不是静态的,而是随时间而变化的。根据传染的类型,一天中的时间,或现有节点的死亡和新节点的诞生,潜在的网络可能会随着时间而动态变化和转移。 |
|
In recent years, several network inference algorithms have been developed [9, 10, 12, 20, 24, 30].译文:近年来,人们发展了几种网络推理算法[9,10,12,20,24,30]。 Some approaches infer only the network structure [10, 30], while others infer not only the network
However, to the best of our knowledge, previous work has always assumed networks to be static and contagion However, in most cases, networks are dynamic, and contagion pathways change over time, depending upon the contagions that propagate through them [22, 28].译文:然而,在大多数情况下,网络是动态的,传染途径随着时间的推移而改变,这取决于通过网络传播的传染[22,28]。
For example, a blog can increase its popularity abruptly after This again will lead to different emerging and vanishing information pathways, and thus to a time-varying underlying network. 译文:这将再次导致不同的出现和消失的信息路径,从而导致一个时变的底层网络。
In order to better understand these temporal changes, one needs to reconstruct the time-varying structure and underlying temporal dynamics of these networks and then study the information pathways of real-world events, topics or content. 译文:为了更好地理解这些时间变化,人们需要重建这些网络的时变结构和潜在的时间动态,然后研究现实世界事件、主题或内容的信息路径。 |
|
Our approach to time-varying network inference. In this paper we investigate the problem of inferring dynamic networks based on information diffusion data. 译文:我们对时变网络推理的方法。本文研究了基于信息扩散数据的动态网络推断问题。
We assume there is an unobserved dynamic network that changes over time, while we observe the node The task then is to infer the edges and the dynamics of the underlying network.译文:接下来的任务是推断出潜在网络的边缘和动态。
We develop an efficient on-line dynamic network inference algorithm, INFOPATH, that allows us to infer daily networks of information diffusion between online media sites over a one year period using more than 179 million different contagions diffusing over the underlying media network. 译文:我们开发了一种高效的在线动态网络推断算法INFOPATH,该算法允许我们利用底层媒体网络上超过1.79亿种不同的传染情况,推断一年时间内在线媒体站点之间的信息传播日常网络。 |
|
We model diffusion processes as discrete networks of fully continuous Our model allows information to propagate The model considers the information which propagates
Here, we generalize the model and develop a new inference method to support dynamic networks.译文:在此,我们推广了该模型,并开发了一种新的支持动态网络的推理方法。
译文:在此,我们推广了该模型,并开发了一种新的支持动态网络的推理方法。我们的时变网络推断算法INFOPATH使用随机梯度[26]来估计推断网络的时变结构和时间动态。 The framework enables us to study the temporal evolution of information pathways in the online media space. 译文:该框架使我们能够研究网络媒体空间中信息路径的时间演化。 |
|
We apply the INFOPATH algorithm to synthetic as well as real Web information propagation data.译文:我们将INFOPATH算法应用于合成的以及真实的Web信息传播数据。
We study 179 million different information cascades spreading among 3.3 million blog and news media sites over a one year period, from March 2011 till February 2012.1 Results on synthetic data show INFOPATH is able to track changes in the topology of dynamic networks and provides accurate on-line estimates of the time-varying transmission rates of the edges of the network. 译文:我们研究了1年时间里在330万个博客和新闻媒体网站上传播的1.79亿种不同的信息级联,2.1综合数据结果表明,INFOPATH能够跟踪动态网络的拓扑变化,并对网络边缘的时变传输速率提供准确的在线估计。 INFOPATH is also robust across network |
|
Experiments on large-scale real news and social media data lead to interesting insights and findings.译文:对大规模真实新闻和社交媒体数据的实验可以带来有趣的见解和发现。 For example, we find that the information pathways over which general recurrent topics propagate remain more stable over time, while unexpected events lead to dramatically changing information pathways.译文:例如,我们发现一般循环主题传播的信息路径会随着时间的推移保持更稳定,而意外事件会导致信息路径的急剧变化。 Clusters of mainstream news and blogs often emerge and vanish in a matter of days, and our on-line algorithm is able to uncover such structures.译文:主流新闻和博客的集群经常在几天内出现和消失,我们的在线算法能够发现这样的结构。 News events that involve large-scale social movements, as the Libyan civil war, Egypt’s revolution or Syria’s uprise, result in a greater increase in information transfer among blogs than among mainstream media.译文:涉及大规模社会运动的新闻事件,如利比亚内战、埃及革命或叙利亚起义,导致博客之间的信息传递比主流媒体之间的信息传递更多。 Perhaps surprisingly, the amount of mainstream media and blogs among the most influential nodes for most topics or news events are comparable.译文:也许令人惊讶的是,在大多数话题或新闻事件的最具影响力节点中,主流媒体和博客的数量是相当的。 However, we find that growing numbers of influential blogs on some topics or news events are often temporally correlated with large-scale social movements (e.g., the Occupy Wall Street movement in Sept-Nov 2011).译文:然而,我们发现,越来越多关于某些话题或新闻事件的有影响力的博客通常与大规模社会运动(例如,2011年9月至11月的占领华尔街运动)存在时间上的关联。 |
|
Further related work.
Previous methods for inferring diffusion 译文: MULTITREE[12]使用子模块优化推断网络连通性。
NETRATE [9] and CONNIE [20] infer not only 译文:NETRATE[9]和CONNIE[20]不仅利用凸优化来推断网络的连通性,还利用凸优化来推断感染的传播率或感染的先验概率。此外,也有人试图在不假设潜在网络存在的情况下对信息扩散建模[33,32]。 |
|
However, to the best of our knowledge, all previous approaches 译文:然而,据我们所知,之前所有的网络推断方法都假设网络和边缘的潜在动态是恒定的,也就是说,网络结构和每条边缘的传输速率不会随时间而变化。
Therefore, they consider the pathways over which information propagates to be time-invariant. 译文:因此,他们认为信息传播的路径是时不变的。
The main contribution of this paper is
This allows us to detect how information pathways emerge and vanish over time, and identify
|
|
The remainder of the paper is organized as follows: in Sec. 2, we revisit the model of diffusion and state the dynamic network inference problem. Section 3 describes the proposed time-varying network inference method, called INFOPATH. Section 4 evaluates INFOPATH quantitatively and qualitatively using synthetic and real diffusion data. We conclude with a discussion of results in Section 5. 译文:本文的其余部分组织如下:在第2节中,我们重新讨论扩散模型并说明动态网络推理问题。第3节描述了所提出的时变网络推理方法,称为INFOPATH。第四节评估INFOPATH定量和定性使用合成和真实扩散数据。最后,我们讨论了本节的结果. |
|
译文:各种边缘传输可能性模型 |
|
2. PROBLEM FORMULATION
In this section, we build on our fully continuous time model 译文: |
|
Observed data. For now let’s consider a single static directed network. 观测数据。现在让我们考虑一个静态有向网络。 在网络的边缘,多重传染病蔓延开来。 当传染病在网络边缘从受感染的节点扩散到未受感染的节点时,它就会creates a cascade。 对于每个感染c,我们观察到一个级联tc,它只是在tc长度的时间窗口内观察到的节点感染次数的记录。在信息传播设置中,每个级联对应于不同的信息片段,节点的感染时间简单地是节点第一次提到该片段的时间。 |
|
work but we assume each contagion to propagate independently of
We apply the Maximum Likelihood principle 译文:我们应用最大似然原则来推断最有可能产生观测数据的网络。我们假设一个静态网络,并描述了信息扩散的生成模型。然后将该模型推广到动态网络中。 |
|
Pairwise transmission likelihood. 成对传播的可能性
The first step in modeling diffusion 译文:我们关注的是相当普遍的异质成对传播率的情况,即,在网络的不同边缘上,感染可以以不同的传播率发生。 译文: 允许边缘传输速率随时间动态增加和衰减,将使我们能够推断出时变的扩散网络。 |
|
有条件传播可能性的形状可能取决于传播发生的特定环境(信息、影响、疾病等)。在某些情况下,可以估计非参数似然值,而在其他情况下,可以使用专家知识来确定参数模型。为简单起见,我们考虑三种著名的边缘传输率参数模型:指数模型、幂律模型和瑞利模型,如表1所示。指数似然和幂律似然被用于社会和信息中的信息传播建模。
|
|
似然函数 边生存函数 危险函数——瞬间感染函数
|
|
Likelihood of a cascade.译文:级联似然 Consider some node i in a directed network. 译文:在独立级联模型[13]中,我们假设一旦第一个父节点感染了它,节点就会被感染(即一个节点只能被感染一次)。
Perhaps surprisingly, our continuous time model of diffusion is
译文:也许令人惊讶的是,我们的连续时间扩散模型是Aalen的加法回归模型的一个特殊情况,该模型经常用于生存理论分析[3]。在Aalen的模型中,节点i的危险函数或瞬时感染率被参数化为ai;0(t)+a(t)Ti si(t),其中a(t)是一个向量,用来解释可观察协变量s(t)和ai的集合的影响;0(t)是一个基线。对于指数型、幂律型和瑞利型这三种成对传输模型,很容易证明节点i在ti时刻的危险函数为:
|
|
Dynamic network inference problem.译文:动态网络推理问题
|
|
3. THE INFOPATH ALGORITHM
The problem defined by equation Eq. 8 is convex for the three 译文:对于我们所考虑的三种传输模型,方程Eq. 8定义的问题是凸的。因此,我们的目标是找到任意给定时间点t的唯一最优解:
THEOREM 1 ([9]). Given log-concave survival functions and
|
|
Stochastic gradient (SG) methods have been shown to be extremely the basic projected stochastic gradient method [26] works well for
|
|
|
|
Evolution of node centrality.
Having studied the dynamics of |
|
Accuracy on real data. So far, we have used memes to trace the |
|
5. CONCLUSION
All previous network inference algorithms have assumed diffusion
以往的网络推理算法都假设扩散网络是静态的。因此,他们认为随着时间的推移,信息传播的途径是静态的。 相比之下,我们开发了一种用于时变网络推理的算法,INFOPATH。我们的算法提供了网络边缘的在线时变估计以及动态边缘传输速率,这使我们能够检测信息路径是如何随着时间的推移出现和消失的。 |
|
We evaluated our algorithm on synthetic data and demonstrated 译文:我们在合成数据上评估了我们的算法,并证明了INFOPATH成功地跟踪动态网络拓扑的变化,提供了时变边缘传输速率的准确在线估计,而且在网络拓扑、边缘传输模型和边缘传输速率演化模式中也具有鲁棒性。 |
|
We also run INFOPATH on real data and investigated how real 译文:我们还在真实的数据上运行INFOPATH,并研究了真实的网络和信息路径是如何随时间发展的。我们发现,一般复发主题传播的信息通路在时间上保持相对稳定。相反,现实世界的重大事件会导致信息路径的戏剧性变化和转变。我们观察到,主流新闻和博客常常在几天内出现或消失。我们发现,对于涉及大众的新闻,博客之间的信息传递比主流媒体之间的信息传递早得多。 |
|
Our work also opens various venues for future work. For example,
译文:我们的工作也为未来的工作提供了各种场所。例如,对随机梯度下降法的收敛性进行严格的理论分析将为其性能提供进一步的见解。此外,我们注意到,推断出的网络结构的变化很多时候可以归因于突发的外部现实事件。这引出了两个有趣的问题。如何将扩散网络推理与检测网络[22]外部影响的方法相结合?如何扩展动态网络推理来检测意外的真实网络。 |
|
redme.txt
Diffusion of information, spread of rumors and infectious diseases are all 译文:信息的扩散、谣言的传播和传染病都是发生在潜在网络边缘的随机过程。很多时候,传染病传播的网络是无法观测到的,需要从扩散数据中推断出来。此外,这样的网络往往是动态的,并随着时间的推移而变化。 |
|
We have developed an on-line algorithm, INFOPATH, that relies on stochastic gradient 译文:我们开发了一个在线算法INFOPATH,该算法依靠随机梯度下降来有效地推断基于信息扩散数据的动态网络。我们假设存在一个随时间变化而不可观测的动态网络,同时我们观察一个动态过程在网络边缘扩展的结果。接下来的任务是推断出潜在网络的边缘和动态。 |
|
For more information about the procedure see: |
|
In order to compile on MacOS: 'make' OR 'make opt'. |
|
Usage: Infer the network given a text file with cascades (nodes and timestamps): ./infer -i:cascades.txt All arguments are shown any time ./infer is run. |
|
Format input cascades: The cascades input file should have two blocks separated by a blank line. |
|
Additional Tool: In addition, generate_nets is also provided. It allows to build time-varying Kronecker and Forest-Fire networks and generate cascades with exponential, powerlaw and rayleigh transmission models. Please, run without any argument to see how to use them. 译文:此外,还提供了generate_nets。它允许建立时变的克罗内克网络和森林火灾网络,并生成具有指数、幂律和瑞利传输模型的级联。 |
|