首页资源分类其它科学普及 > 复杂网络分析

复杂网络分析

已有 453361个资源

下载专区

上传者其他资源

文档信息举报收藏

标    签: 复杂网络

分    享:

文档简介

复杂网络(Complex Network),具有自组织、自相似、吸引子、小世界、无标度中部分或全部性质的网络称为复杂网络。用数学的语言来说,就是一个有着足够复杂的拓扑结构特征的图。复杂网络具有简单网络,如晶格网络、随机图等结构所不具备的特 …

文档预览

Dynamic Networks Kathleen M. Carley kathleen.carley@cs.cmu.edu Center for Computational Analysis of Social and Organizational Systems http://www.casos.cs.cmu.edu/ Overview • Networks • Common metrics • Dynamics • Spatial • Where does data come from April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 2 Network Science - Connecting the Dots and Trails to Predict and Explain Behavior Identify Groups / Threats Using Network and Geo-Spatial Information How does space – constrain and enable networks April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 3 1 What is a Network? Ties Between Nodes (links) • Who do you like or respect? • Transfer of resources • Authority lines • Association or affiliation • Alliance • Substitution • Precedence • Proximity Nodes • People • Units of action • Coalition partners • Departments • Resources • Ideas or Skills • Events • Nation-states Networks are ubiquitous April 2012 4 Internet Example AOL MICRO SOFT YAHOO MICRO AT&T Some Nodes Stand Out April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 5 What is Dynamic Network Analysis? • The study of how entities are constrained and enabled by the relations among them and the process that lead to change in these relations • Combines social networks analysis, link analysis, multi-agent modeling, machine learning, graph theory, and non-parametric statistics • Complex Meta-Networks: multiple networks, multiple types of nodes, multiple relations • Key Issues: Scalability, Robustness, Flexibility, Error – Relations among nodes are flexible and vary in strength and certainty – Node membership may be questionable – Networks may be large 106 nodes – Classes of data may not be discoverable April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 6 2 Dynamic Network Analysis • The Network Perspective – It’s not just the elements (composition) of a system, but how they are put together – non-reductionist, holistic • What are networks and how do you analyze them? • Social Network Analysis, Link Analysis, Network Text Analysis, Dynamic Network Analysis • Network Elites • Groups and clustering • Consensus and networks • Network Topology • Compare and contrast networks • Network dynamics • Network Visualization April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 7 Where are Dynamic Network Analysis Models Used • Designing adaptive teams for Command and Control • Evaluating organizational structures and evaluating changes such as downsizing – E.g., hospitals, health departments NY Dutchess County • Estimating effectiveness and adaptability of new structures – E.g., SSG – Comcargru, Army Unit of Action, CPOF (IRAQ) • Estimating size, shape and vulnerabilities in organizational designs and covert networks – E.g., NASA, Counter-terrorism, drug, terrorist, tax-avoiders • Network management and IT intervention/effectiveness analysis – E.g., NASA, Knowledge Wall in JTF, supply chains, various companies • Impact analysis of actions in asymmetric warfare situations • Impact on cities of weaponized biological or chemical attacks • Identifying key actors and emergent groups – E.g., Counter terrorism, Health Units, Merchant Marine • Prevention and intervention – E.g., IRS tax avoidance interventions April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 8 Connect & Dis-Connect the Dots! Degree 1 0.417 Mohamed Atta 2 0.389 Marwan AlShehhi 3 0.278 Hani Hanjour 4 0.278 Nawaf Alhazmi Betweenness 0.334 Nawaf Alhazm 0.318 Mohamed Atta 0.227 Hani Hanjour 0.158 Marwan AlShehhi Closeness 0.571 Mohamed Atta 0.537 Nawaf Alhazmi 0.507 Hani Hanjour 0.500 Marwan AlShehhi Standard Social Network Measures April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 3 So – why is this hard? • The Network – Vast quantities of data – Multi-mode – people, events, etc. – Multi-plex – many connections e.g. financial and authority • The Information – Intentional misinformation – e.g., aliases – Inaccurate information – e.g., typos – Out-of-date information – Incomplete information • Dynamic – Learning – Recruitment – Attrition –… April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Typical Mainstream Data Structure Variables (attributes) Cases (individuals) Age GendEducation Income State 1001 28 0 < highschool 28,000 PA 1002 35 1 highschool 54,000 MA 1003 26 0 < highschool 26,000 MA 1004 40 1 Bachelors 65,000 MA 1005 24 0 highschool 27,500 PA 1006 55 1 Ph.D. 82,800 PA 1007 31 1 Ph.D. 73,000 MA 1008 M 0 highschool 33,500 PA … Analysis consists of correlating attributes, regression, anova … April 2012 11 The Network Perspective Standard Statistics Social Network Dynamic Analysis Network Analysis Attributes Atomistic Actors as independent Relations Interdependence Actors constrained and enabled by links Relations + Attributes Actors constrained and enabled by links PA LA 1 5 GA 2 GA LA 1 TX 4 LA 1 1 FL 2 FL 1 LA 2 LA 3 Actor state Actor state Actor state matters irrelevant impacts perception of NY 10 and use of links NY LA LA 9 NY NY 12 2 8 3 NY 5 LA 7 LA NY 0 4 NY 6 NY 13 19 NY Discovery of HIV: Sexual contacts among SF NY 9 NY 1 15 11 gay men w/ unusual cancers, traced by Bill NY NY 21 8 NY NY Darrow of the CDC 14 NJ NY 1 NY 17 18 1 NY NY NY7 22 16 NY 20 NY 6 April 2012 12 4 Illustrative Networks High School Dating Physicist Collaborations Contagion of TB Fresh Water Food Web Sexual Contacts The Internet Topic Network (Email) Email Profile al Qaida 2004 Nodes have attributes that matter April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 13 Haitian SMS Phone Co-Occurrence Network April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU London Riots User Network April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 5 Informal and Formal Structure Each person is embedded in many networks April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 16 Normal Person Drug Networks Cocaine User Family Work Friend People with Different Roles have Different Networks April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 17 Amygdala? • Controls processing and memory of emotional reactions • Suggests building ties is an “emotional” act • Since emotion after an event increases the chance the event will be remembered, recall of alters is higher if – Interaction is repeated (chance) – Interaction is emotional Joy Sadness Trust Disgust Fear Anger Surprise Anticipation April 2012 From WikiPedia 6 The Size of Your Network Depends on the Size of Your Brain April 2012 Bickart, Wright, Dautoff, Dickerson and Barrett, 2010. “Amygdala volume and social network size in humans.” Nature Neuroscience Dunbar’s Number 150 (well, actually 147.8) April 2012 • Hypothesized in 1993 – the number of individuals that one person could follow based on extrapolations of neocortex size • Wide variance – the original 95% CI was between ~100 and ~230, and was due to an extrapolation “well beyond” those known for primates adapted from Dunbar 1993 How Many People Do You Know? April 2012 3-5 ~15 ~50 ~150 ~500 ~2000 adapted from Zhou et al 2005 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 7 In other words • Network topology matters – Networks are not random – All networks are not scale free • Informal and formal networks differ • Individual’s ego network – Constrains and enables behavior – Individual differences result in difference in ego net composition – Ties are emotion + frequency – Psycho physical constraints impose limits • Overall size – Coordination constraints impose limits – Vary by media April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 22 Social Networks are Linked to Other Networks WHAT: Tasks Events HOW: Resources, Knowledge WHO: People, Teams Organizations April 2012 WHERE: Location WHY: Beliefs 50 45 40 35 30 25 20 15 10 5 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 Baseline Without Top Conservative Without Top Liberal Meta-Network: multi-mode, multi-plex, multi-level People People Social Network Organizat ions Affiliation Network Expertise Knowledge Network Activities Assignment Network Events Participatio n Network Locations Presence Network Organizati ons Expertise Activities Events Organizat ional Network Capability Network Informatio n Network Action Network Needs Network Precedence Network Participatio n Network Contributin g expertise Network Contributin g Acivity Network Precedence Network Presence Network Availability Network Happening s Network Happening s Network Locations April 2012 Border Network 24 8 WEB SCRAPER Build TDNexeattwMtoianrkint-go Model Process Data Analyze – Collection Statistics SNA, DNA, Link Analysis Assess Change, What if Analysis – Multi-agent DNA April 2012 Meta-Network Name of Indi vidual Abdu l Ra h ma n Yas in Me ta-matrix Enti ty Ag en t Kn ow- le dge chemicals Re s ource chemicals Abu Abbas Huss ein mas termindi ng Hi s ham Al Hus s ei n school p hone, bomb Abu Madja p hone Hams iraji Al i Abdu ra ja k J a n ja l a n i Hams iraji Al i J amal M ohammad K h al if a, Os ama bin Laden Saddam Huss ein M u waf ak al -Ani bus ines s card p hone $20, 000 bomb Task-Eve nt Organiz ati Location on b om b, Al Q aeda Wo r ld T rade Cent er Dy ing, Green Achille Beret s Lauro cruise s hip hijackin I raq Baghdad Role op erat ive t er ro ri st M anila, s econd Z amboanga s ecretary Abu Say y af, Qaeda Abu Say y af, Qaeda Philip p ine Al Philip p ine Al leader leader Attri bute 26 - Feb -9 3 pales tinian 1985 2000 February 13, 2003, Oct ober 3, 2002 1980s Abu Say y af, Ir aq is Basilan commander br o t h er -i nlaw Philip p ines , terrorist s, M anila dip lomat Ir aq i 1991 DyNetML Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 25 *ORA: From Networks to Patterns ORA: a DNA statistical analysis tool for locating patterns and identifying vulnerabilities • Organized by function not measure; e.g., ••• • • • • Key Entity Report Group Locator Report Import/Export tools Linkage to mysql Visualization components Batch, web, thick-client Can handle large 106 networks quickly April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 26 Internet Alliances Microsoft AOL AT&T April 2012 Yahoo 9 Network of Critical Actors April 2012 Identify: Who are the Key Players? Drilling down… *ORA’s Key Entity Report shows 3 agents critical to operations. ════════ Narrow our focus from set of interstitial members to small group of leaders. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Network Analysis Enables Management and Disruption A vulnerability to exploit! This guy needs help! April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 30 10 Centralities • Degree Centrality – Node with the most connections • Betweenness Centrality – Node in the most best paths • Requires symmetric data • Eigenvector Centrality – Node connected best overall • Doesn’t work if there are components • Closeness Centrality – Node that is closest to all other nodes Issue: Measures are highly correlated April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Who Is “Key” ? April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 32 Who Is “Key” ? BeBtewtweeenennensesss CCeenntrtaraliltiyty April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 33 11 Betweenness Centrality • Frequency with which a node lies along the shortest path between two other nodes • Computed as: where gij is number of geodesic paths from i to j and gikj is number of those paths that pass through k • Index of potential for gate-keeping, brokering, controlling the flow, and also of liaising otherwise separate parts of the network • Interpreted as indicating power and access to diversity of what flows; potential for synthesizing • Sometimes interpreted as “connecting” groups • Very “expensive” to compute April 2012 34 Closeness Centrality • Measured as: – Sum of distances to all other nodes – Computed as marginals of symmetric geodesic distance matrix • Closeness is an inverse measure of centrality • Index of expected time until arrival for given node of whatever is flowing through the network – Gossip network: central player hears things first April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 35 Eigenvector Centrality • A node will have a high score if it is connected to many nodes that are themselves highly connected • Computed as: where A is adjacency network and V is eigenvector centrality. V is the principal eigenvector of A • Indicator of popularity and group-bonding • Like degree, this is an index of exposure, risk • Tends to identify centers of large cliques • Often identified as leader of self-contained group, sometimes referred to as leader of leaders • Very “expensive” to compute Adapted from Steve Borgatti 2004 April 2012 36 12 Cutpoints • Nodes which, if deleted, would disconnect net Bob Bonnie Biff Bill Betty Betsy April 2012 © StevCeopByroigrhgta©ttKi a2th0l0ee4n M. Carley, CASOS, ISR, SCS, CMU 37 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 38 Moving Beyond Single Measures Issue: Centrality Measures are highly correlated Betweenness A Bridge! Sink? Or Source? Degree April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 39 13 SNA Insufficient • Centralities – Communication • Degree – most connected • Betweenness – most paths • Exclusivities – Expertise • Knowledge – special expertise • Task – special experience • Demands/Loads – Roles • Cognitive demand – emergent leader • Workload April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 40 Meta-Network KEY ACTORS Degree Centrality in the know High Betweenness and not Degree connects groups Cognitive Demand emergent leader Task exclusivity critical ability Eigenvector central core Betweenness many paths Resource exclusivity Mobilize resources Knowledge exclusivity Mobilize info April 2012 41 Assignment Network Assignment Redundancy • Average number of redundant agents assigned to tasks. An agent is redundant if there is already an agent assigned to the task. • Redundancy occurs only when more than one agent is assigned to a task. Define the assignment redundancy for task j as follows:, d j max{0, sum( AT (:, j)) 1} 1  j  T • Then Assignment Redundancy =    T j 1 d j   / T April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 42 14 Knowledge Exclusivity Index • Detects agents who have singular knowledge. • The Knowledge Exclusivity Index (KEI) for agent i is defined as follows: |K| AK (i, j) * e(1sum( AK (:, j))) j 1 • The values are then normalized to be in [0,1] by dividing by the maximum KEI value. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 43 Additional Specialized Measures Exist Particularly Ones Using Multiple Matrices • Performance – Diffusion – Accuracy • Loads – Cognitive demand – Workload – Potential Work Load • Congruency – fit – Communication – Knowledge – Resource • Need for Negotiation • Under Supply April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 44 Cognitive Demand • The cognitive effort the individual has to do on average • How many people do you interact with • How many tasks do you do CENTRALITY • How much knowledge do you have • How much knowledge is needed to do the tasks • How many people do you need to interact with to do the tasks • How many other tasks and so people depend on you • How many other tasks and so people do you depend on April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 15 Key Entities Indiana Jones & Marion Ravenwood are key characters; Rene Belloq & Salah also rank high in importance. Key Agents Key Knowledge Key Locations April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 46 Key Entities: Resources Dominant Resource (total degree centrality) The Total Degree Centrality of a node is the normalized sum of its row and column degrees. Individuals or organizations who are "in the know" are those who are linked to many others and so, by virtue of their position have access to the ideas, thoughts, beliefs of many others. Individuals who are "in the know" are identified by degree centrality in the relevant social network. Those who are ranked high on this metrics have more connections to others in the same network. The scientific name of this measure is total degree centrality and it is calculated on the agent by agent matrices. Input: all networks based on the node class(es) Resource Rank Resource 1 Ark of the Covenant 2 truck Ark of the Covenant is a 3 4 bullwhip torch dominant resource. 5 Headpiece for Staff - Ravenwood's half 6 pistol 7 machine gun 8 fire 9 rope 10 car Value 0.277 0.127 0.123 0.112 0.112 0.104 0.085 0.081 0.081 0.069 Unscaled 72.000 33.000 32.000 29.000 29.000 27.000 22.000 21.000 21.000 18.000 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 47 What Have We Learned? • Indy is an important character, given a variety of relevant measures – Indy ranked in top 3 in 94% of measures calculated – Marion Ravenwood, Sallah, & Rene Belloq are also important (i.e., top-ranked in a high percentage of measures) – German Agents, while identified as important, is an entity that represents various extras who wore Nazi uniforms in bit parts • Knowing of the Well of Souls & the Ark of the Covenant is important • The Ark of the Covenant is the most important resource in the movie • The Raven Saloon & Tanis Ruins are important locations April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 48 16 Sudan April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 49 Total Degree Centrality for Political Agents of Sudan in 2003-2008 (Sudanese only) Garang Taha Mayardit Bashir April 2012 10/21/2010 Data Extraction Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Cognitive Demand for Political Agents of Sudan in 2003-2008 (Sudanese only) Mayardit Bashir Garang Taha Machar Minnawi April 2012 10/21/2010 Data Extraction Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 17 Sudan – Key Actors Rank Degree 1 omar_al_bashir 2 john_garang 3 george_w_bush 4 salva_kiir_mayardit 5 yoweri_museveni 6 ali_osman_taha 7 joseph_kony 8 kofi_annan 9 barack_obama 10 hosni_mubarak Betweenness omar_al_bashir john_garang george_w_bush salva_kiir_mayardit mustafa_fadhil saddam_hussein keith_richards barack_obama ali_osman_taha usama_bin_laden Eigenvector omar_al_bashir salva_kiir_mayardit john_garang luis_moreno_ocampo ali_osman_taha george_w_bush yoweri_museveni hosni_mubarak joseph_kony thabo_mbeki April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 52 Findings on Sudan • At the macro-level – little change • Tribal interactions have coalesced into more formal consistent lines of alliances and conflict • Bashir’s influence increased between 20032008 and Minnawi shows as an emergent leader • Political “brokers” continually changing – situational volatility • Harbored terrorists show as key actors only from a “global” perspective 2003 • Rise in power of Dinka 2004 • Conflict logic changes to enable creation of 2005 S. Sudan 2006 2007 2008 Ecology, Land_Water_Use Land_Water_Use, Ideology Ideology, Economy Ideology, Economy Economy, Ideology Land_Water_Use, Economy April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 53 Groups! April 2012 54 18 Motivation April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 55 What is a Group? • Any number of entities considered as a unit • Nominal group – “named” collective e.g., nurses • Collection of entities with features in common • Small Group – 3-15 members – Able to communicate freely and openly with all of the other members of the group – Norms – Roles – Common purpose April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 56 Group Rationales 3 conceptual reasons for why groups matter • Cohesion – Because the nodes have the same kind of position – relations to same type of other nodes – Network region might contain cohesive subgroups • Equivalence – Because the nodes have the same linkages = relationships to the same other nodes • Distinction – Because the nodes are different from other nodes around them, anomalies NOTE: A group may or may not be a component or a K-Core April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 57 19 Terminology: Components • A subgraph S of a graph G is a component if S is maximal and connected • If G is a digraph, then – S is a weak component if it is a component of the underlying (undirected) graph – S is a strong component if for all dyads u,v in S, there is a path from u to v • Finding components is the first step in analysis of large graphs – Analyze each component separately, or discard very small components April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 58 Largest Component April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 59 Terminology: K-Cores • K-CORE – A maximal subgraph S such that for all u in S, (u,S) >= k • S=A is 1-core & 2-core; B and C 3-core • There is no 4-core or higher – Finds large regions within which cohesive subgroups may be found – Identifies fault lines across which cohesive subgroups do not span B April 2012 A Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU C 60 20 Groups and Equivalences • Many grouping mechanisms are based on equivalences • Common ones: – Structural – Regular – Automorphic *At least as defined in JMS paper in 1994. • These are subsets Regular Automorphic Structural April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 61 Structurally Similar Groups • Classes of node connecting to the same other classes. 011000000 100111000 100000111 010111000 010111000 010111000 001000111 001000111 0 0 • 1Ca0n b0e 0coh1es1ive1, or not. • Good for detecting organizational roles. • Early methods found strict regularity. • Definition is circular! April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Structural Equivalence • Structurally indistinguishable – Same degree, centrality, belong to same number of cliques, etc. – Only the label on the node can distinguish it from those equivalent to it. – Perfectly substitutable: same contacts, resources • Face the same social environment – Similar forces affecting them – same influencers – On average, hear things equally early, influenced similarly, have similar things to cope with April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 63 21 CONCOR a b c • Works by splitting groups • Specify number of splits d e • Recursively splits partitions, user selects n splits. – n splits  2n groups • At each split, divides nodes based on maximum correlation in outgoing connections. • Builds a hierarchical decomposition • Calculates correlation between each pair of rows/columns – Then the correlation of the correlations … – Repeats until reaches “stableness” – Then splits the nodes into two groups based on the correlation April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 64 CONCOR • Finds ZERO blocks • Issues – – First correlation does most of the work – Heuristic approach – Located groups are “cliques” and often only regularly equivalent • PRO: Only commonly used algorithm detects relaxed structural equivalence. (except arguable PCA) • CON: Top down splitting of nodes imposes structure • CON: Requires user to choose a power of 2 for the number of groups. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU CONCOR Grouping April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 66 22 Girvan-Newman • Detects groups – “community structure” – A community consists of a subset of nodes within which the node-node connections are dense, and the edges to nodes in other communities are less dense • Procedure: – Calculate betweenness of all existing edges in the network – Remove edge with the highest betweenness is removed – Recalculate betweenness of all edges affected by the removal – Repeat until no edges remain • Procedure to find optimal grouping • Fast • Groups sometimes difficult to interpret April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 67 Newman’s Algorithms • Maximize Modularity (diff between expected and observed ties in community) • Newman & Girvan [2004] – Remove high-betweenness nodes – What’s left are communities with redundant connections – Requires assumption of k groups. • Newman [2006] : – Start inside community and search for boundary. – Relatively fast for large networks April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 68 Newman Grouping April 2012 Jürgen Pfeffer, Kathleen M. Carley 69 23 Networks & Trails • Can be 1 or 2 mode • Links occur only once (but can be weighted) • Time can be modeled by multiple networks or timed attributes • Entities can maintain many relationships in single snapshot. • Always involves subjects and locations • Subject may revisit location • Time stream integral part of data. • Subjects at one location at a time (but longer relationship may be implicit in repeated visits). April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU What is FOG? • Fuzzy, Overlapping Groups – Multiple group memberships – Varying strength of membership – No arbitrary assignments on boundary spanners • Reveals details of interstitial roles • Designed for Link Data or Network Data • Generative model (rather than pattern matching) April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Sampling Link Data From Networks • Random tree • Models iterative interaction – Informal gathering – Spread of rumor or info Tree Link {A,B} {A,B,C} {A,B,C,E} April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 24 FOG Groups: Monastery April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Assess: How are they organized? FOG (Fuzzy Group Clustering) shows suspicious entities organized into 5 groups w/shared members. ════════ Interstitial members are likely to contain coordinators & leaders April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 74 FOG Algorithms Algorithm Based On H-FOG Hierarchical Clustering Pros • Nested Groups • Run once; explore tree to determine # of groups. Cons Scales poorly O(n4) k-FOG K-Means Scales well Must guess # of groups, k α-FOG Dirichlet Process Fast, Does not require guessing number of groups (α parameter is expected concentration) Data-hungry April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 25 Jeff Skilling Kenneth Lay Tanya Jones Veronica Espinoza Jeff Dasovich Key Actors are Interstitial April 2012 Summary • Why Group? – Reconstruct “real” groups – Find individuals who might be or act similarly – Find individuals who have unusual community ties/ • CONCOR: Structural Similarity – Finds groups with similar roles in network, even if dispersed • Newman: Cohesive Communities – Finds unusually dense clusters, even in large networks • FOG: Fuzzy, Overlapping Groups – Gives better understanding of individuals spanning groups – Analyzes network data or raw link data April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Motifs • Specialized patterns April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 78 26 Two Types of HUbbs Iris Mack Similar for 2 other Other 5 are in a concentrated core April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 79 Illustrative Problem Gang Interaction • Two gangs are in the regions • Recent reports suggest they are working together • Potential problems – Increased drug trafficking – Coordinated response to law enforcement activity – More dangerous • What is the connection? • Who is connecting them? April 2012 80 Are Two Critical Actors Linked? Path Finder April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 81 27 Example April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 82 Two Gangs April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 83 ILA is a tightly structured cell April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 84 28 Good Boys is a cellular structure April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 85 Are the gangs connected? Yes! April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 86 Overall Gang Network April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 87 29 From Gang Membership to Key Bridge April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 88 Real-world Win! – The Context • CASOS was training members of Tulsa Police Department Special Investigations Division and Oklahoma Bureau of Narcotics with support from the North Texas HIDTA • Unknown to the CASOS team – The gang unit of a major-city police force had evidence of drug supply-chain that indicated some abnormal cooperation between two dissimilar street-gangs – Law enforcement had no leads on who was behind the connection • The CASOS team was demonstrating ORA – to a few officers – utilizing a small sample of data from the department’s live arrest and surveillance database April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 89 Real-world Win! Event! • During the demonstration of ORA’s pathfinder feature – Officers asked the CASOS team to show the path between two gangs • Voila! – Using ORA a human connection between the two gangs was quickly found – This human had not been readily-apparent to the officers – But the information was buried in their database • The investigating officer was called into the room to see the newly discovered finding • This first link proved to be useful! • In addition – Other links have been found in follow-up analyses – Gang investigators from his Special Investigations Division are following these additional leads April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 90 30 Network Comparison • Are two networks similar? • What is the difference of two networks? • How to compare more than two networks? • How to compare predicted networks to the actual future observed networks? • Can we use standard statistics (e.g. correlations)? April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 91 Compare Networks: Basic Tribal Structure But, what does this look like on a map? Can we identify contested borders? Sudan Tribal Network - Hereditary April 2012 Tribal Network in the News 92 Levels of Comparison • Node measures: – key entities – rank of entities • Network measures: – distribution of node measure values – network centralization, density, … • Network structure: – Hamming, Euclidean – Correlation of networks – QAP • “Motifs”: – local effects (transitivity, reciprocity,…) April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 93 31 Comparison Report: Nodes April 2012 94 Comparison Report: Networks April 2012 Jürgen Pfeffer, Kathleen M. Carley 95 Simple Measures of Differences • Hamming distance – Sum of differences between networks – For binary edges, how many differences/changes between two networks – Simple, intuitive, meaningful for binary data • Correlation – Calculate the correlation between the edge values in two networks – Useful in standard statistics for independent identical samples – Doesn’t mean much with binary data April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 96 32 Hamming Distance 0111010000100010000111110 0001010000100010010011100 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 97 The Problem with Statistics on Networks • There are row/column dependency’s • In other words – each entry is a dyad and dyads are not independent • The basic assumptions of standard statistics are violated: – Independent – Identically distributed • Why does this matter? – Statistical hypothesis tests require these pre-conditions – Statistical guarantees (p-values) don’t hold • We need a better significance value! April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 98 QAP/MRQAP • Problem: Did the network structure cause similarity or did the identity of the nodes? • Solution: QAP Permutations - quadratic assignment procedure • QAP tests an arbitrary graph-level statistic against a QAP null hypothesis, via Monte Carlo simulation of likelihood quantiles April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 99 33 QAP Algorithm • Randomly permute the dependent network • Calculate the observed correlation with the independent network • Instead of all possible N! permutations just n random samples • In how many of the runs was the observed >= new? • That’s the approximate p-value (significance) April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 100 Example in ORA • Predicted/Observed Networks • Two networks: – Week 2: eAgent2eAgent D08 – Week 3: eAgent2eAgent D15 • The networks are different • What happened? • Hypotheses: a) Caused by transitivity b) Caused by reciprocity April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 101 Overtime • Node level • Group level • Structural level -----------• What – If – immediate impact • Trends • Change Detection • Forecasting – Near term April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 102 34 QAP/MRQAP April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 103 Reports with Multiple Networks April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 104 Temporal Report April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 105 35 Trail Differences Due to Mode of EMail Interaction Coordination for every other week meetings Face-to-Face While I’m Away April 2012 Vacation # of interactions is random Weekend Effect 106 What happens if a change occurs? • What if – You fire someone – A group of people retire – You arrest members of a cell – You use up a resource • There are two key questions – What happens immediately? – What will happen after the dust settles in the near term? • The Immediate Impact Report helps answer what happens immediately • Near-Term Analysis helps answer what happens after the dust settles April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 107 Purpose of Immediate Impact Report • Supports what-if analysis of strategic interventions on organizational performance & individuals within – Intervention = remove one or more nodes / links – Two types of analyses • Impact of n specific node removals • Impact of n random node removals averaged over r replications – Report includes network- & node-level statistics for pre- & postintervention organizations • Specific node removals yield Reports that include network- & node- level measures related to individual agents, tasks, resources • Random node removals yield Reports that include only network-level metrics. • Convenience relative to other Reports’ comparison modes April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 108 36 Illustrative Example Targeting • Which individual or group to isolate to achieve maximal effect • How to influence • Are there important connections • Who to target (vulnerabilities) • What groups or individuals stand out • What is the immediate effect of a COA • On Diffusion, Performance, Leadership April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU What do these things tell us Network-level Metric Meaning Number of Nodes Overall Complexity Performance as Accuracy Will go down – anchors how big is the change Impact beyond that node – remember this is a meta-network Likelihood the group will make mistakes Diffusion Clustering Coefficient Social Density Communication Congruence Average Communication Speed How fast does information flow through the group Tendency to groupiness Density in the social network The higher the more effective the group Typical communication speed Number of Isolated Agents Fragmentation Overall Fragmentation Who’s alone Are there subgroups and level of subgroups April 2012 Copyright © 2011 CASOS, ISR, CMU -- Kathleen M. Carley - Director 110 What do these things tell us Specific Node Removal Report also includes 3 node-level metrics, rankings & visualization Metric Meaning Emergent Leader (Cognitive Demand) Who will be calling the shots Potentially Influential (Betweenness Centrality) Who will work behind the scenes Centrality (Total Degree Centrality) Who will know what is going on Before After April 2012 Copyright © 2011 CASOS, ISR, CMU -- Kathleen M. Carley - Director 111 37 Dynamic Network Analysis Locates Effective Single Strike Target Model Choice Commander’s Choice (in the news) April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 112 Change Detection - Objective How can we quickly identify changes in social networks subject to a specified risk of false alarm? Al Qaeda Data (Based on Sageman, 88-04) Graduate Students (Email over 24 weeks, 07) April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 113 Al-Qaeda Application The control chart signals a change in the network in 2001. Closeness Z C+ C- 1994 0.0027 -0.8729 0 0.3729 1995 0.003 1.0911 0.5911 0 1996 0.0028 -0.2182 0 0 1997 0.0028 -0.2182 0 0 1998 0.0031 1.7457 1.2457 0 1999 0.003 1.0911 1.8368 0 2000 0.0032 2.4004 3.7372 0 2001 0.0034 3.7097 6.9469 0 2002 0.0024 -2.8368 3.6101 2.3368 2003 0.0015 -8.7287 0 10.5655 2004 0.0004 -15.9299 0 25.9955 Signal 0 0 0 Most Likely Estimate of the change point is 1997: - Re-establish base in Afghanistan - Bright Star ’97 cut short - Feb ’98 Islamic Front - Embassy bombings in ’98 1997 was a critical planning year for Al-Qaeda 0 0 0 0 1 8 Closeness CUSUM Statistic for Al-Qaeda (1994-2004) 7 Signal 6 5 C+ 4 Control Limit = 4 3 Change Point 2 1 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 April 2012 114 38 Dynamics People Knowledge Tasks People Relation Knowledge Relation Tasks Relation Social Network Who knows who Knowledge Network Who knows what Assignment Network Who does what Information Network What informs what Needs Network What knowledge is needed to do that task Precedence Network Which tasks must be done before which April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 115 Rates of Adaptation % Shared Fast Slow Information Activity Beliefs April 2012 Time 116 Running Construct Create CreAate PoVpiurtluaatilon Population Input Deck Construct Output SOtautitsptuicts PoMpyuSlaQtiLon DSatatatibstaicsse CSV Files Add Special Agents Event Time Line Defines a cell in a Virtual Experiment Statistically Analyze Results April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 117 39 Transactive Memory • Task memory – Knowledge of how to do task (the what) – Who knows what EGO • Social memory – Knowledge of who – Who knows who • Transactive memory – Knowledge of who knows what, who has done what, who knows whom – Who knows who knows who – Who knows who knows what – AKijk = i thinks j knows k, s.t., k = #agents + # task facts April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 118 What Drives Interaction Select an Initiating Agent… 10101001 …and a Fact to exchange… …Based on Relative Similarity or Relative Expertise, derive an Interaction Probability… 10101001 Represents an Agent’s Knowledge …Select an Agent to Interact with… 00101011 …and a Fact to exchange… 10101011 10101011 …Modulate the …and Communicate. Interaction Probability by the Socio-Demographic Proximity, etc., … April 2012 119 Interaction Is Driven By Transactive Memory • Agents have transactive memory (perceptions) of others – transactive memory may be inaccurate or incomplete – agents will not necessarily behave in an “optimal” fashion – tm can increase or decrease the probability of interaction 10101110 actual alter knowledge 10101110 10101110 actual similarity 10101001 10101110 own knowledge perceived alter knowledge 10101110 10101001 perceived similarity April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 120 40 Ideas Going Viral • 2 groups • Population 30 • Group ratio: 1:1, 1:2, 1:3 • Professionalism – Amount of knowledge shared by group members - 20%, 40%, 80% – Amount of knowledge shared across groups – 10%, 20%, 30% – Professionalism increases with higher in-group and lower out- group • Mode of communication - 1:1 or 1:N+DB • Monte Carlo 100 repetitions, 500 time periods • Time to diffusion April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 121 Impact of Moving to a Digital Economy TIME-TO-DIFFUSE INNOVATORS IDEA TO TARGETED GROUP 12 0 1 10 1 00 90 80 5 4 3 2 PROFESSIONALISM 1 1 OF ORIGINATING GROUP Just People 5 4 3 2 PROFESSIONALISM OF TARGETED GROUP TIME-TO-DIFFUSE INNOVATORS IDEA TO TARGETED GROUP 70 60 People + Web 50 5 4 5 4 PROFESSIONALISM 3 2 3 2 OF ORIGINATING 11 GROUP PROFESSIONALISM OF TARGETED GROUP April 2012 122 Hartford Sample Network – Notice Issues of Occupation and Location Network Size The number of links (relations) for a particular node (agent) depends probabilistically on: Living quarters Work status Gender Age Education Occupation Race Network Whether a link (relation) exists Composition between two nodes (agents) depends probabilistically on how they match on: April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 123 41 Intervention Communication and Behavior in U.S. cities April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 124 Mean Strength of pro-US Belief } Forecast - Construct – from Patterns and Identified COA to Near Term Impact Gaza (HQ) Yasin Rantissi Gaza Operations West BankEgypt DamascusLebanon Other Regions Other regions Judea Samaria Meshaal Damscus Infrastructure functional cells regional organization Bin Ladin support operation operation support Religious Military recruitment training finacnocnesultamtiveedicaouncoipl eration support operation support operation oper ation al cell Near Term Impact Report Generic Performance 45 40 35 30 25 20 15 10 5 0 1 With Leader Without Leader 2 Bin Laden Yassin April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, aClM-ZUawahiri Rantisi 125 Approach to Delivering the Message Matters Anti Messengers use all Media Anti Messengers use a Media Opinion News Leaders Web substantial differences by cities April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 126 42 Social Results Vary Due to These Differences Washington DC Muncie Indiana Pueblo Colorado April 2012 127 Monetary Exchange Process • Canonical example: – specific dollar bill moving through the economy • Single object in only one place at a time • Can travel between same pair more than once – A--B--C--B--C--D--E--B--C--B--C ... • This is a walk • Use link analysis on walks in multi-mode networks – To find money launderers – To find possible covert activity April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Gossip Process • Canonical Example: – rumor moving through informal network • Multiple copies exist simultaneously • Person tells only one person (or a small number) at a time • Information or good doesn’t travel between same pair twice • Information or good can reach same person multiple times • This is a trail • Use link analysis on trails on uni-mode networks: – to find rumor source • Use link analysis on trails on uni-mode networks: – To identify providers of specialty items April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 43 Infection Process • Canonical Example: – virus which activates effective immunological response • Multiple copies may exist simultaneously • Cannot revisit a node – A--B--C--E--D--F... • This is a path • Use link analysis on paths in uni-mode network – to identify key infectious individuals and points for blocking epidemic spreads • Use link analysis on paths in multi-mode network – to identify key locations that might be quarantined or closed to block epidemic spreads April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU ORA Micro-Simulation Overview I can give it to others I keep it after sharing I lose it after some time I can get it back Ideas YES YES no n/a Disease YES YES YES no Money YES no no YES Tech YES YES Sometimes YES April 2012 131 Which Node is Critical? Physical Interaction Cyber - Communication April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 132 44 Infection Physical Interaction Cyber - Communication Ivan Ben April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 133 Diffusion Physical Interaction Cyber - Communication April 2012 Abe Ben Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 134 Gifting Physical Interaction Cyber - Communication Ivan April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 135 45 Geo-Spatial Representation of Networks ArcGIS Google Earth NASA WWJ April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 136 Geo-Enabled Network Analysis Networks In an Area Location Analysis 8 Closeness CUSUM Statistic for Al-Qaeda (1994-2004) 7 6 Signal 5 C+ 4 3 2 Change Point 1 Control Limit = 4 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Change Detection Walks Visualizing Networks in Space Information Loss Tracking April 2012 Outline • Trails – Data Sources – Network Comparison – Key Questions • Loom – Opening through ORA – Interface and Manual Analysis • ORA / Loom Workflow – Exported Networks – Combining trails and reports April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 46 Trail Data • – Evolving 1:Many Relation • Sensors – Cameras – GPS – AIS – Key Card • Mechanisms – Map queries – Address registration –… April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU The Loom Interface File exports annotated DyNetML Options customizes visualization Select Subjects (IE agents) to view Vertical bars correspond to Locations Locations are shown automatically April 2012 Colored lines Indicate each subject’s path over time (down  later) Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Switch to 3-D map… April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 141 47 Explore Indy’s Trail… Movie starts in Eyebrow of the Jungle… April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 142 Raven Saloon & Well of Souls are in this part of the world. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 143 Indy, Ark Knowledge & Ark Trails Indy Ark Knowledge Ark April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 48 Global Reach of Highly Central Actors April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Identify Critical Locations Finding Points Using Betweenness Extracting Out Spatial Constraints Using Density Based Clustering Fishing Community Major Ports April 2012 Unknown 146 Afghanistan April 2012 Wikipedia 49 Ability of Kahn and Dotsum to act as an Emergent Leader Changes Over Time Abdul Rasid Dostum Mohammad Ismail Khan April 2012 Appointed Chief of Staff Akbar Bai Incident Minister of Water and EnergyGoes to Turkey Returns Where are the people in Afghanistan Kunduz and Kandahar are hot spots April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU The Problem • Traditional network measures were only designed to look at distance with respect to a single relationship. • Using Meta-Networks we can now model multiple relationships, including geo-spatial relationships. • This requires us to re-evaluate the concept of “distance” within a network. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 50 New Metrics • Spatial betweenness centrality – Identifies individuals who allow short paths across long distances – Fast, efficient algorithm • Spatial degree centrality – Identifies individuals that are not only high connected, but who’s connections cover large geographic areas. • Spatial eigenvector centrality – Identifies Locations who’s agent population as a whole has the greatest eigenvector centrality • Location Relevance – Identifies locations by the number of agents that are located at that location and also known to be connected to each other in the agent to agent graph April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Critical Nodes as Anomalies • Betweenness centrality – Nodes that connect disparate groups – Important in diffusion across groups (e.g. knowledge, illicit drugs, etc.) • Eigenvector centrality – Social capital – Related to PageRank – Important in diffusion within groups: fast cascades from high eigenvector centrality nodes to rest of a group April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Centrality Measure: Betweenness • Captures the extent to which a node lies along shortest paths between other nodes – Often interpreted as having power – In simple random networks, these paths get long for large networks – Length of connection irrelevant • High betweenness centrality nodes facilitate the fast diffusion of information across the network April 2012 Most critical Most between Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 51 Spatial Betweenness • Propinquity predicts these short paths to exist between spatially proximate nodes. Spatial dimension • Want a new measure to discover the nodes in the network that facilitate short paths across large distance April 2012 Most critical Most critical Most spatially Most between between Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU Using Constraints to Set Strength of Links has Practical Consequences Regular •Louisville •Nashville •Houston •Albuquerque Betweenness Geo-Network Measures ImprSopveatfioacl us •Louisville •Albuquerque •Tucson •Atlanta April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 155 CFMroUmCPAeSnOLSinuksedaOtaRAtotoCPoruorvsideeoTfaActcictaiol nGuAindaanlycse i…s 5 groups Dynamic Network Analysis 334 are in multiple sub-groups Agents 286, 652, 97 repeatedly rank in top three in the measures. In-the-Know (total degree centrality) Rank Value Unscaled Agent 1 0.181 181 652 2 0.176 3 0.175 176 286 175 97 4 0.165 165 412 5 0.158 6 0.154 158 502 154 273 7 0.139 139 246 8 0.115 115 615 9 10 3 0.111 0.109 111 829 109 8 Potentially Influential (betweenness centrality) Rank Value Unscaled Agent 1 0.0988664 24667.2 286 2 0.0705427 17600.4 97 3 0.0625256 15600.1 502 4 0.0609523 15207.6 829 5 0.0548627 13688.2 652 6 0.0542421 13533.4 615 7 0.0524498 13086.2 412 8 0.0410645 10245.6 501 9 0.0306578 7649.12 273 10 0.0305522 7622.78 552 Identified Key Bad-Guys waypoint 97, liaison 652 1 Core Component of Red Comms Identified Bad-Guy Network 2 Fuzzy Groups in Red Comms –– Characterized Organizational Structure 4 Time Period 3 Signals Change in Operations 5 Time 3 May Be Operation and Time 4 Initial Surveillance 97 holed-up here with others 652 alone, on the run 286 in Adelphi w/many others 6 Bad Guys by Region Shows Convergence on Adelphi at Time 3 April 2012 286 alone, 8 little movement 7 By Time 6: Time 4: All active; 652 running; Never same place same time. COA 1: Go after Dispersed Bad Guys COA 2: Scour Adelphi for IED/Bomb etc 156 52 Arab Spring April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 157 Changing Social Behavior: Terrorism and the Arab Spring • Using rapid ethnographic assessment at multiple levels – High Level • Fast processing of lexis nexis tags • Examined role of social media in context • Fast processing of twitter hashtags – Medium • Medium processing of content – lexis nexis content • Twitter content • Results – 18 arab spring countries for 10 months – Spread of revolution not geographically based – Mixed effect of social media – Conceptual complexity and human rights are dominant factors – Media attention to terrorism drops as revolutions begin – High power actors and ideas are “secondary” April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 158 Arab Spring – Rise and Fall of Revolution April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 159 53 Attention to Issues During Arab Spring Complexity Increases Pattern for all Terror Groups Similar Activity Increases 32,000+ articles 2 Days As Revolutions Gets Underway Focus on Terror Goes Down April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 160 Actors of Interest 450 400 350 300 250 200 150 100 50 0 71 82 93 104 115 126 17 28 39 410 511 612 713 8 14 915 Egypt Libya Tunisia Syria April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 161 Familiar Secondary Actors • Leaders • Latent leaders – most likely to sway populace when leaders removed • Gatekeepers – Betweenness – Even better – high betweenness low degree – Individuals with high structural holes • Critical for impacting – Who has access to what information – Who gets what job – Etc. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 162 54 Key Actors- Egypt Month Leader Latent Leader Gatekeeper July 10 Hosni Mubark Baroness Ashton Michael Hayden Aug 10 Barack Obama George J Mitchell Asif Ali Zardari Sep 10 Mahmoud Abbas Hosni Mubark Dmitry Medvedev Oct 10 Hosni Mubark Mohamed Elbaradei Dmitry Medvedev Nov 10 Hosni Mubark Silvio Berlusconi Muammar Gaddafi Dec 10 Hosni Mubark George J Mitchell John Kerry Jan 11 Hosni Mubark Nicholas Sarkozy Thaddeus G McCotter Feb 11 Hosni Mubark George W Bush Wolfgang Schaeuble Mar 11 Hosni Mubark George W Bush Bill Nelson Apr 11 Hosni Mubark Bashar al Assad Angela Merkel May 11 Barack Obama George W Bush Dick Cheney Jun 11 Barack Obama Christine Lagarde Conan O’Brien Jul 11 Hosni Mubark Bashar al Assad Tzipora Livini Aug 11 Hosni Mubark David Cameron Joe Biden Sep 11 Barack Obama Hilary Rodham Clinton Mark Zuckerberg April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 163 Key Actors - Libya Month Leader Latent Leaders Gatekeeper July 10 Barack Obama Hilary Rodham Clinton Prince Philip Aug 10 Alex Salmond Charles Schumer Peter Mandelson Sep 10 Alex Salmond Kirsten E Gillibrand Ben Cardin Oct 10 Mahmoud Abbas George J Mitchell Lee Myung-Bak Nov 10 Nicholas Sarkozy Angela Merkel Benjamin Netanyahu Dec 10 Muammar Gaddafi Julian Assange Sadam Hussein Jan 11 Muammar Gaddafi Ellen Johnson-Sirleaf Kim Jong Il Feb 11 Muammar Gaddafi Gordon Brown Francois Fillon Mar 11 Muammar Gaddafi Robert M. Gates Stephen Colbert Apr 11 Muammar Gaddafi Liam Fox Caroline Spelman May 11 Muammar Gaddafi Dmitry Medvedev Christiane Amanpour Jun 11 Muammar Gaddafi Liam Fox Kevin McCarthy Jul 11 Muammar Gaddafi Nicolas Sarkozy Prince William Aug 11 Muammar Gaddafi Nick Clegg Dalai Lama Sep 11 Muammar Gaddafi Ban Ki-Moon Al Gore April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 164 Key Issues - Egypt Month First Second Third July 10 Oil & gas Internat Relations Religion Aug 10 Religion Internat Relations Economic Sep 10 Peace Process Internat Relations Religion Oct 10 Religion Internat Relations Elections Nov 10 Religion Elections Politics Dec 10 Religion Internat Relations Elections Jan 11 Protests & Demons Religion Internat Relations Feb 11 Protests & Demons Religion Internat Relations Mar 11 Protests & Demons Religion Mubarak Resignation Apr 11 Protests & Demons Religion Mubarak Resignation May 11 Religion Terrorism Internat Relations Jun 11 Religion Protests & Demons Economic Jul 11 Protests & Demons Religion Mubarak Resignation Aug 11 Religion Mubarak Resignation Protests & Demons Sep 11 Internat Relations Religion Protests & Demons April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 165 55 Key Issues - Libya Month First Second Third July 10 Oil & gas Investigations Internat Relations Aug 10 Anniversaries Investigations Terrorism Sep 10 Internat Relations Investigations Finance Oct 10 Internat Relations Peace Process Terrorism Nov 10 Internat Econ Org Internat Relations Economic Growth Dec 10 Internat Relations WikiLeaks Internat Econ Org Jan 11 Sports Internat Relations Internat Econ Org Feb 11 Mar 11 War & Conflict Internat Relations Rebellion Insurgent Apr 11 War & Conflict Internat Relations Rebellion Insurgent May 11 War & Conflict Internat Relations Rebellion Insurgent Jun 11 War & Conflict Armed Forces Internat Relations Jul 11 War & Conflict Internat Relations Rebellion Insurgent Aug 11 War & Conflict Rebellion Insurgent Armed Forces Sep 11 War & Conflict Internat Relations Rebellion Insurgent April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 166 Key Betweenness Issues - Egypt Month First Second Third July 10 Religion Investigations Construction Aug 10 Religion Terrorism Economics Sep 10 Religion Deserts Peace Process Oct 10 Religion Penalties Peace Process Nov 10 Religion Economics Terrorism Dec 10 Religion Economics Vehicles Jan 11 Peace Process Religion Terrorism Feb 11 Mar 11 Religion Internat Relations Economics Apr 11 Economics Anniversaries Internat Relations May 11 Religion Economics Internat Relations Jun 11 Economics Religion Investigations Jul 11 Religion Economics Internat Econ Org Aug 11 Religion Economics Peace Process Sep 11 Religion Economics Terrorism April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 167 Key Betweenness Issues - Libya Month First Second Third July 10 Religion Internat Relations Terrorism Aug 10 Religion Economics Investigations Sep 10 Religion Terrorism Oil & Gas Oct 10 Religion Internat Relations Investigations Nov 10 Economics Internat Relations Terrorism Dec 10 Internat Relations Religion Peace Process Jan 11 Religion Internat Relations Economics Feb 11 Mar 11 Economics War & Conflict Religion Apr 11 Economics War & Conflict Religion May 11 Religion War & Conflict Economics Jun 11 Religion Economics Internat Relations Jul 11 Religion Internat Relations Armed Forces Aug 11 Religion Internat Relations Armed Forces Sep 11 Religion Economics Terrorism April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 168 56 Communicative Reach Egypt Egypt Degree Stereotypes Elections Degree Symbols Stereotypes Religion Elections Internat Relations Symbols Religion Internat Relations Protests Protests Sports War Economics Peace TerroBruiszmzwords Betweeness War Terrorism Peace Sports Economics Buzzwords Betweeness April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 169 Communicative Reach Libya Libya Degree Stereotypes Sports Degree EconoSmyicmsbols Stereotypes Protests Internat Relations Symbols War Internat Relations Protests Elections War April 2012 Religion Peace Terrorism Buzzwords Betweeness Peace Sports Elections Economics Religion Terrorism Buzzwords Betweeness Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 170 Steps in a Structural Analysis 1. Collect network data. – Connections among people, knowledge, resources, events … 2. Enter data into ORA. 3. Visualize. 4. Generate Report. 5. If multiple networks create combined measures. 6. If needed look at some measures more indepth. 7. Possibly drop isolates and pendants 8. Check interpretations. April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 171 57 STEP 1 – Data Collection Socio-Cultural Data is Every Where • Unstructured – Text – e.g. interviews, news articles, blogs, email – Various on-line sources • Semi-structured – Blogs – Emails – Crowd-sourced • Structured – Government and corporate documents – Proceedings • Unstructured – Sudan Tribune Review • 2003 - 2932 • 2004 - 6943 • 2005 - 3828 • 2006 - 3828 • 2007 - 5815 • 2008 - 9266 – Archive of Lobban writing • Semi-structured – UN Reports – IDA Study 1796 files • Structured – African gazateer April 2012 172 Data Assessment Pyramid Crisis Response 0 Secondary response and cleanup Planning and long term response Resiliency planning Reasoning & Learning Analysis Method Time High Overview Matches Core actors Detailed Scientific Rapid Network Assessment Network Analytics Text Mining Machine Learning Network Analytics Simulation Multmethod Simulation A few hours A few hours to a day A few days to a month Several months April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 173 STEP 2 – Cleaning the Corpus • Cleaning of data – Removal of navigation, headers and footers (information not pertinent to the article) – Remove non texable files • E.g. remove maps/scans – Converted PDF to .txt • Most but not all can be converted with CASOS tool – Convert RTF to .txt • Currently semi-manual process • Semi-automatic cleaning of the corpus which is done manually (optional) and automatically – Involves the entire corpus as a whole not individual texts – Removed word wrapping – Run automated cleaning April 2012 174 58 STEP 3 - Deduplication • Deduplication – Removal of repeated articles – Reduces the number of files and allows a more compact analysis – Near Miss procedure is best • Performed Once • Time depends on number of texts and length • No deduplication was done on SNARC data as all files unique Illustration of Impact of typical Deduplication - Number of texts before and after deduplication applied only to Sudan data Sudan Number before Number of after text 32613 18309 concepts 88260 83150 Average frequency per concept 197.417879 88.15785929 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 175 Illustration of Impact of Deduplication: Note Deduplication Can Impact Importance of Concepts Top 10 Concepts Before and After Deduplication - Sudan Before After Concept Count % Concept Count % Valencia conflict _task nanuque ampere republic_of_the_sudan wilayat_darfur valence_task ner_population faouzi_ben_mohamed_be_ahmed_a badou 1141455 867688 500411 448679 385560 344036 178629 178059 172782 152547 6.55 Valencia 4.98 conflict 2.87 republic_of_the_sudan 2.88 conflict_task 2.21 wilayat_darfur 1.97 political 1.03 Sudanese 1.02 Khartoum 0.99 valence_task 0.88 environment 853691 11.65 585801 7.99 332082 4.53 207738 2.83 153976 2.1 113992 1.56 94010 1.28 72409 0.99 62440 0.85 60138 0.82 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 176 STEP 4 – Automated Text Cleaning • Removed stand-alone numbers • Converted common hyphenated forms and • Removed extra space non-hyphenated to • Fixed common typos • Removed extra white space common form e.g. Major-General and Major General • Expanded contraction and abbreviations • Generalization using standard plus the • Removed individual named entities letters not in names • Removed noise words • Converted British to American spelling • Ran standard ngram conversion • Pronoun resolution This can be done with AutoMap April 2012 177 59 Additional Aspects of Automated Text Cleaning Text preparation: a completely automatic process • Creation of a thesauri of stemmed and non-stemmed version of nouns and verbs – Detensing: Reduce all verbs to their present tense – Depluralization: Eliminates the plural form and reduces it to its base form • Apply an n-gram thesauri to convert multi-words to single concepts • Delete noise/stop words – Prepositions – helping verbs – verb of being – remaining pronouns April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 178 Illustrative Results for Impact of Specialized Stemmers Number of concepts after depluralization Original After Percentage Sudan Catnet Singapore 97492 24743 5073 85758 22091 4452 87.96% 89.28% 87.76% Number of texts nouns and verbs before and after depluralization and detensifying Nouns Before Nouns After Verbs Before Verbs After Sudan 28488 23680 12006 6763 Catnet 7693 6838 4754 3223 Singapore 1677 1445 1213 816 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 179 Step 5 – Named Entity Identification • Extraction and Identification of named entities: – Tag by part of speech – Identify proper nouns and n-grams that are proper nouns • Result: thesauri of named entities • Prior study showed that human processing time reduced by 80% to 99% by using this approach SNARC Number of Named Entities Texts People Organizations Locations Events 265 33,210 5,523 2,807 120 April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 180 60 STEP 5 – Includes Ontological Cross Classification and Thesauri Construction • Apply standard thesauri and ontological categories • Applying the standard thesauri coverts all multiple- concepts words into a single word/concept already classified • Ontological classes are suggested using – Parts of speech and statistical regularities April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 181 Step 6: Named Entity Resolution • At the same time that the named entities are extracted a metanetwork is extracted. • Named Entity list – Contains all concepts/ngrams guessed to be people, organizations, locations, events with best guess • Meta-network – Contains all people, organizations, locations, knowledge, events, activities, resources, beliefs based on existing standard thesauri – The specific people, organizations, locations and events in this are viewed as “vetted” • The vetted list is removed from the named entity list and the vetted class is used • Humans then go through the remaining named entities to classify those that make sense • Step 5 and 6 are repeated as needed • End result is a vetted meta-network April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 182 Illustrative Results from Named Entity List Before Resolution conceptFrom up missions targeting taliban leaders conceptTo metaOntology up_missions_targeting_taliba n_leaders agent distribute new distribute_new agent main_office_california_office main office california office 1899 l street _1899_l_street location pay afghan pay_afghan agent military commander military organization once u once_u agent peace press washington peace_press_washington agent david katz david_katz agent david kilcullen david_kilcullen agent david lachapelle david_lachapelle agent david lanz david_lanz agent April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 183 61 London Riots Hashtag Network London Riots London Riots London Riots London Riots London Riots April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 184 Socio-Cultural Modeling Conundrum! CASOS CT and DNA Data IKE Net USMA Web Scrapers • There are hundreds (maybe thousands) of services and models in the human socio-cultural behavioral area • Some are webservices and some need to be on larger systems • NEED: – Increased model reuse – Decreased time from data collection to model results – Share procedures for going from data to model to policy April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 185 SORASCS What is SORASCS? REPORTING RECORDING ANALYSIS MODELING DATA DATA PROCESSING SOURCES • Platform for building workflows using existing tools – key user – analyst – tools integrated: (CMU) WebScraper, ORA, Automap (Thick & Thin), PileSort, CMU-front-end for NASA World Wind (GMU) Pythia, Caesar III, (COTS) Word, Powerpoint, Excel. • Back-end system on which tools are built – key user – modelers – VIBES (Alion/CMU) now has a SORASCS backend Key Features • Workflow management (scriptrunner tool and the script2bpel), Simple interface, Integration guidelines, Data Provenance, coherent, flexible, extensible Key Underlying Technologies • BPEL, Apache (CXF, ODE, Tomcat), CMU Scriptrunner April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 186 62 SORASCS is the GLUE! April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU 187 Example Applications • Public Health – Reorganization • Service Science – Designing new teams • Science – Science networks • Enron – bankruptcy – match.com • Merchant marine vessels – piracy & finding hidden ports • Counter-narcotics – marijuana traffic • Counter-terrorism – early identification of leadership change • Pandemic Influenza – intervention – school closure doesn’t work • Literature – the structure of plays and movies • MMOGs – many ways to succeed April 2012 Copyright © Kathleen M. Carley, CASOS, ISR, SCS, CMU N N 35 30 25 20 15 10 5 0 40 35 30 25 20 15 10 5 0 W ea po n_ G P 30 _G re n W e a p o n _ M 1 6 A 2 _ R if le T h ro w _ M 6 7 _ F ra g W e a p o n_ M 2 49 _ S A W W e a p o n _ G u e r i ll a _ R P G 7 _ R o c k e t T h r o w _ M IL E S _ G r e n a d e T h ro w _ M 6 7 _ Fr a g W e a p o n _ A K 7 4 s u _ R if l e W e a p o n _ M 4 A 1 _ R if l e _ M o d W e a p o n _ M 1 6 A 2 _ R if le W e a p o n _ M 9 _ P i st o l W ea po n_ M 2 49 _S A W 50.0% Effects of School Closure 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 Simulation Day .5T_3D .5T_7D .5T_28D 1T_3D 1T_7D 1T_28D 2T_3D 2T_7D 2T_28D BASE 50.0% Effects of Vaccination 45.0% 40.0% 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 Simulation Day C25_E25 C75_E50 Top players’ C25_E50 C75_E75 C25_E75 C75_E90 C25_E90 C100_E25 C50_E25 C100_E50 C50_E50 C100_E75 C50_E75 C100_E90 C50_E90 BASE C75_E25 weapon usage T h ro w _ M I L E S _ G r e n a d e W e a p o n _ G u e ri ll a _ R P G 7 _ R o c ke t W e ap o n _R P K _ SA W T h ro w _ M 8 4 _ S t u n W e ap o n_ SP R _S nip er T h ro w _ M 8 3 _ S m o k e W ea p o n _R P G 7 _R ocke t W e a po n _M 82 _S nip er W e a p o n _ M 2 0 3 _ G re n T hrow _M 8 3 _S m o ke W e a p o n _ M 4 A 1 _ R ifle _ M o d W e a p o n _ M 2 0 3 _ G re n W e a p o n _R P K _S A W W e a p o n_ R P G 7 _R ock e t T h ro w _ M 8 4 _ S tu n favori teW eapon Bottom players’ weapon usage W e a p o n _ G P 3 0 _ G re n W e a p o n _ A K 7 4 s u _ R i fle fa vor iteWeapo n 188 63

Top_arrow
回到顶部
EEWORLD下载中心所有资源均来自网友分享,如有侵权,请发送举报邮件到客服邮箱bbs_service@eeworld.com.cn 或通过站内短信息或QQ:273568022联系管理员 高进,我们会尽快处理。