<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Graph analysis on R Views</title>
    <link>https://rviews.rstudio.com/tags/graph-analysis/</link>
    <description>Recent content in Graph analysis on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 06 Mar 2019 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/graph-analysis/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Graph analysis using the tidyverse</title>
      <link>https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/</link>
      <pubDate>Wed, 06 Mar 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/</guid>
      <description>
        


&lt;p&gt;It is because I am not a graph analysis expert that I thought it important to write this article. For someone who thinks in terms of single rectangular data sets, it is a bit of a mental leap to understand how to apply &lt;em&gt;tidy&lt;/em&gt; principles to a more robust object, such as a graph table. Thankfully, there are two packages that make this work much easier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/thomasp85/tidygraph&#34;&gt;&lt;code&gt;tidygraph&lt;/code&gt;&lt;/a&gt; - Provides a way for &lt;code&gt;dplyr&lt;/code&gt; to interact with graphs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/thomasp85/ggraph&#34;&gt;&lt;code&gt;ggraph&lt;/code&gt;&lt;/a&gt; - Extension to &lt;code&gt;ggplot2&lt;/code&gt; for graph analysis&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;quick-intro&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Quick intro&lt;/h3&gt;
&lt;p&gt;Simply put, graph theory studies relationships between objects in a group. Visually, we can think of a graph as a series of interconnected circles, each representing a member of a group, such as people in a Social Network. Lines drawn between the circles represent a relationship between the members, such as friendships in a Social Network. Graph analysis helps with figuring out things such as the influence of a certain member, or how many friends are in between two members. A more formal definition and detailed explanation of Graph Theory can be found in &lt;a href=&#34;https://en.wikipedia.org/wiki/Graph_theory&#34;&gt;Wikipedia here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;Using an example, this article will introduce concepts of graph analysis work, and how &lt;code&gt;tidyverse&lt;/code&gt; and &lt;code&gt;tidyverse&lt;/code&gt;-adjacent tools can be used for such analysis.&lt;/p&gt;
&lt;div id=&#34;data-source&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Data source&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/rfordatascience/tidytuesday&#34;&gt;tidytuesday&lt;/a&gt; weekly project encourages new and experienced users to use the &lt;code&gt;tidyverse&lt;/code&gt; tools to analyze data sets that change every week. I have been using that opportunity to lean new tools and techniques. One of the most recent data sets relates to French trains; it contains aggregate daily total trips per connecting stations.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(readr)

url &amp;lt;- &amp;quot;https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv&amp;quot;
small_trains &amp;lt;- read_csv(url)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(small_trains)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 13
##    year month service departure_stati… arrival_station journey_time_avg
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;            &amp;lt;chr&amp;gt;                      &amp;lt;dbl&amp;gt;
## 1  2017     9 Nation… PARIS EST        METZ                        85.1
## 2  2017     9 Nation… REIMS            PARIS EST                   47.1
## 3  2017     9 Nation… PARIS EST        STRASBOURG                 116. 
## 4  2017     9 Nation… PARIS LYON       AVIGNON TGV                161. 
## 5  2017     9 Nation… PARIS LYON       BELLEGARDE (AI…            164. 
## 6  2017     9 Nation… PARIS LYON       BESANCON FRANC…            129. 
## # … with 7 more variables: total_num_trips &amp;lt;int&amp;gt;,
## #   avg_delay_all_departing &amp;lt;dbl&amp;gt;, avg_delay_all_arriving &amp;lt;dbl&amp;gt;,
## #   num_late_at_departure &amp;lt;int&amp;gt;, num_arriving_late &amp;lt;int&amp;gt;,
## #   delay_cause &amp;lt;chr&amp;gt;, delayed_number &amp;lt;dbl&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;data-preparation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Data Preparation&lt;/h3&gt;
&lt;p&gt;Even though it was meant to analyze delays, I thought it would be interesting to use the data to understand how stations connect with each other. A new summarized data set is created, called &lt;em&gt;routes&lt;/em&gt;, which contains a single entry for each connected station. It also includes the average journey time it takes to go between stations.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dplyr)

routes &amp;lt;- small_trains %&amp;gt;%
  group_by(departure_station, arrival_station) %&amp;gt;%
  summarise(journey_time = mean(journey_time_avg)) %&amp;gt;%
  ungroup() %&amp;gt;%
  mutate(from = departure_station, 
         to = arrival_station) %&amp;gt;%
  select(from, to, journey_time)

routes&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 130 x 3
##    from                       to                 journey_time
##    &amp;lt;chr&amp;gt;                      &amp;lt;chr&amp;gt;                     &amp;lt;dbl&amp;gt;
##  1 AIX EN PROVENCE TGV        PARIS LYON                186. 
##  2 ANGERS SAINT LAUD          PARIS MONTPARNASSE         97.5
##  3 ANGOULEME                  PARIS MONTPARNASSE        146. 
##  4 ANNECY                     PARIS LYON                225. 
##  5 ARRAS                      PARIS NORD                 52.8
##  6 AVIGNON TGV                PARIS LYON                161. 
##  7 BARCELONA                  PARIS LYON                358. 
##  8 BELLEGARDE (AIN)           PARIS LYON                163. 
##  9 BESANCON FRANCHE COMTE TGV PARIS LYON                131. 
## 10 BORDEAUX ST JEAN           PARIS MONTPARNASSE        186. 
## # … with 120 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next step is to transform the tidy data set, into a graph table. In order to prepare &lt;em&gt;routes&lt;/em&gt; for this transformation, it has to contain two variables specifically named: &lt;em&gt;from&lt;/em&gt; and &lt;em&gt;to&lt;/em&gt;, which are the names that &lt;code&gt;tidygraph&lt;/code&gt; expects to see. Those variables should contain the name of each member (e.g., “AIX EN PROVENCE TGV”), and the relationship (“AIX EN PROVENCE TGV” -&amp;gt; “PARIS LYON”) .&lt;/p&gt;
&lt;p&gt;In graph terminology, a member of the group is called a &lt;strong&gt;node&lt;/strong&gt; (or vertex) in the graph, and a relationship between nodes is called an &lt;strong&gt;edge&lt;/strong&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidygraph)

graph_routes &amp;lt;- as_tbl_graph(routes)

graph_routes&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 1 (active)
##   name               
##   &amp;lt;chr&amp;gt;              
## 1 AIX EN PROVENCE TGV
## 2 ANGERS SAINT LAUD  
## 3 ANGOULEME          
## 4 ANNECY             
## 5 ARRAS              
## 6 AVIGNON TGV        
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
##    from    to journey_time
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;        &amp;lt;dbl&amp;gt;
## 1     1    39        186. 
## 2     2    40         97.5
## 3     3    40        146. 
## # … with 127 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;as_tbl_graph()&lt;/code&gt; function splits the &lt;em&gt;routes&lt;/em&gt; table into two:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Node Data - Contains all of the unique values found in the &lt;em&gt;from&lt;/em&gt; and &lt;em&gt;to&lt;/em&gt; variables. In this case, it is a table with a single column containing the names of all of the stations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Edge Data - Is a table of all relationships between &lt;em&gt;from&lt;/em&gt; and &lt;em&gt;to&lt;/em&gt;. A peculiarity of &lt;code&gt;tidygraph&lt;/code&gt; is that it uses the row position of the node as the identifier for &lt;em&gt;from&lt;/em&gt; and &lt;em&gt;to&lt;/em&gt;, instead of its original name.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Another interesting thing about &lt;code&gt;tidygraph&lt;/code&gt; is that it allows us to attach more information about the node or edge in an additional column. In this case, &lt;em&gt;journey_time&lt;/em&gt; is not really needed to create the graph table, but it may be needed for the analysis we plan to perform. The &lt;code&gt;as_tbl_graph()&lt;/code&gt; function automatically created the column for us.&lt;/p&gt;
&lt;p&gt;Thinking about &lt;em&gt;graph_routes&lt;/em&gt; as two &lt;code&gt;tibbles&lt;/code&gt; inside a larger table graph, was one of the two major mental breakthroughs I had during this exercise. At that point, it became evident that &lt;code&gt;dplyr&lt;/code&gt; needs a way to know which of the two tables (nodes or edges) to perform the transformations on. In &lt;code&gt;tidygraph&lt;/code&gt;, this is done using the &lt;code&gt;activate()&lt;/code&gt; function. To showcase this, the nodes table will be “activated” in order to add two new string variables derived from &lt;em&gt;name&lt;/em&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(stringr)

graph_routes &amp;lt;- graph_routes %&amp;gt;%
  activate(nodes) %&amp;gt;%
  mutate(
    title = str_to_title(name),
    label = str_replace_all(title, &amp;quot; &amp;quot;, &amp;quot;\n&amp;quot;)
    )

graph_routes&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 3 (active)
##   name                title               label                   
##   &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;                   
## 1 AIX EN PROVENCE TGV Aix En Provence Tgv &amp;quot;Aix\nEn\nProvence\nTgv&amp;quot;
## 2 ANGERS SAINT LAUD   Angers Saint Laud   &amp;quot;Angers\nSaint\nLaud&amp;quot;   
## 3 ANGOULEME           Angouleme           Angouleme               
## 4 ANNECY              Annecy              Annecy                  
## 5 ARRAS               Arras               Arras                   
## 6 AVIGNON TGV         Avignon Tgv         &amp;quot;Avignon\nTgv&amp;quot;          
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
##    from    to journey_time
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;        &amp;lt;dbl&amp;gt;
## 1     1    39        186. 
## 2     2    40         97.5
## 3     3    40        146. 
## # … with 127 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It was really impressive how easy it was to manipulate the graph table, because once one of the two tables are activated, all of the changes can be made using &lt;code&gt;tidyverse&lt;/code&gt; tools. The same approach can be used to extract data from the graph table. In this case, a list of all the stations is pulled into a single character vector.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;stations &amp;lt;- graph_routes %&amp;gt;%
  activate(nodes) %&amp;gt;%
  pull(title)

stations&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;Aix En Provence Tgv&amp;quot;            &amp;quot;Angers Saint Laud&amp;quot;             
##  [3] &amp;quot;Angouleme&amp;quot;                      &amp;quot;Annecy&amp;quot;                        
##  [5] &amp;quot;Arras&amp;quot;                          &amp;quot;Avignon Tgv&amp;quot;                   
##  [7] &amp;quot;Barcelona&amp;quot;                      &amp;quot;Bellegarde (Ain)&amp;quot;              
##  [9] &amp;quot;Besancon Franche Comte Tgv&amp;quot;     &amp;quot;Bordeaux St Jean&amp;quot;              
## [11] &amp;quot;Brest&amp;quot;                          &amp;quot;Chambery Challes Les Eaux&amp;quot;     
## [13] &amp;quot;Dijon Ville&amp;quot;                    &amp;quot;Douai&amp;quot;                         
## [15] &amp;quot;Dunkerque&amp;quot;                      &amp;quot;Francfort&amp;quot;                     
## [17] &amp;quot;Geneve&amp;quot;                         &amp;quot;Grenoble&amp;quot;                      
## [19] &amp;quot;Italie&amp;quot;                         &amp;quot;La Rochelle Ville&amp;quot;             
## [21] &amp;quot;Lausanne&amp;quot;                       &amp;quot;Laval&amp;quot;                         
## [23] &amp;quot;Le Creusot Montceau Montchanin&amp;quot; &amp;quot;Le Mans&amp;quot;                       
## [25] &amp;quot;Lille&amp;quot;                          &amp;quot;Lyon Part Dieu&amp;quot;                
## [27] &amp;quot;Macon Loche&amp;quot;                    &amp;quot;Madrid&amp;quot;                        
## [29] &amp;quot;Marne La Vallee&amp;quot;                &amp;quot;Marseille St Charles&amp;quot;          
## [31] &amp;quot;Metz&amp;quot;                           &amp;quot;Montpellier&amp;quot;                   
## [33] &amp;quot;Mulhouse Ville&amp;quot;                 &amp;quot;Nancy&amp;quot;                         
## [35] &amp;quot;Nantes&amp;quot;                         &amp;quot;Nice Ville&amp;quot;                    
## [37] &amp;quot;Nimes&amp;quot;                          &amp;quot;Paris Est&amp;quot;                     
## [39] &amp;quot;Paris Lyon&amp;quot;                     &amp;quot;Paris Montparnasse&amp;quot;            
## [41] &amp;quot;Paris Nord&amp;quot;                     &amp;quot;Paris Vaugirard&amp;quot;               
## [43] &amp;quot;Perpignan&amp;quot;                      &amp;quot;Poitiers&amp;quot;                      
## [45] &amp;quot;Quimper&amp;quot;                        &amp;quot;Reims&amp;quot;                         
## [47] &amp;quot;Rennes&amp;quot;                         &amp;quot;Saint Etienne Chateaucreux&amp;quot;    
## [49] &amp;quot;St Malo&amp;quot;                        &amp;quot;St Pierre Des Corps&amp;quot;           
## [51] &amp;quot;Strasbourg&amp;quot;                     &amp;quot;Stuttgart&amp;quot;                     
## [53] &amp;quot;Toulon&amp;quot;                         &amp;quot;Toulouse Matabiau&amp;quot;             
## [55] &amp;quot;Tourcoing&amp;quot;                      &amp;quot;Tours&amp;quot;                         
## [57] &amp;quot;Valence Alixan Tgv&amp;quot;             &amp;quot;Vannes&amp;quot;                        
## [59] &amp;quot;Zurich&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;visualizing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Visualizing&lt;/h2&gt;
&lt;p&gt;In graphs, the absolute position of the each node is not as relevant as it is with other kinds of visualizations. A very minimal &lt;code&gt;ggplot2&lt;/code&gt; theme is set to make it easier to view the plotted graph.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggplot2)

thm &amp;lt;- theme_minimal() +
  theme(
    legend.position = &amp;quot;none&amp;quot;,
     axis.title = element_blank(),
     axis.text = element_blank(),
     panel.grid = element_blank(),
     panel.grid.major = element_blank(),
  ) 

theme_set(thm)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To create the plot, start with &lt;code&gt;ggraph()&lt;/code&gt; instead of &lt;code&gt;ggplot2()&lt;/code&gt;. The &lt;code&gt;ggraph&lt;/code&gt; package contains &lt;code&gt;geoms&lt;/code&gt; that are unique to graph analysis. The package contains &lt;code&gt;geoms&lt;/code&gt; to specifically plot nodes, and other &lt;code&gt;geoms&lt;/code&gt; for edges.&lt;/p&gt;
&lt;p&gt;As a first basic test, the &lt;em&gt;point&lt;/em&gt; &lt;code&gt;geom&lt;/code&gt; will be used, but instead of calling&lt;code&gt;geom_point()&lt;/code&gt;, we call &lt;code&gt;geom_node_point()&lt;/code&gt;. The edges are plotted using &lt;code&gt;geom_edge_diagonal()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggraph) 

graph_routes %&amp;gt;%
  ggraph(layout = &amp;quot;kk&amp;quot;) +
    geom_node_point() +
    geom_edge_diagonal() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;To make it easier to see where each station is placed in this plot, the &lt;code&gt;geom_node_text()&lt;/code&gt; is used. Just as with regular &lt;code&gt;geoms&lt;/code&gt; in &lt;code&gt;ggplot2&lt;/code&gt;, other attributes such as &lt;code&gt;size&lt;/code&gt;, &lt;code&gt;color&lt;/code&gt;, and &lt;code&gt;alpha&lt;/code&gt; can be modified.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;graph_routes %&amp;gt;%
  ggraph(layout = &amp;quot;kk&amp;quot;) +
    geom_node_text(aes(label = label, color = name), size = 3) +
    geom_edge_diagonal(color = &amp;quot;gray&amp;quot;, alpha = 0.4) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;morphing-time&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Morphing time!&lt;/h2&gt;
&lt;p&gt;The second mental leap was understanding how a graph algorithm is applied. Typically, the output of a model function is a model object, not a data object. With &lt;code&gt;tidygraph&lt;/code&gt;, the process begins and ends with a graph table. The steps are these:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Start with a graph table&lt;/li&gt;
&lt;li&gt;Temporarily transform the graph to comply with the model that is requested (&lt;code&gt;morph()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Add additional transformations to the morphed data using &lt;code&gt;dplyr&lt;/code&gt; (optional)&lt;/li&gt;
&lt;li&gt;Restore the original graph table, but modified to keep the changes made during the morph&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The shortest path algorithm defines the “length” as the number of edges in between two nodes. There may be multiple routes to get from point A to point B, but the algorithm chooses the one with the fewest number of “hops”. The way to call the algorithm is inside the &lt;code&gt;morph()&lt;/code&gt; function. Even though &lt;code&gt;to_shortest_path()&lt;/code&gt; is a function in itself, and it is possible run it without &lt;code&gt;morph()&lt;/code&gt;, it is not meant to be used that way. In the example, the &lt;em&gt;journey_time&lt;/em&gt; is used as &lt;code&gt;weights&lt;/code&gt; to help the algorithm find an optimal route between the &lt;em&gt;Arras&lt;/em&gt; and the &lt;em&gt;Nancy&lt;/em&gt; stations. The print output of the morphed graph will not be like the original graph table.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;from &amp;lt;- which(stations == &amp;quot;Arras&amp;quot;)
to &amp;lt;-  which(stations == &amp;quot;Nancy&amp;quot;)

shortest &amp;lt;- graph_routes %&amp;gt;%
  morph(to_shortest_path, from, to, weights = journey_time)

shortest&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tbl_graph temporarily morphed to a shortest path representation
## # 
## # Original graph is a directed simple graph with 1 component
## # consisting of 59 nodes and 130 edges&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It is possible to make more transformations with the use of &lt;code&gt;activate()&lt;/code&gt; and &lt;code&gt;dplyr&lt;/code&gt; functions. The results can be previewed, or committed back to the original R variable using &lt;code&gt;unmorph()&lt;/code&gt;. By default, nodes are active in a morphed graph, so there is no need to set that explicitly.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest %&amp;gt;%
  mutate(selected_node = TRUE) %&amp;gt;%
  unmorph()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 4 (active)
##   name               title              label                 selected_node
##   &amp;lt;chr&amp;gt;              &amp;lt;chr&amp;gt;              &amp;lt;chr&amp;gt;                 &amp;lt;lgl&amp;gt;        
## 1 AIX EN PROVENCE T… Aix En Provence T… &amp;quot;Aix\nEn\nProvence\n… NA           
## 2 ANGERS SAINT LAUD  Angers Saint Laud  &amp;quot;Angers\nSaint\nLaud&amp;quot; NA           
## 3 ANGOULEME          Angouleme          Angouleme             NA           
## 4 ANNECY             Annecy             Annecy                NA           
## 5 ARRAS              Arras              Arras                 TRUE         
## 6 AVIGNON TGV        Avignon Tgv        &amp;quot;Avignon\nTgv&amp;quot;        NA           
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
##    from    to journey_time
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;        &amp;lt;dbl&amp;gt;
## 1     1    39        186. 
## 2     2    40         97.5
## 3     3    40        146. 
## # … with 127 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While it was morphed, only the few nodes that make up the connections between the Arras and Nancy stations were selected. A simple &lt;code&gt;mutate()&lt;/code&gt; adds a new variable called &lt;em&gt;selected_node&lt;/em&gt;, which tags those nodes with TRUE. The new variable and value is retained once the rest of the nodes are restored via the &lt;code&gt;unmorph()&lt;/code&gt; command.&lt;/p&gt;
&lt;p&gt;To keep the change, the &lt;em&gt;shortest&lt;/em&gt; variable is updated with the changes made to both edges and nodes.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest &amp;lt;- shortest %&amp;gt;%
  mutate(selected_node = TRUE) %&amp;gt;%
  activate(edges) %&amp;gt;%
  mutate(selected_edge = TRUE) %&amp;gt;%
  unmorph() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next step is to coerce each NA into a 1, and the shortest route into a 2. This will allow us to easily re-arrange the order that the edges are drawn in the plot, ensuring that the route will be drawn at the top.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest &amp;lt;- shortest %&amp;gt;%
  activate(nodes) %&amp;gt;%
  mutate(selected_node = ifelse(is.na(selected_node), 1, 2)) %&amp;gt;%
  activate(edges) %&amp;gt;%
  mutate(selected_edge = ifelse(is.na(selected_edge), 1, 2)) %&amp;gt;%
  arrange(selected_edge)

shortest&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Edge Data: 130 x 4 (active)
##    from    to journey_time selected_edge
##   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;        &amp;lt;dbl&amp;gt;         &amp;lt;dbl&amp;gt;
## 1     1    39        186.              1
## 2     2    40         97.5             1
## 3     3    40        146.              1
## 4     4    39        225.              1
## 5     6    39        161.              1
## 6     7    39        358.              1
## # … with 124 more rows
## #
## # Node Data: 59 x 4
##   name               title              label                 selected_node
##   &amp;lt;chr&amp;gt;              &amp;lt;chr&amp;gt;              &amp;lt;chr&amp;gt;                         &amp;lt;dbl&amp;gt;
## 1 AIX EN PROVENCE T… Aix En Provence T… &amp;quot;Aix\nEn\nProvence\n…             1
## 2 ANGERS SAINT LAUD  Angers Saint Laud  &amp;quot;Angers\nSaint\nLaud&amp;quot;             1
## 3 ANGOULEME          Angouleme          Angouleme                         1
## # … with 56 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A simple way to plot the route is to use the &lt;em&gt;selected_&lt;/em&gt; variables to modify the &lt;code&gt;alpha&lt;/code&gt;. This will highlight the shortest path, without completely removing the other stations. This is a personal design choice, so experimenting with different ways of highlighting the results is always recommended.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest %&amp;gt;%
  ggraph(layout = &amp;quot;kk&amp;quot;) +
    geom_edge_diagonal(aes(alpha = selected_edge), color = &amp;quot;gray&amp;quot;) +
    geom_node_text(aes(label = label, color =name, alpha = selected_node ), size = 3) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;selected_&lt;/em&gt; fields can also be used in other &lt;code&gt;dplyr&lt;/code&gt; functions to analyze the results. For example, to know the aggregate information about the trip, &lt;em&gt;selected_edge&lt;/em&gt; is used to filter the edges, and then the totals can be calculated. There is no &lt;code&gt;summarise()&lt;/code&gt; function for graph tables; this make sense because the graph table would become a summarized table with such a function. Since the end result we seek is a total rather than another graph table, a simple &lt;code&gt;as_tibble()&lt;/code&gt; command will coerce the edges, which will then allows us to finish the calculation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest %&amp;gt;%
  activate(edges) %&amp;gt;%
  filter(selected_edge == 2) %&amp;gt;%
  as_tibble() %&amp;gt;%
  summarise(
    total_stops = n() - 1,
    total_time = round(sum(journey_time) / 60)
    )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 2
##   total_stops total_time
##         &amp;lt;dbl&amp;gt;      &amp;lt;dbl&amp;gt;
## 1           8         23&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;re-using-the-code&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Re-using the code&lt;/h2&gt;
&lt;p&gt;To compile most of the code in a single chunk, here is an example of how to re-run the shortest path for a different set of stations: the Laval and Montpellier stations.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;from &amp;lt;- which(stations == &amp;quot;Montpellier&amp;quot;)
to &amp;lt;-  which(stations == &amp;quot;Laval&amp;quot;)

shortest &amp;lt;- graph_routes %&amp;gt;%
  morph(to_shortest_path, from, to, weights = journey_time) %&amp;gt;%
  mutate(selected_node = TRUE) %&amp;gt;%
  activate(edges) %&amp;gt;%
  mutate(selected_edge = TRUE) %&amp;gt;%
  unmorph() %&amp;gt;%
  activate(nodes) %&amp;gt;%
  mutate(selected_node = ifelse(is.na(selected_node), 1, 2)) %&amp;gt;%
  activate(edges) %&amp;gt;%
  mutate(selected_edge = ifelse(is.na(selected_edge), 1, 2)) %&amp;gt;%
  arrange(selected_edge)

shortest %&amp;gt;%
  ggraph(layout = &amp;quot;kk&amp;quot;) +
    geom_edge_diagonal(aes(alpha = selected_edge), color = &amp;quot;gray&amp;quot;) +
    geom_node_text(aes(label = label, color =name, alpha = selected_node ), size = 3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Additional, the same code can be recycled to obtain the trip summarized data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;shortest %&amp;gt;%
  activate(edges) %&amp;gt;%
  filter(selected_edge == 2) %&amp;gt;%
  as_tibble() %&amp;gt;%
  summarise(
    total_stops = n() - 1,
    total_time = round(sum(journey_time) / 60)
    )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 2
##   total_stops total_time
##         &amp;lt;dbl&amp;gt;      &amp;lt;dbl&amp;gt;
## 1           3         10&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;shiny-app&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Shiny app&lt;/h2&gt;
&lt;p&gt;To see how to use this kind of analysis inside Shiny, please refer to &lt;a href=&#34;https://beta.rstudioconnect.com/content/4606/&#34;&gt;this application&lt;/a&gt;. It lets the user select two stations, and it returns the route, plus the summarized data. The source code is embedded in the app.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
