Reproducible Finance with R: ETF Country Exposure

2017-02-08

A Reproducible Finance with R Post by Jonathan Regenstein

Today, we are going to tackle a project that has long been on my wishlist - a Shiny app to take a fund or portfolio, analyze its exposure to different countries and display those exposures on a world map. Now you know how exciting my wishlists are.

Before describing our data importing/wrangling work here in the Notebook, it mgight be helpful to look at where we’re headed. The final Shiny app is here. This is similar to a previous project because we are building a leaflet map, shading according to data that we add to the spatial dataframe, and including another HTML widget that is reponsive to the map. However, our current project differs in important ways and has a completely different use.

The previous project allowed user to click a country on the map and view the time series of returns. Our current project will allow the user to choose an ETF and view how that ETF is invested in different countries by how a world map is shaded.

From a substantive perspective, this app helps visualize country risks instead of returns - indeed, it’s the first in our series that does not import stock returns in any way. From an R perspective, in our current project the map is the responsive object according to user inputs, whereas before the dygraph was the responsive object according to user clicks on a map. They are related and require spatial dataframes but very different.

If you looked closely at the Shiny app, you noticed that we do have a data object that os responsive to a map click: we display a datatable of companies held by the ETF in whatever country is clicked. That is, if a user chooses an ETF and sees by the shading that the ETF is allocated X% to China, the user can click on the map to see which companis the ETF owns in China. That functionality is similar to the dygraphs functionality, except of course we have to wire up a datatable and do some filtering by country instead of passing an xts object to dygraphs. The fulcrum will still be the clicked map shape.

Alright, that app is what we’re ultimately building but, by way of what we’ll do in this Notebook, here’s the roadmap.

First, we are going to grab the data for one fund, the MSCI Emerging Markets ETF. Note that we are not going to get return data over time. Instead, we just want a snapshot of the ETF holdings: its constituents, their weights and their home countries. Our eventual app will include several ETFs but we are going to work with one ETF in this Notebook, with the foreknowledge that we want to reuse our steps when it’s time to build the Shiny app. In short, let’s get it right for this Emerging Markets ETF and then we can iterate over other ETFs when we move to building our Shiny app.

After we download the snapshot of the emerging markets fund, we will do some wrangling, some country weight aggregation, and then merge that data to our spatial dataframe. Adding that data will depend on the ETF using the same country naming convention as our spatial dataframe, so we’ll pay attention to that in the wrangling process.

Once we add the data to our spatial dataframe, we will recycle some old code, build a leaflet map and shade it according to the ETF’s country exposure. This is just a test to see how things will look in the Shiny app and we can even play around with different color palettes to get things just right.

Once we have the map aesthetics sorted, we’ll turn to part two: displaying the details of each country holding. Really this is just filtering our dataframe by country name - whatever country the user clicks - but we’ll go ahead and make sure things look how we want in this Notebook and then pass that object to our app eventually.

Let’s get to it!

First, let’s grab the fund data from MSCI’s homepage. We will use the read_csv() function from the readr package. We will title it emerging_markets_fund since we’ll be pulling in other funds later.

Note that we have to skip the first 10 rows, which is why the ‘skip = 10’ argument is included. That’s because this csv file is loaded with oddly formatted data in the first 10 rows. If we don’t skip those 11 rows, this import will be totally unhelpful. The ‘import dataset’ button in the IDE saved me minutes/hours of frustration here! (Author’s note: when this post was originally published, this argument was “skip = 11”. A few weeks later, ishares changed the structure of their spreadsheet. Now it’s 10 rows that need to be skipped!)

# Download the Emerging Markets ETF fund data from the ishares homepage.

emerging_market_fund <- read_csv("https://www.ishares.com/us/products/239637/ishares-msci-emerging-markets-etf/1467271812596.ajax?fileType=csv&fileName=EEM_holdings&dataType=fund", 
    skip = 10)

# Take a look at the fund snapshot we just imported. 
head(emerging_market_fund)

## # A tibble: 6 × 16
##   Ticker                               Name `Asset Class` `Weight (%)`
##    <chr>                              <chr>         <chr>        <dbl>
## 1 005930            SAMSUNG ELECTRONICS LTD        Equity       4.1165
## 2    700               TENCENT HOLDINGS LTD        Equity       3.6649
## 3   2330 TAIWAN SEMICONDUCTOR MANUFACTURING        Equity       3.5112
## 4   BABA ALIBABA GROUP HOLDING ADR REPRESEN        Equity       2.7304
## 5    NPN              NASPERS LIMITED N LTD        Equity       1.7600
## 6    939     CHINA CONSTRUCTION BANK CORP H        Equity       1.5572
## # ... with 12 more variables: Price <dbl>, Shares <dbl>, `Market
## #   Value` <chr>, `Notional Value` <chr>, Sector <chr>, SEDOL <chr>,
## #   ISIN <chr>, Exchange <chr>, Country <chr>, Currency <chr>, `Market
## #   Currency` <chr>, `FX Rate` <dbl>

Alright, we have our fund data and now the wrangling begins. We are actually going to use this initial object to create two other objects: one will be merged with the spatial dataframe and one will be a standalone object to be loaded in our Shiny app.

emerging_market_fund_country_weights <- emerging_market_fund %>% 
  select(Country, `Weight (%)`) %>%
  mutate(Country = replace(Country, Country == "Russian Federation", "Russia"), 
         Country = replace(Country, Country == "Korea (South)", "Korea"),
         Country = replace(Country, Country == "KO", "Korea"),
         Country = replace(Country, Country == "Czech Republic", "Czech Rep.")
         ) %>%
  group_by(Country) %>% 
  summarise(EEM = sum(`Weight (%)`, na.rm = TRUE)) %>% 
  rename(name = Country) %>% 
  filter(EEM > 0) %>% 
  filter(name != "-") %>% 
  arrange(desc(EEM))

# Let's just take a look at the data.

emerging_market_fund_country_weights

## # A tibble: 28 × 2
##            name     EEM
##           <chr>   <dbl>
## 1         China 26.2637
## 2         Korea 14.9220
## 3        Taiwan 12.1916
## 4         India  8.5008
## 5        Brazil  7.4449
## 6  South Africa  6.7359
## 7        Russia  3.7714
## 8        Mexico  3.6169
## 9     Indonesia  2.4890
## 10     Malaysia  2.4130
## # ... with 18 more rows

Those country weights are pretty striking. China + Korea + Taiwan equal 51% of this fund - quite concentrated in economies that are probably closely linked. Perhaps that’s by design? Perhaps the inter-economy correlation isn’t as high as I believe? A cross-border investment or trade Shiny app would be helpful here.

It’s worth a second to consider the definition of ‘emerging market’, a term that has become quite ubiquitous and has a know-it-when-we-see-it feel (if you’re not into political economy, feel free to skip this paragraph). The phrase was coined in 1981 by the World Bank’s Antoine Van Agtmael to help encourage investment in developing nations as he felt that ‘Third World’ country was both distasteful and stifling to investors. Learn more here. Today, the phrase connotes an economy that is growing and transitioning from developed to developing, though some commentators include a political transition as well. Since we are working with an MSCI fund, we should consider their definition. It wasn’t easy to track down, but according to the Financial Times, MSCI takes into account number of listed companies of a certain size (an economic measure) and openness to foreign capital (a political measure).

Back to our task at hand: we have downloaded the fund data and got it into shape to be added to our shapefile. That process is the exact same as in our previous post so before we do that, let’s use the original fund data to create one other object, to store country level detail on companies, weights and sectors. If that seems a bit confusing, head back to the Shiny app and click on a country. The datatable displays company names and details, and we need to create a dataframe to extract and hold that data.

# Wrangle for the datatable to show company names, sector, weights.

EEM <- emerging_market_fund %>% 
  select(Name, Country, Sector, `Weight (%)`, `Market Value`) %>% 
  mutate(Country = replace(Country, Country == "Russian Federation", "Russia"), 
         Country = replace(Country, Country == "Korea (South)", "Korea"),
         Country = replace(Country, Country == "Czech Republic", "Czech Rep.")
         ) %>% 
  filter(`Weight (%)` > 0) %>% 
  filter(Country != "-") 

# Let's test it on Brazil to make sure it works.

Brazil_companies <- EEM %>%
  filter(Country == "Brazil") %>% 
  ungroup() %>% 
  select(-Country)

Brazil_companies

## # A tibble: 59 × 4
##                                  Name           Sector `Weight (%)`
##                                 <chr>            <chr>        <dbl>
## 1       ITAU UNIBANCO HOLDING PREF SA       Financials       0.8775
## 2              BANCO BRADESCO PREF SA       Financials       0.6219
## 3                            AMBEV SA Consumer Staples       0.5881
## 4                        VALE PREF SA        Materials       0.3817
## 5         PETROLEO BRASILEIRO PREF SA           Energy       0.3808
## 6                           PETROBRAS           Energy       0.3028
## 7             CIA VALE DO RIO DOCE SH        Materials       0.2701
## 8   ITAUSA INVESTIMENTOS ITAU PREF SA       Financials       0.2668
## 9  BMF BOVESPA BOLSA DE VALORES MERCA       Financials       0.2312
## 10                BANCO DO BRASIL S/A       Financials       0.2039
## # ... with 49 more rows, and 1 more variables: `Market Value` <chr>

This is what a user of our Shiny app will see upon clicking on Brazil, it is the country level detail of how the fund is invested in Brazil. We will save that ‘EEM’ object in the .RDat file so it can be loaded into our Shiny app.

Okay, let’s go ahead and build that map of the world and add our fund country weights to it. This process is identical to how we did it here, but we’ll go through the steps again.

First, let’s download the spatial dataframe. We will also use the ms_simplify() function from rmapshaper to reduce the size of the dataframe. This function will reduce the number of longitude and latitude coordinates used to build each country. It will make loading faster in our Shiny app but won’t affect any of our logic.

library(rgdal)
library(rmapshaper)
library(httr)

# Load the spatial dataframe
tmp_zipped <- tempfile()
tmp_unzipped <- tempfile()

httr::GET("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip", write_disk(tmp_zipped))

## Response [http://naciscdn.org/naturalearth/50m/cultural/ne_50m_admin_0_countries.zip]
##   Date: 2017-03-24 17:31
##   Status: 200
##   Content-Type: application/x-zip-compressed
##   Size: 818 kB
## <ON DISK>  /var/folders/np/13n2s4g91vsgv7y432b5wf_00000gn/T//RtmplAXIku/file106cc57ce68bd

unzip(tmp_zipped, exdir = tmp_unzipped)

world <- readOGR(tmp_unzipped, 'ne_50m_admin_0_countries', verbose = FALSE)

# Use ms_simplify to reduce the size of the spatial dataframe.
world <- ms_simplify(world)

Now we will use the merge() function from the sp package to add our country weight data. Remember above where we made sure to use a consistent country naming convention when wrangling the ETF data? This is where it will come in handy - we use the ‘name’ column to perform the merge. After the merging, ETF exposures will be added for each country that has a match in the ‘name’ column. For those with no match, the EEM column will be filled with NA.

library(sp)
library(leaflet)

world_fund_country_weights <- merge(world, emerging_market_fund_country_weights, by = "name") 

# Let's take a look and make sure that merge accomplished what we want. There will be
# several NA values, because this fund is allocated only to emerging markets. The developed markets
# should be mostly NAs - Australia is immediately visible to us as an NA.
world_fund_country_weights@data[c('name', 'EEM')]

##                     name     EEM
## 1            Afghanistan      NA
## 4                 Angola      NA
## 2                Albania      NA
## 162 United Arab Emirates  0.7672
## 6              Argentina      NA
## 7                Armenia      NA
## 5             Antarctica      NA
## 8              Australia      NA
## 9                Austria      NA
## 10            Azerbaijan      NA
## 24               Burundi      NA
## 13               Belgium      NA
## 15                 Benin      NA
## 23          Burkina Faso      NA
## 11            Bangladesh      NA
## 22              Bulgaria      NA
## 18      Bosnia and Herz.      NA
## 12               Belarus      NA
## 14                Belize      NA
## 17               Bolivia      NA
## 20                Brazil  7.4449
## 21                Brunei      NA
## 16                Bhutan      NA
## 19              Botswana      NA
## 28  Central African Rep.      NA
## 27                Canada      NA
## 149          Switzerland  0.0814
## 30                 Chile  1.2210
## 31                 China 26.2637
## 35         Côte d'Ivoire      NA
## 26              Cameroon      NA
## 40       Dem. Rep. Congo      NA
## 33                 Congo      NA
## 32              Colombia  0.4224
## 34            Costa Rica      NA
## 37                  Cuba      NA
## 107            N. Cyprus      NA
## 38                Cyprus      NA
## 39            Czech Rep.  0.1802
## 58               Germany      NA
## 43              Djibouti      NA
## 42               Denmark      NA
## 44        Dominican Rep.      NA
## 3                Algeria      NA
## 45               Ecuador      NA
## 46                 Egypt  0.1191
## 49               Eritrea      NA
## 143                Spain      NA
## 50               Estonia      NA
## 51              Ethiopia      NA
## 54               Finland      NA
## 53                  Fiji      NA
## 52          Falkland Is.      NA
## 55                France      NA
## 56                 Gabon      NA
## 163       United Kingdom      NA
## 57               Georgia      NA
## 59                 Ghana      NA
## 63                Guinea      NA
## 64         Guinea-Bissau      NA
## 48            Eq. Guinea      NA
## 60                Greece  0.3025
## 61             Greenland      NA
## 62             Guatemala      NA
## 65                Guyana      NA
## 67              Honduras      NA
## 36               Croatia      NA
## 66                 Haiti      NA
## 68               Hungary  0.2818
## 71             Indonesia  2.4890
## 70                 India  8.5008
## 74               Ireland      NA
## 72                  Iran      NA
## 73                  Iraq      NA
## 69               Iceland      NA
## 75                Israel      NA
## 76                 Italy      NA
## 77               Jamaica      NA
## 79                Jordan      NA
## 78                 Japan      NA
## 136      Siachen Glacier      NA
## 80            Kazakhstan      NA
## 81                 Kenya      NA
## 85            Kyrgyzstan      NA
## 25              Cambodia      NA
## 82                 Korea 14.9220
## 83                Kosovo      NA
## 84                Kuwait      NA
## 86               Lao PDR      NA
## 88               Lebanon      NA
## 90               Liberia      NA
## 91                 Libya      NA
## 144            Sri Lanka      NA
## 89               Lesotho      NA
## 92             Lithuania      NA
## 93            Luxembourg      NA
## 87                Latvia      NA
## 104              Morocco      NA
## 101              Moldova      NA
## 95            Madagascar      NA
## 100               Mexico  3.6169
## 94             Macedonia      NA
## 98                  Mali      NA
## 106              Myanmar      NA
## 103           Montenegro      NA
## 102             Mongolia      NA
## 105           Mozambique      NA
## 99            Mauritania      NA
## 96                Malawi      NA
## 97              Malaysia  2.4130
## 108              Namibia      NA
## 111        New Caledonia      NA
## 114                Niger      NA
## 115              Nigeria      NA
## 113            Nicaragua      NA
## 110          Netherlands  0.3367
## 116               Norway      NA
## 109                Nepal      NA
## 112          New Zealand      NA
## 117                 Oman      NA
## 118             Pakistan      NA
## 120               Panama      NA
## 123                 Peru  0.3722
## 124          Philippines  1.1454
## 121     Papua New Guinea      NA
## 125               Poland  1.2310
## 127          Puerto Rico      NA
## 41       Dem. Rep. Korea      NA
## 126             Portugal      NA
## 122             Paraguay      NA
## 119            Palestine      NA
## 128                Qatar  0.7938
## 129              Romania      NA
## 130               Russia  3.7714
## 131               Rwanda      NA
## 169            W. Sahara      NA
## 133         Saudi Arabia      NA
## 145                Sudan      NA
## 132             S. Sudan      NA
## 134              Senegal      NA
## 137         Sierra Leone      NA
## 47           El Salvador      NA
## 141           Somaliland      NA
## 140              Somalia      NA
## 135               Serbia      NA
## 146             Suriname      NA
## 138             Slovakia      NA
## 139             Slovenia      NA
## 148               Sweden      NA
## 147            Swaziland      NA
## 150                Syria      NA
## 29                  Chad      NA
## 156                 Togo      NA
## 154             Thailand  2.2042
## 152           Tajikistan      NA
## 159         Turkmenistan      NA
## 155          Timor-Leste      NA
## 157              Tunisia      NA
## 158               Turkey  1.0363
## 151               Taiwan 12.1916
## 153             Tanzania      NA
## 160               Uganda      NA
## 161              Ukraine      NA
## 165              Uruguay      NA
## 164        United States  0.4507
## 166           Uzbekistan      NA
## 167            Venezuela      NA
## 168              Vietnam      NA
## 170                Yemen      NA
## 142         South Africa  6.7359
## 171               Zambia      NA
## 172             Zimbabwe      NA

We have our data added to the shapefile. Let’s go ahead and construct a map. First we’ll build a popup to show some detail, then we will create a green palette and a purple palette - for no other reason than to see which is more visually appealing.

# Create a popup to display the exact country weight.
EEMPopup <- paste0("<strong>Country: </strong>", 
                world_fund_country_weights$name,
                "<br><strong> Country Weight: </strong>", 
                world_fund_country_weights$EEM, "%")

# Let's create two palettes so we can look at two variants. This is pure aesthetics, but the Notebook is the
# place for aesthetics.

EEMPalGreens <- colorQuantile("Greens", world_fund_country_weights$EEM, n = 20)
EEMPalPurples <- colorQuantile("Purples", world_fund_country_weights$EEM, n = 20)

Let’s invoke leaflet! As before, we will use ‘layerId = ~name’. This is, again, massively important because when we create a Shiny app, we want to pass country names to our datatable and filter accoringly. The ‘layerId’ is how we’ll do that: when a user clicks on a country, we capture the ‘layerId’, which is a country name that can be used for filtering.

# Build a green shaded map.
leaf_world_emerging_greens <- leaflet(world_fund_country_weights) %>%
  addProviderTiles("CartoDB.Positron") %>% 
  setView(lng =  20, lat =  15, zoom = 2) %>%
      addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = .7,
                  
      # The next line of code is really important for creating the map we want to use later.
      
      color =~EEMPalGreens(EEM), layerId = ~name, popup = EEMPopup)

# Build a purple shaded map.
leaf_world_emerging_purples <- leaflet(world_fund_country_weights) %>%
  addProviderTiles("CartoDB.Positron") %>% 
  setView(lng =  20, lat =  15, zoom = 2) %>%
      addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = .7,
      color =~EEMPalPurples(EEM), layerId = ~name, popup = EEMPopup)

# Let's compare our green shaded map to our purple shaded map.
leaf_world_emerging_greens

leaf_world_emerging_purples

Both those maps look good to me but purple might be the way to go ultimately. That’s a decision for next time - see you then!