Dr. Maria Prokofieva is a member of the R / Business working group which is promoting the use of R in accounting, auditing, and actuarial work. She is also a professor at the Victoria University Business School in Australia and works with CPA Australia.
Let’s talk about cash flows!
We continue our series of actuarial and accounting posts where we strive to interest the data science community in using their R skills for the actuarial and accounting applications. We also hope to help some actuaries and accountants with their R skills.
Cash flow is a concept that is important to every business and household. It is difficult to over estimate the importance of disclosing and properly communicating the cash flow as an essential of running any enterprise. This is especially true in times of recession when cash flow is the elephant in the room. Nobody really wants to talk about it, but making cash flow analysis easier to understand and interpret would help almost everyone. To get the maximum impact from a cash flow analysis it is not only important to demonstrate that it has been truly and fairly reported (in accounting terminology), but also to help consumers of cash flow reports intuitively grasp the the meaning of what the numbers are showing.
So what is the problem with the way we normally communicate information about cash flow? Well, look at the image below and check your pulse. How excited did you get? Can you make any sense of it at all with just a flash look?
Actually, it’s incredibly insightful if you know what to look for, but for an non-accounting person (with powers to make decisions), it may not be.
Now look at this example\(^1\) of a personal cash flow!
These numbers start talking to me in one insightful flow showing in a second where all the money goes! We all know that a picture is worth a thousand words, but if its millions of dollars, it’s truly priceless.
This type of chart is called a sankey chart. Sankey charts are used to show material, energy and cost flows. You can view some awesome visualizations made with the
networkD3 package here. So, showing cash flow is a perfect example of what they were designed for. It is not all common in the actuarial or accounting worlds, but, honestly, it is not that common to see any visualizations in financial reports, - well, apart from happy customers in annual reports which, as we all know, is a marketing thing.
Let’s take a look at how the flow diagram above might be reproduced in R. There are several ways to do this and several packages (for example:
diagrammerR) available to help. We will use a very simple dataset, but it could be easily extended to include additional transactional data: different sources of income, types of income, etc. We could also look at inflows and outflows, classify them into types of business activities and automate this tedious, manual task to present a nice chart to our stakeholders.
But. let’s start with … this simple one! And let’s use the
tidyverse.quiet = TRUE networkD3.quiet = TRUE library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4 ## ✔ tibble 3.1.7 ✔ dplyr 1.0.9 ## ✔ tidyr 1.2.0 ✔ stringr 1.4.0 ## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag()
Sankey diagrams visualize flows as connections from one node to another. In our case, the flows are distinct types of cash income that will flow into expenses.
For our example, we will use a simple data set with data that has already been classified into
data <- read_csv("cf_up.csv", show_col_types = FALSE) head(data)
## # A tibble: 6 × 3 ## source target value ## <chr> <chr> <dbl> ## 1 Salary Earned Income 494 ## 2 Salary Earned Income 677 ## 3 Salary Earned Income 758 ## 4 Salary Earned Income 933 ## 5 Salary Earned Income 649 ## 6 Salary Earned Income 825
sources are nodes that initiate the flow and targets are the nodes that finish the flow.
In this particular example, the amount of cash receipts have been matched with a particular cash spending type. In real life, you would not expect this data matching to have already been done for you as most likely your cash flow records just show type of cash flow (cash receipt vs. cash payment) and the amount. We will save the data munging required to transform a typical cash flow dataset into something that can be used to build the Sankey diagram for a future post.
data is a links dataframe with source, target and value columns. For example, $494 of salary was used to pay tax: salary is the earned income and payment of tax is a deduction. In this case the source is the Salary item, the target is Earned income and the value is 494. Earned incomeis also a source node that has Income (i.e. a broader category) target. Income is then distributed to expenses, i.e target category Deduction which is the later becomes a source node for Income tax target.
Here are the unique sources in our data:
##  "Salary" "Credit Card Reward" "Dividends" ##  "Interest" "Earned Income" "Passive Income" ##  "Passive income" "Income" "Deduction" ##  "Core Expenses" "Financial Independence" "Disposable Income"
And, here are the unique targets.
##  "Earned Income" "Passive Income" "Passive income" ##  "Income" "Deduction" "Core Expenses" ##  "Financial Independence" "Disposable Income" "Income Tax" ##  "Social Justice" "Bill Expenses" "Food" ##  "Personal Care" "Transportation" "Pension" ##  "Investment" "Real Estate" "Emergency Fund" ##  "Leisure"
To create a cash flow diagram with the
sankeyNetwork() function from the
networkD3 package we also need an additional data frame,
nodes which contains the list of receipts, types of receipts, types of spending and cash payments.
nodes <- data.frame( name=c(as.character(data$source), as.character(data$target)) %>% unique() ) head(nodes)
## name ## 1 Salary ## 2 Credit Card Reward ## 3 Dividends ## 4 Interest ## 5 Earned Income ## 6 Passive Income
Also, because we cannot use nodes names in the function, we need to replace our connections with id, i.e.
data$IDsource <- match(data$source, nodes$name)-1 data$IDtarget <- match(data$target, nodes$name)-1
Now, we can draw our cash flow diagram.
p <- sankeyNetwork(Links = data, Nodes = nodes, Source = "IDsource", Target = "IDtarget", Value = "value", NodeID = "name", units = '$', sinksRight=TRUE, # LinkGroup="source" fontSize = 10, nodeWidth = 30 )
## Links is a tbl_df. Converting to a plain data frame.
There it is! In just a few lines of code you can turn a table that only an expert can read into a visualization that might capture the imagination of someone forced to read a financial document.
- PPCexpo. How to Create a Cash Flow Chart? Easy to Follow Steps. Available at https://ppcexpo.com/blog/cash-flow-chart , Aug 31, 2022