<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Neural Networks on R Views</title>
    <link>https://rviews.rstudio.com/tags/neural-networks/</link>
    <description>Recent content in Neural Networks on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 24 Jul 2020 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/neural-networks/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Building A Neural Net from Scratch Using R - Part 2</title>
      <link>https://rviews.rstudio.com/2020/07/24/building-a-neural-net-from-scratch-using-r-part-2/</link>
      <pubDate>Fri, 24 Jul 2020 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2020/07/24/building-a-neural-net-from-scratch-using-r-part-2/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Akshaj is a budding deep learning researcher who loves to work with R. He has worked as a Research Associate at the Indian Institute of Science and as a Data Scientist at KPMG India.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In the previous post, we went through the dataset, the pre-processing involved, train-test split, and talked in detail about the architecture of the model. We started build our neural net chunk-by-chunk and wrote functions for initializing parameters and running forward propagation.&lt;/p&gt;
&lt;p&gt;In the this post, we’ll implement backpropagation by writing functions to calculate gradients and update the weights. Finally, we’ll make predictions on the test data and see how accurate our model is using metrics such as &lt;code&gt;Accuracy&lt;/code&gt;, &lt;code&gt;Recall&lt;/code&gt;, &lt;code&gt;Precision&lt;/code&gt;, and &lt;code&gt;F1-score&lt;/code&gt;. We’ll compare our neural net with a logistic regression model and visualize the difference in the decision boundaries produced by these models.&lt;/p&gt;
&lt;p&gt;Let’s continue by implementing our cost function.&lt;/p&gt;
&lt;div id=&#34;compute-cost&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Compute Cost&lt;/h3&gt;
&lt;p&gt;We will use Binary Cross Entropy loss function (aka log loss). Here, &lt;span class=&#34;math inline&#34;&gt;\(y\)&lt;/span&gt; is the true label and &lt;span class=&#34;math inline&#34;&gt;\(\hat{y}\)&lt;/span&gt; is the predicted output.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ cost = - 1/N\sum_{i=1}^{N} y_{i}\log(\hat{y}_{i}) + (1 - y_{i})(\log(1 - \hat{y}_{i})) \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;computeCost()&lt;/code&gt; function takes as arguments the input matrix &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt;, the true labels &lt;span class=&#34;math inline&#34;&gt;\(y\)&lt;/span&gt; and a &lt;code&gt;cache&lt;/code&gt;. &lt;code&gt;cache&lt;/code&gt; is the output of the forward pass that we calculated above. To calculate the error, we will only use the final output &lt;code&gt;A2&lt;/code&gt; from the &lt;code&gt;cache&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;computeCost &amp;lt;- function(X, y, cache) {
    m &amp;lt;- dim(X)[2]
    A2 &amp;lt;- cache$A2
    logprobs &amp;lt;- (log(A2) * y) + (log(1-A2) * (1-y))
    cost &amp;lt;- -sum(logprobs/m)
    return (cost)
}&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cost &amp;lt;- computeCost(X_train, y_train, fwd_prop)
cost&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.693&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;backpropagation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Backpropagation&lt;/h3&gt;
&lt;p&gt;Now comes the best part of this all: backpropagation!&lt;/p&gt;
&lt;p&gt;We’ll write a function that will calculate the gradient of the loss function with respect to the parameters. Generally, in a deep network, we have something like the following.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;backprop_deep.png&#34; alt = &#34;Figure 3: Backpropagation with cache. Credits: deep learning.ai&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;
&lt;p&gt;The above figure has two hidden layers. During backpropagation (red boxes), we use the output cached during forward propagation (purple boxes). Our neural net has only one hidden layer. More specifically, we have the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;linear_backward.png&#34; alt = &#34;Figure 4: Backpropagation for a single layer. Credits: deep learning.ai&#34; height = &#34;200&#34; width=&#34;400&#34;&gt;&lt;/p&gt;
&lt;p&gt;To compute backpropagation, we write a function that takes as arguments an input matrix &lt;code&gt;X&lt;/code&gt;, the train labels &lt;code&gt;y&lt;/code&gt;, the output activations from the forward pass as &lt;code&gt;cache&lt;/code&gt;, and a list of &lt;code&gt;layer_sizes&lt;/code&gt;. The three outputs &lt;span class=&#34;math inline&#34;&gt;\((dW^{[l]}, db^{[l]}, dA^{[l-1]})\)&lt;/span&gt; are computed using the input &lt;span class=&#34;math inline&#34;&gt;\(dZ^{[l]}\)&lt;/span&gt; where &lt;span class=&#34;math inline&#34;&gt;\(l\)&lt;/span&gt; is the layer number.&lt;/p&gt;
&lt;p&gt;We first differentiate the loss function with respect to the weight &lt;span class=&#34;math inline&#34;&gt;\(W\)&lt;/span&gt; of the current layer.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T} \tag{8}\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Then we differentiate the loss function with respect to the bias &lt;span class=&#34;math inline&#34;&gt;\(b\)&lt;/span&gt; of the current layer.
&lt;span class=&#34;math display&#34;&gt;\[ db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}\tag{9}\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Once we have these, we calculate the derivative of the previous layer with respect to &lt;span class=&#34;math inline&#34;&gt;\(A\)&lt;/span&gt;, the output + activation from the previous layer.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]}} = W^{[l] T} dZ^{[l]} \tag{10}\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Because we only have a single hidden layer, we first calculate the gradients for the final (output) layer and then the middle (hidden) layer. In other words, the gradients for the weights that lie between the output and hidden layer are calculated first. Using this (and chain rule), gradients for the weights that lie between the hidden and input layer are calculated next.&lt;/p&gt;
&lt;p&gt;Finally, we return a list of gradient matrices. These gradients tell us the the small value by which we should increase/decrease our weights such that the loss decreases. Here are the equations for the gradients. I’ve calculated them for you so you don’t differentiate anything. We’ll directly use these values -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(dZ^{[2]} = A^{[2]} - Y\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(dW^{[2]} = \frac{1}{m} dZ^{[2]}A^{[1]^T}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(db^{[2]} = \frac{1}{m}\sum dZ^{[2]}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(dZ^{[1]} = W^{[2]^T} * g^{[1]&amp;#39;} Z^{[1]}\)&lt;/span&gt; where &lt;span class=&#34;math inline&#34;&gt;\(g\)&lt;/span&gt; is the activation function.&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(dW^{[1]} = \frac{1}{m}dZ^{[1]}X^{T}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(db^{[1]} = \frac{1}{m}\sum dZ^{[1]}\)&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you would like to know more about the math involved in constructing these equations, please see the references below.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;backwardPropagation &amp;lt;- function(X, y, cache, params, list_layer_size){
    
    m &amp;lt;- dim(X)[2]
    
    n_x &amp;lt;- list_layer_size$n_x
    n_h &amp;lt;- list_layer_size$n_h
    n_y &amp;lt;- list_layer_size$n_y

    A2 &amp;lt;- cache$A2
    A1 &amp;lt;- cache$A1
    W2 &amp;lt;- params$W2

    dZ2 &amp;lt;- A2 - y
    dW2 &amp;lt;- 1/m * (dZ2 %*% t(A1)) 
    db2 &amp;lt;- matrix(1/m * sum(dZ2), nrow = n_y)
    db2_new &amp;lt;- matrix(rep(db2, m), nrow = n_y)
    
    dZ1 &amp;lt;- (t(W2) %*% dZ2) * (1 - A1^2)
    dW1 &amp;lt;- 1/m * (dZ1 %*% t(X))
    db1 &amp;lt;- matrix(1/m * sum(dZ1), nrow = n_h)
    db1_new &amp;lt;- matrix(rep(db1, m), nrow = n_h)
    
    grads &amp;lt;- list(&amp;quot;dW1&amp;quot; = dW1, 
                  &amp;quot;db1&amp;quot; = db1,
                  &amp;quot;dW2&amp;quot; = dW2,
                  &amp;quot;db2&amp;quot; = db2)
    
    return(grads)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see below, the shapes of the gradients are the same as their corresponding weights i.e. &lt;code&gt;W1&lt;/code&gt; has the same shape as &lt;code&gt;dW1&lt;/code&gt; and so on. This is important because we are going to use these gradients to update our actual weights.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;back_prop &amp;lt;- backwardPropagation(X_train, y_train, fwd_prop, init_params, layer_size)
lapply(back_prop, function(x) dim(x))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $dW1
## [1] 4 2
## 
## $db1
## [1] 4 1
## 
## $dW2
## [1] 1 4
## 
## $db2
## [1] 1 1&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;update-parameters&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Update Parameters&lt;/h3&gt;
&lt;p&gt;From the gradients calculated by the &lt;code&gt;backwardPropagation()&lt;/code&gt;, we update our weights using the &lt;code&gt;updateParameters()&lt;/code&gt; function. The &lt;code&gt;updateParameters()&lt;/code&gt; function takes as arguments the gradients, network parameters, and a learning rate.&lt;/p&gt;
&lt;p&gt;Why a learning rate? Because sometimes the weight updates (gradients) are too large and because of that we miss the minima completely. Learning rate is a hyper-parameter that is set by us, the user, to control the impact of weight updates. The value of learning rate lies between &lt;span class=&#34;math inline&#34;&gt;\(0\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(1\)&lt;/span&gt;. This learning rate is multiplied with the gradients before being subtracted from the weights.The weights are updated as follows where the learning rate is defined by &lt;span class=&#34;math inline&#34;&gt;\(\alpha\)&lt;/span&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(W^{[2]} = W^{[2]} - \alpha * dW^{[2]}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(b^{[2]} = b^{[2]} - \alpha * db^{[2]}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(W^{[1]} = W^{[1]} - \alpha * dW^{[1]}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;math inline&#34;&gt;\(b^{[1]} = b^{[1]} - \alpha * db^{[1]}\)&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Updated parameters are returned by &lt;code&gt;updateParameters()&lt;/code&gt; function. We take the gradients, weight parameters, and a learning rate as the input. &lt;code&gt;grads&lt;/code&gt; and &lt;code&gt;params&lt;/code&gt; are calculated above while we choose the &lt;code&gt;learning_rate&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;updateParameters &amp;lt;- function(grads, params, learning_rate){

    W1 &amp;lt;- params$W1
    b1 &amp;lt;- params$b1
    W2 &amp;lt;- params$W2
    b2 &amp;lt;- params$b2
    
    dW1 &amp;lt;- grads$dW1
    db1 &amp;lt;- grads$db1
    dW2 &amp;lt;- grads$dW2
    db2 &amp;lt;- grads$db2
    
    
    W1 &amp;lt;- W1 - learning_rate * dW1
    b1 &amp;lt;- b1 - learning_rate * db1
    W2 &amp;lt;- W2 - learning_rate * dW2
    b2 &amp;lt;- b2 - learning_rate * db2
    
    updated_params &amp;lt;- list(&amp;quot;W1&amp;quot; = W1,
                           &amp;quot;b1&amp;quot; = b1,
                           &amp;quot;W2&amp;quot; = W2,
                           &amp;quot;b2&amp;quot; = b2)
    
    return (updated_params)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see, the weights still maintain their original shape. This means we’ve done things correctly till this point.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;update_params &amp;lt;- updateParameters(back_prop, init_params, learning_rate = 0.01)
lapply(update_params, function(x) dim(x))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $W1
## [1] 4 2
## 
## $b1
## [1] 4 1
## 
## $W2
## [1] 1 4
## 
## $b2
## [1] 1 1&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;train-the-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Train the Model&lt;/h2&gt;
&lt;p&gt;Now that we have all our components, let’s go ahead write a function that will train our model.&lt;/p&gt;
&lt;p&gt;We will use all the functions we have written above in the following order.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Run forward propagation&lt;/li&gt;
&lt;li&gt;Calculate loss&lt;/li&gt;
&lt;li&gt;Calculate gradients&lt;/li&gt;
&lt;li&gt;Update parameters&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This &lt;code&gt;trainModel()&lt;/code&gt; function takes as arguments the input matrix &lt;code&gt;X&lt;/code&gt;, the true labels &lt;code&gt;y&lt;/code&gt;, and the number of epochs.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Get the sizes for layers and initialize random parameters.&lt;/li&gt;
&lt;li&gt;Initialize a vector called &lt;code&gt;cost_history&lt;/code&gt; which we’ll use to store the calculated loss value per epoch.&lt;/li&gt;
&lt;li&gt;Run a for-loop:
&lt;ul&gt;
&lt;li&gt;Run forward prop.&lt;/li&gt;
&lt;li&gt;Calculate loss.&lt;/li&gt;
&lt;li&gt;Update parameters.&lt;/li&gt;
&lt;li&gt;Replace the current parameters with updated parameters.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This function returns the updated parameters which we’ll use to run our model inference. It also returns the &lt;code&gt;cost_history&lt;/code&gt; vector.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trainModel &amp;lt;- function(X, y, num_iteration, hidden_neurons, lr){
    
    layer_size &amp;lt;- getLayerSize(X, y, hidden_neurons)
    init_params &amp;lt;- initializeParameters(X, layer_size)
    cost_history &amp;lt;- c()
    for (i in 1:num_iteration) {
        fwd_prop &amp;lt;- forwardPropagation(X, init_params, layer_size)
        cost &amp;lt;- computeCost(X, y, fwd_prop)
        back_prop &amp;lt;- backwardPropagation(X, y, fwd_prop, init_params, layer_size)
        update_params &amp;lt;- updateParameters(back_prop, init_params, learning_rate = lr)
        init_params &amp;lt;- update_params
        cost_history &amp;lt;- c(cost_history, cost)
        
        if (i %% 10000 == 0) cat(&amp;quot;Iteration&amp;quot;, i, &amp;quot; | Cost: &amp;quot;, cost, &amp;quot;\n&amp;quot;)
    }
    
    model_out &amp;lt;- list(&amp;quot;updated_params&amp;quot; = update_params,
                      &amp;quot;cost_hist&amp;quot; = cost_history)
    return (model_out)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we’ve defined our function to train, let’s run it! We’re going to train our model, with 40 hidden neurons, for 60000 epochs with a learning rate of 0.9. We will print out the loss after every 10000 epochs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;EPOCHS = 60000
HIDDEN_NEURONS = 40
LEARNING_RATE = 0.9

train_model &amp;lt;- trainModel(X_train, y_train, hidden_neurons = HIDDEN_NEURONS, num_iteration = EPOCHS, lr = LEARNING_RATE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Iteration 10000  | Cost:  0.3724 
## Iteration 20000  | Cost:  0.4081 
## Iteration 30000  | Cost:  0.3273 
## Iteration 40000  | Cost:  0.4671 
## Iteration 50000  | Cost:  0.4479 
## Iteration 60000  | Cost:  0.3074&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2020-07-21-building-a-neural-net-from-scratch-using-r-part-2/index_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;logistic-regression&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Logistic Regression&lt;/h2&gt;
&lt;p&gt;Before we go ahead and test our neural net, let’s quickly train a simple logistic regression model so that we can compare its performance with our neural net. Since, a logistic regression model can learn only linear boundaries, it will not fit the data well. A neural-network on the other hand will.&lt;/p&gt;
&lt;p&gt;We’ll use the &lt;code&gt;glm()&lt;/code&gt; function in R to build this model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lr_model &amp;lt;- glm(y ~ x1 + x2, data = train)
lr_model&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:  glm(formula = y ~ x1 + x2, data = train)
## 
## Coefficients:
## (Intercept)           x1           x2  
##     0.51697      0.00889     -0.05207  
## 
## Degrees of Freedom: 319 Total (i.e. Null);  317 Residual
## Null Deviance:       80 
## Residual Deviance: 76.4  AIC: 458&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s now make generate predictions of the logistic regression model on the test set.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lr_pred &amp;lt;- round(as.vector(predict(lr_model, test[, 1:2])))
lr_pred&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] 1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 1 1 1 1 1 1
## [39] 1 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0
## [77] 1 1 1 0&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;test-the-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Test the Model&lt;/h2&gt;
&lt;p&gt;Finally, it’s time to make predictions. To do that -&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;First get the layer sizes.&lt;/li&gt;
&lt;li&gt;Run forward propagation.&lt;/li&gt;
&lt;li&gt;Return the prediction.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;During inference time, we do not need to perform backpropagation as you can see below. We only perform forward propagation and return the final output from our neural network. (Note that instead of randomly initializing parameters, we’re using the trained parameters here. )&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;makePrediction &amp;lt;- function(X, y, hidden_neurons){
    layer_size &amp;lt;- getLayerSize(X, y, hidden_neurons)
    params &amp;lt;- train_model$updated_params
    fwd_prop &amp;lt;- forwardPropagation(X, params, layer_size)
    pred &amp;lt;- fwd_prop$A2
    
    return (pred)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After obtaining our output probabilities (Sigmoid), we round-off those to obtain output labels.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;y_pred &amp;lt;- makePrediction(X_test, y_test, HIDDEN_NEURONS)
y_pred &amp;lt;- round(y_pred)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here are the true labels and the predicted labels.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Neural Net: 
##  1 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Ground Truth: 
##  0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Logistic Reg: 
##  1 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;decision-boundaries&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Decision Boundaries&lt;/h3&gt;
&lt;p&gt;In the following visualization, we’ve plotted our test-set predictions on top of the decision boundaries.&lt;/p&gt;
&lt;div id=&#34;neural-net&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Neural Net&lt;/h4&gt;
&lt;p&gt;As we can see, our neural net was able to learn the non-linear decision boundary and has produced accurate results.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2020-07-21-building-a-neural-net-from-scratch-using-r-part-2/index_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;logistic-regression-1&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Logistic Regression&lt;/h4&gt;
&lt;p&gt;On the other hand, Logistic Regression with it’s linear decision boundary could not fit the data very well.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2020-07-21-building-a-neural-net-from-scratch-using-r-part-2/index_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;confusion-matrix&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Confusion Matrix&lt;/h3&gt;
&lt;p&gt;A confusion matrix is often used to describe the performance of a classifier.
It is defined as:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[\mathbf{Confusion Matrix} = \left[\begin{array}
{rr}
True Negative &amp;amp; False Positive  \\
False Negative &amp;amp; True Positive
\end{array}\right]
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Let’s go over the basic terms used in a confusion matrix through an example. Consider the case where we were trying to predict if an email was spam or not.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;True Positive&lt;/strong&gt;: Email was predicted to be spam and it actually was spam.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;True Negative&lt;/strong&gt;: Email was predicted as not-spam and it actually was not-spam.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;False Positive&lt;/strong&gt;: Email was predicted to be spam but it actually was not-spam.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;False Negative&lt;/strong&gt;: Email was predicted to be not-spam but it actually was spam.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tb_nn &amp;lt;- table(y_test, y_pred)
tb_lr &amp;lt;- table(y_test, lr_pred)

cat(&amp;quot;NN Confusion Matrix: \n&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NN Confusion Matrix:&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tb_nn&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       y_pred
## y_test  0  1
##      0 34 10
##      1  7 29&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cat(&amp;quot;\nLR Confusion Matrix: \n&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## LR Confusion Matrix:&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tb_lr&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       lr_pred
## y_test  0  1
##      0 14 30
##      1 18 18&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;accuracy-metrics&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Accuracy Metrics&lt;/h3&gt;
&lt;p&gt;We’ll calculate the Precision, Recall, F1 Score, Accuracy. These metrics, derived from the confusion matrix, are defined as -&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt; is defined as the number of true positives over the number of true positives plus the number of false positives.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[Precision = \frac {True Positive}{True Positive + False Positive} \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt; is defined as the number of true positives over the number of true positives plus the number of false negatives.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[Recall = \frac {True Positive}{True Positive + False Negative} \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;F1-score&lt;/strong&gt; is the harmonic mean of precision and recall.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[F1 Score = 2 \times \frac {Precision \times Recall}{Precision + Recall} \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Accuracy&lt;/strong&gt; gives us the percentage of the all correct predictions out total predictions made.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[Accuracy = \frac {True Positive + True Negative} {True Positive + False Positive + True Negative + False Negative}  \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;To better understand these terms, let’s continue the example of “email-spam” we used above.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If our model had a precision of 0.6, that would mean when it predicts an email as spam, then it is correct 60% of the time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If our model had a recall of 0.8, then it would mean our model correctly classifies 80% of all spam.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The F-1 score is way we combine the precision and recall together. A perfect F1-score is represented with a value of 1, and worst with 0&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now that we have an understanding of the accuracy metrics, let’s actually calculate them. We’ll define a function that takes as input the confusion matrix. Then based on the above formulas, we’ll calculate the metrics.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;calculate_stats &amp;lt;- function(tb, model_name) {
  acc &amp;lt;- (tb[1] + tb[4])/(tb[1] + tb[2] + tb[3] + tb[4])
  recall &amp;lt;- tb[4]/(tb[4] + tb[3])
  precision &amp;lt;- tb[4]/(tb[4] + tb[2])
  f1 &amp;lt;- 2 * ((precision * recall) / (precision + recall))
  
  cat(model_name, &amp;quot;: \n&amp;quot;)
  cat(&amp;quot;\tAccuracy = &amp;quot;, acc*100, &amp;quot;%.&amp;quot;)
  cat(&amp;quot;\n\tPrecision = &amp;quot;, precision*100, &amp;quot;%.&amp;quot;)
  cat(&amp;quot;\n\tRecall = &amp;quot;, recall*100, &amp;quot;%.&amp;quot;)
  cat(&amp;quot;\n\tF1 Score = &amp;quot;, f1*100, &amp;quot;%.\n\n&amp;quot;)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here are the metrics for our neural-net and logistic regression.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Neural Network : 
##  Accuracy =  78.75 %.
##  Precision =  80.56 %.
##  Recall =  74.36 %.
##  F1 Score =  77.33 %.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Logistic Regression : 
##  Accuracy =  40 %.
##  Precision =  50 %.
##  Recall =  37.5 %.
##  F1 Score =  42.86 %.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see, the logistic regression performed horribly because it cannot learn non-linear boundaries. Neural-nets on the other hand, are able to learn non-linear boundaries and as a result, have fit our complex data very well.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this two-part series, we’ve built a neural net from scratch with a vectorized implementation of backpropagation. We went through the entire life cycle of training a model; right from data pre-processing to model evaluation. Along the way, we learned about the mathematics that makes a neural-network. We went over basic concepts of linear algebra and calculus and implemented them as functions. We saw how to initialize weights, perform forward propagation, gradient descent, and back-propagation.&lt;/p&gt;
&lt;p&gt;We learned about the ability of a neural net to fit to non-linear data and understood the importance of the role activation functions play in it. We trained a neural net and compared it’s performance to a logistic-regression model. We visualized the decision boundaries of both these models and saw how a neural-net was able to fit better than logistic regression. We learned about metrics like Precision, Recall, F1-Score, and Accuracy by evaluating our models against them.&lt;/p&gt;
&lt;p&gt;You should now have a pretty solid understanding of how neural-networks are built.&lt;/p&gt;
&lt;p&gt;I hope you had as much fun reading as I had while writing this! If I’ve made a mistake somewhere, I’d love to hear about it so I can correct it. Suggestions and constructive criticism are welcome. :)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;Here is a short list of two intermediate level and two beginner level references for the mathematics underlying neural networks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Intermediate&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Matrix Calculus You Need for Deep Learning&lt;/em&gt; - &lt;a href=&#34;https://arxiv.org/abs/1802.01528&#34;&gt;Parr and Howard (2018)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Deep Learning: An Introduction for Applied Mathematicians&lt;/em&gt; - &lt;a href=&#34;https://arxiv.org/abs/1801.05894&#34;&gt;Higham and Higham (2018)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Beginner&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning&#34;&gt;Deep Learning&lt;/a&gt; course by Andrew NG on Coursera. It can be audited for free.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Grant Sanderson’s YouTube channel. Here are the 4 relevant playlists. &lt;a href=&#34;https://www.youtube.com/playlist?list=PLZHQObOWTQDNPOjrT6KVlfJuKtYTftqH6&#34;&gt;diff eq&lt;/a&gt;, &lt;a href=&#34;https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&#34;&gt;linear algebra&lt;/a&gt;, &lt;a href=&#34;https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr&#34;&gt;calculus&lt;/a&gt;, &lt;a href=&#34;https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&#34;&gt;neural nets&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2020/07/24/building-a-neural-net-from-scratch-using-r-part-2/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Downtime Reading</title>
      <link>https://rviews.rstudio.com/2017/12/29/down-time-reading/</link>
      <pubDate>Fri, 29 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/12/29/down-time-reading/</guid>
      <description>
        &lt;p&gt;Not everyone has the luxury of taking some downtime at the end the year, but if you do have some free time, you may enjoy something on my short list of downtime reading. The books and articles here are not exactly &amp;ldquo;light reading&amp;rdquo;, nor are they literature for cuddling by the fire. Nevertheless, you may find something that catches your eye.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&#34;https://www.syncfusion.com/resources/techportal/ebooks&#34;&gt;Syncfusion series&lt;/a&gt; of free eBooks contains more than a few gems on a variety of programming subjects, including James McCaffrey&amp;rsquo;s &lt;a href=&#34;https://www.syncfusion.com/resources/techportal/details/ebooks/R-Programming_Succinctly&#34;&gt;R Programming Succinctly&lt;/a&gt; and Barton Poulson&amp;rsquo;s &lt;a href=&#34;https://www.syncfusion.com/resources/techportal/details/ebooks/rsuccinctly&#34;&gt;R Succinctly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-12-28-Rickert-Reading_files/succinctly.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;For a more ambitious read, mine the rich vein of &lt;a href=&#34;https://textbooks.opensuny.org/open-source-textbooks/&#34;&gt;SUNY Open Textbooks&lt;/a&gt;. My pick is Hiroki Sayama&amp;rsquo;s &lt;a href=&#34;https://textbooks.opensuny.org/introduction-to-the-modeling-and-analysis-of-complex-systems/&#34;&gt;Introduction to the Modeling and Analysis of Complex Systems&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-12-28-Rickert-Reading_files/complex.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;If you just can&amp;rsquo;t get enough of data science, then a few articles that caught my attention are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Christopher Olah&amp;rsquo;s brief but mind-stretching post on &lt;a href=&#34;http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/&#34;&gt;Neural Networks, Manifolds, and Topology&lt;/a&gt;, which is good preparation for the Fujitsu Laboratories paper on &lt;a href=&#34;https://www.jstage.jst.go.jp/article/tjsai/32/3/32_D-G72/_pdf&#34;&gt;Time Series Classification via Topological Data Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The paper by Nguyen and Holmes on their &lt;a href=&#34;https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1790-x&#34;&gt;Bayesian Unidimensional Scaling (BUDS)&lt;/a&gt; method for detecting patterns in high-dimensional data&lt;/li&gt;
&lt;li&gt;Bou-Hamad et. al&amp;rsquo;s &lt;a href=&#34;https://projecteuclid.org/download/pdfview_1/euclid.ssu/1315833185&#34;&gt;A review of survival trees&lt;/a&gt;, a valuable introduction to the literature on the subject&lt;/li&gt;
&lt;li&gt;Rob Hyndman&amp;rsquo;s recent post on &lt;a href=&#34;https://robjhyndman.com/hyndsight/tspackages/&#34;&gt;Some new time series packages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mike Bostock&amp;rsquo;s beautiful and mind-altering post on &lt;a href=&#34;https://bost.ocks.org/mike/algorithms/?t=1&amp;amp;cn=ZmxleGlibGVfcmVjcw%3D%3D&amp;amp;refsrc=email&amp;amp;iid=90e204098ee84319b825887ae4c1f757&amp;amp;uid=765311247189291008&amp;amp;nid=244+281088008&#34;&gt;Visualizing Algorithms&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-12-28-Rickert-Reading_files/starry.png&#34; alt=&#34;Starry Night through 6,667 uniform random samples&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Finally, if you really have some time on your hands, try searching through the 318M+ papers on &lt;a href=&#34;https://www.pdfdrive.net/&#34;&gt;PDFDRIVE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy reading, and have a &lt;em&gt;Happy and Prosperous New Year&lt;/em&gt; from all of us at RStudio!!&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/12/29/down-time-reading/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Connecting R to Keras and TensorFlow</title>
      <link>https://rviews.rstudio.com/2017/12/11/r-and-tensorflow/</link>
      <pubDate>Mon, 11 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/12/11/r-and-tensorflow/</guid>
      <description>
        


&lt;p&gt;It has always been the mission of R developers to connect R to the “good stuff”. As John Chambers puts it in his book &lt;em&gt;&lt;a href=&#34;http://amzn.to/2A2U1RG&#34;&gt;Extending R&lt;/a&gt;&lt;/em&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;One of the attractions of R has always been the ability to compute an interesting result quickly. A key motivation for the original S remains as important now: to give easy access to the best computations for understanding data.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From the day it was announced a little over two years ago, it was clear that Google’s &lt;a href=&#34;https://www.tensorflow.org/&#34;&gt;TensorFlow&lt;/a&gt; platform for &lt;a href=&#34;https://en.wikipedia.org/wiki/Deep_learning#cite_note-dechter1986-22&#34;&gt;Deep Learning&lt;/a&gt; is good stuff. This September (see &lt;a href=&#34;https://blog.rstudio.com/2017/09/05/keras-for-r/&#34;&gt;announcment&lt;/a&gt;), J.J. Allaire, François Chollet, and the other authors of the &lt;a href=&#34;https://cran.r-project.org/package=keras&#34;&gt;keras package&lt;/a&gt; delivered on R’s “easy access to the best” mission in a big way. Data scientists can now build very sophisticated Deep Learning models from an R session while maintaining the &lt;em&gt;flow&lt;/em&gt; that R users expect. The strategy that made this happen seems to have been straightforward. But, the smooth experience of using the &lt;code&gt;Keras&lt;/code&gt; API indicates inspired programming all the way along the chain from TensorFlow to R.&lt;/p&gt;
&lt;div id=&#34;the-keras-strategy&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The Keras Strategy&lt;/h3&gt;
&lt;p&gt;TensorFlow itself is implemented as a &lt;a href=&#34;https://en.wikipedia.org/wiki/Dataflow_programming&#34;&gt;Data Flow Language&lt;/a&gt; on a directed graph. Operations are implemented as nodes on the graph and the data, multi-dimensional arrays called “tensors”, flow over the graph as directed by control signals. An overview and some of the details of how this all happens is lucidly described in a &lt;a href=&#34;http://delivery.acm.org/10.1145/3090000/3088527/pldiws17mapl-maplmainid2.pdf?ip=73.71.144.79&amp;amp;id=3088527&amp;amp;acc=OA&amp;amp;key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E5945DC2EABF3343C&amp;amp;CFID=831811081&amp;amp;CFTOKEN=34450892&amp;amp;__acm__=1512687001_5cc6d6628bb281a58e545884cba347f9&#34;&gt;paper by Abadi, Isard and Murry&lt;/a&gt; of the Google Brain Team,&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-12-7-Rickert-TensorFlow_files/TF_graph.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;and even more details and some fascinating history are contained in Peter Goldsborough’s paper, &lt;a href=&#34;https://arxiv.org/pdf/1610.01178v1.pdf&#34;&gt;A Tour of TensorFlow&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This kind of programming will probably strike most R users as being exotic and obscure, but my guess is that because of the &lt;a href=&#34;https://pdfs.semanticscholar.org/6869/4d0a776b55459392a1fdead1bad5266f4b38.pdf&#34;&gt;long history&lt;/a&gt; of dataflow programming and parallel computing, it was an obvious choice for the Google computer scientists who were tasked to develop a platform flexible enough to implement arbitrary algorithms, work with extremely large data sets, and be easily implementable on any kind of distributed hardware including GPUs, CPUs, and mobile devices.&lt;/p&gt;
&lt;p&gt;The TensorFlow operations are written in C++, &lt;a href=&#34;https://developer.nvidia.com/cuda-downloads&#34;&gt;CUDA&lt;/a&gt;, &lt;a href=&#34;http://eigen.tuxfamily.org/index.php?title=Main_Page&#34;&gt;Eigen&lt;/a&gt;, and other low-level languages optimized for different operation. Users don’t directly program TensorFlow at this level. Instead, they assemble flow graphs or algorithms using a higher-level language, most commonly Python, that accesses the elementary building blocks through an &lt;a href=&#34;https://www.tensorflow.org/api_docs/&#34;&gt;API&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;keras&lt;/code&gt; R package wraps the &lt;a href=&#34;https://www.tensorflow.org/api_docs/&#34;&gt;Keras Python Library&lt;/a&gt; that was expressly built for developing Deep Learning Models. It supports convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both, as well as arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, etc. (It should be pretty clear that the Python code that makes this all happen counts as good stuff too.)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-started-with-keras-and-tensorflow&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Getting Started with Keras and TensorFlow&lt;/h3&gt;
&lt;p&gt;Setting up the whole shebang on your local machine couldn’t be simpler, just three lines of code:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;keras&amp;quot;)
library(keras)
install_keras()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just install and load the &lt;code&gt;keras&lt;/code&gt; R package and then run the &lt;code&gt;keras::install_keras()&lt;/code&gt; function, which installs TensorFlow, Python and everything else you need including a &lt;a href=&#34;https://virtualenv.pypa.io/en/stable/&#34;&gt;Virtualenv&lt;/a&gt; or &lt;a href=&#34;https://conda.io/docs/&#34;&gt;Conda&lt;/a&gt; environment. It just works! For instructions on installing Keras and TensorFLow on GPUs, look &lt;a href=&#34;https://tensorflow.rstudio.com/installation_gpu.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That’s it; just a few minutes and you are ready to start a hands-on exploration of the extensive documentation on the RStudio’s TensorFlow webpage &lt;a href=&#34;https://tensorflow.rstudio.com/&#34;&gt;tensorflow.rstudio.com&lt;/a&gt;, or jump right in and build a &lt;a href=&#34;https://tensorflow.rstudio.com/keras/&#34;&gt;Deep Learning model&lt;/a&gt; to classify the hand-written numerals using&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-12-7-Rickert-TensorFlow_files/MNIST.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;MNIST data set which comes with the &lt;code&gt;keras&lt;/code&gt; package, or any one of the other twenty-five pre-built examples.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;beyond-deep-learning&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Beyond Deep Learning&lt;/h3&gt;
&lt;p&gt;Being able to build production-level Deep Learning applications from R is important, but Deep Learning is not the answer to everything, and TensorFlow is bigger than Deep Learning. The really big ideas around TensorFlow are: (1) TensorFlow is a general-purpose platform for building large, distributed applications on a wide range of cluster architectures, and (2) while data flow programming takes some getting used to, TensorFlow was designed for algorithm development with big data.&lt;/p&gt;
&lt;p&gt;Two additional R packages make general modeling and algorithm development in TensorFlow accessible to R users.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://github.com/rstudio/tfestimators&#34;&gt;&lt;code&gt;tfestimators&lt;/code&gt;&lt;/a&gt; package, currently on GitHub, provides an interface to Google’s &lt;a href=&#34;https://www.tensorflow.org/programmers_guide/estimators&#34;&gt;Estimators&lt;/a&gt; API, which provides access to pre-built TensorFlow models including SVM’s, Random Forests and KMeans. The architecture of the API looks something like this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-12-7-Rickert-TensorFlow_files/tfestimators.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;There are several layers in the stack, but execution on the small models I am running locally goes quickly. Look &lt;a href=&#34;https://tensorflow.rstudio.com/tfestimators/&#34;&gt;here&lt;/a&gt; for documentation and sample models that you can run yourself.&lt;/p&gt;
&lt;p&gt;At the deepest level, the &lt;a href=&#34;https://CRAN.R-project.org/package=tensorflow&#34;&gt;&lt;code&gt;tensorflow&lt;/code&gt;&lt;/a&gt; package provides an interface to the core &lt;a href=&#34;https://www.tensorflow.org/api_docs/python/&#34;&gt;TensorFlow API&lt;/a&gt;, which comprises a set of Python modules that enable constructing and executing TensorFlow graphs. The documentation on the package’s &lt;a href=&#34;https://tensorflow.rstudio.com/tensorflow/articles/tutorial_mnist_pros.html&#34;&gt;webpage&lt;/a&gt; is impressive, containing tutorials for different levels of expertise, several examples, and references for further reading. The &lt;a href=&#34;https://tensorflow.rstudio.com/tensorflow/articles/tutorial_mnist_beginners.html&#34;&gt;MNIST for ML Beginners&lt;/a&gt; tutorial works through the classification problem described above in terms of the Keras interface at a low level that works through the details of a softmax regression.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-12-7-Rickert-TensorFlow_files/softmax.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;While Deep Learning is sure to capture most of the R to TensorFlow attention in the near term, I think having easy access to a big league computational platform will turn out to be the most important benefit to R users in the long run.&lt;/p&gt;
&lt;p&gt;As a final thought, I am very much enjoying reading the &lt;a href=&#34;https://www.manning.com/books/deep-learning-with-r&#34;&gt;MEAP&lt;/a&gt; from the forthcoming Manning Book, &lt;em&gt;Deep Learning with R&lt;/em&gt; by François Chollet, the creator of Keras, and J.J. Allaire. It is a really good read, masterfully balancing theory and hands-on practice, that ought to be helpful to anyone interested in Deep Learning and TensorFlow.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/12/11/r-and-tensorflow/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
