Sometimes…it’s just because we can: Deep Dreaming in R

A couple of years ago I saw a TED-talk on AI and the first pieces of art created with a neural network and I thought to myself “Wow! This is so cool!”.

shanghai
An untrained dream of Pudong (浦东)

My first passion growing up was drawing and I had a bizarre taste in pictures….they resembled quite a lot what is shown in the picture above (yes, yes I know…I probably did arts a favor by studying math!) and I spent a lot of time, in classes that bored me, drawing these. Today, as I have grown and have responsabilities I spend time working and learning new techniques that are applicable in different parts of society, but from time to time just because they’re simple fun.

 

 

 

As most of the blogs on this site have been focused on practical issues, I thought that I could indulge for once in just playing with tools. But, this playing is also a good way to learn and spread knowledge about advanced analytics methods and tools that can be useful in other areas. So, even though the code and results I will give below are simply written for the sake of Deep Dreaming, I will give a basic background to what is being used, namely TensorFlow and Inception V3. As for what a neural network is and how Convolution networks are (and how they operate) I refer to a previous post, A gentle Introduction to Convolutional Networks.

TensorFlow…what is it?

Well, as the name suggests it’s a flow of tensors. What is then meant by “flow” and what is a “tensor”? Let’s begin by defining tensors: Given a reference basis of vectors, a tensor can be represented as an organized multidimensional array of numerical values. The order (also degree or rank) of a tensor is the dimensionality of the array needed to represent it, or equivalently, the number of indices needed to label a component of that array. For example, a linear transformation is a n\times n matrix and therefore is a tensor of order 2. A vector is represented as a 1-dimensional array in a basis, and is a thus a 1st-order tensor. Scalars are single numbers and are thus 0th-order tensors. The collection of tensors in a vector space forms a tensor algebra. What would then an order 3 tensor be? That’s right! A three dimensional array of numbers and a tensor of order n a n-dimensional array. What the meaning of these is in the physical world is another story and I will not go into it. How are they formed? The simplest example could be seen this way: Suppose one second-order tensor A_{ik} is a linear function of another second-order tensor B_{ik}, such that A_{ik}=\lambda_{iklm}B_{lm},
then \lambda_{iklm} is a fourth-order tensor. For the purists, I suggest the exact definition of tensors as can be found in Bourbaki, Nicolas (1989), Elements of Mathematics, Algebra I, Springer-Verlag(A word of caution though, this book is not for the faint hearted and requires at least a master’s degree in mathematics)

Now, the idea behind the architecture of TensorFlow the creation of a graph in which nodes are operations to be performed while the edges between them are tensors. Note the analogi with how our brains are thought to be rigged.

Graph
A simple graph representing two tensors and a node (the operation to be excecuted with the tensors as data). The mathematical operators can be simple arithmetical operation or multivariate functions, the result being a new tensor that can be used further down the flow.

One might see a contradiction in what I have written above about tensors (being mathematical functions) and the picture that I have just drawn. Indeed, the picture above means that a tensor is simply an array of data on which operations will be performed. But if you think about it, a matrix, unless you let it act on a vector, is nothing else than a collection (a grid) of numerical values. So, basically, there is no contradiction.

This is the essence of TensorFlow: Create a flow of tensors and nodes to achieve a particular result.

TensorFlow was originally developed by the Google Brain team for internal Google use but released as an open-source in 2015. Now, for people like me that have a preferens for R this represented a little problem since the code for it is written in Python (and for the calculation part in C++). But this issue was quickly resolved by the development of the keras library. So, as long as you have a working installation of Anaconda (or equivalent) on your system, you’re good to go.

The first step of our work will be to load keras, as usualy by using the command

library(keras)

Now, the dreaming does not appear from nowhere and we need to use some pre-trained model. There are many choices, wither by using a model that you’ve trained by yourself (to recognize some object) or one of the many models that are already available in the keras package. They are all convolution networks (or convNets) that have been developped for different purposes, for instance VGG16, VGG19, Xception, ResNet50, MobileNet and the one I am going to use, namely Inception v3 (a v4 release is available). Of course, all these models have different purposes and will have a large impact on the effect produced in the visualization, and in my personal opinion Inception produced the most appealing effects.

We need to establish and compute a measure of the loss to maximize during the gradient ascent process. Specifically, we will maximize a weighted sum of the L2 norm of the activations of a set of high-level layers. Lower layers give effects in the geometry, and higher layers result in visuals in which you can recognize some classes from ImageNet. Inception v3 has a depth of 48 layers!! So you definitely want to go deeper than the 4 layers in the movie with the same name! By the way, just three to four years ago, going 2 layers down was the maximum you could achieve.

I will do two things in my example. First of all I’ll use an untrained model using the ImageNet weights. These are predetermined. In a second step, I’ll be using the same script but assign a training to it (it will basically use the training algortihm coming with Inception V3). The simplest way to switch training on and off is to use the command:

k_set_learning_phase(value)

where value is either 0 (no training) or 1 (training). Now, as I mentioned above, there are 48 available layers and it’s up to the user to decide which layers to activate and to decide how much they are to contribute to maximizing loss. However, if you’re doing this from a laptop beware that you might not have to ability to use all possible layers. I’ve chosen to use a few lower layer to affect the geometry and a few from the top to influence the object recognition. There are other ways to get some pretty good results, as I will show below.

LContrib = list(
                mixed2 = 0.2,
                mixed3 = 0.8,
                mixed4 = 2,
                mixed5 = 4,
                mixed6 = 5,
                mixed11 =4,
                mixed27 = 9,
                mixed37 = 10,
                mixed47 = 15
)

Since every layer has a unique name, we create a sort of dictionary of layer names to be used in our code

layerDictionary               = model$layers
names(layerDictionary) = lapply(layerDictionary, function(layer) layer$name)

and define a loss function. I mentioned above the the weighted sum of theL2-normof the activations of the layers I’ve chosen. It has to be added to the loss after having activated the layer.

loss = k_variable(0)

      for (layer_name in names(LContrib)) {
            Coefficient = LContrib[[layer_name]]
             activation = layerDictionary[[layer_name]]$output
                scaling = k_prod(k_cast(k_shape(activation), "float32"))
                   loss = loss + (Coefficient * k_sum(k_square(activation)) / scaling)
       }

dream = model$input

Now, as a good student or analyst you probably remember your first course in analysis (one-dimensional or multi-dimensional). If you want to minimize or maximize some objective function, what ought you to do? Well, one of the easiest ways in to incrementaly decrease or increase until you reach your minimum or maximum. This process is called gradient descent or ascent bepending of your goal:

Gradient descent: To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient ascent: If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

Our goal being to maximalize the loss, I’ll use the latter:

Gradients                = k_gradients(loss, dream)[[1]]
Gradients                = Gradients / k_maximum(k_mean(k_abs(Gradients)), 1e-7)
outputs                  = list(loss, Gradients)
fetch_loss_and_Gradients = k_function(list(dream), outputs)

Evaluate_loss_and_Gradients = function(x) {
                       outs = fetch_loss_and_Gradients(list(x))
                 loss_value = outs[[1]]
            gradient.values = outs[[2]]
                            list(loss_value, gradient.values)
                             }

gradient_ascent = function(x, iterations, step, max_loss = NULL) {
                     for (i in 1:iterations) {
                       c(loss_value, gradient.values) %<-% Evaluate_loss_and_Gradients(x)
                       if (!is.null(max_loss) && loss_value > max_loss)
                       break
                       cat("...Loss value at", i, ":", loss_value, "\n")
                       x = x + (step * gradient.values)
                       }
                   x
                          }

All the above is just a preparation for the actual dreaming part. Before I continue, there are some things that need to be defined and which are essential to the algorithm used. I already spoke of the gradient part, that is how steep the gradient should be. This is not done just once, but repeated until the loss has been maximized. Now, there should obiviously be 1) a boundary for how many times a process can be repeated, so one should set a limit for the number of iteratios and 2) A maximum loss to be allowed to avoid a complete destruction of the original image. Also, to avoid changing the image beyond recognition we’ll reinject the original image to 1) keep as much information from it and 2) avoid to have blury dreams. We’ll do this a predetermined number of times. This number is called a scale, or in some cases Octaves.

So, to recap the whole process: For each octave I’ll perform n number of iterations of gradient ascents for a predetermined gradient. Also, after each successive octave  we upscale the resulting image by a given percentage.

resize_img = function(img, size) {
                          image_array_resize(img, size[[1]], size[[2]])
                         }

save_img   = function(img, fname) {
                          img = deprocess_image(img)
                          image_array_save(img, fname)
                         }

# Open-resize-format pictures into appropriate tensors
preprocess_image = function(image_path) {
                     image_load(image_path) %>% 
                     image_to_array() %>% 
                     array_reshape(dim = c(1, dim(.))) %>% 
                     inception_v3_preprocess_input()
                       }

# Conversion of a tensor into a valid image

deprocess_image = function(img) {
                    img = array_reshape(img, dim = c(dim(img)[[2]], dim(img)[[3]], 3))
                    img = img / 2
                    img = img + 0.5
                    img = img * 255

                     dims = dim(img)
                      img = pmax(0, pmin(img, 255))
                     dim(img) = dims
                    img
                     }

# HERE IS WHERE YOU CAN MAKE MAGIC HAPPEN!!!! PLAY AROUND
step           = 0.025 
num_octave     = 5 
octave_scale   = 1.4 
iterations     = 20 


max_loss = 10


base_image_path  = "C:PATH/jelly.jpg"
img              = preprocess_image(base_image_path)


original_shape    = dim(img)[-1]
successive_shapes = list(original_shape)
for (i in 1:num_octave) { 
         shape = as.integer(original_shape / (octave_scale ^ i))
          successive_shapes[[length(successive_shapes) + 1]] = shape 
         }

original_img        = img 
shrunk_original_img = resize_img(img, successive_shapes[[1]])
                        for (shape in successive_shapes) {
                        cat("Processsing image shape", shape, "\n")
                         img = resize_img(img, shape)
                         img = gradient_ascent(img,
                  iterations = iterations,
                        step = step,
                    max_loss = max_loss)
             upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
             same_size_original = resize_img(original_img, shape)
                    lost_detail = same_size_original - upscaled_shrunk_original_img
                            img = img + lost_detail
            shrunk_original_img = resize_img(original_img, shape)
            save_img(img, fname = sprintf("dream/at_scale_%s.png",
                                   paste(shape, collapse = "x")))
                     }

save_img(img, fname = "C:PATH/JELLYdream40.jpg")
plot(as.raster(deprocess_image(img) / 255))

 

Result

As I mentioned at the beginning of the code section, there are two settings available to use either an untrained or trained model. First off, I will start with the untrained (value in k_set_learning_phase(value) set to 0). As I also mentioned above, the results you’ll get are dependent on the different settings you choose for the number of scales, iterations and grandient steps (among others). You’ll have to try by yourselves and play around with the different parameters. I’ll simply give two examples here, one of one of my favorite cities in the world, Shanghai, and jelly fish.

 

One of the draw-back of using the Imagenet-trained model is that it has been trained on many different objects, which in turn makes it hard to get really nice pictures. Still, the results are pretty impressive.

 

These technique can be applied to other things than images. My plan for a future post is to teach a network to recognize works of a given author and create new material, hopefully so well trained that it could be misstaken for works of that same author.

Until then, play around…maybe even use models that you have trained on one particular object and create your own works of art!

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: