# Baby twins deep learning classification with Inception-ResNetV1

A photo of my two girls with annotations. It will be used for building the face recognition dataset. In the blog post, their faces have been blurred for anonymization.

Deeplearning techniques have proven to be the most efficient AI tools for computer vision. In this blog post we use a deeplearning convolutional neural network to build a classifier on my baby twins pictures.

When it comes to machine learning practical experiments, the first thing anybody needs are some data. When experimenting for hobby, we often rely on some open and popular dataset such as MNIST or the IMDB reviews. However, it is useful for improving to be confronted with challenges on fresh and unworked data.

Since July 2019 (9-months at the time of the writing), I am the happy father of two lovely twin baby girls: L and J. If I have free private data at scale, it is definitely photos of my kids. Indeed, all our families have been taking pictures of them and, thanks to Whatsapp and other communications means, I have been able to collect a great part of them.

Deeplearning and Convnet neural networks are now the state-of-the-art methods for computer vision. Here are Convnet visualizations on a L. photo.

In this post, we will use a state-of-the-art deep learning architecture: Inception ResNetV1 to build a classifier for photo portraits of my girls. We also take benefit of some pretrained weights from facenet dataset. Before that, we will make a detour by tricking a little bit the problem: this will allow us to check our code snippets and review some nice visualization techniques. Then, the InceptionResNetV1 based model will allow us to achieve some interesting accuracy results. We will experiment using Keras backed by Tensorflow. We conducted the computing intensive tasks on a GPU machine hosted on Microsoft Azure.

The code source is available here: on Github. The dataset is obviously composed of personal pictures of my family that I do not want to let openly accessible. In this blog post, the faces of my babies have been intentionally blured to preserve their privacy. Of course, the algorithms whose results are presented here were run with non obfuscated data.

In this post and in the associated source code we reuse some of the models and snippets from the (excellent) book Deep Learning with Python by François Chollet. We also leverage the InceptionResNetv1 implementation from Sefik Ilkin Serengil.

Let us start this post by explaining the problem and why it is not as easy as it may seems to be.

## A more complex topic that it seems

While my baby girls L and J are not identical twins, they definitely are sisters, they do look alike a lot! In addition, the photos were taken since their first days in this world to their 8 months. It is no secret that babies change a lot during their first months. They were born weighing around 2.5 kgs, now they are close to 9 kgs. Even family members whom saw them in their first days have difficult times distinguish them now.

In addition, following their grandmother’s home country tradition L and J were shaved at the age of 4 months. Their haircut is not a way to distinguish them: we have photos before the shaving, during hair regrowth and now with middle to long hairs. Also, many photos were taken with a woolen hat or anything else. Consequently, we will be pushing a little the limits of face recognition.

Our objective in this series of experimentation is to build a photo classifier. Precisely, given a photo we would like to make a 2-class prediction: “Is this photo is the one of L or J?”. We assume then that each picture contains one and only one portrait of one of my two baby girls.

I have collected 1000 raw photos that will be used to create the dataset.

## Building the dataset

### Photo tagging

In the raw datasets, some photos contain only L, others only J, some both and some none of them. We exploit these photos to extract face pictures for each girl. To do so, we need to locate precisely the faces in the photos first.

For efficient annotation, I have used an opensource tagging software: VoTT which is supported by Microsoft. The tagging is pretty straightforward and you can quickly annotate the photos with a very intuitive interface. It took me between one and two hours to tag the full dataset.

Efficient photo annotation with the VoTT software. Note also the FC Nantes outfits…

One of the worries of twins parents, is the fear to favor one child over the other. Well, I am not concerned and data speak for me. Here are the results: after the tagging we have a very well balanced tags repartition with a little more than 600 tags for each of the girls.

The tag repartitions of L and J.

Now we will build the picture dataset: where each picture contains the portrait of one of the kid. VoTT provides the tag location within the picture as a JSON format. Therefore, it is easy to crop all files to produce a dataset where each image contains only one kid’s face, see this code snippet.

The extraction process from tagged photos to square cropped images.

### Splitting the dataset: train, validation, test

As always for any machine learning training procedure, one must separate the original dataset between: 1) training data that will be used to fit the model and 2) validation data that will be used to measure performance of the tuned algorithms. Here we go further by keeping also a third untouched test dataset.

It is always easier to work with an equally balanced dataset. Luckily this is almost the case with the original data. Consequently after processing (mainly shuffling and splitting) we obtain the following repartition in our file system:

├── train
│ ├── J => 396 JPG files
│ └── L => 396 JPG files
├── validation
│ ├── J => 150 JPG files
│ └── L => 150 JPG files
└── test
│ ├── J => 100 JPG files
│ └── L => 100 JPG files


The code snippets for splitting and shuffling the dataset is available here.

## Machine setup – GPU Powered VM

The experiments are conducted with Python 3.6 and Tensorflow GPU 1.15. We use the high level deeplearning library Keras 2.2.4.

We have setup an Ubuntu 18.04 NC6 GPU VM on Microsoft Azure. The NC-series VM use Nvidia Tesla K80 graphic card and Interl Xecon E5-2690 for CPU. With the NC6 version we benefit from 1 GPU and 6 vCPU, this setup maked the following computations perfectly acceptable: all experiments lasted less than few minutes.

This is the second time I do the setup of an Ubuntu machine with Cuda/cudNN and Tensorflow, same as before this was a real pain. The official documentation from Tensorflow is totally incorrect and guides you in the wrong direction. Finally, I managed to have a successful setup with the following Tensorflow-gpu 1.15.0, Keras 2.2.4 and Cuda 10.0 thanks to this StackOverflow post.

For efficient development, I use VSCode with the new SSH Remote extensions which make remote development completely seamless. The experiments are also conducted with IPython Jupyter notebook. And once again VSCode provides out-of-the-shelf SSH tunneling to simplify everything.

Tensorflow outputs confirm that its primary computing device is our Tesla K80 GPU.

The nvidia-smi command shows load on the GPU from the Python process

## First, a simplified problem with tricked data to get started

The experiments provided in this section can be found in this notebook.

Here we will make a small detour by simplifying tricking the challenge. We will add easily detectable geometrical shapes in the images.

### Drawing obvious shapes on image classes

When tackling a datascience project, I always think it is great to start really simple. Here, I wanted to make sure that my code snippets were ok so I decided to trick (temporarily) the problem. Indeed, I drawed geometrical shapes on image classes. Precisely, for any J photo, a rectangle is drawn and, for any L photo, an ellipse is inserted. The size, the shape ratio and the filling color are left random. You can see with the two following examples what this looks like:

An ellipse is drawn on all L photos, for train, validation and test sets.

A rectangle with random filling color on all J images.

Of course this completely workarounds and tricks the face recognition problem. Anyway that’s a good starting point to test our setup and code experimentation snippets.

### A simple Convnet trained from scratch

For this simplified task we use the basic Convnet architecture introduced by Francois Chollet in his book (see the beginning of the post). Basically, it consists of 4 2D-convolutional layers followed by MaxPooling layers. This constitutes the convolutional base of the model. Then the tensors are flatten and a series of Dense and Dropout layers are added to perform the final classification parts.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 198, 198, 32) 896
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 99, 99, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 97, 97, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 48, 48, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 46, 46, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 23, 23, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 21, 21, 128) 147584
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 10, 10, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 12800) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12800) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 6554112
_________________________________________________________________
dense_2 (Dense) (None, 1) 513
=================================================================
Total params: 6,795,457
Trainable params: 6,795,457
Non-trainable params: 0


### 97%+ accuracy

Actually this is no surprise that we are able to achieve great classification performance. The algorithms performs without difficulty. To do so we used the standard Data-augmentation techniques. Note that for this task we skipped the rotations.

## Define TRAINING SET
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=0,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True, fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size = BATCH_SIZE,
class_mode='binary')

## Define VALIDATION SET
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_directory(VAL_DIR,
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size = BATCH_SIZE,
class_mode='binary')


After running the training on 30 epochs we observe the following learning curves.

model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-3), metrics=['acc'])
history = model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=30, validation_data=validation_generator, validation_steps=50)


Training and validation accuracy on tricked data. We achieve strong accuracy without surprise.

Now that the results seem satisfactory without sign of overfitting (the validation accuracy grows and stalls). It is time to measure performance on the test sets composed of the 200 pictures left aside.

Accuracy is a key indicator but even with a 2-class classification problem, it is a common error to ignore more subtle information such as the confusion matrix. Using the following snippet and the scikit learn library. We are able to collect a full classification report along a confusion matrix. Again, here all signals are green, and we see the results of a very efficient classifier. Yet, let us not forget that the game has been tricked!

from sklearn.metrics import classification_report, confusion_matrix

TEST_SIZE=200
target_names = ['J', 'L']
Y_pred = Y_pred.flatten()
y_pred_class = np.where(Y_pred > 0.5, 1, 0)

print('Classification Report')
print(classification_report(test_generator.classes, y_pred_class, target_names=target_names))

print('Confusion Matrix')
plot_confusion_matrix(confusion_matrix(test_generator.classes, y_pred_class),target_names)


The confusion matrix with tricked data. We see excellent accuracy, precision and recall.

### Going beyond and observe Conv2d learnings with an activation model

Again, we use a technique well exposed in the François Chollet’s book.

The idea is to build a multi output model based on the successive outputs of the base convolutional layers. Thanks to the successive ReLu layers, we can plot the activation maps from the outputs of these layers. The following visuals illustrate well that our Convnet base model has successfully learned the geometrical shape: the curved stroke of ellipses and the squared edges of rectangles.

layer_outputs = [layer.output for layer in model.layers[:depth]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
predictions = models.predict(input_img_tensor)


One the extracted feature on the bottom layer from the L picture with ellipse. We see in green the activated regions. The ellipsis is strongly activated but also the pacifier.

From the bottom layers of our neural network model, the ellipsis are obviously the most activated regions of the input picture.

With upper layers, it is visible that the patterns that is captured for the classification is the curve of the stroke path of our ellipsis. Similarly, the square corners of the rectangles are captured.

## Back to the real face recognition problem

Now it is time to get back to our original problem: classification of my baby girls without relying on any trick, just with face recognition on the original images. The simple Convnet in the previous section will not be sufficient to build a classifier with significant accuracy. We will need bigger artillery.

The experiments provided in this section can be found in this notebook.

### Using the InceptionResNetV1 as a base model

InceptionResNetV1 is a deep learning model that has proven to be one of the state-of-the-art very deep architecture for convolutional networks. It also uses the concept Residual Neural Networks.

We use its implementation provided originally by Sefik Ilkin Serengil whom was also reusing parts of the implementation provided by David Sandberg.

For our classification problem, we use InceptionResNetV1 as the base (then very deep) network. On top of it we flatten the tensors and bring Dense and Dropout layers to serve as classification.

Layer (type) Output Shape Param #
=================================================================
inception_resnet_v1 (Model) (None, 128) 22808144
_________________________________________________________________
dense_1 (Dense) (None, 256) 33024
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_2 (Dense) (None, 64) 16448
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 16) 1040
_________________________________________________________________
dense_4 (Dense) (None, 1) 17
=================================================================
Total params: 22,858,673
Trainable params: 22,829,841
Non-trainable params: 28,832


### Achieving nearly 0.80 accuracy

We conducted some experiments when trying to make InceptionResNetV1 without any prior weights tuning, which means without using pre-trained experimentations (sometimes called Transfer Learning). Without any surprise, the model could not reach significant validation accuracy (i.e. significantly above 0.6 in accuracy). Our dataset, even with data augmentation, is too small to let a deep architecture “learn” what are the key elements that constitute the characteristics of a human face.

Therefore, we reuse pretrained weights from facenet database, provided by David Sandberg and Sefik Ilkin Serengil in his blog post.

from inception_resnet_v1 import *

def model_with_inception_resnet_base(pretrained_weights):
model = InceptionResNetV1()

if pretrained_weights == True:

new_model = models.Sequential()

return new_model


Thanks to our GPU we were able to retrain the full model composed of a InceptionResNetV1 base with our top layers classifiers. We did not even have to recourse to fine-tuning techniques where the first layers weights need to be frozen.

After a dozen of minutes of training, I was happy to see the following training and validation accuracy curves.

The training and validation accuracy over the epochs. We see that the validation accuracy reaches 0.80 accuracy.

This shows all the positive signs of a successfully trained ML algorithm. Therefore, let us examine the performance on the test dataset, i.e. the one that has not been fed to the algorithm before.

The final classification report and confusion matrix. We achieves nearly 0.80 of accuracy.

The classification reports and confusion matrix on the test dataset confirm the measure on the validation set. We achieve nearly 80% of accuracy. One interesting thing is, from the report, J. looks to be a little be more difficult for our model to classify than L. Honestly, I have no assumption on what could cause this. A deeper analysis by examining layers in the spirit of what has been presented above could be conducted.

## Conclusion

I did not spend time trying to tune so much the InceptionResNetV1 hyperparameters. I also tweaked but only a little the top layers. No doubt that there is room for great improvements here. This can constitute a follow up blog post.

Also, I did not confront other algorithms and deeplearning architectures. I quickly gave a try to the Deepface Keras implementation but without significant results. I did not spend time investigating why this was not working. Once again this could be part of an interesting follow up. Ideally, I would also benchmark this DLib implementation.

By conducting these experiments, I confirm that it is nearly impossible to perform “real” recognition on faces without some kind of pretrained models if you have at hands this small amount of face data.

Finally, I learnt a lot. It is always a good thing to try things on your own data. This is how you learn to tackle real-life problems in datascience.

Our classifier works: it is now able at 80% accuracy to recognize between my two baby girls.

# An unexpected quadratic cost in Excel Interop

My current task at work is the development of an excel addin. This addin is developed in C# .NET and is based on the Excel Interop assemblies, we also use Excel DNA for the packaging and the User Defined functions. While developing a new feature I stumbled upon a technical oddity of Excel Interop which I will describe to you in this post.

Let me start this post by reminding you that a range in Excel is not necessarily a block of contiguous cells. Indeed, try it yourself by starting excel right now. Then you can select a range, keep the Ctrl button of your keyboard press on and select an other block of contiguous cell. Then, you have several cells selected that you can name (as shown in the screenshot). Finally, you have created a named range with non-contiguous cells.

Having said that, let us assume that for our addin we need to reference programmatically the range of all lines of the form, with usual excel notations, Ax:Cx where x describes a set of row indices.

Then we need to use the method Application.Union of Micorsoft.Office.Interop.Excel and finally produce few lines of code that  looks like that.

In the chart above we have monitored the time execution of the BuildUnionAll method for different values of the parameter count.

Remark: in the case of BuildUnionAll there is no need to use a loop and the Union method, we could have just ask for the range “A1:Ccount”. Note also that for small unions you may also use a syntax with semicolon to separate contiguous cells blocks e.g. worsheet.Range[“A1:C3;A8:C12”]. However, this does not work for large ranges made of multiple non-adjacent cells blocks.

So far so good, we see an almost linear curve, which is natural regarding to what we were expecting.

Now, change a little bit our expectation to something quite different and more realistic where we would truly need the Application.Union method. Then let us say that we would like to have the union Ax:Cx mentioned above but for x odd index only. We want the IEnumerable<int> to have the same size than before, so let us use the method in the code below.

Similarly, we monitor the performance curve.

Argghhh!!! we get an awful quadratic cost which makes our approach completely broken for large unions. This quadratic cost was unexpected. Actually I did not found documentation on this (the msdn doc on Excel.Interop is really minimalist). However, it seems that the rule is that the Union method has a linear cost depending on the number of Areas in the input ranges. The Areas member of a Range is the collection containing the blocks of contiguous cells of the range.

This experimental rule, leads us to review our algorithm in a different way. If the cost of an Union is important (linear) when there are many areas in a range. Then we will try to minimize this situation: we will let our algorithm perform the union preferably on ranges having fewer areas. Once again, the basic techniques from school bring a good solution and we will design a very simple recursive algorithm based on the divide and conquer approach, inspired for the merge sort algorithm.

To the end, we recover an almost linear curve. The asymptotic complexity here (if the rows to pick are not adjacent to each other) equals the merge sort one which is O(n.ln(n)).

# F#-ception, wtf ?!?

Hi, its middle of summer so let us relax with a “coding for fun” post. The subject presented here came from a kind of “wtf contest” that we had with a colleague of mine on our free time. The objective was to write a unit test to tackle an existing code base using the weirdest approach. My contestant Gabriel is a brilliant developer, so in order to compete with him I had to bring something really absurd rather than technically sophisticated. So I came with the idea of executing F# code inlined in a powershell script invoked from F# code. Obviously, this is absolutely useless. However, it fitted well for the thematic of the  contest. In this post we will skip the “existing code base” part and focus on how to achieve this with a very simple example .

The first part of the job is to write some F# code that builds the powershell script with inlined F#. We will have to reference the F# code dom provider which can be downloaded here. Remark that for the sake of simplicity of this post we did not add any extra assemblies (see referencedAssemblies in script below).

Then let us try the FsToPs function with very simple case

Thus, the powershell code in simpleExample.ps1 looks like

You should be able to run this script in a powershell session. However, it is not finished for us, we have to invoke this script in F# and retrieve the returned value. The following snippet will do the job for us.

Then executed within MsTest framework….

…it works!

I’d like to thank Gabriel for being fair play by helping me to debug the first version of F#-ception.

# Rewriting a tree algorithm with purely functional data structures: the “zippers”.

This is my first post, it follows a question that I have posted few weeks ago here. A really interesting link provided by a stackoverflow user made me review my question and lead me to write this post. The objective here is to describe the modification of an existing algorithm implementation, from a first try which uses intensively mutability of the underlying data structure to a purely immutable functional implementation. We will show that this can be achieved quite easily when using the appropriate data structures which in our case are the so-called “zippers”.

I discovered the F# programming language recently. While implementing some stuff for the discovery of the language I came confronted with the next problem. Given a sequence of branches coming from a tree but accessible one by one in a random order, how to reconstruct (efficiently) a labeled tree. For example, imagine to retrieve your disk paths in a random order e.g. “C:\Foo”, “C:\”, “C:\Foobar”, “C:\Foo1\bar2” etc. and you would like to rebuild your filesystem.

The branches are supposed to be represented by a linked list of string labels. We also assume that the sequence does not contain twice the same branch and it is in valid, in the sense that a labeled n-ary tree can be created with it. There may be many trees that can be built but we will accept any of them, in other words, we do not care about the order of the children of a given node. The n-ary tree data structure is the usual one where the children of a node are represented by a linked list.

I have no doubt that there may exit better solutions but let us focus at the first implementation that came to my mind as a seasoned imperative and an inexperienced functional programmer. The algorithm is recursive and works as follows.

The branch is inserted in the tree using the recursive procedure:
If the current tree is empty then it is replaced by the branch (as a tree). Except the previous corner case, the invariant loop states that “the label of the current tree node is equal to head of the current branch”.
Having said that we explore a step below:
If the tail of the current branch is empty list then there is nothing to do. Indeed, following the invariant loop, the current branch is actually a sub-branch of the existing tree.
Else, we search the children of the given node and try to find a match with the next value of the current branch.
If such a match exists we are not done yet, we have to go deeper and apply our procedure on the subtree of the matched child with the tail of current branch.
If there is no match, we are arrived to an end so we append the current branch (transformed as a tree) to the children list.

Implementing this in F# leads to the following code (provided some help from Leaf Garland). The function that builds the tree is the mergeInto that performs side effects on its first parameter.

Now apply this with a simple example.

However, this implementation is quite frustrating because we have hacked our functional data structure with F# ref cells to enable mutability just for this particular construction. We needed such mutability to perform our “append” operation on the tree. We were focused on a subtree and we could not return a new tree with the branch attached correctly.  Because of the recursion, we did not know where we were exactly.

This is were the zippers come into play. The zippers were introduced by Gerard Huet in this paper. The authors says that they have probably been “invented at numerous occasions by creative programmers” but never published. I recommend you to have a look on the first fourth pages of this paper, it is very accessible for an academic publication and may help you for the following.

A zipper is a practical implementation of a given data structure, this is why we can say that what follows is a labeled n-ary tree implemented using the zipper pattern. To sum up, the zipper enables you to focus on a subtree but the structure keeps the path that lead you there granting you to return a new immutable instance of the complete tree.

The translation of zippers implementation of Huet’s paper from OCAML to F# is straightforward. In the implementation below I reused the good idea from this post where the author changed the order of the function arguments to enable F# pipe-forward. The go functions are used for tree traversal while the insert returns a new data structure with the input tree inserted. Most of the implementation of the zippers provided on the web are designed for leaf-labeled trees, so for our case we need to extend the types to support labeled trees. In the code below, we have removed the go_left and insert_left because they were not of any use for the following.

The power of the zippers comes from the Path and Location discriminated unions, even if the focus is on a subtree we know where we are because we kept track of what is on our left, above and on our right. If you want to visualize some zipper I suggest you to read this post.
You can see that the difference with Huet’s implementation on leaf-labeled trees is that the Path type contains the label of the currentPosition (that can be retrieved with the getLabel method). I was in difficulty to implement the go_up method without it (to build the parent node), though I am not completely satisfied with this…

Now, it is time to review our algorithm and we will see that it will become more elegant and less imperative:
The main difference comes from the fact that we do not have to look for a “matched child” while keeping the focus on the current node. We can go down directly and move right.
Precisely, the first corner case with the Empty tree stays similar. The invariant loop states “that the parent label of the current node equals the head of the current branch”.
Then, if the label of the node matches the head of the branch, we have to go deeper. If there is no right simbling then this is our destination and we have to append the branch here. Finally, if there is a younger simbling (a node at the right) reapply the procedure but put the focus on him.

In this new implementation, we can append the branch to the good location because we have an insert_right operation. In the previous implementation we could not do this (without modifying once again the tree structure) because we would have gone too deep to append the tree to the list kept by the parent of our node.

Remark that we have not speak the word “zipper” in the description, the zippers are only wrappers that provides efficient and useful operations on our immutable data structure.
So the core algorithm looks now

We introduce the getZipper function that creates a zipper around a tree and the appendToTree which is the method that we will call effectively (abstracting the zipper). Here how it works with the same branch sequence.

So now we have a competitor to our mergeInto method which uses only purely functional and immutable datastructure and which does not require any modification on the exposed tree data structure. Remark that provided the same input sequence of branches the two algorithms do not build the same tree. However, I would not said that the zipper is without default for our case. The zipper structure is quite complex , that is non negligible. We have also added an extra complexity cost, not in the appendToTreeZipper function but in appendToTree, it comes from the root function. After a quick examination of the go_up method an upper bound for the root function complexity can be found, it is the height of the tree multiplied by the number of individuals in the largest sibship.