Quantum programming: getting started with Q# and Quantum Katas

Professor Richard Feynman, nobel prize in physics and forefather of quantum computing.

Professor Richard Feynman. Nobel prize and one of the forefathers of quantum computing.

Quantum computing has received a lot of attention in the past years and in fall 2019 Google claimed they achieved the so-called quantum supremacy. Quantum computing brings with it great promises from its early days, when Richard Feynman and others, imagined that leveraging the quantum properties of subatomic particles could lead to devices with inconmensurable computing power compared to what could be ever achieved with a classical computer.

I am no quantum researcher or expert, I cannot reasonably predict who is going to win this “race” for quantum supremacy or even when. However, these recent claims make the race even more exciting to watch. Even if a usable quantum computing chip does not seem to be something we will have at hand in the short term, there is still a lot that can be done now, with simulations for instance.

Among the big players, Microsoft made an interesting move. Indeed, they started building an “open quantum programming community” by releasing end of 2017 their Q# programming language.

Microsoft describes Q# as:

Q# (Q-sharp) is a domain-specific programming language used for expressing quantum algorithms. It is to be used for writing subroutines that execute on an adjunct quantum processor, under the control of a classical host program and computer. Until quantum processors are widely available, Q# subroutines execute on a simulator.

A sphere is often used to represent the concept of Qubit: the fundamental unit of a Quantum capable device. Image source: https://medium.com/@kareldumon/the-computational-power-of-quantum-computers-an-intuitive-guide-9f788d1492b6

A sphere is often used to represent the concept of Qubit: the fundamental basic unit of in quantum information systems. Image source: https://medium.com/@kareldumon

Actually, Microsoft released a full Quantum Development Kit on top of the Q# compiler. It is completely free and open source. Following Microsoft habits to do things properly, even in its early days this was really well polished: documentation was pretty neat and complete and even some samples were shipped to help you start. They did pretty good job making all this super attractive, even the name Q# sounds cool, doesn’t it?

I started playing around with Q# in spring 2018. I managed to install and ran the provided samples (on my Mac) without any trouble. However, after all this I felt a little stuck. Now so what? Crafting quantum algorithms on my own seemed to be a distant dream, so I stopped there for a couple of months. It seems that the Microsoft QuArc team noticed this lack of training programs and guidance. To overcome this they launched end of 2018 the Quantum Katas: a series of exercises and tutorials so that “newbies” like me could ramp up at their own pace. It is really well done: very didacting and complete. I definitely recommend this as #1 source for anyone who would like to get started with quantum computing and programming.

In this blog post, I share some of my feedbacks, tips and even resources to help you getting started. As discussed here, to start for real quantum computing you need higher (post-secondary) education in mathematics. However, you can definitely get the substance of what is quantum computing with no maths at all. In all cases, I recommend you this 3-minutes videos, for a quick overview of what quantum computing is.



A very well done 3-min intro on quantum computing by IBM. If you do not want to jump into what follows and just have an overview of what is quantum computing, that's a good video to watch.

An overview of Quantum Katas

The name kata comes from Karate and martial arts. It's a discipline originally aimed at practicing a movement until the perfection of its execution.

The name kata comes from martial arts and karate. It’s a discipline originally aimed at practicing the execution of a movement until perfection. Image source Wikipedia.

Quantum Katas is a series of tutorials and exercises provided by Microsoft. They come with the form of small challenges that you need to solve using the Q# programming language. They guide you progressively from the very beginning to advanced and hard problems on quantum computing.

Most of the katas are presented as a Q# operation whose implementation is missing and your job is to complete it. Similarly to a unit test, the quantum simulator will validate your submission on a series of test inputs (that is left unknown to the programmer). Usually, completing a kata does not take more than 20 lines of Q# at max.

When executed within a Jupyter notebook, you benefit from well formatted explanations with beautiful maths equations right next to your Q# code inputs.

A Q# kata example: tt takes the form of an exercise. Fill the operation implementation with code and see if it passes a series of tests. If you do not understand anything yet, do not worry, the tutorials will lead you there.

A Q# kata example. It takes the form of an exercice. Fill the operation with code and see if it passes the tests. If you do not understand anything yet, do not worry: the series of tutorials will lead you there.

What are the real prerequisites to get started with Q# and Quantum Katas?

On the Katas’s webpage Microsoft provides the prerequisites to get started. I think it was interesting to provide my point of view here.

Mathematics

Linear algebra is almost everywhere in quantum computing. Precisely, linear algebra with the field \(\mathbb{C}\) of complex numbers. As you will learn, a \(N\)-Qubits system is described by a \(2^{N}\) dimensional vector space on \(\mathbb{C}\).

I would say that to really get started with quantum computing, one would need a good knowledge on complex numbers and linear algebra that would correspond more or less to a BSc. in Sciences. In the french/european universitarian system this corresponds to Licence L2/L3 or Maths Spé (for preparation classes). I was a maths teaching assistant at University Paris VI for three years, I taught there linear algebra, material and exercices are still available here (in french).

The Quantum Katas provide two tutorials about complex numbers and linear algebra. Yet, in my humble opinion, I think this a good refresher on concepts and notations but I doubt one can reasonably learn from scratch and be ready for the rest of Quantum Katas.

Programming

At first glance, the Q# syntax looks like C# syntax. Moreover, Q# is functional by design and one can definitely see inspiration from F#, the C# cousin’s functional language, also running on top of the .NET CLI. For example, variables are immutable by default. But do not worry too much, your interaction with .NET will be limited to the sole installation of .NET Core on your machine (maybe not at all if you go with a Docker install).

This is how a piece of Q# looks like. Similarly to its great cousins: F# and C#, it let you switch easily between functional and imperative styles.

operation TwoBitstringSuperposition (qs : Qubit[], bits1 : Bool[], bits2 : Bool[]) : Unit {
    mutable i0 = Length(qs);
    for (i in 0..Length(qs)-1){
        if(bits1[i] == bits2[i]){
            if(bits1[i]){
                X(qs[i]);
            }
        }else{
            if(i > i0){
                if(bits1[i] == bits1[i0]){
                    CNOT(qs[i0], qs[i]);
                }else{
                    X(qs[i0]);
                    CNOT(qs[i0], qs[i]);
                    X(qs[i0]);
                }
            }else{
                H(qs[i]);
                set i0 = i;
            }
        }
    }
}

When completing Quantum Katas, you will not be required to go really deep in the Q# language. Most of the exercices are solved with a couples of lines. To do your katas, you will just have to learn the fundamentals of the language: basic syntax, types, loops, conditional statements and of course use the core of the Q# libraries for Qubits manipulations: Gates, Measurements, Simulations etc.

Contrary, to the maths prerequisites that are underestimated in Quantum Katas introduction, you absolutely do not need to be a hardcore programmer to get started with Q# and Quantum Katas.

Let us just point out that the Q# QDK is much more that just a mere Domain Specific Language (DSL) on top of the .NET CLI. It implements a rich type system and comes with an extensive library suite adapted to Quantum Programming. For example, operations are distinguished from functions. In addition, operations have advanced functors support such as adjoint and controlled version which makes this very impressive from a language design perspective. No doubt that there are great engineers and researchers at work in the QuArc Team. This also proves that this is a real investment for Microsoft, not just a toy to do some buzz.

Physics

Good news here: you do not necessarily need any previous knowledge on quantum mechanics to get started. Contrary to maths and computer science, I did not study quantum mechanics. I learned recently the basics.

I would recommend nevertheless some quick onboarding tour on the fundamental concepts on quantum mechanics (see the recommended resources below). Good news this is fascinating. I hope you will enjoy as much as I did.

Some tips and a compilation of good resources available on the web

While the Quantum Katas, and more generally the Microsoft documentation, point to a lot of great ressources. I thought that it would be interesting sharing what helped me the most.

Accept (quickly) the principles of quantum mechanics

Before you can start with quantum programming, you just need to accept some of the fundamentals of quantum mechanics: superposition and entanglement. I think that you do not need much more than this. I also chose the wording accept, on purpose. Pr Richard Feynman even said to his students:

If you think you understand quantum mechanics, you don’t understand quantum mechanics.

So my advice here is to accept these principles and quick, otherwise you would be stuck there for a very long time. You will have plenty of time to meditate on this but if you want to start quantum computing this will slow you down. And, before I forget, let me also warn you with a common mistake, we beginners do to reassure ourselves when confronted with the shocking reality of quantum mechanics. It is tempting to think that a Qubit in superposition is still Zero or(exclusively) One but because we cannot observe it for some obscure reasons, it is just a modelisation of the possible outcomes by probabilities. That’s incorrect: with superposition, they are both \(|0\rangle\) and \(|1\rangle\) at the same time. Yes this is crazy! But there’s another world down there at subatomic scale, with different laws…



I personally liked this video on superposition and decoherence. Visualizations are great.

I also saw some amazing animations on this website (in French): toutestquantique.fr
From the same authors, this video illustrates well the wave-particle duality.



French are not only good at soccer (I mean football), they also create great videos on wave-particle duality.

Do not spend too much time on the Bloch Sphere and quantum circuit visualizations

The Quantum Katas take an approach “by code” with Q#. This has been really adapted for me. But with other didactic approaches I followed when I started, the Bloch Sphere is often used to represent and visualize 1-Qubit states.

Indeed, a Qubit in the arbitrary (superposed) state \(|\psi\rangle\) can be written \(\alpha |0\rangle + \beta |1\rangle\), \(\alpha, \beta \in \mathbb{C}\) but also \( cos(\frac{\theta}{2}) |0\rangle + e^{i\phi} sin(\frac{\theta}{2})|1\rangle\), \(\theta, \phi \in \mathbb{R}\). This leads to a 3D visualization on the Bloch Sphere. I allows you to see the actions of the most commonly used 1-Qubit gates, such as Hadamard and Pauli gates.. However, again in my humble opinion with my little experience, this has not brought much to me to understand quantum computing and its concepts. In addition, keep in mind that this visualization is limited to 1-Qubit states. You will quickly manipulate multi Qubits systems so the Bloch sphere will not help you much there.

When starting to learn quantum computing, a reference often cited is Quirk. While it’s an amazing project, building visual quantum circuits has not really helped me to “get” the fundamentals of quantum computing and why it can achieve for example these exponential speed ups.

The Bloch Sphere and its 3D representation.

The Bloch Sphere and its 3D representation. This is very popular, but has not helped me much while trying to get into quantum computing.

Adopt quickly the bra-ket notations and get used to tensorial product

If you come with a mathematical background you probably have not worked with bra-ket notations before. If everybody use them in quantum mechanics and computing this is for a reason. As you will learn in the Quantum Katas tutorials, the canonical base for a 3-Qubit systems is composed of the 8 vectors \(|000\rangle\), \(|001\rangle\), \(|010\rangle\)… which is much more handy to write compared to a \(\mathbb{C}\) 8-dimensional vector for each. Remember, a \(N\)-Qubit state is represented by a \(2^{N}\) dimensional vector. Convince yourself that the bra-ket notations are consistent with the matrix representation you are more familiar with and after that stick and embrace bra-ket notations. I also advise you to review the tensor product properties, which is used intensively in quantum computing. In my mathematical studies I had encountered it sometimes (more frequently as an exercise), here it plays a central role.

Watch this PhD comics video – 6min

This one is more advanced than the video mentioned in the introduction. Yet, it gives you an amazing tour on what quantum computing is in only 6 minutes.



From Youtube's description: theoretical Physicists John Preskill and Spiros Michalakis describe how things are different in the Quantum World and how that can lead to powerful Quantum Computers.

Watch this conference from Andrew Helwer – 1h30

The conference is lead by a young talented researcher. With humility and great clarity, he onboards an audience of computer scientists to quantum computing. The attendees’ questions are also relevant.



Andrew Helwer's talk: Quantum computing for computer scientists.

Use Jupyter notebooks for your Quantum Katas

When completing Quantum Katas. You will have the choice to run them as Q# projects or Jupyter notebooks. I strongly encourage you to use the notebooks. You will benefit from great well formatted tutorials with \( \LaTeX \) formulas.

Quickly, Jupyter is a Python based opensource project which allows you to write code in cells embedded within a web page called a notebook. The code in cells is remotely executed by a webserver kernel. Here the Q# team distributed a Jupyter kernel: IQSharp, performing Q# execution for notebooks. This may sounds a little bit complicated but just follow the instructions to have this setup, you will not regret it.

By the way, I recommend installing Jupyter with pip and virtual environment. With Python, I am not a big fan of Conda and global installs and have always been a promoter of the use of virtual environments.

Understanding Deutsch–Jozsa is a good first objective

After following, the first tutorials and katas on Qubits, Superposition, Measurements etc. I strongly advise you to start with Deutsch-Jozsa algorithm. Even the problem it solves does not sound sexy, it is definitely the simplest example where quantum computing achieves an exponential speedup.

Shortly, it allows “with one oracle evaluation” to determine if a function \(f: \left\{ 0, 1 \right\}^{N} \rightarrow \left\{ 0, 1 \right\}\) is either constant or balanced (output the same number of zeros and ones). With a classical computing you would need in the worst case \(2^{N-1}+1\) evaluations but with Deutsch-Jozsa algorithm and a quantum computer you would just need one evaluation of the oracle. There are a lot of great explanations of how this works, including the Quantum Katas tutorials. I really think this should be the first one you should try to really understand. It’s definitely simpler that Shor’s or Grover’s algorithms.

The Quantum Teleportation is often depicted to be the Hello World of quantum programming. However, Deutsch-Jozsa is the algorithm that will really give you the intuition on why the quantum computing provides these incredible algorithmic speedups. If you are interesting on a well explained blog post about Quantum Teleportation, I would recommend this one.

That’s all for this onboarding tour and tips compilation. I hope that you are now really excited to give a try to Quantum Katas. Happy quantum coding!

Baby twins deep learning classification with Inception-ResNetV1

A photo of my two girls with annotation used for building a face recognition dataset. Here after anonymization.

A photo of my two girls with annotations. It will be used for building the face recognition dataset. In the blog post, their faces have been blurred for anonymization.

Deeplearning techniques have proven to be the most efficient AI tools for computer vision. In this blog post we use a deeplearning convolutional neural network to build a classifier on my baby twins pictures.

When it comes to machine learning practical experiments, the first thing anybody needs are some data. When experimenting for hobby, we often rely on some open and popular dataset such as MNIST or the IMDB reviews. However, it is useful for improving to be confronted with challenges on fresh and unworked data.

Since July 2019 (9-months at the time of the writing), I am the happy father of two lovely twin baby girls: L and J. If I have free private data at scale, it is definitely photos of my kids. Indeed, all our families have been taking pictures of them and, thanks to Whatsapp and other communications means, I have been able to collect a great part of them.

Deeplearning and Convnet neural networks are now the state-of-the-art methods for computer vision

Deeplearning and Convnet neural networks are now the state-of-the-art methods for computer vision. Here are Convnet visualizations on a L. photo.

In this post, we will use a state-of-the-art deep learning architecture: Inception ResNetV1 to build a classifier for photo portraits of my girls. We also take benefit of some pretrained weights from facenet dataset. Before that, we will make a detour by tricking a little bit the problem: this will allow us to check our code snippets and review some nice visualization techniques. Then, the InceptionResNetV1 based model will allow us to achieve some interesting accuracy results. We will experiment using Keras backed by Tensorflow. We conducted the computing intensive tasks on a GPU machine hosted on Microsoft Azure.

The code source is available here: on Github. The dataset is obviously composed of personal pictures of my family that I do not want to let openly accessible. In this blog post, the faces of my babies have been intentionally blured to preserve their privacy. Of course, the algorithms whose results are presented here were run with non obfuscated data.

In this post and in the associated source code we reuse some of the models and snippets from the (excellent) book Deep Learning with Python by François Chollet. We also leverage the InceptionResNetv1 implementation from Sefik Ilkin Serengil.

Let us start this post by explaining the problem and why it is not as easy as it may seems to be.

A more complex topic that it seems

While my baby girls L and J are not identical twins, they definitely are sisters, they do look alike a lot! In addition, the photos were taken since their first days in this world to their 8 months. It is no secret that babies change a lot during their first months. They were born weighing around 2.5 kgs, now they are close to 9 kgs. Even family members whom saw them in their first days have difficult times distinguish them now.

In addition, following their grandmother’s home country tradition L and J were shaved at the age of 4 months. Their haircut is not a way to distinguish them: we have photos before the shaving, during hair regrowth and now with middle to long hairs. Also, many photos were taken with a woolen hat or anything else. Consequently, we will be pushing a little the limits of face recognition.

Our objective in this series of experimentation is to build a photo classifier. Precisely, given a photo we would like to make a 2-class prediction: “Is this photo is the one of L or J?”. We assume then that each picture contains one and only one portrait of one of my two baby girls.

I have collected 1000 raw photos that will be used to create the dataset.

Building the dataset

Photo tagging

In the raw datasets, some photos contain only L, others only J, some both and some none of them. We exploit these photos to extract face pictures for each girl. To do so, we need to locate precisely the faces in the photos first.

For efficient annotation, I have used an opensource tagging software: VoTT which is supported by Microsoft. The tagging is pretty straightforward and you can quickly annotate the photos with a very intuitive interface. It took me between one and two hours to tag the full dataset.

Efficient photos annotation with VoTT

Efficient photo annotation with the VoTT software. Note also the FC Nantes outfits…

One of the worries of twins parents, is the fear to favor one child over the other. Well, I am not concerned and data speak for me. Here are the results: after the tagging we have a very well balanced tags repartition with a little more than 600 tags for each of the girls.

The tag repartitions of L and J.

The tag repartitions of L and J.

Now we will build the picture dataset: where each picture contains the portrait of one of the kid. VoTT provides the tag location within the picture as a JSON format. Therefore, it is easy to crop all files to produce a dataset where each image contains only one kid’s face, see this code snippet.

Extraction process from tagged photos to square cropped images

The extraction process from tagged photos to square cropped images.

Splitting the dataset: train, validation, test

As always for any machine learning training procedure, one must separate the original dataset between: 1) training data that will be used to fit the model and 2) validation data that will be used to measure performance of the tuned algorithms. Here we go further by keeping also a third untouched test dataset.

It is always easier to work with an equally balanced dataset. Luckily this is almost the case with the original data. Consequently after processing (mainly shuffling and splitting) we obtain the following repartition in our file system:

├── train
│ ├── J => 396 JPG files
│ └── L => 396 JPG files
├── validation
│ ├── J => 150 JPG files
│ └── L => 150 JPG files
└── test
│ ├── J => 100 JPG files
│ └── L => 100 JPG files

The code snippets for splitting and shuffling the dataset is available here.

Machine setup – GPU Powered VM

The experiments are conducted with Python 3.6 and Tensorflow GPU 1.15. We use the high level deeplearning library Keras 2.2.4.

We have setup an Ubuntu 18.04 NC6 GPU VM on Microsoft Azure. The NC-series VM use Nvidia Tesla K80 graphic card and Interl Xecon E5-2690 for CPU. With the NC6 version we benefit from 1 GPU and 6 vCPU, this setup maked the following computations perfectly acceptable: all experiments lasted less than few minutes.

This is the second time I do the setup of an Ubuntu machine with Cuda/cudNN and Tensorflow, same as before this was a real pain. The official documentation from Tensorflow is totally incorrect and guides you in the wrong direction. Finally, I managed to have a successful setup with the following Tensorflow-gpu 1.15.0, Keras 2.2.4 and Cuda 10.0 thanks to this StackOverflow post.

For efficient development, I use VSCode with the new SSH Remote extensions which make remote development completely seamless. The experiments are also conducted with IPython Jupyter notebook. And once again VSCode provides out-of-the-shelf SSH tunneling to simplify everything.

Tensorflow confirms that its primary computing device is Tesla K80 GPU

Tensorflow outputs confirm that its primary computing device is our Tesla K80 GPU.

The nvidia-smi command shows load on the GPU from the Python process

The nvidia-smi command shows load on the GPU from the Python process

First, a simplified problem with tricked data to get started

The experiments provided in this section can be found in this notebook.

Here we will make a small detour by simplifying tricking the challenge. We will add easily detectable geometrical shapes in the images.

Drawing obvious shapes on image classes

When tackling a datascience project, I always think it is great to start really simple. Here, I wanted to make sure that my code snippets were ok so I decided to trick (temporarily) the problem. Indeed, I drawed geometrical shapes on image classes. Precisely, for any J photo, a rectangle is drawn and, for any L photo, an ellipse is inserted. The size, the shape ratio and the filling color are left random. You can see with the two following examples what this looks like:

An ellipse is drawn on all L photos, for train, validation and test sets

An ellipse is drawn on all L photos, for train, validation and test sets.

 

A rectangle with random filling color on all J images.

A rectangle with random filling color on all J images.

Of course this completely workarounds and tricks the face recognition problem. Anyway that’s a good starting point to test our setup and code experimentation snippets.

A simple Convnet trained from scratch

For this simplified task we use the basic Convnet architecture introduced by Francois Chollet in his book (see the beginning of the post). Basically, it consists of 4 2D-convolutional layers followed by MaxPooling layers. This constitutes the convolutional base of the model. Then the tensors are flatten and a series of Dense and Dropout layers are added to perform the final classification parts.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 198, 198, 32) 896
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 99, 99, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 97, 97, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 48, 48, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 46, 46, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 23, 23, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 21, 21, 128) 147584
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 10, 10, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 12800) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 12800) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 6554112
_________________________________________________________________
dense_2 (Dense) (None, 1) 513
=================================================================
Total params: 6,795,457
Trainable params: 6,795,457
Non-trainable params: 0

97%+ accuracy

Actually this is no surprise that we are able to achieve great classification performance. The algorithms performs without difficulty. To do so we used the standard Data-augmentation techniques. Note that for this task we skipped the rotations.

## Define TRAINING SET
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=0,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True, fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size = BATCH_SIZE,
class_mode='binary')

## Define VALIDATION SET
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_directory(VAL_DIR,
target_size=(TARGET_SIZE, TARGET_SIZE),
batch_size = BATCH_SIZE,
class_mode='binary')

After running the training on 30 epochs we observe the following learning curves.

model.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-3), metrics=['acc'])
history = model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=30, validation_data=validation_generator, validation_steps=50)
Training and validation accuracy on tricked data. We achieve strong accuracy without surprise.

Training and validation accuracy on tricked data. We achieve strong accuracy without surprise.

Now that the results seem satisfactory without sign of overfitting (the validation accuracy grows and stalls). It is time to measure performance on the test sets composed of the 200 pictures left aside.

Accuracy is a key indicator but even with a 2-class classification problem, it is a common error to ignore more subtle information such as the confusion matrix. Using the following snippet and the scikit learn library. We are able to collect a full classification report along a confusion matrix. Again, here all signals are green, and we see the results of a very efficient classifier. Yet, let us not forget that the game has been tricked!

from sklearn.metrics import classification_report, confusion_matrix

TEST_SIZE=200
target_names = ['J', 'L']
Y_pred = model_convenet_ad_hoc.predict_generator(test_generator, TEST_SIZE)
Y_pred = Y_pred.flatten()
y_pred_class = np.where(Y_pred > 0.5, 1, 0)

print('Classification Report')
print(classification_report(test_generator.classes, y_pred_class, target_names=target_names))

print('Confusion Matrix')
plot_confusion_matrix(confusion_matrix(test_generator.classes, y_pred_class),target_names)
The confusion matrix with tricked data. We see excellent accuracy, precision and recall.

The confusion matrix with tricked data. We see excellent accuracy, precision and recall.

Going beyond and observe Conv2d learnings with an activation model

Again, we use a technique well exposed in the François Chollet’s book.

The idea is to build a multi output model based on the successive outputs of the base convolutional layers. Thanks to the successive ReLu layers, we can plot the activation maps from the outputs of these layers. The following visuals illustrate well that our Convnet base model has successfully learned the geometrical shape: the curved stroke of ellipses and the squared edges of rectangles.

layer_outputs = [layer.output for layer in model.layers[:depth]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
predictions = models.predict(input_img_tensor)
One the extracted feature on the bottom layer from the L picture with ellipse. We see in green the activated regions. The ellipsis is strongly activated.

One the extracted feature on the bottom layer from the L picture with ellipse. We see in green the activated regions. The ellipsis is strongly activated but also the pacifier.

From the bottom layers of our neural network model, the ellipsis are obviously the most activated regions of the input picture.

From the bottom layers of our neural network model, the ellipsis are obviously the most activated regions of the input picture.

With upper layers, it is visible that the patterns that is captured for the classification is the curve of the stroke path of our ellipsis. Similarly, the square corners of the rectangles are captured.

With upper layers, it is visible that the patterns that is captured for the classification is the curve of the stroke path of our ellipsis. Similarly, the square corners of the rectangles are captured.

Back to the real face recognition problem

Now it is time to get back to our original problem: classification of my baby girls without relying on any trick, just with face recognition on the original images. The simple Convnet in the previous section will not be sufficient to build a classifier with significant accuracy. We will need bigger artillery.

The experiments provided in this section can be found in this notebook.

Using the InceptionResNetV1 as a base model

InceptionResNetV1 is a deep learning model that has proven to be one of the state-of-the-art very deep architecture for convolutional networks. It also uses the concept Residual Neural Networks.

We use its implementation provided originally by Sefik Ilkin Serengil whom was also reusing parts of the implementation provided by David Sandberg.

For our classification problem, we use InceptionResNetV1 as the base (then very deep) network. On top of it we flatten the tensors and bring Dense and Dropout layers to serve as classification.

Layer (type) Output Shape Param #
=================================================================
inception_resnet_v1 (Model) (None, 128) 22808144
_________________________________________________________________
dense_1 (Dense) (None, 256) 33024
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_2 (Dense) (None, 64) 16448
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 16) 1040
_________________________________________________________________
dense_4 (Dense) (None, 1) 17
=================================================================
Total params: 22,858,673
Trainable params: 22,829,841
Non-trainable params: 28,832

Achieving nearly 0.80 accuracy

We conducted some experiments when trying to make InceptionResNetV1 without any prior weights tuning, which means without using pre-trained experimentations (sometimes called Transfer Learning). Without any surprise, the model could not reach significant validation accuracy (i.e. significantly above 0.6 in accuracy). Our dataset, even with data augmentation, is too small to let a deep architecture “learn” what are the key elements that constitute the characteristics of a human face.

Therefore, we reuse pretrained weights from facenet database, provided by David Sandberg and Sefik Ilkin Serengil in his blog post.

from inception_resnet_v1 import *

def model_with_inception_resnet_base(pretrained_weights):
  model = InceptionResNetV1()

  if pretrained_weights == True:
  #pre-trained weights https://drive.google.com/file/d/1971Xk5RwedbudGgTIrGAL4F7Aifu7id1/view?usp=sharing
    model.load_weights('facenet_weights.h5')

  new_model = models.Sequential()
  new_model.add(model)
  new_model.add(layers.Dense(256, activation='relu'))
  new_model.add(layers.Dropout(0.5))
  new_model.add(layers.Dense(64, activation='relu'))
  new_model.add(layers.Dropout(0.5))
  new_model.add(layers.Dense(16, activation='relu'))
  new_model.add(layers.Dense(1, activation='sigmoid'))

  return new_model

Thanks to our GPU we were able to retrain the full model composed of a InceptionResNetV1 base with our top layers classifiers. We did not even have to recourse to fine-tuning techniques where the first layers weights need to be frozen.

After a dozen of minutes of training, I was happy to see the following training and validation accuracy curves.

The training and validation accuracy over the epochs. We see that the validation accuracy reaches 0.80 accuracy.

The training and validation accuracy over the epochs. We see that the validation accuracy reaches 0.80 accuracy.

This shows all the positive signs of a successfully trained ML algorithm. Therefore, let us examine the performance on the test dataset, i.e. the one that has not been fed to the algorithm before.

The final classification report and confusion matrix. We achieves nearly 0.80 of accuracy.

The final classification report and confusion matrix. We achieves nearly 0.80 of accuracy.

The classification reports and confusion matrix on the test dataset confirm the measure on the validation set. We achieve nearly 80% of accuracy. One interesting thing is, from the report, J. looks to be a little be more difficult for our model to classify than L. Honestly, I have no assumption on what could cause this. A deeper analysis by examining layers in the spirit of what has been presented above could be conducted.

Conclusion

I did not spend time trying to tune so much the InceptionResNetV1 hyperparameters. I also tweaked but only a little the top layers. No doubt that there is room for great improvements here. This can constitute a follow up blog post.

Also, I did not confront other algorithms and deeplearning architectures. I quickly gave a try to the Deepface Keras implementation but without significant results. I did not spend time investigating why this was not working. Once again this could be part of an interesting follow up. Ideally, I would also benchmark this DLib implementation.

By conducting these experiments, I confirm that it is nearly impossible to perform “real” recognition on faces without some kind of pretrained models if you have at hands this small amount of face data.

Finally, I learnt a lot. It is always a good thing to try things on your own data. This is how you learn to tackle real-life problems in datascience.

Our classifier works. It is now able at 80% accuracy to recognize between my two baby girls.

Our classifier works: it is now able at 80% accuracy to recognize between my two baby girls.

Windows with WSL2, a good configuration for a dev team?

In the Talentoday tech team we have regular discussions about the best Operating System (OS) workstation configuration for the team. While experienced engineers are free to choose what they prefer, it is reasonable to recommend and maintain a kind of preferred configuration for interns and juniors to facilitate the setup and onboarding. Talentoday tech stack being *Nix based, for long most of the developers were using MacOS with MacBooks. This is definitely an acceptable solution but we confronted that, and some of us (me included), started to use Linux distributions. However, I did not find yet an ideal setup. I will share here some criticisms about various OS configurations and discuss the new Microsoft WSL2 and see how this could be the best of both all worlds, yet to be confirmed.

Usual OS configurations, main flaws

First I am presenting here my main criticisms, as you will notice most are not on the technologies themselves but more on their daily usage within a team and a potential large fleet of devices.

Windows OS. Unless you develop solely .NET applications (the regular .NET FX not the new .NET Core) that can only run on Windows OS then, Windows is probably not the ideal dev OS. While Python interpreters work well on Windows some other technologies, like Ruby for instance, are really a pain on Windows. It is a fact that most of developer tools and technologies are not thought for Windows (or at least not as a first class citizens). On the positive side, administrating the team machines would be easier compared to others. Indeed, Windows is really adapted for maintaining a large fleet of machines and most everyday applications run perfectly on Windows. Unfortunately, this latter argument does not counterbalance the poor dev experience on many technologies.

Linux standalone OS setup. In this case, you have a real Linux OS that will probably be the closest thing that hosts and runs your applications in production. Using Linux, you can use a great package manager such as apt. Good code editors like VSCode or Sublime work perfectly. Yet… in the real world other problems arise. First, the ones that are due to Linux desktops and window manager themselves. For example, having different resolutions and scaling for your screens is not well supported unless you go with Weyland. Well with Weyland you will stumble on troubles for sharing your screens. In our team, we do remote work, we need to share things when huddling. We have to present to clients and teammates around the world everyday. You need a comfortable setup with multiple monitor and the ability to jump on in case clients invite you on a GoToMeeting, Zoom.us or whatever videoconf system and this needs to work. Similarly, it happens frequently that we receive *.xlsx, *.docx files with macros that only Office desktop can process. Therefore, even if you manage to have everything working on your Linux desktop workstation there are things, independent of your will, that can sometimes be problematic.

Dual boots. Honestly, I never was really convinced by dual boots. It is quite cumbersome. In addition, most of the time you need to disable secure boot, which is, for a company setup, probably not a good recommendation.

Leveraging virtualized OS. That’s an option and here are two possibilities. First, host OS is Windows/MacOS and you develop in a guest Linux. Well if you are a developer your primary work will probably be development, then, even though you can make your VM as seamless as possible, it is always preferable to do your everyday work in the host OS. The other way is setting a guest Windows OS (you cannot guest MacOS) on Linux host. Again, some problems like sharing your Linux screen for showing your local devwork will not go away. Let me also point out that you depend a lot on the hypervisor and its ability to handle well displays. My experience proved me that this is often bugged. Fixing this kind of bugs does not seem to be the top priority of hypervisors. Take for example Oracle and VirtualBox that keeps ignoring for years this bug on HighDPI screen.

Mac OS. It seems that this has become a very popular solution for developers in the past decade. This is the configuration that I have now for almost two years and it works!  Nevertheless, there are some criticisms that cannot be ignored. Even if MacOS is Unix based and POSIX compliant this is no Linux. This makes a bit of difference: you cannot use apt-get but you have to rely on Homebrew instead. Small stuff that make your local config different from the prod systems. One can not ignore that Docker For Mac also has some serious flaws. In addition, it is clear that maintaining a fleet of Macbook Pros is quite a pain: no native group policies . For device maintenance, you have to go with Apple support etc. Finally, the cost associated with purchasing Macbook Pro compared to its equivalent at Dell is higher (even that is not so much as it is often mentioned in discussions).

The best of all worlds? Windows 10 with WSL2?

Ubuntu on Windows

Ubuntu on Windows

WSL stands for Windows Subsystem for Linux. At the time of  writing its first version WSL1 is shipped on all up-to-date Windows 10 builds, to get access to WSL2 you need to register to Windows Insider Program and retrieve a preview build.

With WSL, Microsoft brought a Linux Kernel inside Windows 10. It can be used directly from within Windows host using a bash shell. In its first version, it was really useful for many small tasks, such as sshing easily from Windows (and not relying on PuTTy). Yet, some limitations on networking and filesystem did not make it a viable solution for an everyday developer workstation setup. WSL2 is now a much more complete solution that makes usable for development (involving a full VM on top of Hyper-V that some may even see as a regression).

In addition, some really important news came along. First, the popular code editor VSCode fully supports WSL2 and brings extensions. Second, and probably more important, Docker in the latest tech preview edition of Docker Desktop let you use WSL2 to run the containers.

If you are working on existing projects, there is a strong possibility that some (if not all) of your services leverage Docker. Therefore, you must have a way to run and access Docker containers from your new WSL2 environment. This was our situation at Talentoday where we rely a lot on Docker.

I tried to install our *Nix based Talentoday stack on Windows 10 with WSL2 and it worked like a charm.

As a big overview, at Talentoday, the applicative server services are built on top of Ruby on Rails (with jobs and queues etc.) accessing SolR indexers. We also have a cache mechanism on top of Redis, a PostgreSQL database and a Flask based Python WebAPI. For the developer experience, we rely a lot on Docker containers (including the database) for this various components. With the Docker Tech Preview, I had no problem building and starting my images and contacting them within WSL2 and accessing them from both WSL2 code and/or host Windows.

We will need some step back and see if WSL2 is the right way to go in the long run. We will have to pursue our investigation and see if there are no other problems (performance, networks etc.) that I could not spot during this one day trial. In all cases, this early attempt sounds extremely promising and could be a good solution for developers in teams who want to keep the benefit of a commonly distributed OS and a real developer experience leveraging *Nix based technologies.

Talentoday stack running on WSL2

Talentoday stack running on WSL2

Remote debugging Python with VSCode

I truly think that no matter what your platform is, you must have access to a comfortable development environment and a working debugger is one of the most important part of it. Remote debugging can be very helpful: it is possible to execute code on a remote machine and benefit from a nice debugging experience locally in your favorite code editor.

In this blog post I propose to review the setup of Python remote debugging with the portable and popular code editor VSCode. Actually VSCode documentation provides some very short instructions. In this blog post we will provide more explanations.

Use remote debugging capabilities of VSCode with Python

Use remote debugging capabilities of VSCode with Python

Prerequisite

We will assume that we do not have any security constraints. Precisely, we do not care about MITM interceptions between our client and remote server. We will discuss in appendix how we could solve this using SSH portforwarding.

We assume that the reader is familiar with the usage of a debugger in VSCode. In addition, we assume that the reader knows how to logon on a remote machine using SSH.

Our example

In this blog post we used an Ubuntu Azure Virtual Machine. Its configuration, RAM, GPU etc. are independent so you can basically choose anything.

We assume now that the reader has an Azure Ubuntu server running and is able to logon through SSH. Note that in VSCode documentation SSH portforwarding is mentioned but we will ignore it for now.

Let us present precisely what remote debugging is.
In this post, the name remote stands for our Ubuntu VM on Azure while the client is our local, e.g. MACOS, computer. With remote debugging only a single Python process is executed on the remote VM then, on client computer, VSCode “attach itself” to this remote process so you can match the remote code execution with your local files. Therefore, it is important to keep exactly the same .py files on client and in host so that the debugging process is able to match line by line the two versions.

The magic lies in a library called ptvsd that makes the bridge for attaching local VSCode to remotely executed process. The remotely executed Python waits until the client debugging agent is attached.

Obviously network communication is involved here and that is actually the major pitfall when configuring remote debugging. The VSCode documentation is fuzzy about whether to use IP or localhost which port to set etc. We will try to simplify things so the debugging experience becomes crystal clear.

Networking

To make things simpler we decided to show an example where the Python process is executed on a remote machine whose IP address is 234.56.45.89 (I chose this address randomly). We use the good old port 80 for the communication (the usual port for http).

Before doing anything else we need to make sure that our remote VM network configuration is ok. We will make sure that machine 234.56.45.89 can be contacted from the outside world on port 80.

Firstly, using an SSH session on remote machine we will start a webserver using the following Python3 command. You may need elevated privilege for listening on port 80 (for real production usage give this privilege to the current user, do not sudo the process).

sudo python3 -m http.server 80

Secondly on a client terminal you should be able request your machine using wget (spider mode to avoid file download). In this command the target machine is accessed with IP:PORT

wget --spider 234.56.45.89:80

You should get response from the server. If you see some errors, you mat need to open the 80 port in firewall configuration, see instructions here for Azure.

Make sure you can contact your machine on port 80 by running a one line Python server

Make sure you can contact your machine on port 80 by running a one line Python server

At this stage your network configuration is ok. You can stop the Python command that runs the webserver.

Configuring VSCode

Make sure that you have the VSCode Python extension installed. Follow the instructions here to add a new Debug configuration in your launch.json containing the following JSON configuration.

{
    "name": "Attach (Remote Debug)",
    "type": "python",
    "request": "attach",
    "localRoot": "${workspaceRoot}",
    "remoteRoot": "/home/benoitpatra",
    "port": 80,
    "secret": "my_secret",
    "host":"234.56.45.89"
}

It is important to understand that this configuration is only for VSCode. The host corresponds to the machine where the remote Python process is ran. The port corresponds to the port that will be used by the remote process to communicate with the client debugging agent, in our case it is 80.

You must specify the root folders, on both local environment and on the remote machine.

That’s it for VSCode configuration.

The code under debugging

Let us debug the following Python script

import os
import ptvsd
import socket
ptvsd.enable_attach("my_secret", address = ('0.0.0.0', 80))

# Enable the line of source code below only if you want the application to wait until the debugger has attached to it
#ptvsd.wait_for_attach()

ptvsd.break_into_debugger()

cwd = os.getcwd()

print("Hello world you are here %s" % cwd )
print("On machine %s" % socket.gethostname())

As explained in the introduction, the Python file must be the same on client and on remote machine. There is one exception yet, the line ptvsd.wait_for_attach() must be executed by remote Python process only. Indeed, it tells the Python process to pause and wait that the client is attached to continue.

Of course in order to execute it you may need to install dependencies (for example using Pip) so it executes on the remote machine.

REMARK: looks like at the time of the writing version of ptvsd>3.0.0 suffers some problems. I suggest that you force the install of version 3.0.0, see this issue.

It is important to understand that enable_attach, enable_attach, break_into_debugger are instructions for the remote Python process. The first line ptvsd.enable_attach("my_secret", address = ('0.0.0.0', 80)) basically instructs the remote Python process to listen on all network interfaces, on port 80 for any client debugger that would like to attach. This client agent must provide the right secret (here my_secret).

The line ptvsd.break_into_debugger() is important, it is the line that allows to break and navigate in code with client VSCode.

Putting things together

Now you are almost ready. Make sure your Python file is duplicated on both local and remote at root location. Make sure the ptvsd.wait_for_attach is uncommented and executes on remote environment.

Now using an SSH session on remote machine. Start the Python process using elevated privileges
sudo python3 your_file_here.py

This should not return anything right now and should be hanging, waiting for your VSCode to attach the process.

Set a VSCode break point just after ptvsd.break_into_debugger(), make sure that in VSCode the selected debugging configuration is Attach (Remote Debugger). Hit F5, you should be attached and breaking in code !

What a relief, efficient working ahead !

Breaking in VSCode

Breaking in VSCode

Going further

The debugging procedure described aboved is simplified and suffer some flaws.

Security constraints

Here anybody can intercept your traffic, it is plain unencrypted http traffic. A recommended and yet simple option to secure the communication is to use SSH port forwarding tunnelling. It basically creates an encrypted network communication between your localhost client and the remote machine. When an SSH tunnel is setup, you can talk to your local machine on a given port and the remote receives call on another port (magic, isn’t it?). Therefore the launch.json configuration should be modified and host value is localhost. Note also that the port in Python code and in launch.json may not be the same, you have two different ports now.

Copying files

We pointed out that the files must be the same between local env and remote. We advise to group in a shell script: the files mirroring logic (using scp) and the execution of the Python process on remote machine.

Handling differences between local and remote files

We said that the files must the same between local env and remote but we need some differences at least to allow the execution of ptvsd.wait_for_attach on remote.
This is definitely something that can be handled in an elegant manner using environment variables.

if os.environ.has_key("REMOTE"):
    ptvsd.break_into_debugger()
end

Of course you need to pass now the environment variable to you remote process with SSH, see this stackexchange post to know how to do that.

Using Analytics in Application Insights to monitor CosmosDB Requests

Following Wikipedia, DocumentDB (now CosmosDB) is

Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.

The throughput of the database is charged and measured in request unit per second (RUs). Therefore, when creating application on top of DocumentDB, this is a very important dimension that you should pay attention to and monitor carefully.

Unfortunately, at the time of the writing the Azure portal tools to measure your RUs usage are very poor and not really usable. You have access to tiny charts where granularity cannot be really changed.

DocumentDB monitoring charts in Azure Portal

These are the only monitoring charts available in the Azure Portal

In this blog post, I show how Application Insights Analytics can be used to monitor the RUs consumption efficiently. This is how we monitor our collections now at Keluro.

Let us start by presenting Application Insights, it defines itself here as

an extensible Application Performance Management (APM) service for web developers on multiple platforms. Use it to monitor your live web application. It will automatically detect performance anomalies. It includes powerful analytics tools to help you diagnose issues and to understand what users actually do with your app.

Let us show how to use it in a C# application that is using the DocumentDB .NET SDK.

First you need to install the Application Insights Nuget Package. Then, you need to track the queries using a TelemetryClient object, see a sample code below.

public static async Task<FeedResponse<T>> LoggedFeedResponseAsync<T>(this IQueryable<T> queryable, string infoLog, string operationId)
{
	var docQuery = queryable.AsDocumentQuery();
	var now = DateTimeOffset.UtcNow;
	var watch = Stopwatch.StartNew();
	var feedResponse = await docQuery.ExecuteNextAsync<T>();
	watch.Stop();
	TrackQuery(now, watch.Elapsed, feedResponse.RequestCharge, "read", new TelemetryClient(), infoLog, operationId, feedResponse.ContentLocation);
	return feedResponse;
}

public static void TrackQuery(DateTimeOffset start, TimeSpan duration, double requestCharge, string kind, TelemetryClient tc, string infolog, string operationId, string contentLocation)
{
	var dependency = new DependencyTelemetry(
			"DOCDB",
			"",
			"DOCDB",
			"",
			start,
			duration,
			"0", // Result code : we can't capture 429 here anyway
			true // We assume this call is successful, otherwise an exception would be thrown before.
			);
	dependency.Metrics["request-charge"] = requestCharge;
	dependency.Properties["kind"] = kind;
	dependency.Properties["infolog"] = infolog;
	dependency.Properties["contentLocation"] = contentLocation ?? "";
	if (operationId != null)
	{
		dependency.Context.Operation.Id = operationId;
	}
	tc.TrackDependency(dependency);
}

The good news is that you can now effectively keep records of all requests made to DocumentDB. Thanks to a great component of Application Insights named Analytics, you can browse the queries and see their precise request-charges (the amount of RUs consumed).

You can also add identifiers (with variables such as kind and infolog in sample above) from your calling code for a better identification of the requests. Keep in mind that the request payload is not saved by Application Insights.

In the screenshot below you can list and filter the requests tracked with DocumentDB in Application Insights Analytics thanks to its amazing querying language to access data.

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

There is one problem with this approach is that for now, using this technique and DocumentDB .NET SDK we do not have access to the number of retries (the 429 requests). This is an open issue on Github.

Finally, Analytics allows us to create a very important chart. The accumulated RUs per second for a specific time range.
The code looks like the following one.

dependencies
| where timestamp > ago(10h)
| where type == "DOCDB"
| extend requestCharge = todouble(customMeasurements["request-charge"])
| extend docdbkind = customDimensions["kind"]
| extend infolog = customDimensions["infolog"]
| order by timestamp desc
| project  timestamp, target, data, resultCode , duration, customDimensions, requestCharge, infolog, docdbkind , operation_Id 
| summarize sum(requestCharge) by bin(timestamp, 1s)
| render timechart 

And the rendered charts is as follows

Accumulated Request-Charge per second (RUs)

Accumulated Request-Charge per second (RUs)