Remote debugging Python with VSCode

Following the popularity of new deep learning approaches, the usage of high performance machine for Python code is now a commodity. Many cloud providers offer Virtual Machine adapted to this need. When it comes to developing datascience experiments a common approach is the usage of IPython notebook.

Although I strongly support the usage of notebooks for sharing ideas and presentation purposes, I also believe it is not powerful enough to develop complex program for real production software. The major flaw resides in the very low debugging capabilities of IPython notebooks. Moreover, I consider printf debugging harmful. I truly think that no matter what your platform is, you must have access to a comfortable development environment and a working debugger is one of the most important part of it. This is where remote debugging comes into play: it is possible to execute code on a remote machine and benefit from a nice debugging experience locally in your favorite code editor.

In this blog post I propose to review the setup of Python remote debugging with the portable and popular code editor VSCode. Actually VSCode documentation provides some very short instructions. In this blog post we will provide more explanations.

Use remote debugging capabilities of VSCode with Python

Use remote debugging capabilities of VSCode with Python

Prerequisite

We will assume that we do not have any security constraints. Precisely, we do not care about MITM interceptions between our client and remote server. We will discuss in appendix how we could solve this using SSH portforwarding.

We assume that the reader is familiar with the usage of a debugger in VSCode. In addition, we assume that the reader knows how to logon on a remote machine using SSH.

Our example

In this blog post we used an Ubuntu Azure Virtual Machine. Its configuration, RAM, GPU etc. are independent so you can basically choose anything.

We assume now that the reader has an Azure Ubuntu server running and is able to logon through SSH. Note that in VSCode documentation SSH portforwarding is mentioned but we will ignore it for now.

Let us present precisely what remote debugging is.
In this post, the name remote stands for our Ubuntu VM on Azure while the client is our local, e.g. MACOS, computer. With remote debugging only a single Python process is executed on the remote VM then, on client computer, VSCode “attach itself” to this remote process so you can match the remote code execution with your local files. Therefore, it is important to keep exactly the same .py files on client and in host so that the debugging process is able to match line by line the two versions.

The magic lies in a library called ptvsd that makes the bridge for attaching local VSCode to remotely executed process. The remotely executed Python waits until the client debugging agent is attached.

Obviously network communication is involved here and that is actually the major pitfall when configuring remote debugging. The VSCode documentation is fuzzy about whether to use IP or localhost which port to set etc. We will try to simplify things so the debugging experience becomes crystal clear.

Networking

To make things simpler we decided to show an example where the Python process is executed on a remote machine whose IP address is 234.56.45.89 (I chose this address randomly). We use the good old port 80 for the communication (the usual port for http).

Before doing anything else we need to make sure that our remote VM network configuration is ok. We will make sure that machine 234.56.45.89 can be contacted from the outside world on port 80.

Firstly, using an SSH session on remote machine we will start a webserver using the following Python3 command. You may need elevated privilege for listening on port 80 (for real production usage give this privilege to the current user, do not sudo the process).

sudo python3 -m http.server 80

Secondly on a client terminal you should be able request your machine using wget (spider mode to avoid file download). In this command the target machine is accessed with IP:PORT

wget --spider 234.56.45.89:80

You should get response from the server. If you see some errors, you mat need to open the 80 port in firewall configuration, see instructions here for Azure.

Make sure you can contact your machine on port 80 by running a one line Python server

Make sure you can contact your machine on port 80 by running a one line Python server

At this stage your network configuration is ok. You can stop the Python command that runs the webserver.

Configuring VSCode

Make sure that you have the VSCode Python extension installed. Follow the instructions here to add a new Debug configuration in your launch.json containing the following JSON configuration.

{
    "name": "Attach (Remote Debug)",
    "type": "python",
    "request": "attach",
    "localRoot": "${workspaceRoot}",
    "remoteRoot": "/home/benoitpatra",
    "port": 80,
    "secret": "my_secret",
    "host":"234.56.45.89"
}

It is important to understand that this configuration is only for VSCode. The host corresponds to the machine where the remote Python process is ran. The port corresponds to the port that will be used by the remote process to communicate with the client debugging agent, in our case it is 80.

You must specify the root folders, on both local environment and on the remote machine.

That’s it for VSCode configuration.

The code under debugging

Let us debug the following Python script

import os
import ptvsd
import socket
ptvsd.enable_attach("my_secret", address = ('0.0.0.0', 80))

# Enable the line of source code below only if you want the application to wait until the debugger has attached to it
#ptvsd.wait_for_attach()

ptvsd.break_into_debugger()

cwd = os.getcwd()

print("Hello world you are here %s" % cwd )
print("On machine %s" % socket.gethostname())

As explained in the introduction, the Python file must be the same on client and on remote machine. There is one exception yet, the line ptvsd.wait_for_attach() must be executed by remote Python process only. Indeed, it tells the Python process to pause and wait that the client is attached to continue.

Of course in order to execute it you may need to install dependencies (for example using Pip) so it executes on the remote machine.

REMARK: looks like at the time of the writing version of ptvsd>3.0.0 suffers some problems. I suggest that you force the install of version 3.0.0, see this issue.

It is important to understand that enable_attach, enable_attach, break_into_debugger are instructions for the remote Python process. The first line ptvsd.enable_attach("my_secret", address = ('0.0.0.0', 80)) basically instructs the remote Python process to listen on all network interfaces, on port 80 for any client debugger that would like to attach. This client agent must provide the right secret (here my_secret).

The line ptvsd.break_into_debugger() is important, it is the line that allows to break and navigate in code with client VSCode.

Putting things together

Now you are almost ready. Make sure your Python file is duplicated on both local and remote at root location. Make sure the ptvsd.wait_for_attach is uncommented and executes on remote environment.

Now using an SSH session on remote machine. Start the Python process using elevated privileges
sudo python3 your_file_here.py

This should not return anything right now and should be hanging, waiting for your VSCode to attach the process.

Set a VSCode break point just after ptvsd.break_into_debugger(), make sure that in VSCode the selected debugging configuration is Attach (Remote Debugger). Hit F5, you should be attached and breaking in code !

What a relief, efficient working ahead !

Breaking in VSCode

Breaking in VSCode

Going further

The debugging procedure described aboved is simplified and suffer some flaws.

Security constraints

Here anybody can intercept your traffic, it is plain unencrypted http traffic. A recommended and yet simple option to secure the communication is to use SSH port forwarding tunnelling. It basically creates an encrypted network communication between your localhost client and the remote machine. When an SSH tunnel is setup, you can talk to your local machine on a given port and the remote receives call on another port (magic, isn’t it?). Therefore the launch.json configuration should be modified and host value is localhost. Note also that the port in Python code and in launch.json may not be the same, you have two different ports now.

Copying files

We pointed out that the files must be the same between local env and remote. We advise to group in a shell script: the files mirroring logic (using scp) and the execution of the Python process on remote machine.

Handling differences between local and remote files

We said that the files must the same between local env and remote but we need some differences at least to allow the execution of ptvsd.wait_for_attach on remote.
This is definitely something that can be handled in an elegant manner using environment variables.

if os.environ.has_key("REMOTE"):
    ptvsd.break_into_debugger()
end

Of course you need to pass now the environment variable to you remote process with SSH, see this stackexchange post to know how to do that.

Using Analytics in Application Insights to monitor CosmosDB Requests

Following Wikipedia, DocumentDB (now CosmosDB) is

Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.

The throughput of the database is charged and measured in request unit per second (RUs). Therefore, when creating application on top of DocumentDB, this is a very important dimension that you should pay attention to and monitor carefully.

Unfortunately, at the time of the writing the Azure portal tools to measure your RUs usage are very poor and not really usable. You have access to tiny charts where granularity cannot be really changed.

DocumentDB monitoring charts in Azure Portal

These are the only monitoring charts available in the Azure Portal

In this blog post, I show how Application Insights Analytics can be used to monitor the RUs consumption efficiently. This is how we monitor our collections now at Keluro.

Let us start by presenting Application Insights, it defines itself here as

an extensible Application Performance Management (APM) service for web developers on multiple platforms. Use it to monitor your live web application. It will automatically detect performance anomalies. It includes powerful analytics tools to help you diagnose issues and to understand what users actually do with your app.

Let us show how to use it in a C# application that is using the DocumentDB .NET SDK.

First you need to install the Application Insights Nuget Package. Then, you need to track the queries using a TelemetryClient object, see a sample code below.

public static async Task<FeedResponse<T>> LoggedFeedResponseAsync<T>(this IQueryable<T> queryable, string infoLog, string operationId)
{
	var docQuery = queryable.AsDocumentQuery();
	var now = DateTimeOffset.UtcNow;
	var watch = Stopwatch.StartNew();
	var feedResponse = await docQuery.ExecuteNextAsync<T>();
	watch.Stop();
	TrackQuery(now, watch.Elapsed, feedResponse.RequestCharge, "read", new TelemetryClient(), infoLog, operationId, feedResponse.ContentLocation);
	return feedResponse;
}

public static void TrackQuery(DateTimeOffset start, TimeSpan duration, double requestCharge, string kind, TelemetryClient tc, string infolog, string operationId, string contentLocation)
{
	var dependency = new DependencyTelemetry(
			"DOCDB",
			"",
			"DOCDB",
			"",
			start,
			duration,
			"0", // Result code : we can't capture 429 here anyway
			true // We assume this call is successful, otherwise an exception would be thrown before.
			);
	dependency.Metrics["request-charge"] = requestCharge;
	dependency.Properties["kind"] = kind;
	dependency.Properties["infolog"] = infolog;
	dependency.Properties["contentLocation"] = contentLocation ?? "";
	if (operationId != null)
	{
		dependency.Context.Operation.Id = operationId;
	}
	tc.TrackDependency(dependency);
}

The good news is that you can now effectively keep records of all requests made to DocumentDB. Thanks to a great component of Application Insights named Analytics, you can browse the queries and see their precise request-charges (the amount of RUs consumed).

You can also add identifiers (with variables such as kind and infolog in sample above) from your calling code for a better identification of the requests. Keep in mind that the request payload is not saved by Application Insights.

In the screenshot below you can list and filter the requests tracked with DocumentDB in Application Insights Analytics thanks to its amazing querying language to access data.

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

There is one problem with this approach is that for now, using this technique and DocumentDB .NET SDK we do not have access to the number of retries (the 429 requests). This is an open issue on Github.

Finally, Analytics allows us to create a very important chart. The accumulated RUs per second for a specific time range.
The code looks like the following one.

dependencies
| where timestamp > ago(10h)
| where type == "DOCDB"
| extend requestCharge = todouble(customMeasurements["request-charge"])
| extend docdbkind = customDimensions["kind"]
| extend infolog = customDimensions["infolog"]
| order by timestamp desc
| project  timestamp, target, data, resultCode , duration, customDimensions, requestCharge, infolog, docdbkind , operation_Id 
| summarize sum(requestCharge) by bin(timestamp, 1s)
| render timechart 

And the rendered charts is as follows

Accumulated Request-Charge per second (RUs)

Accumulated Request-Charge per second (RUs)

Powershell srcset image generator

If you have a website and SEO matters for you, then you probably had to optimize images. To this aim, you may want to have responsive images. As explained here,

a responsive image is an image which is displayed in its best form on a web page, depending on the device your website is being viewed from.

One of the modern way to serve quickly responsive images is to benefit from the srcset html attribute. Shortly, depending on parameters and your viewport (i.e. browser window) the srcset attribute will tell the browser to download the best appropriate image for the current display.

For example, if you put the following HTML element

<img src="images/fcnantes-champions-95.jpg"
srcset="images/fcnantes-champions-95.jpg 200w, images/fcnantes-champions-95-400.jpg 400w,
images/fcnantes-champions-95-600.jpg 600w,
images/fcnantes-champions-95-800.jpg 800w">

Your server logic can serve up to four different images representing the same pictures.

You may guess that creating all this different resized pictures can be painful manually. In this blog post we propose the following Powershell script to help you for the automation of this task.

Param ( [Parameter(Mandatory=$True)] [ValidateNotNull()] $imageSource, [Parameter(Mandatory=$true)][ValidateNotNull()] $quality )

if (!(Test-Path $imageSource)){throw( "Cannot find the source image")}
if ($quality -lt 0 -or $quality -gt 100){throw( "quality must be between 0 and 100.")}

[void][System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$resolvedPath = Join-Path $PWD -ChildPath $imageSource
$bmp = [System.Drawing.Image]::FromFile($resolvedPath)

#hardcoded canvas size...
$canvasWidths = @(200, 400, 600, 800)

foreach($canvasWidth in $canvasWidths){
    #Encoder parameter for image quality
    $myEncoder = [System.Drawing.Imaging.Encoder]::Quality
    $encoderParams = New-Object System.Drawing.Imaging.EncoderParameters(1)
    $encoderParams.Param[0] = New-Object System.Drawing.Imaging.EncoderParameter($myEncoder, $quality)
    # get codec
    $myImageCodecInfo = [System.Drawing.Imaging.ImageCodecInfo]::GetImageEncoders()|where {$_.MimeType -eq 'image/jpeg'}

    #compute the final ratio to use
    $ratioX = $canvasWidth / $bmp.Width;
    $ratioY = $canvasWidth / $bmp.Height;
    $ratio = $ratioY
    if($ratioX -le $ratioY){
        $ratio = $ratioX
    }

    #create resized bitmap
    $newWidth = [int] ($bmp.Width*$ratio)
    $newHeight = [int] ($bmp.Height*$ratio)
    $bmpResized = New-Object System.Drawing.Bitmap($newWidth, $newHeight)
    $graph = [System.Drawing.Graphics]::FromImage($bmpResized)

    $graph.Clear([System.Drawing.Color]::White)
    $graph.DrawImage($bmp,0,0 , $newWidth, $newHeight)

    $targetFileName = [System.IO.Path]::GetFileNameWithoutExtension($imageSource) + "-" + $canvasWidth + ".jpg"
    $dir = [System.IO.Path]::GetDirectoryName($resolvedPath)
    $targetFilePath = Join-Path $dir -ChildPath $targetFileName
    Write-Host "Saving file" $targetFilePath
    #save to file
    $bmpResized.Save($targetFilePath,$myImageCodecInfo, $($encoderParams))
    $graph.Dispose()
    $bmpResized.Dispose()
}
$bmp.Dispose()

Now you can simply invoke the script like this: .\SrcsetBuilder.ps1 "..\images\MyImage.jpg" 85. Then all generated images: MyImage-200.jpg, MyImage-400.jpg, MyImage-600.jpg, MyImage-800.jpg are located next to MyImage.jpg.
You can modify the generated images widths by changing the values in the array $canvasWidths (line 11).

Debugging locally REST API webhooks with Visual Studio

Modern REST APIs such as Outlook REST Api, Microsoft Graph or Facebook Graph expose very powerful capabilities called webhooks. They allow push notifications. After subscription, when something change these API send notifications to your service by calling the URL you provided. For example, in Outlook REST API the push notification services will send a request when something has been modify in the user mailbox such as a mail received or an email marked as read.

I am not going to explain how you register subscriptions to a particular webhook. In this blog post, we provide a solution in order to be able to “break” with your Visual Studio debugger in a callback webhook you subscribed to. The approach is not windows/.NET specific, actually the mechanism exposed here is generic, but these are the tools I am using at the moment so they will serve as example in this post.

Problem: when you subscribe to a webhook you specify what would be your notification URL (see Outlook REST API example). This url must be https and visible from the ‘outside’ internet. Therefore, you cannot set an url such as https://localhost:44301/api/MyNotificationCallBack where https://localhost:44301 is the url of your local development website.  However, it would be convenient in order to ‘break’ directly in your server side code responsible for handling the request. In addition, if you are using Visual Studio and IIS express for development you cannot simply expose a website with custom domain and SSL to the outside internet.

Solution: take a (sub)domain name you own (e.g. superdebug.keluro.com) then create an A record to point to your public IP. If you are in a home network this IP is the one of your ISP box. Configure this box to redirect incoming traffic for superdebug.keluro.com on port 443 to your personal developer machine (still on port 443). In your machine, configure an IIS web server with a binding for https://superdebug.keluro.com on port 443 that will act as reverse proxy and will redirect incoming traffic to your IIS Express local development server  (e.g. https://localhost:44301). Finally, set a valid SSL certificate on the reverse proxy IIS server for superdebug.keluro.com. Now, you can now use https://superdebug.keluro.com/api/MyNotificationCallBack as notification Url and the routing logic will redirect incoming push notification requests to https://localhost:44301/api/MyNotificationCallBack where you can debug locally.

Debug IIS Express website visible from the outside internet

Debug locally IIS Express website visible from the outside internet

Pitfalls:

  • Unfortunately, in case of home network, I cannot give precise instructions on how to configure your ISP box to reroute incoming traffic. Also make sure that the box IP does not change and is static.
  • Take care of your own Firewall rules, make sure that 443 port is open for both Inbound and Outbound rules.
  • In IIS Application Request Routing (ARR), the module that may be used for creating the reverse proxy, an option is set by default that modifies ‘location’ request response Headers. It may break your application that probably uses OAUTH flow. See this stackoverflow response.
  • If you never setup IIS to work as a reverse proxy. That is quite simple now with ARR or Rewrite Request modules. In this previous blog post we explained how to setup a reverse proxy with IIS.

Programming well structured Javascript stored procedures for DocumentDB with Typescript and SystemJs

EDIT: code sample on github https://github.com/bpatra/StoredProcedureGeneration

If you are using DocumentDB you may had to write your own stored procedure. A stored procedure is a function written in Javascript that runs on the DocumentDB cloud infrastructure. It may reduce performance problem or make you execute some queries that are not supported yet through the REST API such as aggregate function.

A stored procedure should be registered as a single function of the form:

function myStoredProcedure(arg1, arg2){ /* you can place all the arguments you want here*/
    //The body of the function here

    //you set the response like this
    var context = getContext();
    context.setBody(myResult);
}

You can create function inside the myStoredProcedure function body. However, you cannot create other functions in the file otherwise DocumentDB will complain. This is quite annoying because you probably want to create independent and reusable pieces of code for testability or simply for the sake of readability. The problem comes from the fact that you cannot really use out-of-the box third party module management libraries such as requireJS, commonJS or SystemJS. This sounds no good. Is that means we are forced to inline all our code in a big not modular and un-testable Javascript file!?!?

The answer is no, in this blog post I will show you the solution we implemented at Keluro to overcome this problem. This solution is based on SystemJs and SystemJs-Builder to create a standalone function where all modules/class dependencies are embedded in the stored procedure single function which acts as our entry point. The code snippets presented in the following are extracted from the following git repository.

In this article, we will use Visual Studio as an IDE but it is not mandatory. Actually, it is only to simplify the options settings for compiling Typescript files, you can invoke the compiler manually exactly with the same set of options.

In the following, we will take an example where the stored procedure computes the sum of input arguments. Therefore, we create a Typescript class for the core of our stored procedure called Utilitary that contains a method called sumValues.

A typescript class used in DocumentDB stored procedure

A typescript class used in DocumentDB stored procedure

Then we create the entry point of the stored procedure in a Typescript file that uses the DocumentDB current context. We benefit from the typings from documentdb-server.d.ts that defines the IContext interface.

MyStoredProcedure typescript file executed by DocumentDB

MyStoredProcedure typescript file executed by DocumentDB

We compile the Typescript files has SystemJs modules and we redirect all generated Typescript in the directory out-js. As I told you, no need of VisualStudio to do this, you can achieve the same result by invoking the Typescript compiler with similar options.

ompiling options set in Visual Studio

ompiling options set in Visual Studio

If you look at the generated Typescript you will get something that looks like

System.register(["Utilitary"], function(exports_1, context_1) {
    "use strict";
    var __moduleName = context_1 && context_1.id;
    
     /* ETC */
});
//# sourceMappingURL=myStoredProcedure.js.map

If you try to run such a Javascript source code directly with DocumentDB you will get an error. These modules are meant to run with System.js and in the case of a stored procedure you cannot reference any third party library such as System.js. This is where SystemJs-Builder will come into play. SystemJs-builder will package everything so that your modules are independent of System.js and they can be used as a standalone file.

To invoke SystemJs-Builder you need to use Node.js. In the sample git repository, you will have to hit ‘npm install’ to restore the SystemJs-Builder Node package. When the Typescript files are compiled invoke the node script “documentdb_storedprocedure_builder.js” at the root of the repository. This script basically executes two tasks. First, it generates a standalone Javascript file with SystemJs-Builder. Secondly, because DocumentDB explicitly needs a file with one function and not an executable Javascript script, we wrap this code inside a function that will act as our entry point for the stored procedure.

We also retrieve the arguments passed to the storedprocedure procedure function in a variable called storedProcedureArgs that is kept in the global namespace. The resulting file is generated and put in the directory generated-procedure. Finally, this file contains what we needed: a standalone and executable by DocumentDB Javascript function. With this approach all Typescript classes can be reused for other stored procedures , for unit tests or anywhere in your Typescript code base.

There are probably thousands of alternatives to create testable and decoupled stored procedures. We liked this approach because it reuses the same tools that we were already using for our single page applications: Typescript and SystemJs. To conclude let me thank Olivier Guimbal who showed us some months ago how Typescript, SystemJs and SystemJs-Builder worked well together.