Debugging locally REST API webhooks with Visual Studio

Modern REST APIs such as Outlook REST Api, Microsoft Graph or Facebook Graph expose very powerful capabilities called webhooks. They allow push notifications. After subscription, when something change these API send notifications to your service by calling the URL you provided. For example, in Outlook REST API the push notification services will send a request when something has been modify in the user mailbox such as a mail received or an email marked as read.

I am not going to explain how you register subscriptions to a particular webhook. In this blog post, we provide a solution in order to be able to “break” with your Visual Studio debugger in a callback webhook you subscribed to. The approach is not windows/.NET specific, actually the mechanism exposed here is generic, but these are the tools I am using at the moment so they will serve as example in this post.

Problem: when you subscribe to a webhook you specify what would be your notification URL (see Outlook REST API example). This url must be https and visible from the ‘outside’ internet. Therefore, you cannot set an url such as https://localhost:44301/api/MyNotificationCallBack where https://localhost:44301 is the url of your local development website.  However, it would be convenient in order to ‘break’ directly in your server side code responsible for handling the request. In addition, if you are using Visual Studio and IIS express for development you cannot simply expose a website with custom domain and SSL to the outside internet.

Solution: take a (sub)domain name you own (e.g. superdebug.keluro.com) then create an A record to point to your public IP. If you are in a home network this IP is the one of your ISP box. Configure this box to redirect incoming traffic for superdebug.keluro.com on port 443 to your personal developer machine (still on port 443). In your machine, configure an IIS web server with a binding for https://superdebug.keluro.com on port 443 that will act as reverse proxy and will redirect incoming traffic to your IIS Express local development server  (e.g. https://localhost:44301). Finally, set a valid SSL certificate on the reverse proxy IIS server for superdebug.keluro.com. Now, you can now use https://superdebug.keluro.com/api/MyNotificationCallBack as notification Url and the routing logic will redirect incoming push notification requests to https://localhost:44301/api/MyNotificationCallBack where you can debug locally.

Debug IIS Express website visible from the outside internet

Debug locally IIS Express website visible from the outside internet

Pitfalls:

  • Unfortunately, in case of home network, I cannot give precise instructions on how to configure your ISP box to reroute incoming traffic. Also make sure that the box IP does not change and is static.
  • Take care of your own Firewall rules, make sure that 443 port is open for both Inbound and Outbound rules.
  • In IIS Application Request Routing (ARR), the module that may be used for creating the reverse proxy, an option is set by default that modifies ‘location’ request response Headers. It may break your application that probably uses OAUTH flow. See this stackoverflow response.
  • If you never setup IIS to work as a reverse proxy. That is quite simple now with ARR or Rewrite Request modules. In this previous blog post we explained how to setup a reverse proxy with IIS.

Programming well structured Javascript stored procedures for DocumentDB with Typescript and SystemJs

EDIT: code sample on github https://github.com/bpatra/StoredProcedureGeneration

If you are using DocumentDB you may had to write your own stored procedure. A stored procedure is a function written in Javascript that runs on the DocumentDB cloud infrastructure. It may reduce performance problem or make you execute some queries that are not supported yet through the REST API such as aggregate function.

A stored procedure should be registered as a single function of the form:

function myStoredProcedure(arg1, arg2){ /* you can place all the arguments you want here*/
    //The body of the function here

    //you set the response like this
    var context = getContext();
    context.setBody(myResult);
}

You can create function inside the myStoredProcedure function body. However, you cannot create other functions in the file otherwise DocumentDB will complain. This is quite annoying because you probably want to create independent and reusable pieces of code for testability or simply for the sake of readability. The problem comes from the fact that you cannot really use out-of-the box third party module management libraries such as requireJS, commonJS or SystemJS. This sounds no good. Is that means we are forced to inline all our code in a big not modular and un-testable Javascript file!?!?

The answer is no, in this blog post I will show you the solution we implemented at Keluro to overcome this problem. This solution is based on SystemJs and SystemJs-Builder to create a standalone function where all modules/class dependencies are embedded in the stored procedure single function which acts as our entry point. The code snippets presented in the following are extracted from the following git repository.

In this article, we will use Visual Studio as an IDE but it is not mandatory. Actually, it is only to simplify the options settings for compiling Typescript files, you can invoke the compiler manually exactly with the same set of options.

In the following, we will take an example where the stored procedure computes the sum of input arguments. Therefore, we create a Typescript class for the core of our stored procedure called Utilitary that contains a method called sumValues.

A typescript class used in DocumentDB stored procedure

A typescript class used in DocumentDB stored procedure

Then we create the entry point of the stored procedure in a Typescript file that uses the DocumentDB current context. We benefit from the typings from documentdb-server.d.ts that defines the IContext interface.

MyStoredProcedure typescript file executed by DocumentDB

MyStoredProcedure typescript file executed by DocumentDB

We compile the Typescript files has SystemJs modules and we redirect all generated Typescript in the directory out-js. As I told you, no need of VisualStudio to do this, you can achieve the same result by invoking the Typescript compiler with similar options.

ompiling options set in Visual Studio

ompiling options set in Visual Studio

If you look at the generated Typescript you will get something that looks like

System.register(["Utilitary"], function(exports_1, context_1) {
    "use strict";
    var __moduleName = context_1 && context_1.id;
    
     /* ETC */
});
//# sourceMappingURL=myStoredProcedure.js.map

If you try to run such a Javascript source code directly with DocumentDB you will get an error. These modules are meant to run with System.js and in the case of a stored procedure you cannot reference any third party library such as System.js. This is where SystemJs-Builder will come into play. SystemJs-builder will package everything so that your modules are independent of System.js and they can be used as a standalone file.

To invoke SystemJs-Builder you need to use Node.js. In the sample git repository, you will have to hit ‘npm install’ to restore the SystemJs-Builder Node package. When the Typescript files are compiled invoke the node script “documentdb_storedprocedure_builder.js” at the root of the repository. This script basically executes two tasks. First, it generates a standalone Javascript file with SystemJs-Builder. Secondly, because DocumentDB explicitly needs a file with one function and not an executable Javascript script, we wrap this code inside a function that will act as our entry point for the stored procedure.

We also retrieve the arguments passed to the storedprocedure procedure function in a variable called storedProcedureArgs that is kept in the global namespace. The resulting file is generated and put in the directory generated-procedure. Finally, this file contains what we needed: a standalone and executable by DocumentDB Javascript function. With this approach all Typescript classes can be reused for other stored procedures , for unit tests or anywhere in your Typescript code base.

There are probably thousands of alternatives to create testable and decoupled stored procedures. We liked this approach because it reuses the same tools that we were already using for our single page applications: Typescript and SystemJs. To conclude let me thank Olivier Guimbal who showed us some months ago how Typescript, SystemJs and SystemJs-Builder worked well together.

My TransientFaultHandling utilitary classes for DocumentDB

Keluro uses extensively DocumentDB for data persistence. However, it’s extensive scaling capabilities come with a price. Indeed with your queries or your commands you may exceed the amout of request unit you are granted. In that case you will received a 429 error “Request rate too large” or a “DocumentException” if you use the .NET SDK. It is your responsability then to implement the retry policies to avoid such a failure and wait the proper amout of time before retrying.

Edit: look at the comment below. The release v1.8.0 of the .NET SDK proposes some settings options for these retry policies.

Some samples are provided by Microsoft on how to handle this 429 “Request too large error”, but they are concerning only commands, such as inserting or deleting a document, there is no sample on own to implement the retry policies for common queries. A Nuget package is also available: “Microsoft.Azure.Documents.Client.TransientFaultHandling” but even if integrating it is as quick as an eye blink, there is no logging capabilities. In my case it did not really resolve my exceeding RU problem, I even doubt that I made it work and the code is not opensource. Then, I decided to integrate the ideas from the samples in own utilitary classes on top of the DocumentDB .NET SDK.

The idea is similar to “TransientFaultHandling” package: to wrap the DocumentClient inside another class exposed only through an interface. By all accounts, it is a good thing to abstract the DocumentClient behind an interface for testability purposes. In our case this interface is named IDocumentClientWrapped.

public interface IDocumentClientWrapped : IDisposable
{
    /* Some command like methods*/
    Task DeleteDocumentAsync(Uri uri);

    Task CreateDocumentAsync(Uri uri, object obj);

    /* Some query like methods*/
    IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentUri, FeedOptions feedOptions = null);

    IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentCollectionUri, SqlQuerySpec sqlSpec);

    /* Copy here all other public method signature from DocumentClient but return IRetryQueryable instead of IOrderedQueryable or IQueryable */
}

Instead of returning an Queryable<T> instance as DocumentClient would do we return an IRetryQueryable<T>. This latter type, whose definition will follow, is also a wrapper on the IQueryable<T> instance returned by the DocumentDB client. However, this interface explicitely retries when the enumeration fails because of 429 request too large exception raised by the database engine, DocumentDB in our case.

public interface IRetryQueryable<T>
{
    IRetryQueryable<T> Where(Expression<Func<T, bool>> predicate);

    IDocumentQuery<T> AsDocumentQuery();

    IEnumerable<T> AsRetryEnumerable();

    IRetryQueryable<TResult> SelectMany<TResult>(Expression<Func<T, IEnumerable<TResult>>> predicate);

    IRetryQueryable<TResult> Select<TResult>(Expression<Func<T, TResult>> predicate);
}

In this interface we only expose the method extension methods that are actually supported by the “real” IQueryable<T> instance returned by DocumentDB: Select, SelectMany, Where etc. For example, at the time of the writing GroupBy is not supported. You would get an runtime exception if you used it directly on the IQueryable<T> instance returned by DocumentClient.

Now look at how we use these interfaces in the calling code.

IDocumentClientWrapped client = /* instanciation */;
IRetryQueryable<MyType> query = client.CreateDocumentQuery<MyType>(/* get usual uri using the UriFactory*/);
IEnumerable<string> = query.Where(c => c.Member1 == "SomeValue").Select(c=>c.StringMember2).AsRetryEnumerable();
/* manipulate your in memory enumeration as you wish*/

Independently of the retry policies classes and the “request too large errors”, let me emphasis that LINQ can be tricky here. Indeed, the piece of code above is completly different from this one:

IDocumentClientWrapped client = /* instanciation */;
IRetryQueryable<MyType> query = client.CreateDocumentQuery<MyType>(/* get usual uri using the UriFactory*/);
IEnumerable<string> = query.AsRetryEnumerable().Where(c => c.Member1 == "SomeValue").Select(c=>c.StringMember2);
/* manipulate your in memory enumeration as you wish*/

In this case the Where constraint is perfomed “in memory” by your .NET application server. It means that you have fetched all data from DocumentDB to your app server. If MyType contains a lot of data, then all have been transfered from DocumentDB to your application server and/or if the Where constraint filters a lot of documents you will probably have a bottleneck.

Let us get back to our problem. Now that we saw that having retry policy for a query means only calling AsRetryEnumerable() instead of AsEnumerable() let us jump to the implementation of thoses classes.

The idea is to use an IEnumerator that “retries” and use two utility method: ExecuteWithRetry,ExecuteWithRetryAsync. The former one for basic mono threaded calls while the latter is for the async/await context. Most of this code is verbose because it is only wrapping implementation. I hope it will be helpful for others.

public class DocumentClientWrapped : IDocumentClientWrapped
{
    private class RetriableEnumerable<T> : IEnumerable<T>
    {
        private readonly IEnumerable<T> _t;
        public RetriableEnumerable(IEnumerable<T> t)
        {
            _t = t;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new RetriableEnumerator<T>(ExecuteWithRetry(() => _t.GetEnumerator()));
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }
    }

    private class RetriableEnumerator<T> : IEnumerator<T>
    {
        private IEnumerator<T> _t;
        public RetriableEnumerator(IEnumerator<T> t)
        {
            _t = t;
        }

        public T Current
        {
            get
            {
                return ExecuteWithRetry(()=> _t.Current);
            }
        }

        object IEnumerator.Current
        {
            get
            {
                return ExecuteWithRetry(() => _t.Current);
            }
        }

        public void Dispose()
        {
            _t.Dispose();
        }

        public bool MoveNext()
        {
            return ExecuteWithRetry(() => _t.MoveNext());
        }

        public void Reset()
        {
            _t.Reset();
        }
    }

    private class RetryQueryable<T> : IRetryQueryable<T>
    {
        private readonly IQueryable<T> _queryable;
        public RetryQueryable(IQueryable<T> queryable)
        {
            _queryable = queryable;
        }

        public IDocumentQuery<T> AsDocumentQuery()
        {
            return this._queryable.AsDocumentQuery();
        }

        public IEnumerable<T> AsRetryEnumerable()
        {
            return new RetriableEnumerable<T>(this._queryable.AsEnumerable());
        }

        public IRetryQueryable<T> Where(Expression<Func<T, bool>> predicate)
        {
            var queryable = this._queryable.Where(predicate);
            return new RetryQueryable<T>(queryable);
        }

        public IRetryQueryable<TResult> SelectMany<TResult>(Expression<Func<T, IEnumerable<TResult>>> predicate)
        {
            var queryable = this._queryable.SelectMany(predicate);
            return new RetryQueryable<TResult>(queryable);
        }

        public IRetryQueryable<TResult> Select<TResult>(Expression<Func<T, TResult>> predicate)
        {
            var queryable = this._queryable.Select(predicate);
            return new RetryQueryable<TResult>(queryable);
        }
    }


    private const int MaxRetryCount = 10;

    private static async Task<V> ExecuteWithRetriesAsync<V>(Func<Task<V>> function)
    {
        TimeSpan sleepTime = TimeSpan.FromSeconds(1.0);
        int count = 0;
        while (true)
        {

            try
            {
                return await function();
            }
            catch (DocumentClientException de)
            {
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB async retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            catch (AggregateException ae)
            {
                if (!(ae.InnerException is DocumentClientException))
                {
                    throw;
                }

                DocumentClientException de = (DocumentClientException)ae.InnerException;
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB async retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            await Task.Delay(sleepTime);
        }
    }

    private static V ExecuteWithRetry<V>(Func<V> function)
    {
        TimeSpan sleepTime = TimeSpan.FromSeconds(1.0);
        int count = 0;
        while (true)
        {
            try
            {
                return function();
            }
            catch (DocumentClientException de)
            {
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB sync retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            catch (AggregateException ae)
            {
                if (!(ae.InnerException is DocumentClientException))
                {
                    throw;
                }

                DocumentClientException de = (DocumentClientException)ae.InnerException;
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB sync retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            Thread.Sleep(sleepTime);
        }
    }

    private readonly DocumentClient _client;
    public DocumentClientWrapped(string endpointUrl, string authorizationKey)
    {
        _client = new DocumentClient(new Uri(endpointUrl), authorizationKey);
    }

    public async Task CreateDocumentAsync(Uri documentCollectionUri, object obj)
    {
        ResourceResponse<Document> response = await ExecuteWithRetriesAsync(()=> _client.CreateDocumentAsync(documentCollectionUri, obj));
        LogResponse<Document>("CreateDocumentAsync", response);/* Do something with the response if you want to use it */
    }

    public async Task DeleteDocumentAsync(Uri documentUri)
    {
        ResourceResponse<Document> response = await ExecuteWithRetriesAsync( () => _client.DeleteDocumentAsync(documentUri));
        LogResponse<Document>("DeleteDocumentAsync", response); /* Do something with the response if you want to use it */
    }

    public async Task ReplaceDocumentAsync(Uri documentCollectionUri, object document)
    {
        var response = await ExecuteWithRetriesAsync(()=>  _client.ReplaceDocumentAsync(documentCollectionUri, document));
        LogResponse<Document>("ReplaceDocumentAsync", response);
    }

    public IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentCollectionUri, SqlQuerySpec sqlSpec)
    {
        var queryable = _client.CreateDocumentQuery<T>(documentCollectionUri, sqlSpec);
        return new RetryQueryable<T>(queryable);
    }
}

Hosting Jekyll website on Azure Web app using TeamCity automated deployment

The public website of my company Keluro is built with Jekyll. If you are a Jekyll user you are probably aware of Github pages. To me, the best feature of Github pages is the automated deployment of your website when pushing on a specific branch (namely ‘gh-pages’). We hosted the website almost a year in Github pages. However, we suffer many inconveniences with it:

  • Https is not supported by Github pages. As Google announced, http over SSL is now a bonus in term of rankings. At Keluro we do have a wild card SSL certificate, its a shame that we could not use it for our public corporate website!
  • We could not tune some server caching configuration (ETag, Cache-Control etc.), resulting on poor results from Google Page Speed Insights.
  •  With Githup pages you cannot use custom advanced gems. They are ruby extensions, which is the technology on which Jekyll is based on. I have already blogged about our solution to support multi-lang website. Even if it works, I am more and more thinking that a Jekyll gem would do a better job…
  • We had problem with Facebook scrapping our open graph meta tags and there was nothing we could do about it: see issue here.
  • You do not control the version of Jekyll run by Github pages. We found out that there are some breaking changes introduced when migrating from Jekyll 2 to Jekyll 3. We do not want our website to get down because of a silent release of a new Jekyll revision.
  •  At Keluro due to our business model we are windows users. However, the server running Github pages are Linux servers, so you face the technicalities coming from switching between Linux/Windows: CRLF vs LF etc. Even if there are solutions such as .gitattribute file etc. these are extra technicalities that our non-tech teammates working on the website are not aware of and we do not want them to spend time on this.
  • We use a project page rather than a personal page, it was complicated to configure DNS names etc. There are always exception when it comes to project pages with Github pages.

For all these reasons I wanted to quit Github pages. As the CTO of Keluro, I had not a lot of time to investigate all alternatives and wanted a simple solution. We are already Bizspark member, our web apps, APIs, VMs etc. are hosted on Azure. We are familiar with configuration of IIS server. Consequently, it was a reasonable solution to host the website on Azure with all our other resources. On the other hand, we already had an automated deployment solution: our continuous integration server, TeamCity, which is also hosted on Azure.

The solution proposed here is then very simple. Similarly to the automated deployment provided by Github pages, we use TeamCity to trigger changes on a given branch of the Git repository. Then, the website is built by Jekyll on the integration server virtual machine. Finally, it is synchronized with the Azure web app FTP using the sync library WinSCP. At the end, our static html pages are served using Azure Web app.

Once the TeamCity build configuration is created, you just have to write two Powershell build steps (see code below). You can also use two Psake build targets and chain them as I wrote here.

The prerequisite is to install Jekyll on the windows server VM with jekyll.exe on the environment variable $PATH. You can add WinSCP.exe and .dll in a folder within your source code under the ‘ignored\build’ location. Make sure that ‘ignored’ is indeed an ignored folder by Jekyll (you do not want Jekyll to output these to your deployed _site folder).

In the TeamCity build configuration you can set up environment variable that will be consumed by the Powershell script ($env:VARNAME). It is an acceptable solution for avoiding hardcoding passwords, location path etc. on the sources. For example, the variable $env.RepoDir is set to %system.teamcity.build.checkoutDir%. You use such environment variable, to store ftp settings. To recover the FTP settings of an Azure Web App, see this stackoverflow question.

REMARK: We did not manage to redirect the WinSCP ouput of the sync logs in real time to TeamCity. We log the results when the syncing is completed. If someone has a solution we will be glad to hear it.
We tried the WinSCP powershell CMDLets but they seem heavily bugged at the time of the writing.

Build the website with Jekyll

$repoDir = $env:RepoDir
cd $repoDir #change location and go to the repository
Exec{Invoke-Expression "& jekyll build"} #invoke Jekyll from Powershell command line

Sync the website with WinSCP

$hostName = $env:FtpHostName
$repoDir = $env:RepoDir
$userName = $env:FtpUserName
$pwd = $env:FtpUserPwd

$sitePath = Join-Path $repoDir -ChildPath "_site"
$windscpPath = Join-Path $repoDir -ChildPath "ignored\build\WinSCP"

$dllPath = Join-Path $windscpPath -ChildPath "WinSCPnet.dll"
$exePath = Join-Path $windscpPath -ChildPath "WinSCP.exe"
if(!(Test-Path $dllPath)){
    Write-Error "No dll path found! " $dllPath
}
if(!(Test-Path $exePath)){
    Write-Error "No exe path found! " $exePath
}

Add-Type -Path $dllPath

$sessionOptions = New-Object WinSCP.SessionOptions
$sessionOptions.Protocol = [WinSCP.Protocol]::Ftp
$sessionOptions.HostName = $hostName
$sessionOptions.UserName = $userName
$sessionOptions.Password = $pwd
$sessionOptions.FtpSecure = [WinSCP.FtpSecure]::Explicit

$session = New-Object WinSCP.Session
$session.ExecutablePath = $exePath
$session.SessionLogPath = $logpath

try
{
    $session.Open($sessionOptions)

    Write-Host "Start syncing..."

    $transferResult = $session.SynchronizeDirectories([WinSCP.SynchronizationMode]::Remote, $sitePath, "/site/wwwroot", $True, $False,[WinSCP.SynchronizationCriteria]::Size)

    $transferResult.Check()

    foreach ($transfer in $transferResult.Downloads)
    {
    Write-Host ("Download of {0} succeeded" -f $transfer.FileName)
    }
    foreach ($transfer in $transferResult.Uploads)
    {
        Write-Host ("Upload of {0} succeeded" -f $transfer.FileName)
    }
    foreach ($transfer in $transferResult.Removals)
    {
    Write-Host ("Removal of {0} succeeded" -f $transfer.FileName)
    }
    Write-Host "... finish syncing."
}
finally
{
    # Disconnect, clean up
    $session.Dispose()
}

A rigorous management of Entity Framework migrations adapted to multiple app deployments

This post follows a question that I asked on stackoverflow several months ago. I did not receive a satisfactory answer at that time. I will expose the solution that we found at Keluro for this problem.

At Keluro, our client app products (for example the VSTO KMailAssistantKBilling and the SPA web apps) communicate with a REST web api. This web api uses EntityFramework 6.0  on top of SQL Server for the persistence. However, some of our web api deployments are not necessarily multitenant. Indeed, we do have multiple clients who do not want to share their infrastructure, they demand an isolated deployment mainly for security and confidentiality reasons. For us, in order to keep things simple, it was important to be sure that all of our client deployments share the same database schema, even if they got into production at a different step during the development of the products. Consequently, we have N deployments of our web api with as many database catalogs. We also want to have all our web api deployments to be up-to-date compared to a stable revision of the source code. To this aim, a continuous build is in charge to update all these web api. Necessarily, the associated databases also need to  be updated automatically. The Entity Framework is able, when the web app starts, to handle the migration process (if needed), the topic of this post is to propose a rigorous methodology to manage the migrations.

Entity Framework Code First supports migrations (see documentation here). To answer the problem explained above, when reading the documentation it is not clear how we should use the migrations. Indeed, we are told to call Update-Database from Visual Studio or to use Enable-Automatic migrations. Let us explain how to use the set of features proposed by Entity Framework to handle multiple deployment in a clean and rigorous way. Note also that this approach works well for local databases that are deployed with your ‘rich client‘  application, for example with SQL Server CE which we use for our VSTO addins KMailAssistant and KBilling.

TLDR; Determine a “stable production schema”:  which is the database schema corresponding to the web app code for a stable branch/tag in your source control. Avoid the so-called AutomaticMigration and always create new code based migration using Add-Migration with respect to an empty database that have been updated to the “stable production schema” after applying all existing migrations. Do not use Update-Database command to update your database in production, let the framework do it for you at startup using the MigrateDatabaseToLatestVersion initializer.  Then, when you release a new environment starts the application with an empty database. You will also have to take care of version control when working with feature branches.

Generating clean code based migrations

For the following, I will assume that you have read the Entity Framework documentation. It is very important that you decide what is the “stable database schema”. It corresponds to the schema determined by the source code (remind that we use EF Code First) for the selected revision in your stable branch or tag. We advise you to avoid the AutomaticMigrations. Actually, AutomaticMigration does not mean that the migrations created will be applied automatically (we will discuss how to do that latter). It means that the migration needed, which is the piece of SQL needed to change the database between its actual stated and what it should be, will be generated and applied on the fly. This is dangerous in our situation, think of our multiple deployments, they have not been started at the same time. Indeed, with automatic migrations some migrations could have been generated and applied for some older client environments while you have to push a new environment right now. Consequently, the history of automatically generated migrations could be different even for the same revision of the source code.

The best solution to avoid this situation is that all deployment share the same series of code based migrations. On a stable source code revision, the succession of existing code migrations applied to an empty database produce a database with the so-called “stable database schema” introduced above. Then, if a new client is deployed, an empty database (no tables, no data) is created and then all existing migrations will be applied automatically by Entity Framework when the web app starts the first time. For example, suppose that we have the following list of migrations: 201401050000000_MigrationA (January 5st 2014), 20150300000000_1MigrationB, 201504120000000_MigrationC, 201511150000000_MigrationD. This means that, if a client web app and its database is put in production on March 2015, all migrations will be applied (including MigrationA and MigrationB).

When using code base migrations, it is important to keep in mind that the migrations are presented as csharp files that represent the SQL instructions to be applied (e.g. drop a table, adding a column etc.). In addition, a given database has also a table _MigrationHistory which keeps all migrations that have been applied to it. Then, if all your web apps are up-to-date with respect to the same web app source code, for all your databases, you will get exactly the same rows in the table _MigrationHistory.

When an application starts, to automatically migrate the associated database to the latest migration, you have to run this code at startup (e.g. Globalasax.cs for an asp.net web app).

Database.SetInitializer(new MigrateDatabaseToLatestVersion<MyDbContext, Configuration>());
Code based list of migrations in Visual Studio

Code based list of migrations in Visual Studio

Keep a clean list of code based migrations

Let us explain how to keep a clean list of code migrations. I suggest to create a clean code based migration anytime a database schema change is required. To do so, you will need to use the Add-Migration command in your Package Manager Console in Visual Studio. Remind that if you do not specify a database connection string in the Powershell command, the connection string used will be the first found in your app.config or web.config file. This selected database may not have the proper “production schema”, it is error prone. My advice is to create only for the generation of this new code migration a database with no data but with the actual “production schema”. This is extremely simple and is also a sanity check of your existing migrations: create a new empty database, in one click in VisualStudio SQL Server Express (see picture below).

Create empty database from Visual Studio

Create empty database from Visual Studio

To update this database schema, take the connection string (right click on your database > Properties) then update the database by targeting the last migration, e.g.201511150000000_MigrationD, with the following command:

Update-Database MigrationD -ConnectionString "<yourConnectionString>" -ConnectionProviderName "System.Data.SqlClient" -Verbose

Then now this local database is “up-to-date” and you can generate your new migration named MigrationE (choose something more meaningful in your case) with the command:

Add-Migration MigrationE -ConnectionString "<yourConnectionString>" -ConnectionProviderName "System.Data.SqlClient" -Verbose

Then the migration files are generated, it is recommended to read them and make sure they correspond to the changes you intended to introduce. Now they are ready to be committed in a single and clean commit. As we have seen the migration is prefixed with number which corresponds to its generation date (e.g. 20150501000000_MigrationE). This number is effectively used by when using the MigrateDatabaseToLatestVersion database initialize and it can be a problem when not carefully used with version control.

Migrations and version control

There may be troubles when branching, to see this let us explain how the Entity Framework applies the code base migrations. Have a look at the table _MigrationHistory, the rows are the migrations, the date when the migration was generated is also there, because it is included in the name of the migration. Entity framework takes the date of most recent migration applied in the _MigrationHistory table and applies all migrations in the web app code that have been generated latter.

The _MigrationHistory table generated by Entity Framework

The _MigrationHistory table generated by Entity Framework

You see the potential problem? Say that you have created two feature branches: X and Y. Suppose that you have generated a migration for each of these branches, for X then for Y. But for some reason, you merged Y into your stable branch before X, the migration of X will not be applied!

To avoid this as much as possible, I suggest that you generate a minimum of migrations and , for each newly generated migration, put it in a dedicated commit with nothing but the code of the migration and with a clean indicator in the commit log (e.g. put a “[MIGRATION]” tag). Remark that the git rebase interactive command of git can be useful (take care when rebasing pushed commits!). For example you can remove all intermediate [MIGRATION] commits and regenerate a single one. or if you decide that a migration (not deployed!) is no longer needed you can drop the commit etc. I think it is wise thing to name a “database” master in your dev team. This person should be the one responsible for merging branches involving database migrations. He will be aware of the potential problem with migrations date generation and will know how to fix it.