Tag Archives: BigData

Using Analytics in Application Insights to monitor DocumentDB Requests

Following Wikipedia, DocumentDB is

Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.

The throughput of the database is charged and measured in request unit per second (RUs). Therefore, when creating application on top of DocumentDB, this is a very important dimension that you should pay attention to and monitor carefully.

Unfortunately, at the time of the writing the Azure portal tools to measure your RUs usage are very poor and not really usable. You have access to tiny charts where granularity cannot be really changed.

DocumentDB monitoring charts in Azure Portal

These are the only monitoring charts available in the Azure Portal

In this blog post, I show how Application Insights Analytics can be used to monitor the RUs consumption efficiently. This is how we monitor our collections now at Keluro.

Let us start by presenting Application Insights, it defines itself here as

an extensible Application Performance Management (APM) service for web developers on multiple platforms. Use it to monitor your live web application. It will automatically detect performance anomalies. It includes powerful analytics tools to help you diagnose issues and to understand what users actually do with your app.

Let us show how to use it in a C# application that is using the DocumentDB .NET SDK.

First you need to install the Application Insights Nuget Package. Then, you need to track the queries using a TelemetryClient object, see a sample code below.

public static async Task<FeedResponse<T>> LoggedFeedResponseAsync<T>(this IQueryable<T> queryable, string infoLog, string operationId)
{
	var docQuery = queryable.AsDocumentQuery();
	var now = DateTimeOffset.UtcNow;
	var watch = Stopwatch.StartNew();
	var feedResponse = await docQuery.ExecuteNextAsync<T>();
	watch.Stop();
	TrackQuery(now, watch.Elapsed, feedResponse.RequestCharge, "read", new TelemetryClient(), infoLog, operationId, feedResponse.ContentLocation);
	return feedResponse;
}

public static void TrackQuery(DateTimeOffset start, TimeSpan duration, double requestCharge, string kind, TelemetryClient tc, string infolog, string operationId, string contentLocation)
{
	var dependency = new DependencyTelemetry(
			"DOCDB",
			"",
			"DOCDB",
			"",
			start,
			duration,
			"0", // Result code : we can't capture 429 here anyway
			true // We assume this call is successful, otherwise an exception would be thrown before.
			);
	dependency.Metrics["request-charge"] = requestCharge;
	dependency.Properties["kind"] = kind;
	dependency.Properties["infolog"] = infolog;
	dependency.Properties["contentLocation"] = contentLocation ?? "";
	if (operationId != null)
	{
		dependency.Context.Operation.Id = operationId;
	}
	tc.TrackDependency(dependency);
}

The good news is that you can now effectively keep records of all requests made to DocumentDB. Thanks to a great component of Application Insights named Analytics, you can browse the queries and see their precise request-charges (the amount of RUs consumed).

You can also add identifiers (with variables such as kind and infolog in sample above) from your calling code for a better identification of the requests. Keep in mind that the request payload is not saved by Application Insights.

In the screenshot below you can list and filter the requests tracked with DocumentDB in Application Insights Analytics thanks to its amazing querying language to access data.

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

Getting all requests to DocumentDB in a a timeframe using application Insights Analytics

There is one problem with this approach is that for now, using this technique and DocumentDB .NET SDK we do not have access to the number of retries (the 429 requests). This is an open issue on Github.

Finally, Analytics allows us to create a very important chart. The accumulated RUs per second for a specific time range.
The code looks like the following one.

dependencies
| where timestamp > ago(10h)
| where type == "DOCDB"
| extend requestCharge = todouble(customMeasurements["request-charge"])
| extend docdbkind = customDimensions["kind"]
| extend infolog = customDimensions["infolog"]
| order by timestamp desc
| project  timestamp, target, data, resultCode , duration, customDimensions, requestCharge, infolog, docdbkind , operation_Id 
| summarize sum(requestCharge) by bin(timestamp, 1s)
| render timechart 

And the rendered charts is as follows

Accumulated Request-Charge per second (RUs)

Accumulated Request-Charge per second (RUs)

Programming well structured Javascript stored procedures for DocumentDB with Typescript and SystemJs

EDIT: code sample on github https://github.com/bpatra/StoredProcedureGeneration

If you are using DocumentDB you may had to write your own stored procedure. A stored procedure is a function written in Javascript that runs on the DocumentDB cloud infrastructure. It may reduce performance problem or make you execute some queries that are not supported yet through the REST API such as aggregate function.

A stored procedure should be registered as a single function of the form:

function myStoredProcedure(arg1, arg2){ /* you can place all the arguments you want here*/
    //The body of the function here

    //you set the response like this
    var context = getContext();
    context.setBody(myResult);
}

You can create function inside the myStoredProcedure function body. However, you cannot create other functions in the file otherwise DocumentDB will complain. This is quite annoying because you probably want to create independent and reusable pieces of code for testability or simply for the sake of readability. The problem comes from the fact that you cannot really use out-of-the box third party module management libraries such as requireJS, commonJS or SystemJS. This sounds no good. Is that means we are forced to inline all our code in a big not modular and un-testable Javascript file!?!?

The answer is no, in this blog post I will show you the solution we implemented at Keluro to overcome this problem. This solution is based on SystemJs and SystemJs-Builder to create a standalone function where all modules/class dependencies are embedded in the stored procedure single function which acts as our entry point. The code snippets presented in the following are extracted from the following git repository.

In this article, we will use Visual Studio as an IDE but it is not mandatory. Actually, it is only to simplify the options settings for compiling Typescript files, you can invoke the compiler manually exactly with the same set of options.

In the following, we will take an example where the stored procedure computes the sum of input arguments. Therefore, we create a Typescript class for the core of our stored procedure called Utilitary that contains a method called sumValues.

A typescript class used in DocumentDB stored procedure

A typescript class used in DocumentDB stored procedure

Then we create the entry point of the stored procedure in a Typescript file that uses the DocumentDB current context. We benefit from the typings from documentdb-server.d.ts that defines the IContext interface.

MyStoredProcedure typescript file executed by DocumentDB

MyStoredProcedure typescript file executed by DocumentDB

We compile the Typescript files has SystemJs modules and we redirect all generated Typescript in the directory out-js. As I told you, no need of VisualStudio to do this, you can achieve the same result by invoking the Typescript compiler with similar options.

ompiling options set in Visual Studio

ompiling options set in Visual Studio

If you look at the generated Typescript you will get something that looks like

System.register(["Utilitary"], function(exports_1, context_1) {
    "use strict";
    var __moduleName = context_1 && context_1.id;
    
     /* ETC */
});
//# sourceMappingURL=myStoredProcedure.js.map

If you try to run such a Javascript source code directly with DocumentDB you will get an error. These modules are meant to run with System.js and in the case of a stored procedure you cannot reference any third party library such as System.js. This is where SystemJs-Builder will come into play. SystemJs-builder will package everything so that your modules are independent of System.js and they can be used as a standalone file.

To invoke SystemJs-Builder you need to use Node.js. In the sample git repository, you will have to hit ‘npm install’ to restore the SystemJs-Builder Node package. When the Typescript files are compiled invoke the node script “documentdb_storedprocedure_builder.js” at the root of the repository. This script basically executes two tasks. First, it generates a standalone Javascript file with SystemJs-Builder. Secondly, because DocumentDB explicitly needs a file with one function and not an executable Javascript script, we wrap this code inside a function that will act as our entry point for the stored procedure.

We also retrieve the arguments passed to the storedprocedure procedure function in a variable called storedProcedureArgs that is kept in the global namespace. The resulting file is generated and put in the directory generated-procedure. Finally, this file contains what we needed: a standalone and executable by DocumentDB Javascript function. With this approach all Typescript classes can be reused for other stored procedures , for unit tests or anywhere in your Typescript code base.

There are probably thousands of alternatives to create testable and decoupled stored procedures. We liked this approach because it reuses the same tools that we were already using for our single page applications: Typescript and SystemJs. To conclude let me thank Olivier Guimbal who showed us some months ago how Typescript, SystemJs and SystemJs-Builder worked well together.

My TransientFaultHandling utilitary classes for DocumentDB

Keluro uses extensively DocumentDB for data persistence. However, it’s extensive scaling capabilities come with a price. Indeed with your queries or your commands you may exceed the amout of request unit you are granted. In that case you will received a 429 error “Request rate too large” or a “DocumentException” if you use the .NET SDK. It is your responsability then to implement the retry policies to avoid such a failure and wait the proper amout of time before retrying.

Edit: look at the comment below. The release v1.8.0 of the .NET SDK proposes some settings options for these retry policies.

Some samples are provided by Microsoft on how to handle this 429 “Request too large error”, but they are concerning only commands, such as inserting or deleting a document, there is no sample on own to implement the retry policies for common queries. A Nuget package is also available: “Microsoft.Azure.Documents.Client.TransientFaultHandling” but even if integrating it is as quick as an eye blink, there is no logging capabilities. In my case it did not really resolve my exceeding RU problem, I even doubt that I made it work and the code is not opensource. Then, I decided to integrate the ideas from the samples in own utilitary classes on top of the DocumentDB .NET SDK.

The idea is similar to “TransientFaultHandling” package: to wrap the DocumentClient inside another class exposed only through an interface. By all accounts, it is a good thing to abstract the DocumentClient behind an interface for testability purposes. In our case this interface is named IDocumentClientWrapped.

public interface IDocumentClientWrapped : IDisposable
{
    /* Some command like methods*/
    Task DeleteDocumentAsync(Uri uri);

    Task CreateDocumentAsync(Uri uri, object obj);

    /* Some query like methods*/
    IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentUri, FeedOptions feedOptions = null);

    IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentCollectionUri, SqlQuerySpec sqlSpec);

    /* Copy here all other public method signature from DocumentClient but return IRetryQueryable instead of IOrderedQueryable or IQueryable */
}

Instead of returning an Queryable<T> instance as DocumentClient would do we return an IRetryQueryable<T>. This latter type, whose definition will follow, is also a wrapper on the IQueryable<T> instance returned by the DocumentDB client. However, this interface explicitely retries when the enumeration fails because of 429 request too large exception raised by the database engine, DocumentDB in our case.

public interface IRetryQueryable<T>
{
    IRetryQueryable<T> Where(Expression<Func<T, bool>> predicate);

    IDocumentQuery<T> AsDocumentQuery();

    IEnumerable<T> AsRetryEnumerable();

    IRetryQueryable<TResult> SelectMany<TResult>(Expression<Func<T, IEnumerable<TResult>>> predicate);

    IRetryQueryable<TResult> Select<TResult>(Expression<Func<T, TResult>> predicate);
}

In this interface we only expose the method extension methods that are actually supported by the “real” IQueryable<T> instance returned by DocumentDB: Select, SelectMany, Where etc. For example, at the time of the writing GroupBy is not supported. You would get an runtime exception if you used it directly on the IQueryable<T> instance returned by DocumentClient.

Now look at how we use these interfaces in the calling code.

IDocumentClientWrapped client = /* instanciation */;
IRetryQueryable<MyType> query = client.CreateDocumentQuery<MyType>(/* get usual uri using the UriFactory*/);
IEnumerable<string> = query.Where(c => c.Member1 == "SomeValue").Select(c=>c.StringMember2).AsRetryEnumerable();
/* manipulate your in memory enumeration as you wish*/

Independently of the retry policies classes and the “request too large errors”, let me emphasis that LINQ can be tricky here. Indeed, the piece of code above is completly different from this one:

IDocumentClientWrapped client = /* instanciation */;
IRetryQueryable<MyType> query = client.CreateDocumentQuery<MyType>(/* get usual uri using the UriFactory*/);
IEnumerable<string> = query.AsRetryEnumerable().Where(c => c.Member1 == "SomeValue").Select(c=>c.StringMember2);
/* manipulate your in memory enumeration as you wish*/

In this case the Where constraint is perfomed “in memory” by your .NET application server. It means that you have fetched all data from DocumentDB to your app server. If MyType contains a lot of data, then all have been transfered from DocumentDB to your application server and/or if the Where constraint filters a lot of documents you will probably have a bottleneck.

Let us get back to our problem. Now that we saw that having retry policy for a query means only calling AsRetryEnumerable() instead of AsEnumerable() let us jump to the implementation of thoses classes.

The idea is to use an IEnumerator that “retries” and use two utility method: ExecuteWithRetry,ExecuteWithRetryAsync. The former one for basic mono threaded calls while the latter is for the async/await context. Most of this code is verbose because it is only wrapping implementation. I hope it will be helpful for others.

public class DocumentClientWrapped : IDocumentClientWrapped
{
    private class RetriableEnumerable<T> : IEnumerable<T>
    {
        private readonly IEnumerable<T> _t;
        public RetriableEnumerable(IEnumerable<T> t)
        {
            _t = t;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new RetriableEnumerator<T>(ExecuteWithRetry(() => _t.GetEnumerator()));
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }
    }

    private class RetriableEnumerator<T> : IEnumerator<T>
    {
        private IEnumerator<T> _t;
        public RetriableEnumerator(IEnumerator<T> t)
        {
            _t = t;
        }

        public T Current
        {
            get
            {
                return ExecuteWithRetry(()=> _t.Current);
            }
        }

        object IEnumerator.Current
        {
            get
            {
                return ExecuteWithRetry(() => _t.Current);
            }
        }

        public void Dispose()
        {
            _t.Dispose();
        }

        public bool MoveNext()
        {
            return ExecuteWithRetry(() => _t.MoveNext());
        }

        public void Reset()
        {
            _t.Reset();
        }
    }

    private class RetryQueryable<T> : IRetryQueryable<T>
    {
        private readonly IQueryable<T> _queryable;
        public RetryQueryable(IQueryable<T> queryable)
        {
            _queryable = queryable;
        }

        public IDocumentQuery<T> AsDocumentQuery()
        {
            return this._queryable.AsDocumentQuery();
        }

        public IEnumerable<T> AsRetryEnumerable()
        {
            return new RetriableEnumerable<T>(this._queryable.AsEnumerable());
        }

        public IRetryQueryable<T> Where(Expression<Func<T, bool>> predicate)
        {
            var queryable = this._queryable.Where(predicate);
            return new RetryQueryable<T>(queryable);
        }

        public IRetryQueryable<TResult> SelectMany<TResult>(Expression<Func<T, IEnumerable<TResult>>> predicate)
        {
            var queryable = this._queryable.SelectMany(predicate);
            return new RetryQueryable<TResult>(queryable);
        }

        public IRetryQueryable<TResult> Select<TResult>(Expression<Func<T, TResult>> predicate)
        {
            var queryable = this._queryable.Select(predicate);
            return new RetryQueryable<TResult>(queryable);
        }
    }


    private const int MaxRetryCount = 10;

    private static async Task<V> ExecuteWithRetriesAsync<V>(Func<Task<V>> function)
    {
        TimeSpan sleepTime = TimeSpan.FromSeconds(1.0);
        int count = 0;
        while (true)
        {

            try
            {
                return await function();
            }
            catch (DocumentClientException de)
            {
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB async retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            catch (AggregateException ae)
            {
                if (!(ae.InnerException is DocumentClientException))
                {
                    throw;
                }

                DocumentClientException de = (DocumentClientException)ae.InnerException;
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB async retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            await Task.Delay(sleepTime);
        }
    }

    private static V ExecuteWithRetry<V>(Func<V> function)
    {
        TimeSpan sleepTime = TimeSpan.FromSeconds(1.0);
        int count = 0;
        while (true)
        {
            try
            {
                return function();
            }
            catch (DocumentClientException de)
            {
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB sync retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            catch (AggregateException ae)
            {
                if (!(ae.InnerException is DocumentClientException))
                {
                    throw;
                }

                DocumentClientException de = (DocumentClientException)ae.InnerException;
                if ((int)de.StatusCode != 429)
                {
                    throw;
                }
                if (++count > MaxRetryCount)
                {
                    throw new MaxRetryException(de, count);
                }
                Trace.TraceInformation("DocumentDB sync retry count: {0} - retry in: {1} - RUs: {2}", ++count, sleepTime.TotalMilliseconds, de.RequestCharge);
                sleepTime = de.RetryAfter;
            }
            Thread.Sleep(sleepTime);
        }
    }

    private readonly DocumentClient _client;
    public DocumentClientWrapped(string endpointUrl, string authorizationKey)
    {
        _client = new DocumentClient(new Uri(endpointUrl), authorizationKey);
    }

    public async Task CreateDocumentAsync(Uri documentCollectionUri, object obj)
    {
        ResourceResponse<Document> response = await ExecuteWithRetriesAsync(()=> _client.CreateDocumentAsync(documentCollectionUri, obj));
        LogResponse<Document>("CreateDocumentAsync", response);/* Do something with the response if you want to use it */
    }

    public async Task DeleteDocumentAsync(Uri documentUri)
    {
        ResourceResponse<Document> response = await ExecuteWithRetriesAsync( () => _client.DeleteDocumentAsync(documentUri));
        LogResponse<Document>("DeleteDocumentAsync", response); /* Do something with the response if you want to use it */
    }

    public async Task ReplaceDocumentAsync(Uri documentCollectionUri, object document)
    {
        var response = await ExecuteWithRetriesAsync(()=>  _client.ReplaceDocumentAsync(documentCollectionUri, document));
        LogResponse<Document>("ReplaceDocumentAsync", response);
    }

    public IRetryQueryable<T> CreateDocumentQuery<T>(Uri documentCollectionUri, SqlQuerySpec sqlSpec)
    {
        var queryable = _client.CreateDocumentQuery<T>(documentCollectionUri, sqlSpec);
        return new RetryQueryable<T>(queryable);
    }
}

An unexpected quadratic cost in Excel Interop

My current task at work is the development of an excel addin. This addin is developed in C# .NET and is based on the Excel Interop assemblies, we also use Excel DNA for the packaging and the User Defined functions. While developing a new feature I stumbled upon a technical oddity of Excel Interop which I will describe to you in this post.

Let me start this post by reminding you that a range in Excel is not necessarily a block of contiguous cells. Indeed, try it yourself by starting excel right now. Then you can select a range, keep the Ctrl button of your keyboard press on and select an other block of contiguous cell. Then, you have several cells selected that you can name (as shown in the screenshot). Finally, you have created a named range with non-contiguous cells.

ranges

Having said that, let us assume that for our addin we need to reference programmatically the range of all lines of the form, with usual excel notations, Ax:Cx where x describes a set of row indices. 

Then we need to use the method Application.Union of Micorsoft.Office.Interop.Excel and finally produce few lines of code that  looks like that.

In the chart above we have monitored the time execution of the BuildUnionAll method for different values of the parameter count.

linear

Remark: in the case of BuildUnionAll there is no need to use a loop and the Union method, we could have just ask for the range “A1:Ccount”. Note also that for small unions you may also use a syntax with semicolon to separate contiguous cells blocks e.g. worsheet.Range[“A1:C3;A8:C12”]. However, this does not work for large ranges made of multiple non-adjacent cells blocks.

So far so good, we see an almost linear curve, which is natural regarding to what we were expecting.

Now, change a little bit our expectation to something quite different and more realistic where we would truly need the Application.Union method. Then let us say that we would like to have the union Ax:Cx mentioned above but for x odd index only. We want the IEnumerable<int> to have the same size than before, so let us use the method in the code below.

Similarly, we monitor the performance curve.

quadratic

Argghhh!!! we get an awful quadratic cost which makes our approach completely broken for large unions. This quadratic cost was unexpected. Actually I did not found documentation on this (the msdn doc on Excel.Interop is really minimalist). However, it seems that the rule is that the Union method has a linear cost depending on the number of Areas in the input ranges. The Areas member of a Range is the collection containing the blocks of contiguous cells of the range.

This experimental rule, leads us to review our algorithm in a different way. If the cost of an Union is important (linear) when there are many areas in a range. Then we will try to minimize this situation: we will let our algorithm perform the union preferably on ranges having fewer areas. Once again, the basic techniques from school bring a good solution and we will design a very simple recursive algorithm based on the divide and conquer approach, inspired for the merge sort algorithm.

To the end, we recover an almost linear curve. The asymptotic complexity here (if the rows to pick are not adjacent to each other) equals the merge sort one which is O(n.ln(n)).

divideAndConquer

Rewriting a tree algorithm with purely functional data structures: the “zippers”.

This is my first post, it follows a question that I have posted few weeks ago here. A really interesting link provided by a stackoverflow user made me review my question and lead me to write this post. The objective here is to describe the modification of an existing algorithm implementation, from a first try which uses intensively mutability of the underlying data structure to a purely immutable functional implementation. We will show that this can be achieved quite easily when using the appropriate data structures which in our case are the so-called “zippers”.

I discovered the F# programming language recently. While implementing some stuff for the discovery of the language I came confronted with the next problem. Given a sequence of branches coming from a tree but accessible one by one in a random order, how to reconstruct (efficiently) a labeled tree. For example, imagine to retrieve your disk paths in a random order e.g. “C:\Foo”, “C:\”, “C:\Foobar”, “C:\Foo1\bar2” etc. and you would like to rebuild your filesystem.

The branches are supposed to be represented by a linked list of string labels. We also assume that the sequence does not contain twice the same branch and it is in valid, in the sense that a labeled n-ary tree can be created with it. There may be many trees that can be built but we will accept any of them, in other words, we do not care about the order of the children of a given node. The n-ary tree data structure is the usual one where the children of a node are represented by a linked list.

I have no doubt that there may exit better solutions but let us focus at the first implementation that came to my mind as a seasoned imperative and an inexperienced functional programmer. The algorithm is recursive and works as follows.

The branch is inserted in the tree using the recursive procedure:
If the current tree is empty then it is replaced by the branch (as a tree). Except the previous corner case, the invariant loop states that “the label of the current tree node is equal to head of the current branch”.
Having said that we explore a step below:
If the tail of the current branch is empty list then there is nothing to do. Indeed, following the invariant loop, the current branch is actually a sub-branch of the existing tree.
Else, we search the children of the given node and try to find a match with the next value of the current branch.
If such a match exists we are not done yet, we have to go deeper and apply our procedure on the subtree of the matched child with the tail of current branch.
If there is no match, we are arrived to an end so we append the current branch (transformed as a tree) to the children list.

Implementing this in F# leads to the following code (provided some help from Leaf Garland). The function that builds the tree is the mergeInto that performs side effects on its first parameter.

Now apply this with a simple example.

However, this implementation is quite frustrating because we have hacked our functional data structure with F# ref cells to enable mutability just for this particular construction. We needed such mutability to perform our “append” operation on the tree. We were focused on a subtree and we could not return a new tree with the branch attached correctly.  Because of the recursion, we did not know where we were exactly.

This is were the zippers come into play. The zippers were introduced by Gerard Huet in this paper. The authors says that they have probably been “invented at numerous occasions by creative programmers” but never published. I recommend you to have a look on the first fourth pages of this paper, it is very accessible for an academic publication and may help you for the following.

A zipper is a practical implementation of a given data structure, this is why we can say that what follows is a labeled n-ary tree implemented using the zipper pattern. To sum up, the zipper enables you to focus on a subtree but the structure keeps the path that lead you there granting you to return a new immutable instance of the complete tree.

The translation of zippers implementation of Huet’s paper from OCAML to F# is straightforward. In the implementation below I reused the good idea from this post where the author changed the order of the function arguments to enable F# pipe-forward. The go functions are used for tree traversal while the insert returns a new data structure with the input tree inserted. Most of the implementation of the zippers provided on the web are designed for leaf-labeled trees, so for our case we need to extend the types to support labeled trees. In the code below, we have removed the go_left and insert_left because they were not of any use for the following.

The power of the zippers comes from the Path and Location discriminated unions, even if the focus is on a subtree we know where we are because we kept track of what is on our left, above and on our right. If you want to visualize some zipper I suggest you to read this post.
You can see that the difference with Huet’s implementation on leaf-labeled trees is that the Path type contains the label of the currentPosition (that can be retrieved with the getLabel method). I was in difficulty to implement the go_up method without it (to build the parent node), though I am not completely satisfied with this…

Now, it is time to review our algorithm and we will see that it will become more elegant and less imperative:
The main difference comes from the fact that we do not have to look for a “matched child” while keeping the focus on the current node. We can go down directly and move right.
Precisely, the first corner case with the Empty tree stays similar. The invariant loop states “that the parent label of the current node equals the head of the current branch”.
Then, if the label of the node matches the head of the branch, we have to go deeper. If there is no right simbling then this is our destination and we have to append the branch here. Finally, if there is a younger simbling (a node at the right) reapply the procedure but put the focus on him.

In this new implementation, we can append the branch to the good location because we have an insert_right operation. In the previous implementation we could not do this (without modifying once again the tree structure) because we would have gone too deep to append the tree to the list kept by the parent of our node.

Remark that we have not speak the word “zipper” in the description, the zippers are only wrappers that provides efficient and useful operations on our immutable data structure.
So the core algorithm looks now

We introduce the getZipper function that creates a zipper around a tree and the appendToTree which is the method that we will call effectively (abstracting the zipper). Here how it works with the same branch sequence.

So now we have a competitor to our mergeInto method which uses only purely functional and immutable datastructure and which does not require any modification on the exposed tree data structure. Remark that provided the same input sequence of branches the two algorithms do not build the same tree. However, I would not said that the zipper is without default for our case. The zipper structure is quite complex , that is non negligible. We have also added an extra complexity cost, not in the appendToTreeZipper function but in appendToTree, it comes from the root function. After a quick examination of the go_up method an upper bound for the root function complexity can be found, it is the height of the tree multiplied by the number of individuals in the largest sibship.