Tag Archives: algorithm

An unexpected quadratic cost in Excel Interop

My current task at work is the development of an excel addin. This addin is developed in C# .NET and is based on the Excel Interop assemblies, we also use Excel DNA for the packaging and the User Defined functions. While developing a new feature I stumbled upon a technical oddity of Excel Interop which I will describe to you in this post.

Let me start this post by reminding you that a range in Excel is not necessarily a block of contiguous cells. Indeed, try it yourself by starting excel right now. Then you can select a range, keep the Ctrl button of your keyboard press on and select an other block of contiguous cell. Then, you have several cells selected that you can name (as shown in the screenshot). Finally, you have created a named range with non-contiguous cells.


Having said that, let us assume that for our addin we need to reference programmatically the range of all lines of the form, with usual excel notations, Ax:Cx where x describes a set of row indices. 

Then we need to use the method Application.Union of Micorsoft.Office.Interop.Excel and finally produce few lines of code that  looks like that.

In the chart above we have monitored the time execution of the BuildUnionAll method for different values of the parameter count.


Remark: in the case of BuildUnionAll there is no need to use a loop and the Union method, we could have just ask for the range “A1:Ccount”. Note also that for small unions you may also use a syntax with semicolon to separate contiguous cells blocks e.g. worsheet.Range[“A1:C3;A8:C12”]. However, this does not work for large ranges made of multiple non-adjacent cells blocks.

So far so good, we see an almost linear curve, which is natural regarding to what we were expecting.

Now, change a little bit our expectation to something quite different and more realistic where we would truly need the Application.Union method. Then let us say that we would like to have the union Ax:Cx mentioned above but for x odd index only. We want the IEnumerable<int> to have the same size than before, so let us use the method in the code below.

Similarly, we monitor the performance curve.


Argghhh!!! we get an awful quadratic cost which makes our approach completely broken for large unions. This quadratic cost was unexpected. Actually I did not found documentation on this (the msdn doc on Excel.Interop is really minimalist). However, it seems that the rule is that the Union method has a linear cost depending on the number of Areas in the input ranges. The Areas member of a Range is the collection containing the blocks of contiguous cells of the range.

This experimental rule, leads us to review our algorithm in a different way. If the cost of an Union is important (linear) when there are many areas in a range. Then we will try to minimize this situation: we will let our algorithm perform the union preferably on ranges having fewer areas. Once again, the basic techniques from school bring a good solution and we will design a very simple recursive algorithm based on the divide and conquer approach, inspired for the merge sort algorithm.

To the end, we recover an almost linear curve. The asymptotic complexity here (if the rows to pick are not adjacent to each other) equals the merge sort one which is O(n.ln(n)).


Rewriting a tree algorithm with purely functional data structures: the “zippers”.

This is my first post, it follows a question that I have posted few weeks ago here. A really interesting link provided by a stackoverflow user made me review my question and lead me to write this post. The objective here is to describe the modification of an existing algorithm implementation, from a first try which uses intensively mutability of the underlying data structure to a purely immutable functional implementation. We will show that this can be achieved quite easily when using the appropriate data structures which in our case are the so-called “zippers”.

I discovered the F# programming language recently. While implementing some stuff for the discovery of the language I came confronted with the next problem. Given a sequence of branches coming from a tree but accessible one by one in a random order, how to reconstruct (efficiently) a labeled tree. For example, imagine to retrieve your disk paths in a random order e.g. “C:\Foo”, “C:\”, “C:\Foobar”, “C:\Foo1\bar2” etc. and you would like to rebuild your filesystem.

The branches are supposed to be represented by a linked list of string labels. We also assume that the sequence does not contain twice the same branch and it is in valid, in the sense that a labeled n-ary tree can be created with it. There may be many trees that can be built but we will accept any of them, in other words, we do not care about the order of the children of a given node. The n-ary tree data structure is the usual one where the children of a node are represented by a linked list.

I have no doubt that there may exit better solutions but let us focus at the first implementation that came to my mind as a seasoned imperative and an inexperienced functional programmer. The algorithm is recursive and works as follows.

The branch is inserted in the tree using the recursive procedure:
If the current tree is empty then it is replaced by the branch (as a tree). Except the previous corner case, the invariant loop states that “the label of the current tree node is equal to head of the current branch”.
Having said that we explore a step below:
If the tail of the current branch is empty list then there is nothing to do. Indeed, following the invariant loop, the current branch is actually a sub-branch of the existing tree.
Else, we search the children of the given node and try to find a match with the next value of the current branch.
If such a match exists we are not done yet, we have to go deeper and apply our procedure on the subtree of the matched child with the tail of current branch.
If there is no match, we are arrived to an end so we append the current branch (transformed as a tree) to the children list.

Implementing this in F# leads to the following code (provided some help from Leaf Garland). The function that builds the tree is the mergeInto that performs side effects on its first parameter.

Now apply this with a simple example.

However, this implementation is quite frustrating because we have hacked our functional data structure with F# ref cells to enable mutability just for this particular construction. We needed such mutability to perform our “append” operation on the tree. We were focused on a subtree and we could not return a new tree with the branch attached correctly.  Because of the recursion, we did not know where we were exactly.

This is were the zippers come into play. The zippers were introduced by Gerard Huet in this paper. The authors says that they have probably been “invented at numerous occasions by creative programmers” but never published. I recommend you to have a look on the first fourth pages of this paper, it is very accessible for an academic publication and may help you for the following.

A zipper is a practical implementation of a given data structure, this is why we can say that what follows is a labeled n-ary tree implemented using the zipper pattern. To sum up, the zipper enables you to focus on a subtree but the structure keeps the path that lead you there granting you to return a new immutable instance of the complete tree.

The translation of zippers implementation of Huet’s paper from OCAML to F# is straightforward. In the implementation below I reused the good idea from this post where the author changed the order of the function arguments to enable F# pipe-forward. The go functions are used for tree traversal while the insert returns a new data structure with the input tree inserted. Most of the implementation of the zippers provided on the web are designed for leaf-labeled trees, so for our case we need to extend the types to support labeled trees. In the code below, we have removed the go_left and insert_left because they were not of any use for the following.

The power of the zippers comes from the Path and Location discriminated unions, even if the focus is on a subtree we know where we are because we kept track of what is on our left, above and on our right. If you want to visualize some zipper I suggest you to read this post.
You can see that the difference with Huet’s implementation on leaf-labeled trees is that the Path type contains the label of the currentPosition (that can be retrieved with the getLabel method). I was in difficulty to implement the go_up method without it (to build the parent node), though I am not completely satisfied with this…

Now, it is time to review our algorithm and we will see that it will become more elegant and less imperative:
The main difference comes from the fact that we do not have to look for a “matched child” while keeping the focus on the current node. We can go down directly and move right.
Precisely, the first corner case with the Empty tree stays similar. The invariant loop states “that the parent label of the current node equals the head of the current branch”.
Then, if the label of the node matches the head of the branch, we have to go deeper. If there is no right simbling then this is our destination and we have to append the branch here. Finally, if there is a younger simbling (a node at the right) reapply the procedure but put the focus on him.

In this new implementation, we can append the branch to the good location because we have an insert_right operation. In the previous implementation we could not do this (without modifying once again the tree structure) because we would have gone too deep to append the tree to the list kept by the parent of our node.

Remark that we have not speak the word “zipper” in the description, the zippers are only wrappers that provides efficient and useful operations on our immutable data structure.
So the core algorithm looks now

We introduce the getZipper function that creates a zipper around a tree and the appendToTree which is the method that we will call effectively (abstracting the zipper). Here how it works with the same branch sequence.

So now we have a competitor to our mergeInto method which uses only purely functional and immutable datastructure and which does not require any modification on the exposed tree data structure. Remark that provided the same input sequence of branches the two algorithms do not build the same tree. However, I would not said that the zipper is without default for our case. The zipper structure is quite complex , that is non negligible. We have also added an extra complexity cost, not in the appendToTreeZipper function but in appendToTree, it comes from the root function. After a quick examination of the go_up method an upper bound for the root function complexity can be found, it is the height of the tree multiplied by the number of individuals in the largest sibship.