GSoC – Kate Code Folding (some more details about folding)
Hi!
Yesterday I received a comment on my previous blog post that it would be nice to give some more technical details about the folding.
Initially, this was a comment response, but because it proved to be pretty interesting and more of you read the articles than the comments, I decided to post it as an article :)
Here are some details that you could find interesting:
Kate’s folding is based on a real / virtual line system. All the folded block’s lines have the same virtual line (they are literally folded under the first block’s line).
But how are these blocks defined?
Kate uses the file extension to determine what defines a block (special characters like “{“ or “}”, text alignment and so on). Kate’s highlight sends some info about these to Kate’s code folding. Let’s call them “document’s anchors”. This info is rudimentary. It is mentioned only the type of anchor (encoded as a number greater than 0 if it is a start anchor and lower than 0 if it is an end anchor) and its position in the document (real position). I use this info to build the “folding nodes” (I will detail them later) and to keep their position updated as well. So, here we have the first data structure: the “line mapping”. Line mapping is a QMap <line_number, QVector > and contains an entry for each document line that has at least one anchor I was talking about earlier. If this structure is updated correctly, then I am sure that all the folding nodes have their position set correctly and I don’t need to worry about that.
Now, I know where the document anchors are. But that isn’t enough – I need to form folding blocks with them.
I found two solutions for this “matching” problem:
- Use a stack and rebuild (using the line mapping) the folding tree from scratch every time something is changing. The algorithm was very simple (parenthesis matching problem), but I found it pretty ugly (and slow) to rebuild the entire tree so often. I didn’t discard the idea either. The algorithm was perfect to be used as a term of comparison for the other algorithm.
- Use a tree algorithm that changes itself dynamically. This was the solution I chose. I developed an entire project (see my previous posts) that helped me improve this algorithm and I believe it works pretty well.
The folding tree is made of folding nodes. These nodes contain info about their position (actually the document’s anchor that they represent), parent, type, matching node – all the info needed to define the folding blocks. If this tree is correct, then the blocks are defined correctly.
If you have the folding blocks defined, then you can fold them. To do this I use the third (and the last) data structures: a QList with the folded nodes. You might ask why we needed another(!) data structure. There are 3 methods that are called very, very often: getRealLine (virtualLine), getVirtualLine(realLine) and getHiddenLines(). Search through all the folding nodes (using the folding tree or the line mapping) is slow. Imagine if we do this search for every line of the document (actually there are about 2 or 3 calls per line every time something changes in the document!). I have also made one more improvement here: if a folded block contains another folded block, then the inner folded block is removed from this structure (it won’t help us discover the virtual or real lines).
Basically, these 3 methods are the code folding – the real / virtual line system I was talking about in the beginning. But.. there is a long way and lot of work to be able to answer these 3 questions fast and correctly.