Category Archives: Developers

Kate Internals: Text Buffer

Right now, in Kate’s gitorious repository we, or rather mainly Christoph, is rewriting the heart of Kate: The text buffer. In this blog I’ll explain how the text buffer works in detail. After reading, you’ll be able to understand the code. And be warned: Probably you will be desperately amazed about its awesomeness! So this is a must read :^)

Storage. Kate Part internally stores the content of a text document as a QVector of text lines. Each text line holds the text in a QString object as well as additional data like color attributes for highlighting. Inserting and removing text lines implies inserting or removing items in the QVector of the text lines. This is can get slow when there are thousands of text lines in a document, because QVector has to move the memory every time. Thus, to keep text operations fast, Kate splits all text lines into several text blocks: Each text block contains a certain amount of lines (e.g. 256). The expense of memory movement is then always limited. When a text block grows too much, it is automatically split. When the amount of lines in a text block shrinks, it is merged with the previous text block. In other words, Kate’s text buffer automatically balances the text blocks. The basic idea of Kate’s text buffer, the text blocks and text lines looks like this:

Text Cursors and Ranges. Text manipulation always takes place at certain positions in a text. Such a position is called a text cursor. Each text cursor is defined by a line and a column. Further, to specify e.g. text selections, we need text ranges. A text range has a start cursor and an end cursor. For instance, if you have two views of the same document, you want the cursor in the 2nd view to keep its correct position in the text if you insert text in the 1st view. Thus, text cursors have to be kind of intelligent. Whenever text changes, each text cursor needs to be moved. As a text range just consists of two text cursors, we will focus only on text cursors for now. The question is how to implement this efficiently? If we store all text cursors in a list in the text buffer, we have to iterate over all text cursors on every editing operation. This obviously does not scale for thousands of text cursors (KDevelop for instance uses thousands of text cursors for the highlighting). The solution is the same as with the text content itself: Let’s just put a list of all text cursors in a text block to the text block itself. During text editing, we only have to adapt all text cursors of a single text block instead of all text cursors in a document. This looks as follows:

Editing Operations. When editing text, you usually have only four different types of text manipulation:

  1. insert text at a text cursor inside one line
  2. remove text at a text cursor inside one line
  3. wrap line at a text cursor (e.g. by hitting <return/enter>)
  4. unwrap line (e.g. by hitting <backspace> at the beginning of a line)

Those editing primitives are implemented in the Kate::TextBuffer and Kate::TextBlock and take care of balancing the text blocks. For each one the set of text cursors in a text block has to be adapted. Of course, corner cases need to be taken into account: For instance, unwrapping the first line in a text block means that cursors in the first line need to be moved to the previous text block. All editing primitives emit a signal, such that additional layers later can track what happens. For instance, the undo/redo system needs to know all editing operations. Or on top of that we could implement vim like swap file support, i.e., track all changes from the beginning, and if Kate crashes, replay all the editing primitives on the original files.

Transactions. A transaction consists of several of the four editing operations. For instance, if you select text and then move it to another location in the document with drag & drop you want this to be grouped together to a single text operation on undo/redo. (The text operations in this case are unwrap lines and remove text, then insert text and wrap line). To be able to specify which parts of text operations belong together, the text buffer provides two functions: startEditing() starts a transaction and finishEditing() closes it. Those functions use reference counting, so you can call startEditing() multiple times and only the last finishEditing() completes a transaction. Again, signals are emitted e.g. in finishEditing() such that other layers (undo/redo system) are notified about this.

Revisions. As an easy way to check whether a document changed you can get the current revision of the text buffer. The revision is simply an int64 starting with 0 after loading a document. The revision is incremented in every of the 4 editing primitives. This way you don’t have to listen to multiple signals like textInserted() and textRemoved(). This could be useful for e.g. KDevelop’s background parser: Parse a file. When done, check whether the revision is the same. If yes, parsing was successful. If not, parse again. This is how QtCreator does it. Easy and straight forward.

Unit Tests. The text buffer, text block, text line, text cursors are all designed such that unit tests can be written. Hence, each of the above implementation details will be covered by unit tests.

Further ideas. As mentioned previously, the design of the text buffer leaves room for further features. For now, let’s face two ideas:

  • Vim like swap files: If Kate or the application using Kate Part crashes, the unsaved data in the text document is lost. Vim has a backup mechanism called ‘swap files’ to recover from crashes to avoid data loss since years, and it works as follows: Each of the 4 editing primitives explained above are streamed into a separate file. When you save a file successfully, the swap file is cleared. If a crash happens, vim loads the original text document and then replays all editing operations in the swap file. This way the current unsaved editing state is recovered; no data is lost. Since the text buffer emits all necessary signals, implementing this feature is kind of trivial. Any takers?
  • Right now, the highlighting works in an extra class, accessing the text buffer’s text lines and setting the text attributes such as the color. As idea, we could also derive a class Kate::HighlightingBuffer from Kate::TextBuffer and do the additional modifications in the derived class instead of an extra class. We’ll have to see whether that makes sense.

Please Contribute.

Design Ideas of the Kate::TextBuffer

Storage

  • Basic idea: text stored as lines
  • Lines hold text and additional data (like for highlighting)
  • Advanced Concept: Stores lines in blocks of xxx lines, to have better performance

Cursor Support

  • They will move on editing
  • Cursors can be combined to ranges
  • KateTextCursor will be stored in the buffer blocks

Transactions

  • Each edit action must be encapsuled in a transaction
  • startEdit/endEdit will do this
  • Signals for starting and ending this transactions

Revisions

  • After loading, buffer has revision 0
  • Each edit action which is no nop will lead to increment in revision number
  • Successful saving will reset revision back to 0

Editing Operations (only 4 different editing primitives)

  • Insert Text (inside one line)
  • Remove Text (inside one line)
  • Wrap Line (wrap a line at given position, create a new line with content behind wrap position)
  • Unwrap Line (unites the line with its predecessor)
  • Signals for all of these actions, containing the change (to allow to layer undo/swap file support/…) on top of the buffer

Load/Save & Encodings

  • Loading will try to use given codec, if that doesn’t work, try to detect the codec by the BOM at start of file or use fallback codec, if that again doesn’t work, given codec will be used again, and errors reported
  • Saving will just use the given codec

Unit Tests

  • Each of the above implementation details will be covered by unit tests
  • KateTextBuffer (+KateTextLine and Cursor) must therefor be usable without other components

Kate Partly Moving to Gitorious

We are about to move the applications Kate and KWrite as well as the libraries KTextEditor and Kate Part to gitorious. Christoph is working on the migration right now in order to keep the development history. Things look good so far, so the migration is soon finished.
We have discussed a bit about the migration to gitorious on the Kate Developer Meeting and Christoph came up with this mainly because building only KTextEditor, Kate Part, KWrite and Kate is much faster and easier compared to building the KDE modules kdesupport, kdelibs, kdepimlibs, kdebase, kdesdk.
I myself remember the time where I started KDE development, and it took more than two weeks to have a first successful build of KDE. You have to learn so many things at once, like revision control, lots of so far unknown software, and what not. Talking to other developers verifies this. In other words: Getting into KDE development is not easy and straight forward.
Moving to gitorious removes this barrier for Kate development: You just checkout the Kate repository and that’s all you need. It would be nice if you join Kate development and contribute patches :)
What does that mean for Kate in KDE? Nothing changes. We will merge the changes in Gitorious back to the main KDE development line and vice versa.

Developer Meeting: More on scripting Kate

We are 10 people here at the developer meeting in Berlin. Kate, KDevelop as well as Okteta see a lot of commits. I’ll mainly talk about what’s happening in the holy Kate land, though :-)
Yesterday I closed a bug requesting an “unwrap” feature in Kate that works like “Tools > Join Lines” but maintains paragraph separation, i.e., empty lines are not removed. This feature is implemented now in javascript. Further infos:

To run the script simply switch to the command line (F7) and write “unwrap”. If you have further ideas about useful scripts, don’t hesitate to start hacking right away, see also

Fixes with regard to the scripting support in the last days are

Those fixes will be in KDE 4.4.1. More to come in other blog entries :-)