Encoding Detection Revised

In recent KDE releases up to version 4.4 Kate unfortunately very often selected the wrong encoding. The result is that e.g. german umlauts (öäü) show up as cryptic signs in the text editor. What I’ve seen lots of times is that in this case people start to fix those characters manually for the entire document. In other words: They totally do not get at all that the text document simply was opened with the wrong encoding. In fact, the users usually do not even know what encoding is at all. While this is of course kind of sad, this certainly won’t change…

Given this fact, the only correct “fix” is a very good automatic encoding detection, such that the encoding is usually chosen correctly. In the rewrite of Kate’s text buffer for KDE 4.5, Christoph also rewrote the file loader including the encoding detection. The detection now works as follows:

  1. try selected encoding by the user (through the open-file-dialog or the console)
  2. try encoding detection (some intelligent trial & error method)
  3. use fallback encoding

In step 1, Kate tries to use the encoding specified in the open-file-dialog or the one given when launching Kate from the console. On success, we are done.

The encoding detection in step 2 first tries unicode encoding by looking for a Byte Order Mark (BOM). If found, it is certain that the text document is unicode encoded.  If there is no BOM, Kate next uses a tool from KDElibs (KEncodingProber) to detect the correct encoding. This is basically trial & error: Try encoding A, if there are characters in the document the encoding is not able to represent, try encoding B. Then C and so on… Unfortunately, this also doesn’t always work, because a byte sequence might be valid in several encodings and represent different characters. This is why it’s more or less impossible to get the encoding always right. There is simply no way…

If the encoding detection fails, Kate uses a fallback encoding. You can configure this fallback encoding in the editor component settings in the “Open/Save” category. If the fallback encoding fails as well, the document is marked as read-only and a warning is shown.

What about Kile and KDevelop?

One of the applications that heavily suffered of the wrong encoding detection in the past was the LaTeX editor Kile. The same holds probably for KDevelop (although it’s usually less critical with source code). The good news is, that with KDE >= 4.5 the problems with respect to wrong encoding should be gone. So it’s certainly worth to update if you are affected by this issue.

Kate – GSoC Summary

Hello planet,

As Google Summer of Code is now finished and I have successfully passed the final evaluation, I would like to give a brief description of my project.

Kate is now able to recover (most of) what was written after last save in case of a crash or power failure. A swap file is created after the first editing action on a document that was successfully saved. If the user closes the document normally or saves its content, the swap file is deleted, otherwise, if Kate crashes, it remains on the disk. On load, Kate searches for the swap file, and if it exists, a warning bar pops from the top and provides the user with three possibilities: recover the lost data, discard the swap file or view differences between the original data and the recovered one. If the user chooses to restore the lost data, the editing actions from the swap file are replayed over the current content of the document. If somehow the swap file is not valid, for example a finishEditing statement is missing, the recovery is done, but the user is warned that it might be incomplete.

Only the core feature for swap file is implemented at the moment. I know I could have done more, but things went slow at the beginning, as I was new to Qt and KDE development and also had a demanding exam period. But this has a positive aspect, too, as will motivate me to continue my work at this project.

This has been a great summer for me as I was accepted into GSoC program and got a chance to do what I like and get paid for it. I want to thank Christoph, my mentor, for having patience with me and helping me with all the problems I have encountered. I also want to thank the whole Kate team, KDE community and Google :).

Kate Night-Make

Each night now Kate (part/app) is build from git and all unit tests are run. Yeah, we even got some tests .P
Until now, that just reminds on of our failures ;) But I hope the daily mail will perhaps motivate me and others more to fix them ;)

Beside, thanks for all the people working to write these tests, like Bernhard, Dominik and Milian.

Kate Nightmake Tests
From: cullmann@kate-editor.org
To: kwrite-devel@kde.org
Date: Today 05:06:46

Test project /home/www/kate-editor.org/build/build

Start  1: katetextbuffertest
1/34 Test  #1: katetextbuffertest ...............   Passed    0.02 sec
Start  2: range_test
2/34 Test  #2: range_test .......................   Passed    1.09 sec
Start  3: testkateregression
3/34 Test  #3: testkateregression ...............***Failed    0.03 sec
Start  4: undomanager_test
4/34 Test  #4: undomanager_test .................   Passed    1.34 sec
Start  5: plaintextsearch_test
5/34 Test  #5: plaintextsearch_test .............   Passed    7.56 sec
Start  6: regexpsearch_test
6/34 Test  #6: regexpsearch_test ................   Passed    8.45 sec
Start  7: scriptdocument_test
7/34 Test  #7: scriptdocument_test ..............   Passed    1.36 sec
Start  8: completion_test
8/34 Test  #8: completion_test ..................***Failed    0.95 sec
Start  9: searchbar_test
9/34 Test  #9: searchbar_test ...................***Failed   11.24 sec
Start 10: movingcursor_test
10/34 Test #10: movingcursor_test ................   Passed    0.94 sec
Start 11: movingrange_test
11/34 Test #11: movingrange_test .................   Passed    1.79 sec
Start 12: katedocument_test
12/34 Test #12: katedocument_test ................   Passed    1.09 sec
Start 13: revision_test
13/34 Test #13: revision_test ....................   Passed    0.63 sec
Start 14: templatehandler_test
14/34 Test #14: templatehandler_test .............   Passed    0.44 sec
Start 15: indenttest
15/34 Test #15: indenttest .......................***Failed   25.87 sec
Start 16: bug213964_test
16/34 Test #16: bug213964_test ...................   Passed    0.73 sec
Start 17: utf8.txt_create
17/34 Test #17: utf8.txt_create ..................   Passed    0.17 sec
Start 18: utf8.txt_diff
18/34 Test #18: utf8.txt_diff ....................   Passed    0.01 sec
Start 19: latin15.txt_create
19/34 Test #19: latin15.txt_create ...............***Exception: SegFault  0.04 sec
Start 20: latin15.txt_diff
20/34 Test #20: latin15.txt_diff .................***Failed    0.01 sec
Start 21: utf32.txt_create
21/34 Test #21: utf32.txt_create .................   Passed    0.13 sec
Start 22: utf32.txt_diff
22/34 Test #22: utf32.txt_diff ...................   Passed    0.01 sec
Start 23: utf16.txt_create
23/34 Test #23: utf16.txt_create .................   Passed    0.10 sec
Start 24: utf16.txt_diff
24/34 Test #24: utf16.txt_diff ...................   Passed    0.01 sec
Start 25: utf32be.txt_create
25/34 Test #25: utf32be.txt_create ...............   Passed    0.10 sec
Start 26: utf32be.txt_diff
26/34 Test #26: utf32be.txt_diff .................   Passed    0.01 sec
Start 27: utf16be.txt_create
27/34 Test #27: utf16be.txt_create ...............   Passed    0.10 sec
Start 28: utf16be.txt_diff
28/34 Test #28: utf16be.txt_diff .................   Passed    0.01 sec
Start 29: cyrillic_utf8.txt_create
29/34 Test #29: cyrillic_utf8.txt_create .........   Passed    0.10 sec
Start 30: cyrillic_utf8.txt_diff
30/34 Test #30: cyrillic_utf8.txt_diff ...........   Passed    0.01 sec
Start 31: cp1251.txt_create
31/34 Test #31: cp1251.txt_create ................   Passed    0.10 sec
Start 32: cp1251.txt_diff
32/34 Test #32: cp1251.txt_diff ..................   Passed    0.01 sec
Start 33: koi8-r.txt_create
33/34 Test #33: koi8-r.txt_create ................   Passed    0.11 sec
Start 34: koi8-r.txt_diff
34/34 Test #34: koi8-r.txt_diff ..................   Passed    0.01 sec

82% tests passed, 6 tests failed out of 34

Total Test time (real) =  64.64 sec

The following tests FAILED:
3 - testkateregression (Failed)
8 - completion_test (Failed)
9 - searchbar_test (Failed)
15 - indenttest (Failed)
19 - latin15.txt_create (SEGFAULT)
20 - latin15.txt_diff (Failed)
Errors while running CTest

Kate History ;)

While setting up the new server for the Kate homepage, I actually found again old stuff ;)
Amazing that mails nearly ten years old can still be somewhere on the filesystem.
Perhaps a little hint, to post the beginnings of what today is Kate/KatePart/KWrite and KTextEditor.

Ten years ago, I asked the original author of KWrite, if he is interested in a MDI version of it (sorry, german, original mail):

From: Cullmann Christoph <crossfire@babylon2k.de>
To: digisnap@cs.tu-berlin.de
Subject: KWrite - Verbesserungsvorschläge
Date: Thu, 14 Dec 2000 18:38:42 +0100

Ich benutze KWrite regelmässig um Quellcode zu bearbeiten und das
Syntaxhighlighting ist sehr praktisch.
Es wäre jedoch schön wenn KWrite eine MDI-Oberfläche hätte.
Ich baue gerade eine und falls jemand Interesse hat können sie sich ja melden.

Danke und Tschö
Christoph Cullmann

I actually never got any reaction from the author Jochen Wilhelmy. Guess the mail address was already abandoned at that time.
Later I tried my luck with kde-devel:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: kde-devel@max.tat.physik.uni-tuebingen.de
Subject: Need help - KWrite
Date: Thu, 4 Jan 2001 00:21:35 +0100

i am building a mdi texteditor using the kwrite-widget.
I want to use most of the extended features of the kwrite class, like search
dialog, kspell, ....

Is there any way to do this using the KParts system or must i use the kwrite
include files and compile the kwrite widget into my program ?

cu and thanks for any answer
C. Cullmann

Not much reactions, thought, but I kept to be persistent ;)

From: Cullmann Christoph <crossfire@babylon2k.de>
To: kde-devel@max.tat.physik.uni-tuebingen.de
Subject: Re: Looking for kwrite developers.
Date: Thu, 11 Jan 2001 17:30:19 +0100

Hi all,
I have build up a editor using the KWrite Widget and a QTabWidget to provide
a multidocument interface :-)
It has some bugs at the moment (I think QTabWidget is the problem) but works
real nice.

Anybody interested in this ?

C. cullmann

Shortly after this mail, one of the developers which stayed around for years joined, Anders Lund (more at the team page).
I named the starting project “KCEdit” and put it up on sourceforge.net:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: kde-devel@max.tat.physik.uni-tuebingen.de
Subject: MDI TextEditor - KCEdit
Date: Sat, 13 Jan 2001 12:04:33 +0100

Hi all,
I have build up a small mdi texteditor using the kwrite widget :-)
If someone is interested in helping to improve it or only wants to
test it a bit, i have set a sourceforge.net project up.

url : http://sourceforge.net/projects/kcedit

It would be nice if someone wants to take part in the development.


After that, the next nice guy joined: Michael Bartl.
We searched for a new name for the editor, as KCEdit was not that nice, as very similar to KEdit and no longer only “Cullmann”‘s pet project.
What did we choose? Here you see:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: Michael Bartl <michael.bartl1@chello.at>
Subject: Kant is born ;-)
Date: Fri, 19 Jan 2001 20:55:59 +0100

Here it is ;-)

As sourceforge.net failed to be a nice hosting, we moved the project to http://www.openave.net (and later back again, lol).
Because of family problems Michael dropped out of the time after sometime, still BIG THANK YOU.

Later, I tried to get my changes back in KDE, as I didn’t want to do a permanent fork:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: kde-devel@max.tat.physik.uni-tuebingen.de
Subject: How can I participate in the KWrite project ?
Date: Tue, 20 Feb 2001 20:25:15 +0100
Cc: kde-core-devel@max.tat.physik.uni-tuebingen.de

I want to help as a developer in the kwrite project. I and some other people
are working on Kant (http://www.sourceforge.net/projects/kant), a MDI
texteditor for kde >=2.0 and we often find bugs in the kwrite code or missing
features we would need. It would be great if I could help to develop kwrite
because only sending bug reports and hoping that new features in kwrite will
come up sometime is really annoying.
How can I join the KWrite team and get CVS read/write access (perhaps ;-) ?
To have an overview about my skills please look at the Kant sourcecode or
simply download and test Kant out of the CVS at sourceforge.net.

cu and thx for you interest
Christoph Cullmann

Sorry for the bad English ;-)

Without much problems, I got a CVS account on the KDE server and was allowed to add my code and the code of the others to the KWrite codebase.
All other contributors which were still active got accounts later, too. Worked all like a charm thanks to Waldo Bastian.
Still Kant itself was not in KDE, therefor next try:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: kde-devel@max.tat.physik.uni-tuebingen.de
Subject: Could Kant replace or extend KWrite in KDE ?
Date: Sat, 24 Feb 2001 17:28:58 +0100

I am the projectmanager of Kant, a MDI texteditor which uses the KWrite
widget for displaying text (no MDI like you know it from windows, MDI like
you know it from Emacs or Konqueror :).

Kant has come to a level of stability which would it allow to put it into the
kde cvs i hope. I have talked with Carsten Pfeiffer (he likes Kant :) and he
told me to send a message to this list to start a discussion if and where
Kant could be integrated.

I just released a new Kant version (kant-0.2.0-prerelease) on sourceforge.net
for testing the app that you have an overview about its features.

Kant Homepage:

newest Kant version to download:

nice screenshot of Kant:

If you want to look at the unstable development code just look into the Kant
CVS at sourceforge.net, you find the exact description to checkout at the
Kant Homepage under "CVS" (cvs-modulename: kant).

Kant links dynamic to kwritepart and konsolepart. This must be considered if
you want to put Kant into kde cvs.

I hope you all like Kant, I think it would be a nice replacement for KWrite.

cu and thx

Sorry for my poor English and the big tar.gz file (something isn't right with
make dist in kant, must fix it :)

After some changes to the code, we were allowed to move the development completly to KDE CVS.
Joseph Wenninger joined the development, too.

Btw., my nice old e-mail footer:

| |  / /   - get an edge in editing -
| | / /    »»»» GET KANT ««««
| |/ /     a fast and capable multiple document,
|    \     multiple view text editor for KDE
| |\  \
| | \  \   http://devel-home.kde.org/~kant

Whereas Kant was just a fine name for us, it had some pronunciation in english which was not that political correct ;)
Therefor we searched a new name:

From: Cullmann Christoph <crossfire@babylon2k.de>
To: David Faure <david@mandrakesoft.com>
Subject: Hi, is Kate a good name ?
Date: Sat, 31 Mar 2001 15:15:47 +0200

Hi David,
would be Kate a political correct name for Kant ?

Kate - KDE Advanced Text Editor


We even asked the developers of Katy, an other text editor, if they would have problems with that name change ;)
Kate was born ;)

Scroll to top