Encoding Detection Revised

In recent KDE releases up to version 4.4 Kate unfortunately very often selected the wrong encoding. The result is that e.g. german umlauts (öäü) show up as cryptic signs in the text editor. What I’ve seen lots of times is that in this case people start to fix those characters manually for the entire document. In other words: They totally do not get at all that the text document simply was opened with the wrong encoding. In fact, the users usually do not even know what encoding is at all. While this is of course kind of sad, this certainly won’t change…

Given this fact, the only correct “fix” is a very good automatic encoding detection, such that the encoding is usually chosen correctly. In the rewrite of Kate’s text buffer for KDE 4.5, Christoph also rewrote the file loader including the encoding detection. The detection now works as follows:

  1. try selected encoding by the user (through the open-file-dialog or the console)
  2. try encoding detection (some intelligent trial & error method)
  3. use fallback encoding

In step 1, Kate tries to use the encoding specified in the open-file-dialog or the one given when launching Kate from the console. On success, we are done.

The encoding detection in step 2 first tries unicode encoding by looking for a Byte Order Mark (BOM). If found, it is certain that the text document is unicode encoded.  If there is no BOM, Kate next uses a tool from KDElibs (KEncodingProber) to detect the correct encoding. This is basically trial & error: Try encoding A, if there are characters in the document the encoding is not able to represent, try encoding B. Then C and so on… Unfortunately, this also doesn’t always work, because a byte sequence might be valid in several encodings and represent different characters. This is why it’s more or less impossible to get the encoding always right. There is simply no way…

If the encoding detection fails, Kate uses a fallback encoding. You can configure this fallback encoding in the editor component settings in the “Open/Save” category. If the fallback encoding fails as well, the document is marked as read-only and a warning is shown.

What about Kile and KDevelop?

One of the applications that heavily suffered of the wrong encoding detection in the past was the LaTeX editor Kile. The same holds probably for KDevelop (although it’s usually less critical with source code). The good news is, that with KDE >= 4.5 the problems with respect to wrong encoding should be gone. So it’s certainly worth to update if you are affected by this issue.

Kate – GSoC Summary

Hello planet,

As Google Summer of Code is now finished and I have successfully passed the final evaluation, I would like to give a brief description of my project.

Kate is now able to recover (most of) what was written after last save in case of a crash or power failure. A swap file is created after the first editing action on a document that was successfully saved. If the user closes the document normally or saves its content, the swap file is deleted, otherwise, if Kate crashes, it remains on the disk. On load, Kate searches for the swap file, and if it exists, a warning bar pops from the top and provides the user with three possibilities: recover the lost data, discard the swap file or view differences between the original data and the recovered one. If the user chooses to restore the lost data, the editing actions from the swap file are replayed over the current content of the document. If somehow the swap file is not valid, for example a finishEditing statement is missing, the recovery is done, but the user is warned that it might be incomplete.

Only the core feature for swap file is implemented at the moment. I know I could have done more, but things went slow at the beginning, as I was new to Qt and KDE development and also had a demanding exam period. But this has a positive aspect, too, as will motivate me to continue my work at this project.

This has been a great summer for me as I was accepted into GSoC program and got a chance to do what I like and get paid for it. I want to thank Christoph, my mentor, for having patience with me and helping me with all the problems I have encountered. I also want to thank the whole Kate team, KDE community and Google :).

Kate Night-Make

Each night now Kate (part/app) is build from git and all unit tests are run. Yeah, we even got some tests .P
Until now, that just reminds on of our failures ;) But I hope the daily mail will perhaps motivate me and others more to fix them ;)

Beside, thanks for all the people working to write these tests, like Bernhard, Dominik and Milian.

Kate Nightmake Tests
From: cullmann@kate-editor.org
To: kwrite-devel@kde.org
Date: Today 05:06:46

[HANDLER_OUTPUT]
Test project /home/www/kate-editor.org/build/build

Start  1: katetextbuffertest
1/34 Test  #1: katetextbuffertest ...............   Passed    0.02 sec
Start  2: range_test
2/34 Test  #2: range_test .......................   Passed    1.09 sec
Start  3: testkateregression
3/34 Test  #3: testkateregression ...............***Failed    0.03 sec
Start  4: undomanager_test
4/34 Test  #4: undomanager_test .................   Passed    1.34 sec
Start  5: plaintextsearch_test
5/34 Test  #5: plaintextsearch_test .............   Passed    7.56 sec
Start  6: regexpsearch_test
6/34 Test  #6: regexpsearch_test ................   Passed    8.45 sec
Start  7: scriptdocument_test
7/34 Test  #7: scriptdocument_test ..............   Passed    1.36 sec
Start  8: completion_test
8/34 Test  #8: completion_test ..................***Failed    0.95 sec
Start  9: searchbar_test
9/34 Test  #9: searchbar_test ...................***Failed   11.24 sec
Start 10: movingcursor_test
10/34 Test #10: movingcursor_test ................   Passed    0.94 sec
Start 11: movingrange_test
11/34 Test #11: movingrange_test .................   Passed    1.79 sec
Start 12: katedocument_test
12/34 Test #12: katedocument_test ................   Passed    1.09 sec
Start 13: revision_test
13/34 Test #13: revision_test ....................   Passed    0.63 sec
Start 14: templatehandler_test
14/34 Test #14: templatehandler_test .............   Passed    0.44 sec
Start 15: indenttest
15/34 Test #15: indenttest .......................***Failed   25.87 sec
Start 16: bug213964_test
16/34 Test #16: bug213964_test ...................   Passed    0.73 sec
Start 17: utf8.txt_create
17/34 Test #17: utf8.txt_create ..................   Passed    0.17 sec
Start 18: utf8.txt_diff
18/34 Test #18: utf8.txt_diff ....................   Passed    0.01 sec
Start 19: latin15.txt_create
19/34 Test #19: latin15.txt_create ...............***Exception: SegFault  0.04 sec
Start 20: latin15.txt_diff
20/34 Test #20: latin15.txt_diff .................***Failed    0.01 sec
Start 21: utf32.txt_create
21/34 Test #21: utf32.txt_create .................   Passed    0.13 sec
Start 22: utf32.txt_diff
22/34 Test #22: utf32.txt_diff ...................   Passed    0.01 sec
Start 23: utf16.txt_create
23/34 Test #23: utf16.txt_create .................   Passed    0.10 sec
Start 24: utf16.txt_diff
24/34 Test #24: utf16.txt_diff ...................   Passed    0.01 sec
Start 25: utf32be.txt_create
25/34 Test #25: utf32be.txt_create ...............   Passed    0.10 sec
Start 26: utf32be.txt_diff
26/34 Test #26: utf32be.txt_diff .................   Passed    0.01 sec
Start 27: utf16be.txt_create
27/34 Test #27: utf16be.txt_create ...............   Passed    0.10 sec
Start 28: utf16be.txt_diff
28/34 Test #28: utf16be.txt_diff .................   Passed    0.01 sec
Start 29: cyrillic_utf8.txt_create
29/34 Test #29: cyrillic_utf8.txt_create .........   Passed    0.10 sec
Start 30: cyrillic_utf8.txt_diff
30/34 Test #30: cyrillic_utf8.txt_diff ...........   Passed    0.01 sec
Start 31: cp1251.txt_create
31/34 Test #31: cp1251.txt_create ................   Passed    0.10 sec
Start 32: cp1251.txt_diff
32/34 Test #32: cp1251.txt_diff ..................   Passed    0.01 sec
Start 33: koi8-r.txt_create
33/34 Test #33: koi8-r.txt_create ................   Passed    0.11 sec
Start 34: koi8-r.txt_diff
34/34 Test #34: koi8-r.txt_diff ..................   Passed    0.01 sec

82% tests passed, 6 tests failed out of 34

Total Test time (real) =  64.64 sec

The following tests FAILED:
3 - testkateregression (Failed)
8 - completion_test (Failed)
9 - searchbar_test (Failed)
15 - indenttest (Failed)
19 - latin15.txt_create (SEGFAULT)
20 - latin15.txt_diff (Failed)
[ERROR_MESSAGE]
Errors while running CTest