Encoding Detection Revised

In recent KDE releases up to version 4.4 Kate unfortunately very often selected the wrong encoding. The result is that e.g. german umlauts (öäü) show up as cryptic signs in the text editor. What I’ve seen lots of times is that in this case people start to fix those characters manually for the entire document. In other words: They totally do not get at all that the text document simply was opened with the wrong encoding. In fact, the users usually do not even know what encoding is at all. While this is of course kind of sad, this certainly won’t change…

Given this fact, the only correct “fix” is a very good automatic encoding detection, such that the encoding is usually chosen correctly. In the rewrite of Kate’s text buffer for KDE 4.5, Christoph also rewrote the file loader including the encoding detection. The detection now works as follows:

  1. try selected encoding by the user (through the open-file-dialog or the console)
  2. try encoding detection (some intelligent trial & error method)
  3. use fallback encoding

In step 1, Kate tries to use the encoding specified in the open-file-dialog or the one given when launching Kate from the console. On success, we are done.

The encoding detection in step 2 first tries unicode encoding by looking for a Byte Order Mark (BOM). If found, it is certain that the text document is unicode encoded.  If there is no BOM, Kate next uses a tool from KDElibs (KEncodingProber) to detect the correct encoding. This is basically trial & error: Try encoding A, if there are characters in the document the encoding is not able to represent, try encoding B. Then C and so on… Unfortunately, this also doesn’t always work, because a byte sequence might be valid in several encodings and represent different characters. This is why it’s more or less impossible to get the encoding always right. There is simply no way…

If the encoding detection fails, Kate uses a fallback encoding. You can configure this fallback encoding in the editor component settings in the “Open/Save” category. If the fallback encoding fails as well, the document is marked as read-only and a warning is shown.

What about Kile and KDevelop?

One of the applications that heavily suffered of the wrong encoding detection in the past was the LaTeX editor Kile. The same holds probably for KDevelop (although it’s usually less critical with source code). The good news is, that with KDE >= 4.5 the problems with respect to wrong encoding should be gone. So it’s certainly worth to update if you are affected by this issue.

Kate – GSoC Summary

Hello planet,

As Google Summer of Code is now finished and I have successfully passed the final evaluation, I would like to give a brief description of my project.

Kate is now able to recover (most of) what was written after last save in case of a crash or power failure. A swap file is created after the first editing action on a document that was successfully saved. If the user closes the document normally or saves its content, the swap file is deleted, otherwise, if Kate crashes, it remains on the disk. On load, Kate searches for the swap file, and if it exists, a warning bar pops from the top and provides the user with three possibilities: recover the lost data, discard the swap file or view differences between the original data and the recovered one. If the user chooses to restore the lost data, the editing actions from the swap file are replayed over the current content of the document. If somehow the swap file is not valid, for example a finishEditing statement is missing, the recovery is done, but the user is warned that it might be incomplete.

Only the core feature for swap file is implemented at the moment. I know I could have done more, but things went slow at the beginning, as I was new to Qt and KDE development and also had a demanding exam period. But this has a positive aspect, too, as will motivate me to continue my work at this project.

This has been a great summer for me as I was accepted into GSoC program and got a chance to do what I like and get paid for it. I want to thank Christoph, my mentor, for having patience with me and helping me with all the problems I have encountered. I also want to thank the whole Kate team, KDE community and Google :).