Syntax Highlighting Checker

The KTextEditor Framework uses the syntax highlighting files provided by the KSyntaxHighlighting Framework since the  KDE Frameworks release 5.28.

The KSyntaxHighlighting Framework implements Kate’s highlighting system and meanwhile is used in quite some applications (e.g. LabPlot, KDE PIM). What is quite nice is that the KSyntaxHighlighting framework is nicely unit tested. And while we do not have tests for all highlighting files, we still provide some quality assurance through a compile time checker.

How does it work? Well – in former times, Kate loaded all highlighting .xml files from disk (through the KTextEditor framework). This lead to a slow startup over time, since there are >250 .xml files that needed a stat system call at startup.

With the KSyntaxHighlighting Framework, all these xml files are compiled into a Qt resource (qrc file), that then is included into the KSyntaxHighlighting library.

In order to create the Qt resource file, we need to iterate over all available xml files anyways. So what happens is that we take this opportunity and also scan the highlighting files for common mistakes.

As of today, we are checking the following:

  1. RegExpr: A warning is raised, if a regular expression has syntax errors.
  2. DetectChars: A warning is raised, if the char=”x” attribute contains more or less than one character, e.g. when char=”xyz”, or char=”\\” (no escaping required), or similar.
  3. Detect2Chars: Same as DetectChars, just for char=”x” and char1=”y”.
  4. Keyword lists: A warning is raised, if a keyword entry contains leading or trailing spaces. Additional trimming just takes time.
  5. Keyword lists: A warning is raised if a keyword list is unused.
  6. Keyword lists: A warning is raised if multiple keyword lists use the same me (=identifier).
  7. Keyword lists: A warning is raised if a non-existing keyword list is used.
  8. Contexts: A warning is raised, if a non-existing context is referenced.
  9. Contexts: A warning is raised, if a context is unused.
  10. Contexts: A warning is raised, if multiple contexts have the same name (identifier clash).
  11. Attributes: A warning is raised, if non-existing itemData is used.
  12. Attributes: A warning is raised, if multiple itemDatas use the same name (identifier clash).
  13. Attributes: A warning is raised, if an itemData is unused.

This list helps us nicely to catch many mistakes at compile time even before running unit tests.

Update (2017-12-17): All above issues are fixed for all highlighting files starting with the KSyntaxHighlighting 5.42 framework, to be released in January 2018.

4 thoughts on “Syntax Highlighting Checker”

  1. So how do you handle updates to those files?
    (Or maybe you don’t and that would explain why the “install” operation of updated syntax files never seems to complete for me 😉 )

    1. Just like before, KSyntaxHighlighting still looks on disk for xml files. And if the version is higher of files on disk, then we prefer the files on disk. So there is absolutely no change compared to the behavior before, except that it is much faster.

      Related, since the frameworks are released every months, the updates also get faster to users (provided the distros ship). So the need for local xml files via Download dialog or similar are not really there anymore.

  2. Several years ago, I wrote a syntax highlighting file for asciidoc. I used this file for doc:

    https://kate-editor.org/2005/03/24/writing-a-syntax-highlighting-file/

    All was good under KDE4.
    But under Plasma I see a Kate error in journald,
    “Rule: Unknown format” and it shows the attribute name & context.

    1. Where is the current doc for writing syntax highlighting files?
    2. Are there any tools for debugging?

    Also missing — there use to be a common Alerts file –
    /usr/share/kde4/apps/katepart/syntax/alert.xml

    1. To come back to my own remarks, there’s something else I’ve come to realise. With everything being built in there’s no longer an easy way for users to get rid of format they’d never use. Not so much to make place, but to avoid clutter in the selection menu when you have to pick a format manually, as well as mis-categorisation (Matlab, Octave, and a Magma something I never heard of all use a .m file extension which clashes with the .m extension of ObjC files, which are the only ones relevant to me).

      Ideally users should be possible to control what formats are available per application, and I think that a good UI for that would probably provide a central utility to select the formats relevant to the user, in addition to the current UI which should allow enabling/disabling individual formats (using checkbox next to the entries in a hierarchical list?).

      Probably a topic that requires a bit of careful thought and discussion, because it applies across all platforms, and sadly there is still no generic settings utility that is NOT part of Plasma.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.