Syntax Highlighting Checker

The KTextEditor Framework uses the syntax highlighting files provided by the KSyntaxHighlighting Framework since theĀ  KDE Frameworks release 5.28.

The KSyntaxHighlighting Framework implements Kate’s highlighting system and meanwhile is used in quite some applications (e.g. LabPlot, KDE PIM). What is quite nice is that the KSyntaxHighlighting framework is nicely unit tested. And while we do not have tests for all highlighting files, we still provide some quality assurance through a compile time checker.

How does it work? Well – in former times, Kate loaded all highlighting .xml files from disk (through the KTextEditor framework). This lead to a slow startup over time, since there are >250 .xml files that needed a stat system call at startup.

With the KSyntaxHighlighting Framework, all these xml files are compiled into a Qt resource (qrc file), that then is included into the KSyntaxHighlighting library.

In order to create the Qt resource file, we need to iterate over all available xml files anyways. So what happens is that we take this opportunity and also scan the highlighting files for common mistakes.

As of today, we are checking the following:

  1. RegExpr: A warning is raised, if a regular expression has syntax errors.
  2. DetectChars: A warning is raised, if the char=”x” attribute contains more or less than one character, e.g. when char=”xyz”, or char=”\\” (no escaping required), or similar.
  3. Detect2Chars: Same as DetectChars, just for char=”x” and char1=”y”.
  4. Keyword lists: A warning is raised, if a keyword entry contains leading or trailing spaces. Additional trimming just takes time.
  5. Keyword lists: A warning is raised if a keyword list is unused.
  6. Keyword lists: A warning is raised if multiple keyword lists use the same me (=identifier).
  7. Keyword lists: A warning is raised if a non-existing keyword list is used.
  8. Contexts: A warning is raised, if a non-existing context is referenced.
  9. Contexts: A warning is raised, if a context is unused.
  10. Contexts: A warning is raised, if multiple contexts have the same name (identifier clash).
  11. Attributes: A warning is raised, if non-existing itemData is used.
  12. Attributes: A warning is raised, if multiple itemDatas use the same name (identifier clash).
  13. Attributes: A warning is raised, if an itemData is unused.

This list helps us nicely to catch many mistakes at compile time even before running unit tests.

Update (2017-12-17): All above issues are fixed for all highlighting files starting with the KSyntaxHighlighting 5.42 framework, to be released in January 2018.

9 thoughts on “Syntax Highlighting Checker

  1. So how do you handle updates to those files?
    (Or maybe you don’t and that would explain why the “install” operation of updated syntax files never seems to complete for me ;) )

    1. Just like before, KSyntaxHighlighting still looks on disk for xml files. And if the version is higher of files on disk, then we prefer the files on disk. So there is absolutely no change compared to the behavior before, except that it is much faster.

      Related, since the frameworks are released every months, the updates also get faster to users (provided the distros ship). So the need for local xml files via Download dialog or similar are not really there anymore.

  2. Several years ago, I wrote a syntax highlighting file for asciidoc. I used this file for doc:

    All was good under KDE4.
    But under Plasma I see a Kate error in journald,
    “Rule: Unknown format” and it shows the attribute name & context.

    1. Where is the current doc for writing syntax highlighting files?
    2. Are there any tools for debugging?

    Also missing — there use to be a common Alerts file –

    1. To come back to my own remarks, there’s something else I’ve come to realise. With everything being built in there’s no longer an easy way for users to get rid of format they’d never use. Not so much to make place, but to avoid clutter in the selection menu when you have to pick a format manually, as well as mis-categorisation (Matlab, Octave, and a Magma something I never heard of all use a .m file extension which clashes with the .m extension of ObjC files, which are the only ones relevant to me).

      Ideally users should be possible to control what formats are available per application, and I think that a good UI for that would probably provide a central utility to select the formats relevant to the user, in addition to the current UI which should allow enabling/disabling individual formats (using checkbox next to the entries in a hierarchical list?).

      Probably a topic that requires a bit of careful thought and discussion, because it applies across all platforms, and sadly there is still no generic settings utility that is NOT part of Plasma.

  3. IIRC, Kate has never correctly identified my .m files as Octave code. For a while they were Objective C, now they’re Magma. We already have a menu where we can view and modify syntax highlighters in the settings, but we cannot choose which ones to enable/disable.

    The whole idea of identifying the file type based on the file extensions alone was broken from the start, but works well enough for most people to not notice the issue anyway. It’s simply not a reliable system when everyone keeps using single letter acronyms for their languages.

    We really should have some basic analysis of the file, or at the very least the previously mentioned menu override.

    1. Sorry, but what you write is simply not true: you go to the Open/Save config page, go to the Modes&Filetypes tab, choose Octave, and raise the priority, say, to 10. And your problem will be fixed.

      1. there could be a tooltip on that priority widget because it’s never been clear to me what it’s supposed to do. And I fully agree: there could and should be an way to disable types that aren’t relevant. That would also unclutter the pop-up selection (which is in dire need of that).

          1. Well,the default typically also works out of the box for me in apps where I have a say in the default, or when I use the entire feature set (= edit only files that represent a small subset of the supported list of filetypes).

            In this case I have ideas that but not really the experience (nor the time required to acquire that experience) to provide proper patches, the tooltip aside (but adding that should be a trivial effort for one of the regular maintainers of the framework in question).

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top