MIT licensed KSyntaxHighlighting usage

With the KDE Frameworks 5.50 release, the KSyntaxHighlighting framework was re-licensed to the MIT license.

This re-licensing only covers the actual code in the library and the bundled themes but not all of the syntax highlighting definition data files.

One of the main motivation points was to get QtCreator to use this, if possible, instead of their own implementation of the Kate highlighting they needed to create in the past due to the incompatible licensing of KatePart at that time (and the impossibility to do a quick split/re-licensing of the parts in question).

We talked about that possibility on Akademy this year and it seems, that if time permits, this will be picked up by the QtCreator team.

The current state allows the use of this tier 1 framework by projects like Qt(Creator), that require non-copyleft licenses for bundled 3rd-party source components, but in addition also for commercial applications that do static linking against a commercial Qt version.

Whereas at the moment, the QtCreator integration has not yet started (at least I am not aware of any work for that), a first commercial consumer already exists.

The company AbsInt I work at does develop both binary and source level analysis tools. Our GUI is Qt based, statically linked with a commercial license.

Before the current release, our GUI used a handcrafted highlighter for our own annotation languages and the programming languages we support (e.g. C and C++). After the release of the 5.50 MIT licensed KSyntaxHighlighting, this was changed to use the framework through its QSyntaxHighlighter implementation.

The framework was easy to integrate into our static build process.  To make it possible to be used without violating licensing for the bundled highlighting definitions that are not MIT and ensure no other installed instances of the framework will mess up the shipped highlighting definitions, the following two changes were contributed upstream.

A CMake switch to disable the bundling of the syntax definition data files into the library. This avoids mixing non-MIT files into the created static library, which then only contains MIT licensed code and data. One can then let people either download the definitions or ship some as extra data files with an extra licensing.

cmake -DQRC_SYNTAX=OFF

A CMake switch to disable the lookup for syntax and theme definitions in the normal locations via QStandardPaths. This allows the user of the library to only load definitions from search paths specified manually. No definitions that e.g. are installed by users for Kate or other applications using the framework will mess up your lookup, which is really important if you rely on exactly your files to be used.

cmake -DNO_STANDARD_PATHS=ON

These two options might be interesting for the QtCreator people, too. If they need additional configurability, I am sure we can find ways to integrate that.

After the transition, my colleagues compared the speed of the old implementation versus the new generic highlighting engine. At first, they were not that impressed, which did lead to several performance improvements to be implemented and up-streamed.

All directly visible bottle-necks got perf’d away. The most CPU consumption now more or less boils down to the costs of the used regular expressions via QRegularExpression. Same for the allocations, we reduced them by taking a look on the heaptrack profiles for the KSyntaxHighlighting benchmark suite.

But as always, performance work is never done, if you have time, you can take a look by profiling the “highlighter_benchmark” autotest, that applies the shipped highlightings to the test files we have in the repository.

There is no divergence in the local git clone at AbsInt at the moment, nor is there any plan to have that in the future. Both sides profit from up-streaming the changes. Other consumers of the framework get improvements and AbsInt doesn’t need to maintain a patched version.

Starting with the 18.10 release of our tools, all highlighting is handled by the framework, no more error-prone handcrafting of QSyntaxHighlighter implementations.

Thanks to all people that helped making this possible ;=)

I hope more projects/companies will pick up the use of this pure qt-dependent tier 1 framework in the future and up-stream their improvements. Be welcome.

7 thoughts on “MIT licensed KSyntaxHighlighting usage”

  1. The option not to embed the definition files is interesting and probably welcome even for distro packagers. (See my other remarks about how many there are and the probably SNR for most average users.)

    I wonder though about the removed definition download/update feature. I can see how it makes little sense when definitions are embedded AND you install each and every update of the framework they’re embedded in. But isn’t it absence going to be missed when you don’t embed the definitions, whatever the reason?

  2. Distros shouldn’t use that debundling, the bundled XML stuff makes all things a lot faster, as not 100ths of files are stat’e, for the normal use case that you ship all highlightings.

    The normal use-case for non-embedding is:

    You want to use just the files you hand-picked and/or you want to control the license stuff. Then there is no need to update/download. You normally want to ensure that exactly the right version is used, syntax updates can break indenters for example very badly.

    1. I’d say it’s up to distros the decide whether or not they’re going to use the feature. It’s not at all uncommon to put the non-binary/non-arch-dependent resources in a separate package for instance.

      You pinpoint the issue I see: HUNDREDS of files. Yes, loading those takes a bit of time – but let’s be realistic and consider the fact that in many scenarios where someone complains about load times the reaction will be that current-day SSDs are more than fast enough to make that a non-issue. Just like I’d get the argument “disk space is cheap” if I were to complain that all those files take space.
      And when we’re being realistic: just how many profiles does the average user need … and what remains of that speed gain you get from embedding when you ONLY have the files installed you really need?

      What you cannot compensate with faster hardware is UI clutter. With embedding users have no way to prune the definitions they have online and among which they have to chose if automatic selection doesn’t work (this goes for the pop-up widgets but even more so for the “flat” filetype list in the kate mode config page).

      I can see why you want to avoid embedding files when this leads to multiple, conflicting licenses being applied to the bundled product. But that consideration doesn’t apply to downloading content. Providers concerned about controlling what definition files are available could 1) deactivate the download feature (via a simple cmake option) or 2) provide their own profile server and let the download feature use that source (which should be relatively trivial to implement too).

      More options/hooks to provide a (power)user-friendly experience can only make your product more appealing to a wider audience – that’s what’s been driving the efforts reported here, no?

      1. You are wrong: It’s not up to distros how to bundle this, since we decide how the product is shipped. The feature described above is solely for standalone deployment e.g. in QtCreator. And the hundreds of hl files is not a problem, it’s a feature. We did not get a bug report about having too many files in the last 15+ years. There are enough solutions to choosing one hl over another: modelines, kateconfig, modes&filetypes ui, the command set-highlight, …

        1. I guess we’ll have to agree to disagree than because you’re not going to be able to convince me how anyone but distro maintainers (or their bosses) can decide how they ship things. Idem for my MacPorts port: I will be definitely be looking into alternatives to embedding the whole s*load of definition files.

          And if you want a bug report about the number of “hl” files and the resulting SNR … let me find some time to file one that explains the needle in haystack proverb once more O:^)

          1. Ok, but there are two issues with reducing hl files: i) many hl files use IncludeRules from other hl filed, meaning that we have dependencies. ii) the download dialog did not take into account dependency management, so it was inherently broken. I am still not convinced that our many hl files are a problem, I rather consider it a highly successful feature.

        2. The fact there are many hl files is (I think) firstly a measure of how wide many people use kate-based editors, and secondly a measure of how easy it is to write hl files. Where of course that 2nd point may have contributed to the first.

          But I have a strong hunch that there would be much less hl files now if they had been embedded from the beginning, if not only for the simple fact that simple users wouldn’t have had easy access to examples to tweak.

          I didn’t intend to suggest that the existing download manager was perfect – it can’t be because IIRC the KF5 version never actually did something for me other than showing which definitions were outdated. The download manager could also be a standalone application, and the hl files could be installed as a binary collection to keep the initial load time low (an LMDB db with lz4 compressed values, for instance).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.