Cosmic Text: Pure Rust multi-line text handling

eitland · on March 3, 2023

> The UDHR (Universal Declaration of Human Rights) test involves taking the entire set of UDHR translations (almost 500 languages), concatenating them as one file (which ends up being 8 megabytes!), then via the editor-test example, automatically simulating the entry of that file into cosmic-text per-character, with the use of backspace and delete tested per character and per line.

Now that is testing.

I recently got a tip from someone who recently worked on a project I was at that he had a suspicion some validation code didn't cover all all cases.

There were maybe tre tests which where all green.

So I generated 50 random, but known valid cases and, voila, a few of them failed.

From there it was easy to fix the issue.

MrJohz · on March 3, 2023

I've been really enjoying using Quickcheck & friends recently for test cases, which is essentially that, but doing the test case generation while the tests are running (and if it finds an error, trying to reduce that error to its simplest form). It's been very useful for catching little things that I'd forgotten about, like handling a particular case, or unusually shaped data, or things like that.

The biggest difficulty is trying to create good test cases. Unlike normal tests where you're asserting that some specific case is true, with quickcheck-style tests, you're instead trying to find an invariant in your code that will always be true. So it's quite easy to do "this algorithm never throws an error" but harder to do "this algorithm returns true in these cases and false in these cases". That said, a lot of tools in this space give you a lot of flexibility for generating cases, like saying "generate strings matching this regex" or "generate these arbitrary primitives and map then into the correct structure with this function".

The randomness involved isn't ideal, because you can't necessarily guarantee that the same errors will always show up every time, but it's a good habit to copy the failing generated case into its own test as a kind of regression test. And obviously still write a lot of the standard test cases for more obvious problem points.

boxed · on March 3, 2023

You might want to check out mutation testing. It's not the same at all obviously, but it has some advantages that it's a finite process (you know if you're done), and it's often fairly easy to write tests when you get a mutant. It's a much less cognitively demanding process while at the same time finding a lot of problems with your test suite.

MrJohz · on March 4, 2023

I really like the idea of mutation testing, but I've never found a situation where I've got it to work well. Either the project just didn't have the test support needed to make it work, or the mutation toolkit wasn't mature enough, or it didn't work with the specific tools I was already using. When you suggested it, I had a go again with a project I'm working on using Vitest and Striker, but unfortunately Striker just doesn't work with Vitest yet.

I'd definitely love to use it more though, it's one of those things I come back to every so often, unfortunately thus far without much success.

CGamesPlay · on March 3, 2023

Quickcheck tests are especially easy to use in cases where you wrote code that operates in two directions. For example, saving then loading a file should result in the same data, or a round-trip through a conversion process shouldn't modify the data. Alternatively, if there are two paths to do something, you can verify they are identical, for example saving a file and loading it compared to online syncing a file to a server, should result in the same file on the other end.

dan-robertson · on March 3, 2023

There are two improvements that I think can be made to quick-check style testing. One is replacing the random source with bytes from a buffer and having a way to go from test cases to bytes. Hypothesis (python) does this. This means that you can connect a fuzzer like afl instead of generating purely random inputs. Another is setting up test-runners. If you have automated tests as part of CI, you want a fixed seed and deterministic ‘random’ inputs and a small number of runs so tests are fast and reliable. You can still catch obvious bugs but less likely bugs are harder to find. But if you can easily find all the tests and run them all the time with many more different inputs, hopefully you’ll have a much higher chance of finding rare bugs.

One problem though is that being better at finding bugs isn’t always great: if you find bugs you might feel the desire to fix them but for many software teams, having rare bugs is acceptable, even if they aren’t rare in absolute terms (ie a 1-in-10,000,000 when you have a billion chances for it to happen a day)

jitl · on March 3, 2023

This guy is a legend. He started & build Redox OS, which has a very cool design.

https://www.redox-os.org/

lewisjoe · on March 3, 2023

Woah! That's some impressive piece of work.

I've been thinking of how browsers are eventually proving to be bad at complex text rendering, like for a word processor. You either have to make do with mimicking DOM like a text layouting engine or use browser canvas APIs - both of them locks your codebase into a single target (the browser).

The right way feels like using a cross platform text rendering engine like Skia (or this one, if it could be compiled to WASM and somehow made to write to a browser canvas)

These projects are important for that future. The one where word-processor like apps can be written cross platform by using libraries like these.

tayistay · on March 3, 2023

What’s cool about the Redox design?

exDM69 · on March 3, 2023

This is awesome, thanks to the authors of this, as well as all the authors involved in the Rust text rendering ecosystem.

While browsing this, I found the Rustybuzz project which implements the Harfbuzz shaping algorithm in pure Rust. Harfbuzz is yet another example of a mission critical piece of infrastructure that is maintained with meager resources. It is excellent to have an alternative.

Text rendering is truly a daunting engineering problem.

The safety aspects of Rust are well suited to this problem space, as the fonts and the text we render are often coming from untrusted sources. TrueType font parser bugs have been exploited in the past.

Thanks to everyone who has put their time and effort into this. Truly important work.

nicoburns · on March 3, 2023

Amazingly, Rust actually has three high quality alternatives to HarfBuzz: RustyBuzz, Allsorts, and Swash.

- https://github.com/RazrFalcon/rustybuzz

- https://github.com/yeslogic/allsorts

- https://github.com/dfrg/swash

brundolf · on March 3, 2023

Very cool! And this is under the pop-os github org, which I assume means it's being developed for Pop_OS and shared with the community

wging · on March 3, 2023

The 'Cosmic' name also indicates that; see https://github.com/pop-os/cosmic.

erlend_sh · on March 3, 2023

This has already been merged into iced, which is now a default UI framework for PopOS: https://github.com/iced-rs/iced/pull/1697

nicoburns · on March 3, 2023

I don't think it's actually merged into Iced yet. It is merged into https://github.com/vizia/vizia though.

Aeolun · on March 3, 2023

That’s not merged to master though. But rather a new branch related to text rendering.

lewantmontreal · on March 3, 2023

For what I understand in order to use CJK languages on Linux you currently have to install not a keyboard layout but a separate input device which makes getting started quite difficult. I wonder if the pop-os devs have anything up their sleeve for that.

TT-392 · on March 3, 2023

Not sure what the experience is like when just using the GUI on a normal distro. But on my arch setup I just installed fcitx5, fcitx5-mozc, set up the environment variables to actually use fcitx, and make it autostart, and I had japanese input working just fine.

I am guessing that most normal people desktop distros like popos just have something like fcitx set up. In which case you'd just have to install the plugin, or, I wouldn't be surprised if it comes preinstalled on some of them.

lewantmontreal · on March 3, 2023

Yes, sounds right. When trying out Fedora I installed the fcitx5 tools, added some env vars and virtual keyboard settings and rebooted. Then I had CJK working on a virtual keyboard device.

It’s just very different from MacOS and Windows where you just add a keyboard layout and you’re done, that’s why I was curious. Admittedly I love the amount of customisation options on fcitx.

Karliss · on March 3, 2023

What about input method editors? Don't you need to install those on windows and macOS as well (unless you are happy with the method provided by OS)? Or are the third party input method editors less popular now? (I don't use any of the languages requiring them so I don't know how widely are they used at the moment).

TT-392 · on March 3, 2023

or am I not interpreting your comment correctly here?

t3rra · on March 3, 2023

IDK about Chinese or Japanese though, at least for Korean, there is no need of separate input device at all.

adastra22 · on March 3, 2023

Why would you need a separate input device?