Perceptual Color

Compiler character sets

Compiler character sets

Compilers have three different character sets:

  • Input character set (the character set of the source code)
  • Narrow execution character set (for char and for string literals without prefix)
  • Wide execution character set (for wchar_t and for string literals with L prefix)

Input character set

This source code of this library is encoded in UTF8. Therefore, your compiler must treat is also as UTF-8.

Why are we using UTF-8 instead of ASCII?

  • UTF-8 is more complete than ASCII. ASCII does not even provide basic typographic symbols like en-dash, em-dash or non-breaking space characters or quotes.
  • Unicode exists since 1991, UTF-8 since 1993. It’s time to get rid of the insufficient ASCII character. It’s time to use Unicode.
  • We use non-ASCII characters for (typographically correct) Doxygen documentation and partially also for non-Doxygen source code comments. It would be quite annoying to use HTML entities for each non-ASCII character in the Doxygen documentation; and it would be pointless to do it for non-Doxygen source code comments.
  • i18n() and ki18n() and tr() require both, the source file and char* to be encoded in UTF-8; no other encodings are supported. (Only ASCII would be UTF-8 compatible, but in practice this encoding is not supported, but only 8859-Latin encodings, which allow code points higher than 127, which risks to introduce incompatibilities. Therefore, this would not be a good option.)
  • The C++ identifiers of library symbols are however (currently) ASCII-only.

So we use a static_assert statement to control this.

Narrow execution character set

Why are we using UTF-8 as narrow execution character set?

  • i18n() and ki18n() and tr() require both, the source file and char* to be encoded in UTF-8; no other encodings are supported.
  • Implicit conversion from char* to QString assumes that char* is UTF-8 encoded. Thus we disable this implicit conversion in CMakeLists.txt, it’s wise to stay compatible.

Therefore, a static assert controls that really UTF-8 is used as narrow execution character set.

Wide execution character set

We do not use actively the wide execution character set. There is a usage when communicating with LittleCMS, but there we depend anyway from LittleCMS. Therefore, currently, no static assert forces a specific wide execution character set.

This file is part of the KDE documentation.
Documentation copyright © 1996-2024 The KDE developers.
Generated on Mon Nov 18 2024 12:18:38 by doxygen 1.12.0 written by Dimitri van Heesch, © 1997-2006

KDE's Doxygen guidelines are available online.