Module tfm::ligkern

source ·
Expand description

Lig/kern programming.

TFM files can include information about ligatures and kerns. A ligature is a special character that can replace two or more adjacent characters. For example, the pair of characters ae can be replaced by the æ ligature which is a single character. A kern is special space inserted between two adjacent characters to align them better. For example, a kern can be inserted between A and V to compensate for the large amount of space created by the specific combination of these two characters.

The lig/kern programming language

TFM provides ligature and kern data in the form of “instructions in a simple programming language that explains what to do for special letter pairs” (quoting TFtoPL.2014.13). This lig/kern programming language can be used to specify instructions like “replace the pair (a,e) by æ” and “insert a kern of width -0.1pt between the pair (A,V)”. But it can also specify more complex behaviors. For example, a lig/kern program can specify “replace the pair (x,y) by the pair (z,y)”.

In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions. After this instruction is executed, there may be a new pair of characters remaining, as in the (x,y) to (z,y) instruction. The lig/kern instruction for this pair is then executed, if it exists. This process continues until there are no more instructions left to run.

Lig/kern instructions are represented in this module by the lang::Instruction type.

The TFtoPL and PLtoTF programs don’t contain any code for running lig/kern programs. They only contain logic for translating between the .tfm and .pl formats for lig/kern programs, and for doing some validation as described below. Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.

One of the challenges with lig/kern programs is that they can contain infinite loops. Here is a simple example of a lig/kern program with two instruction and an infinite loop:

  • Replace (x,y) with (z,y) (in property list format, (LABEL C x)(LIG/ C y C z))
  • Replace (z,y) with (x,y) (in property list format, (LABEL C z)(LIG/ C y C x))

When this program runs (x,y) will be swapped with (z,y) ad infinitum. See TFtoPL.2014.88 for more examples.

Both TFtoPL and PLtoTF contain code that checks that a lig/kern program does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125). The algorithm for detecting infinite loops is a topological sorting algorithm over a graph where each node is a pair of characters. However it’s a bit complicated because the full graph cannot be constructed without running the lig/kern program.

TeX does not check for infinite loops, presumably under the assumption that any .tfm file will have been generated by PLtoTF and thus already validated. However TeX does check for interrupts when executing lig/kern programs so that at least a user can terminate TeX if an infinite loop is hit. (See the check_interrupt line in KnuthTeX.2021.1040.)

Functionality in this module

This module handles lig/kern programs in a different way, inspired by the “parse don’t validate” philosophy. This module is able to represent raw lig/kern programs as a vector of lang::Instruction values. But can also compile lig/kern programs (into a CompiledProgram). This compilation process essentially executes the lig/kern program for every possible character pair. The result is a map from each character pair to the full list of replacement characters and kerns for that pair. If there is an infinite loop in the program this compilation will naturally fail. The compiled program is thus a “parsed” version of the lig/kern program and it is impossible for infinite loops to appear in it.

An advantage of this model is that the lig/kern program does not need to be repeatedly executed in the main hot loop of TeX. This may make TeX faster. However the compiled lig/kern program does have a larger memory footprint than the raw program, and so it may be slower if TeX is memory bound.

Modules

  • Types corresponding to the “lig/kern programming language”.

Structs

Enums