Expand description
Lig/kern programming.
TFM files can include information about ligatures and kerns. A ligature is a special character that can replace two or more adjacent characters. For example, the pair of characters ae can be replaced by the æ ligature which is a single character. A kern is special space inserted between two adjacent characters to align them better. For example, a kern can be inserted between A and V to compensate for the large amount of space created by the specific combination of these two characters.
The lig/kern programming language
TFM provides ligature and kern data in the form of “instructions in a simple programming language that explains what to do for special letter pairs” (quoting TFtoPL.2014.13). This lig/kern programming language can be used to specify instructions like “replace the pair (a,e) by æ” and “insert a kern of width -0.1pt between the pair (A,V)”. But it can also specify more complex behaviors. For example, a lig/kern program can specify “replace the pair (x,y) by the pair (z,y)”.
In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions. After this instruction is executed, there may be a new pair of characters remaining, as in the (x,y) to (z,y) instruction. The lig/kern instruction for this pair is then executed, if it exists. This process continues until there are no more instructions left to run.
Lig/kern instructions are represented in this module by the lang::Instruction
type.
Related code by Knuth
The TFtoPL and PLtoTF programs don’t contain any code for running lig/kern programs.
They only contain logic for translating between the .tfm
and .pl
formats for lig/kern programs, and for doing some validation as described below.
Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
One of the challenges with lig/kern programs is that they can contain infinite loops. Here is a simple example of a lig/kern program with two instruction and an infinite loop:
- Replace (x,y) with (z,y) (in property list format,
(LABEL C x)(LIG/ C y C z)
) - Replace (z,y) with (x,y) (in property list format,
(LABEL C z)(LIG/ C y C x)
)
When this program runs (x,y) will be swapped with (z,y) ad infinitum. See TFtoPL.2014.88 for more examples.
Both TFtoPL and PLtoTF contain code that checks that a lig/kern program does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125). The algorithm for detecting infinite loops is a topological sorting algorithm over a graph where each node is a pair of characters. However it’s a bit complicated because the full graph cannot be constructed without running the lig/kern program.
TeX does not check for infinite loops, presumably under the assumption that any .tfm
file will have
been generated by PLtoTF and thus already validated.
However TeX does check for interrupts when executing lig/kern programs so that
at least a user can terminate TeX if an infinite loop is hit.
(See the check_interrupt
line in KnuthTeX.2021.1040.)
Functionality in this module
This module handles lig/kern programs in a different way,
inspired by the “parse don’t validate”
philosophy.
This module is able to represent raw lig/kern programs as a vector of lang::Instruction
values.
But can also compile lig/kern programs (into a CompiledProgram
).
This compilation process essentially executes the lig/kern program for every possible character pair.
The result is a map from each character pair to the full list of
replacement characters and kerns for that pair.
If there is an infinite loop in the program this compilation will naturally fail.
The compiled program is thus a “parsed” version of the lig/kern program
and it is impossible for infinite loops to appear in it.
An advantage of this model is that the lig/kern program does not need to be repeatedly executed in the main hot loop of TeX. This may make TeX faster. However the compiled lig/kern program does have a larger memory footprint than the raw program, and so it may be slower if TeX is memory bound.
Modules
- Types corresponding to the “lig/kern programming language”.
Structs
- A compiled lig/kern program.
- An error returned from lig/kern compilation.
- One step in a lig/kern infinite loop.
- Data structure describing the replacement of a character pair in a lig/kern program.
- Iterator over the replacement of a character pair in a lig/kern program.
Enums
- Operation to perform on the left character of a lig/kern pair.