Expand description
tfm: TeX font metric data
This is a crate for working with TeX font metric data. It includes:
-
Functions to read and write TeX font metric (.tfm) files to and from a value of type
format::File
(deserialize
,serialize
). -
Functions to read and write property list (.pl) files to and from a value of type
pl::File
(from_pl_source_code
,display
). -
Converters from .tfm to .pl files and vice-versa (using Rust’s
From
trait to go betweenformat::File
andpl::File
). -
A type
Font
that represents a fully validated and compiled TeX font and that can be used to efficiently query data about the font (e.g., “what is the width of the character A?”). This type and its methods are performance optimized and designed for use in the hot main loops of typesetting software such as TeX.
Background
Probably the most famous part of the implementation of TeX is the Knuth-Plass line breaking algorithm. This algorithm determines the “optimal” places to add line breaks when typesetting a paragraph of text. In order to run the algorithm one needs to provide the dimensions of all characters in the current font. These dimensions are used to size the boxes in the Knuth-Plass box and glue model.
In TeX, character dimensions are provided using TeX font metric files. These are binary files. By convention they have a .tfm file extension. Unlike more modern file formats like TrueType, .tfm files only contain the font dimensions; they don’t contains the glyphs. In general, .tfm files are produced by other software like Metafont, placed in some well-known directory in the TeX distribution, and then read into memory when TeX is running.
Because .tfm files are binary files, it’s hard to debug or tweak them.
To remedy this, Knuth and his team developed another file format called a property list file
(extension .pl or .plst)
that contains the same information but in a modifiable text format.
They then wrote two programs:
tftopl
to convert a .tfm file to a .pl file,
and pltotf
to convert a .pl file to a .tfm file.
The general goal of this crate to fully re-implement all of the TeX font metric
code written by Knuth and others.
This includes tftopl
, pltotf
, and also the parts of TeX itself that contain logic
for reading and interpreting .tfm files.
However, unlike these monolithic software programs,
this re-implementation is in the form of a modular library in which
individual pieces of logic and be used and re-used.
Basic example
// Include the .tfm file for Computer Modern in size 10pt.
let tfm_bytes = include_bytes!["../corpus/computer-modern/cmr10.tfm"];
// Deserialize the .tfm file.
let (tfm_file_or_error, deserialization_warnings) = tfm::format::File::deserialize(tfm_bytes);
let mut tfm_file = tfm_file_or_error.expect("cmr10.tfm is a valid .tfm file");
assert_eq![deserialization_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no deserialization warnings"];
// TODO assert_eq![tfm_file.header.design_size, tfm::Number::UNITY * 10]; make it 11 to be more interesting
// TODO query some data
// Validate the .tfm file.
let validation_warnings = tfm_file.validate_and_fix();
assert_eq![validation_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no validation warnings"];
// Convert the .tfm file to a .pl file and print it.
let pl_file: tfm::pl::File = tfm_file.clone().into();
// TODO query some data
println!["cmr10.pl:\n{}", pl_file.display(/*indent=*/2, tfm::pl::CharDisplayFormat::Default)];
// TODO Convert the .tfm file to the crate's Font type.
Advanced functionality
In addition to supporting the basic use cases of querying font metric data and converting between different formats, this crate has advanced functionality for performing additional tasks on font metric data. The full set of functionality can be understood by navigating through the crate documentation. But here are 3 highlights we think are interesting:
-
Language analysis of .pl files: In
pltotf
, Knuth parses .pl files in a single pass. This crate takes a common approach nowadays of parsing in multiple passes: first constructing a concrete syntax tree (or parse tree), next constructing a fully typed and checked abstract syntax tree, and finally building thepl::File
itself. Each of the passes is exposed, so you can e.g. just build the AST for the .pl file and do some analysis on it. -
Debug output for .tfm files:
-
Compilation of lig/kern programs:
Binaries
The Texcraft project produces 3 binaries based on this crate:
tftopl
andpltotf
: re-implementations of Knuth’s programs.tfmtools
: a new binary that has a bunch of different tools for working with TeX font metric data. Runtfmtools help
to list all of the available tools.
In the root of the Texcraft repository
these tools can be run with cargo run --bin $NAME
and built with cargo build --bin $NAME
.
Correctness
As part of the development of this crate significant effort has been spent
ensuring it exactly replicates the work of Knuth.
This correctness checking is largely based around diff testing the binaries
tftopl
and pltotf
.
We verify that the Texcraft and Knuth implementations have the same output
and generate the same error messages.
This diff testing has been performed in a few different ways:
-
We have run diff tests over all ~100,000 .tfm files in CTAN. These tests verify that
tftopl
gives the correct result, and that runningpltotf
on the output .pl file gives the correct result too. Unfortunately runningpltotf
on the .pl files in CTAN is infeasible because most of these files are Perl scripts, not property list files. -
We have developed a fuzz testing harness (so far just for
tftopl
) that generates highly heterogenous .tfm files and verifies thattftopl
gives the correct result. This fuzz testing has uncovered many issues in the Texcraft implementation, and has even identified a 30-year old bug in Knuth’s implementation oftftopl
.
Any .tfm or .pl file that exposes a bug in this library is added to
our automated testing corpus.
Running cargo t
validates that Texcraft’s binaries give the same result as Knuth’s binaries
(the output of Knuth’s binaries is in source control).
This ensures there are no regressions.
If you discover a .tfm or .pl file such that the Texcraft and Knuth implementations diverge, this indicates there is a bug in this library. Please create an issue on the Texcraft GitHub repo. We will fix the bug and add your files to the testing corpus.
Modules
- The tftopl and pltotf algorithms.
- The TeX font metric (.tfm) file format.
- Lig/kern programming.
- The property list (.pl) file format.
Structs
- A character in a TFM file.
- The TFM header, which contains metadata about the file.
- Compiled program of “next larger character” instructions
- Fixed-width numeric type used in TFM files.
Enums
- A named TeX font metric parameter.
- Warning from the compilation of “next larger character” instructions.