tfm/
lib.rs

1//! # tfm: TeX font metric data
2//!
3//! This is a crate for working with TeX font metric data.
4//! It includes:
5//!
6//! - Functions to read and write TeX font metric (.tfm) files
7//!   to and from a value of type [`File`]
8//!   ([`deserialize`](File::deserialize), [`serialize`](File::serialize)).
9//!
10//! - Functions to read and write property list (.pl) files
11//!   to and from a value of type [`pl::File`]
12//!   ([`from_pl_source_code`](pl::File::from_pl_source_code), [`display`](pl::File::display)).
13//!
14//! - Converters from .tfm to .pl files and vice-versa
15//!   (using Rust's [`From`] trait to go between [`File`] and [`pl::File`]).
16//!
17//! ## Background
18//!
19//! Probably the most famous part of the implementation of TeX is the Knuth-Plass line breaking algorithm.
20//! This algorithm determines the "optimal" places to add line breaks when typesetting a paragraph of text.
21//! In order to run the algorithm one needs to provide the dimensions of all characters in the current font.
22//! These dimensions are used to size the boxes in the Knuth-Plass box and glue model.
23//!
24//! In TeX, character dimensions are provided using TeX font metric files.
25//! These are binary files.
26//! By convention they have a .tfm file extension.
27//! Unlike more modern file formats like TrueType, .tfm files only contain the font dimensions;
28//!     they don't contains the glyphs.
29//! In general,
30//!     .tfm files are produced by other software like Metafont,
31//!     placed in some well-known directory in the TeX distribution,
32//!     and then read into memory when TeX is running.
33//!
34//! Because .tfm files are binary files, it's hard to debug or tweak them.
35//! To remedy this, Knuth and his team developed another file format called a property list file
36//!     (extension .pl or .plst)
37//!     that contains the same information but in a modifiable text format.
38//! They then wrote two programs:
39//!     `tftopl` to convert a .tfm file to a .pl file,
40//!     and `pltotf` to convert a .pl file to a .tfm file.
41//!
42//! The general goal of this crate to fully re-implement all of the TeX font metric
43//!     code written by Knuth and others.
44//! This includes `tftopl`, `pltotf`, and also the parts of TeX itself that contain logic
45//!     for reading and interpreting .tfm files.
46//! However, unlike these monolithic software programs,
47//!     this re-implementation is in the form of a modular library in which
48//!     individual pieces of logic and be used and re-used.
49//!
50//! ## Basic example
51//!
52//! ```
53//! // Include the .tfm file for Computer Modern in size 10pt.
54//! let tfm_bytes = include_bytes!["../corpus/computer-modern/cmr10.tfm"];
55//!
56//! // Deserialize the .tfm file.
57//! let (tfm_file_or_error, deserialization_warnings) = tfm::File::deserialize(tfm_bytes);
58//! let mut tfm_file = tfm_file_or_error.expect("cmr10.tfm is a valid .tfm file");
59//! assert_eq![deserialization_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no deserialization warnings"];
60//! // TODO assert_eq![tfm_file.header.design_size, tfm::FixWord::UNITY * 10]; make it 11 to be more interesting
61//! // TODO query some data
62//!
63//! // Validate the .tfm file.
64//! let validation_warnings = tfm_file.validate_and_fix();
65//! assert_eq![validation_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no validation warnings"];
66//!
67//! // Convert the .tfm file to a .pl file and print it.
68//! let pl_file: tfm::pl::File = tfm_file.clone().into();
69//! // TODO query some data
70//! println!["cmr10.pl:\n{}", pl_file.display(/*indent=*/2, tfm::pl::CharDisplayFormat::Default)];
71//! ```
72//!
73//!
74//! ## Advanced functionality
75//!
76//! In addition to supporting the basic use cases of querying font metric data
77//!     and converting between different formats,
78//!     this crate has advanced functionality for performing additional tasks on font metric data.
79//! The full set of functionality can be understood by navigating through the crate documentation.
80//! But here are 3 highlights we think are interesting:
81//!
82//! - **Language analysis of .pl files**:
83//!   In `pltotf`, Knuth parses .pl files in a single pass.
84//!   This crate takes a common approach nowadays of parsing in multiple passes:
85//!   first constructing a [concrete syntax tree](pl::cst::Cst) (or parse tree),
86//!   next constructing a [fully typed and checked abstract syntax tree](pl::ast::Ast),
87//!   and finally building the [`pl::File`] itself.
88//!   Each of the passes is exposed, so you can e.g. just build the AST for the .pl file and
89//!   do some analysis on it.
90//!
91//! - **Debug output for .tfm files**:
92//!     
93//! - **Compilation of lig/kern programs**:
94//!
95//!
96//! ## Binaries
97//!
98//! The Texcraft project produces 3 binaries based on this crate:
99//!
100//! - `tftopl` and `pltotf`: re-implementations of Knuth's programs.
101//! - `tfmtools`: a new binary that has a bunch of different tools
102//!   for working with TeX font metric data.
103//!   Run `tfmtools help` to list all of the available tools.
104//!
105//! In the root of [the Texcraft repository](https://github.com/jamespfennell/texcraft)
106//!     these tools can be run with `cargo run --bin $NAME`
107//!     and built with `cargo build --bin $NAME`.
108//!
109//!
110//! ## Correctness
111//!
112//! As part of the development of this crate significant effort has been spent
113//!     ensuring it exactly replicates the work of Knuth.
114//! This correctness checking is largely based around diff testing the binaries
115//!     `tftopl` and `pltotf`.
116//! We verify that the Texcraft and Knuth implementations have the same output
117//!     and generate the same error messages.
118//!
119//! This diff testing has been performed in a few different ways:
120//!
121//! - We have run diff tests over all ~100,000 .tfm files in CTAN.
122//!   These tests verify that `tftopl` gives the correct result,
123//!   and that running `pltotf` on the output .pl file gives the correct result too.
124//!   Unfortunately running `pltotf` on the .pl files in CTAN is infeasible
125//!   because most of these files are Perl scripts, not property list files.
126//!
127//! - We have developed a fuzz testing harness (so far just for `tftopl`)
128//!   that generates highly heterogenous .tfm files and verifies that `tftopl` gives the correct result.
129//!   This fuzz testing has uncovered many issues in the Texcraft implementation,
130//!   and has even identified [a 30-year old bug](https://tug.org/pipermail/tex-k/2024-March/004031.html)
131//!   in Knuth's implementation of `tftopl`.
132//!
133//! Any .tfm or .pl file that exposes a bug in this library is added to
134//!     [our automated testing corpus](https://github.com/jamespfennell/texcraft/tree/main/crates/tfm/bin/tests/data).
135//! Running `cargo t` validates that Texcraft's binaries give the same result as Knuth's binaries
136//!     (the output of Knuth's binaries is in source control).
137//! This ensures there are no regressions.
138//!
139//! If you discover a .tfm or .pl file such that the Texcraft and Knuth implementations
140//!     diverge, this indicates there is a bug in this library.
141//! Please create an issue on the Texcraft GitHub repo.
142//! We will fix the bug and add your files to the testing corpus.
143
144pub mod algorithms;
145use std::{
146    collections::{BTreeSet, HashMap, HashSet},
147    num::NonZeroU8,
148};
149pub mod ligkern;
150pub mod pl;
151
152use std::collections::BTreeMap;
153
154mod debug;
155mod deserialize;
156mod serialize;
157mod validate;
158
159pub use deserialize::DeserializationError;
160pub use deserialize::DeserializationWarning;
161pub use deserialize::RawFile;
162pub use deserialize::SubFileSizes;
163pub use validate::ValidationWarning;
164
165/// Complete contents of a TeX font metric (.tfm) file.
166///
167/// The struct contain multiple vectors.
168/// In TeX and TFtoPL there is an optimization in which all of data in the vectors
169/// is stored in one large vector of 32-bit integers.
170/// The conversion from [u32] to the specific types like [FixWord] are then done when the
171/// data is needed.
172/// This makes the memory footprint of this type much more compact,
173///     and such a change may be considered in the future.
174///
175/// In fact in TeX the font data for all fonts is stored in one contiguous piece of memory
176///     (`font_info`, defined in TeX82.2021.549).
177/// This is a little too unsafe to pull off though.
178#[derive(Clone, Debug, PartialEq, Eq)]
179pub struct File {
180    /// Header.
181    pub header: Header,
182
183    /// Smallest character in the .tfm.
184    pub smallest_char: Char,
185
186    /// Character dimensions.
187    pub char_dimens: BTreeMap<Char, CharDimensions>,
188
189    /// Character tags.
190    ///
191    /// Note there is no correlation between a character having
192    ///     a tag and a character having a dimension.
193    /// All four combinations of (has or hasn't dimensions) and (has or hasn't a tag)
194    ///     are possible.
195    pub char_tags: BTreeMap<Char, CharTag>,
196
197    /// Tags that have been unset, but whose discriminant is still written to a .tfm file by PLtoTF.
198    pub unset_char_tags: BTreeMap<Char, u8>,
199
200    /// Character widths
201    pub widths: Vec<FixWord>,
202
203    /// Character heights
204    pub heights: Vec<FixWord>,
205
206    /// Character depths
207    pub depths: Vec<FixWord>,
208
209    /// Character italic corrections
210    pub italic_corrections: Vec<FixWord>,
211
212    /// Lig kern program.
213    pub lig_kern_program: ligkern::lang::Program,
214
215    /// Kerns. These are referenced from inside the lig kern commands.
216    pub kerns: Vec<FixWord>,
217
218    /// Extensible characters.
219    pub extensible_chars: Vec<ExtensibleRecipe>,
220
221    /// Font parameters.
222    pub params: Vec<FixWord>,
223}
224
225/// Data about one character in a .tfm file.
226#[derive(Clone, Debug, PartialEq, Eq)]
227pub struct CharDimensions {
228    /// Index of the width of this character in the widths array.
229    ///
230    /// In valid TFM files, this index will always be non-zero.
231    /// This is because if the width index is zero it means there is no data for the character in the file.
232    /// In this case the other indices (height, etc.) are necessarily zero.
233    /// See TFtoPL.2014.? (where blocks of data with a zero width index are skipped)
234    ///     and PLtoTF.2014.? (where missing characters are written with all indices 0).
235    ///
236    /// There is one edge case where this index can be zero.
237    /// This is if the width index is invalid.
238    /// In this case tftopl essentially sets the width index to 0.
239    ///
240    /// Note that even if a character doesn't have dimensions, it can still have a tag.
241    pub width_index: WidthIndex,
242    /// Index of the height of this character in the height array.
243    pub height_index: u8,
244    pub depth_index: u8,
245    pub italic_index: u8,
246}
247
248#[derive(Copy, Clone, Debug, PartialEq, Eq)]
249pub enum WidthIndex {
250    Invalid,
251    Valid(NonZeroU8),
252}
253
254impl WidthIndex {
255    pub fn get(&self) -> u8 {
256        match self {
257            WidthIndex::Invalid => 0,
258            WidthIndex::Valid(n) => n.get(),
259        }
260    }
261}
262
263/// Tag of a character in a .tfm file.
264#[derive(Clone, Debug, PartialEq, Eq)]
265pub enum CharTag {
266    Ligature(u8),
267    List(Char),
268    Extension(u8),
269}
270
271impl CharTag {
272    pub fn ligature(&self) -> Option<u8> {
273        match self {
274            CharTag::Ligature(l) => Some(*l),
275            CharTag::List(_) | CharTag::Extension(_) => None,
276        }
277    }
278    pub fn list(&self) -> Option<Char> {
279        match self {
280            CharTag::List(c) => Some(*c),
281            CharTag::Ligature(_) | CharTag::Extension(_) => None,
282        }
283    }
284    pub fn extension(&self) -> Option<u8> {
285        match self {
286            CharTag::Extension(u) => Some(*u),
287            CharTag::Ligature(_) | CharTag::List(_) => None,
288        }
289    }
290}
291
292/// Extensible recipe instruction in a .tfm file.
293#[derive(Debug, Default, Clone, PartialEq, Eq)]
294pub struct ExtensibleRecipe {
295    pub top: Option<Char>,
296    pub middle: Option<Char>,
297    pub bottom: Option<Char>,
298    pub rep: Char,
299}
300
301impl ExtensibleRecipe {
302    pub fn is_seven_bit(&self) -> bool {
303        self.chars().all(|c| c.is_seven_bit())
304    }
305
306    pub fn chars(&self) -> impl Iterator<Item = Char> {
307        [self.top, self.middle, self.bottom, Some(self.rep)]
308            .into_iter()
309            .flatten()
310    }
311}
312
313impl Default for File {
314    fn default() -> Self {
315        Self {
316            header: Default::default(),
317            smallest_char: Char(1),
318            char_dimens: Default::default(),
319            char_tags: Default::default(),
320            unset_char_tags: Default::default(),
321            widths: vec![FixWord::ZERO],
322            heights: vec![FixWord::ZERO],
323            depths: vec![FixWord::ZERO],
324            italic_corrections: vec![FixWord::ZERO],
325            lig_kern_program: Default::default(),
326            kerns: vec![],
327            extensible_chars: vec![],
328            params: Default::default(),
329        }
330    }
331}
332
333/// Print debugging information about a .tfm file.
334pub fn debug<'a>(
335    path: Option<&'a str>,
336    sub_file_sizes: SubFileSizes,
337    tfm_file: Option<&'a File>,
338    raw_file: Option<&'a RawFile<'a>>,
339    sections: Vec<Section>,
340) -> impl std::fmt::Display + 'a {
341    debug::Debug {
342        sub_file_sizes,
343        path,
344        raw_file,
345        tfm_file,
346        sections,
347    }
348}
349
350impl File {
351    pub fn from_raw_file(raw_file: &RawFile) -> Self {
352        deserialize::from_raw_file(raw_file)
353    }
354
355    pub fn validate_and_fix(&mut self) -> Vec<ValidationWarning> {
356        validate::validate_and_fix(self)
357    }
358
359    pub fn deserialize(
360        b: &[u8],
361    ) -> (
362        Result<File, DeserializationError>,
363        Vec<DeserializationWarning>,
364    ) {
365        deserialize::deserialize(b)
366    }
367
368    pub fn serialize(&self) -> Vec<u8> {
369        serialize::serialize(self)
370    }
371
372    /// Return a map from characters to the lig/kern entrypoint for that character.
373    /// TODO: can probably return impl Iterator<Item=(Char, u8)>
374    pub fn lig_kern_entrypoints(&self) -> HashMap<Char, u8> {
375        self.char_tags
376            .iter()
377            .filter_map(|(c, d)| match *d {
378                CharTag::Ligature(l) => Some((*c, l)),
379                _ => None,
380            })
381            .collect()
382    }
383
384    fn char_info_bounds(&self) -> Option<(Char, Char)> {
385        let mut r: Option<(Char, Char)> = None;
386        for c in self.char_dimens.keys().copied() {
387            r = Some(match r {
388                None => (c, c),
389                Some((lower, upper)) => (
390                    if c.0 < lower.0 { c } else { lower },
391                    if c.0 < upper.0 { upper } else { c },
392                ),
393            })
394        }
395        r
396    }
397
398    /// Calculate the checksum of this .tfm file.
399    pub fn checksum(&self) -> u32 {
400        // This checksum algorithm is in PLtoTF.2014.134.
401        let (bc, ec) = self.char_info_bounds().unwrap_or((Char(1), Char(0)));
402        let mut b = [bc.0, ec.0, bc.0, ec.0];
403        for c in bc.0..=ec.0 {
404            let char_dimens = match self.char_dimens.get(&Char(c)) {
405                None => continue,
406                Some(char_dimens) => char_dimens,
407            };
408            let width = self.widths[char_dimens.width_index.get() as usize].0;
409            // TODO: adjust based on the design units
410            let width = width + (c as i32 + 4) * 0o20_000_000;
411            let add = |b: u8, m: u8| -> u8 {
412                (((b as i32) + (b as i32) + width) % (m as i32))
413                    .try_into()
414                    .expect("(i32 % u8) is always a u8")
415            };
416            b = [
417                add(b[0], 255),
418                add(b[1], 253),
419                add(b[2], 251),
420                add(b[3], 247),
421            ]
422        }
423        u32::from_be_bytes(b)
424    }
425
426    pub fn named_param(&self, param: NamedParameter) -> Option<FixWord> {
427        self.params.get((param.number() - 1) as usize).copied()
428    }
429
430    pub fn named_param_scaled(&self, param: NamedParameter) -> Option<core::Scaled> {
431        self.named_param(param)
432            .map(|p| p.to_scaled(self.header.design_size))
433    }
434}
435
436impl From<crate::pl::File> for File {
437    fn from(pl_file: crate::pl::File) -> Self {
438        let mut char_bounds: Option<(Char, Char)> = None;
439
440        let lig_kern_entrypoints = pl_file.lig_kern_entrypoints(true);
441        let mut lig_kern_program = pl_file.lig_kern_program;
442        let kerns = lig_kern_program.unpack_kerns();
443        let lig_kern_entrypoints = lig_kern_program.pack_entrypoints(lig_kern_entrypoints);
444
445        let mut widths = pl_file.additional_widths;
446        let mut heights = pl_file.additional_heights;
447        let mut depths = pl_file.additional_depths;
448        let mut italic_corrections = pl_file.additional_italics;
449        for (char, char_dimens) in &pl_file.char_dimens {
450            widths.push(char_dimens.width.unwrap_or_default());
451            match char_dimens.height {
452                None | Some(FixWord::ZERO) => {}
453                Some(height) => heights.push(height),
454            }
455            match char_dimens.depth {
456                None | Some(FixWord::ZERO) => {}
457                Some(depth) => depths.push(depth),
458            }
459            match char_dimens.italic_correction {
460                None | Some(FixWord::ZERO) => {}
461                Some(italic_correction) => italic_corrections.push(italic_correction),
462            }
463            char_bounds = Some(match char_bounds {
464                None => (*char, *char),
465                Some((lower, upper)) => (
466                    if *char < lower { *char } else { lower },
467                    if *char > upper { *char } else { upper },
468                ),
469            })
470        }
471        let (widths, width_to_index) = compress(&widths, 255);
472        let (heights, height_to_index) = compress(&heights, 15);
473        let (depths, depth_to_index) = compress(&depths, 15);
474        let (italic_corrections, italic_correction_to_index) = compress(&italic_corrections, 63);
475        let mut extensible_chars = vec![];
476        let char_dimens = match char_bounds {
477            None => Default::default(),
478            Some((lower, upper)) => {
479                let mut m: BTreeMap<Char, CharDimensions> = Default::default();
480                for c in lower.0..=upper.0 {
481                    let pl_data = match pl_file.char_dimens.get(&Char(c)) {
482                        Some(pl_data) => pl_data,
483                        None => continue,
484                    };
485                    let width = pl_data.width.unwrap_or_default();
486                    let width_index = *width_to_index.get(&width).expect(
487                        "the map returned from compress(_,_) contains every input as a key",
488                    );
489                    m.insert(
490                        Char(c),
491                        CharDimensions {
492                            width_index: WidthIndex::Valid(width_index),
493                            height_index: match pl_data.height {
494                                None => 0,
495                                Some(height) => {
496                                    // If the height data is missing from the height_to_index map, it's because
497                                    // the height is 0. Similar with depths and italic corrections.
498                                    height_to_index
499                                        .get(&height)
500                                        .copied()
501                                        .map(NonZeroU8::get)
502                                        .unwrap_or(0)
503                                }
504                            },
505                            depth_index: match pl_data.depth {
506                                None => 0,
507                                Some(depth) => depth_to_index
508                                    .get(&depth)
509                                    .copied()
510                                    .map(NonZeroU8::get)
511                                    .unwrap_or(0),
512                            },
513                            italic_index: match pl_data.italic_correction {
514                                None => 0,
515                                Some(italic_correction) => italic_correction_to_index
516                                    .get(&italic_correction)
517                                    .copied()
518                                    .map(NonZeroU8::get)
519                                    .unwrap_or(0),
520                            },
521                        },
522                    );
523                }
524                m
525            }
526        };
527        let ordered_chars = {
528            let mut m: Vec<Char> = pl_file.char_tags.keys().copied().collect();
529            m.sort();
530            m
531        };
532        let char_tags = ordered_chars.into_iter().map(|c| {
533            (c,
534                match pl_file.char_tags.get(&c).unwrap() {
535                    pl::CharTag::Ligature(_) => {
536                        let entrypoint = *lig_kern_entrypoints.get(&c).expect("the map returned by crate::ligkern::lang::compress_entrypoints has a key for all chars with a lig tag");
537                        CharTag::Ligature(entrypoint)
538                    }
539                    pl::CharTag::List(c) => CharTag::List(*c),
540                    pl::CharTag::Extension(e) => {
541                        let index: u8 = extensible_chars.len().try_into().unwrap();
542                        extensible_chars.push(e.clone());
543                        CharTag::Extension(index)
544                    }
545                },
546            )
547        }).collect();
548
549        let mut file = Self {
550            header: pl_file.header,
551            smallest_char: char_bounds.map(|t| t.0).unwrap_or(Char(1)),
552            char_dimens,
553            char_tags,
554            unset_char_tags: pl_file.unset_char_tags.clone(),
555            widths,
556            heights,
557            depths,
558            italic_corrections,
559            lig_kern_program,
560            kerns,
561            extensible_chars,
562            params: pl_file.params,
563        };
564        if file.header.checksum.is_none() {
565            file.header.checksum = Some(file.checksum());
566        }
567        file
568    }
569}
570
571/// Lossy compression of numbers for TFM files.
572///
573/// The TFM file format can only store up to 15 heights, 15 depths and 63 italic corrections.
574/// If a property list file contains, e.g., more than 15 distinct heights, something has to give.
575/// PLtoTF contains a lossy compression algorithm that takes a list of values and returns another
576/// bounded list of values which approximates the original list.
577/// This is implemented in PLtoTF.2014.75-80 and re-implemented here.
578///
579/// ## How the algorithm works.
580///
581/// For a given delta, the algorithm partitions the ordered list of values such that within
582/// each partition the maximum distance between two elements is delta. All of the values within
583/// the partition are then approximated by `(interval_max+interval_min)/2`.
584/// As such, each value may be increased or decreased by up to `delta/2`.
585/// After this procedure, the list of values is replaced by the list of approximations.
586/// This compresses the list of values into a list whose size is the number of intervals.
587///
588/// Given this subroutine, the algorithm finds the smallest delta such that the number of intervals
589/// is less than the required maximum (e.g., 15 for heights).
590///
591/// There are some important features of this algorithm to note:
592///
593/// - For a given delta value there may be multiple possible partitions.
594///   The algorithm uses a greedy approach in which it maximizes the size of the first partition,
595///   then the size of the second partition, and so on.
596///
597/// - Distinct delta values can yield the same partition.
598///   For example, if the initial values are `[1, 4, 5]` then any delta in the range `[1, 3)`
599///   gives the same result (`[[1], [4, 5]]`)
600///   Whenever we check a delta, we are really checking the _interval of deltas_  that gives the same result.
601///   Both Knuth's and our implementations use this fact to speed up the search by reducing the search space.
602///   E.g. in the example above, after checking `delta=1` we _don't_ check `delta=2`.
603///
604/// ## This re-implementation
605///
606/// The re-implementation here follows PLtoTF closely enough, but with one modification.
607/// To find the optimal delta, Knuth first calculates the minimal possible delta.
608/// This is the minimum distance between adjacent elements in the ordered list of values.
609/// In general this will not be a valid solution because the number of intervals
610///     it generates will be large.
611/// So, he next finds the smallest `k` such that `2^k * min_delta` is a valid solution.
612/// Te then does a upwards linear search within the interval `[min_delta * 2^{k-1}, min_delta * 2^k]` to find
613///     the optimal delta.
614///
615/// Checking a particular delta is `O(n)`.
616/// The worst-case running time of Knuth's algorithm is then `O(n^3)` because the interval
617///     `[2^{k-1}, 2^k]` can contain `O(n^2)` distinct deltas to check in the worst-case [Note 1].
618///
619/// In the re-implementation here we realize that the problem of finding the smallest possible delta
620///     is a classic binary search problem.
621/// This is because if delta is a valid solution, any larger delta also is;
622///     and if delta is not a valid solution, any smaller delta is also not a valid solution.
623/// The re-implementation using binary search is `O(n log n)`.
624/// Moreover, the constant factors are about the same.
625///
626/// [Note 1] In the worst-case there are O(n^2) distinct deltas because each pair of elements yields a delta.
627/// Let m be the smallest delta and M the largest delta.
628/// In the initial `2^k`-based ranging scheme, the largest `K` satisfies `m 2^{K-1} < M <= m 2^K`,
629/// or `K-1 <= log_2(M/m)`. Thus there are `K=O(1)` ranges in this initial scheme.
630/// By the pigeon-hole principle, there exists a `k` such that the range `[m * 2^{k-1}, m * 2^k]`
631///     contains O(n^2) elements.
632/// In the worst-case, the solution is the maximum element of this range.
633pub fn compress(values: &[FixWord], max_size: u8) -> (Vec<FixWord>, HashMap<FixWord, NonZeroU8>) {
634    let max_size = max_size as usize;
635    let dedup_values = {
636        let s: HashSet<FixWord> = values.iter().copied().collect();
637        // remove the zero value for non-widths
638        // and then add it back in at the start
639        let mut v: Vec<FixWord> = s.into_iter().collect();
640        v.sort();
641        v
642    };
643    // After deduplication, it is possible we don't need to compress at all so we can exit early.
644    // This also handles the case when the values slice is empty.
645    if dedup_values.len() <= max_size {
646        let m: HashMap<FixWord, NonZeroU8> = dedup_values
647            .iter()
648            .enumerate()
649            .map(|(i, &w)| {
650                let i: u8 = i.try_into().expect("`dedup_values` has at most `max_size` elements, so the index is at most `max_size-1`");
651                let i: NonZeroU8 = (i+1).try_into().expect("`i<=max_size-1<=u8::MAX`, so `i+1<=u8::MAX`");
652                (w, i) })
653            .collect();
654        let mut dedup_values = dedup_values;
655        dedup_values.push(FixWord::ZERO);
656        dedup_values.rotate_right(1);
657        return (dedup_values, m);
658    }
659
660    // For the binary search we maintain lower and upper indices as usual.
661    // The optimal delta is in the interval [lower, upper].
662    //
663    // Invariant: delta<lower is never a solution.
664    // Because delta must be non-negative, we initialize it to zero.
665    let mut lower = FixWord::ZERO;
666    // Invariant: delta=upper is always solution.
667    // To initialize upper and begin the search we construct a solution that always works: a single
668    // interval encompassing the entire slice and the largest delta possible.
669    let max_delta = *dedup_values.last().unwrap() - *dedup_values.first().unwrap();
670    let mut upper = max_delta;
671    let mut solution = vec![dedup_values.len()];
672
673    let mut buffer = vec![];
674    while lower < upper {
675        // After the following line delta is potentially equal to lower. This is what we want as
676        // we know upper is a solution so to advance the search when upper=lower+1
677        // we need to check lower+1.
678        let delta = lower + (upper - lower) / 2;
679
680        let mut interval_start = *dedup_values.first().unwrap();
681        // The smallest delta such that the candidate solution will be the same.
682        // This is the maximum of all gaps that don't start a new interval.
683        let mut delta_lower = FixWord::ZERO;
684        // The largest delta such that the candidate solution will be different.
685        // This is the minimum of all gaps that start a new interval.
686        let mut delta_upper = max_delta;
687        for (i, &v) in dedup_values.iter().enumerate() {
688            let gap = v - interval_start;
689            if gap > delta {
690                // We need to start a new interval
691                if gap < delta_upper {
692                    delta_upper = gap;
693                }
694                buffer.push(i);
695                // If the candidate solution is already too big, we can exit early.
696                if buffer.len() >= max_size {
697                    break;
698                }
699                interval_start = v;
700            } else {
701                // We need to extend the current interval
702                // For any delta in the range [gap, delta] we would have made the same choice here.
703                if gap > delta_lower {
704                    delta_lower = gap;
705                }
706            }
707        }
708        buffer.push(dedup_values.len());
709
710        if buffer.len() <= max_size {
711            // solution
712            std::mem::swap(&mut buffer, &mut solution);
713            upper = delta_lower;
714        } else {
715            // not a solution
716            lower = delta_upper;
717        }
718        buffer.clear();
719    }
720
721    let mut value_to_index = HashMap::<FixWord, NonZeroU8>::new();
722    let mut result = vec![FixWord::ZERO];
723    let mut previous = 0_usize;
724    for i in solution {
725        let interval = &dedup_values[previous..i];
726        previous = i;
727        for &v in interval {
728            let index: u8 = result.len().try_into().expect("the `result` array contains at most `1+max_size` elements, so the index it at most `max_size` which is a u8");
729            let index: NonZeroU8 = index
730                .try_into()
731                .expect("the `result` array contains at least 1 element so this is never 0");
732            value_to_index.insert(v, index);
733        }
734        let replacement = (*interval.last().unwrap() + *interval.first().unwrap()) / 2;
735        result.push(replacement);
736    }
737
738    (result, value_to_index)
739}
740
741#[derive(Debug, Hash, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
742pub enum Section {
743    /// Sub-file sizes
744    SubFileSizes,
745    /// Header
746    Header,
747    /// Character data
748    CharInfos,
749    /// Widths array
750    Widths,
751    /// Heights array
752    Heights,
753    /// Depths array
754    Depths,
755    /// Italic corrections array
756    ItalicCorrections,
757    /// Lig/kern instructions
758    LigKern,
759    /// Kerns array
760    Kerns,
761    /// Extensible recipes
762    ExtensibleRecipes,
763    /// Params array
764    Params,
765}
766
767impl Section {
768    pub const NAMES: [&'static str; 11] = [
769        "sub-file-sizes",
770        "header",
771        "char-infos",
772        "widths",
773        "heights",
774        "depths",
775        "italic-corrections",
776        "lig-kern",
777        "kerns",
778        "extensible-recipes",
779        "params",
780    ];
781    pub const ALL_SECTIONS: [Section; 11] = [
782        Section::SubFileSizes,
783        Section::Header,
784        Section::CharInfos,
785        Section::Widths,
786        Section::Heights,
787        Section::Depths,
788        Section::ItalicCorrections,
789        Section::LigKern,
790        Section::Kerns,
791        Section::ExtensibleRecipes,
792        Section::Params,
793    ];
794}
795
796impl TryFrom<&str> for Section {
797    type Error = ();
798
799    fn try_from(value: &str) -> Result<Self, Self::Error> {
800        for i in 0..Section::NAMES.len() {
801            if Section::NAMES[i] == value {
802                return Ok(Section::ALL_SECTIONS[i]);
803            }
804        }
805        Err(())
806    }
807}
808
809impl std::fmt::Display for Section {
810    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
811        write!(f, "{}", Section::NAMES[*self as usize])
812    }
813}
814
815/// The TFM header, which contains metadata about the file.
816///
817/// This is defined in TFtoPL.2014.10.
818#[derive(Clone, Debug, Default, PartialEq, Eq)]
819pub struct Header {
820    /// The font checksum, if specified.
821    ///
822    /// In .tfm files checksums are always specified because the format has no
823    ///     way to omit a checksum.
824    ///
825    /// In .pl files checksums are specified if the `CHECKSUM` node appears.
826    /// If no checksum is specified in a .pl file, pltotf calculates the
827    ///     correct value and writes that.
828    ///
829    /// In TeX82, this is stored in the `font_check` array (TeX82.2021.549).
830    pub checksum: Option<u32>,
831    /// In TeX82, this is stored in the `font_dsize` array (TeX82.2021.549).
832    pub design_size: FixWord,
833    pub design_size_valid: bool,
834    pub character_coding_scheme: Option<String>,
835    pub font_family: Option<String>,
836    pub seven_bit_safe: Option<bool>,
837    pub face: Option<Face>,
838    /// The TFM format allows the header to contain arbitrary additional data.
839    pub additional_data: Vec<u32>,
840}
841
842impl Header {
843    /// Returns the default header when parsing property list files.
844    ///
845    /// This is defined in PLtoTF.2014.70.
846    pub fn pl_default() -> Header {
847        Header {
848            checksum: None,
849            design_size: FixWord::ONE * 10,
850            design_size_valid: true,
851            character_coding_scheme: Some("UNSPECIFIED".into()),
852            font_family: Some("UNSPECIFIED".into()),
853            seven_bit_safe: None,
854            face: Some(0.into()),
855            additional_data: vec![],
856        }
857    }
858
859    /// Returns the default header when parsing .tfm files.
860    ///
861    /// This is defined in PLtoTF.2014.70.
862    pub fn tfm_default() -> Header {
863        Header {
864            checksum: Some(0),
865            design_size: FixWord::ZERO,
866            design_size_valid: true,
867            character_coding_scheme: None,
868            font_family: None,
869            seven_bit_safe: None,
870            face: None,
871            additional_data: vec![],
872        }
873    }
874}
875
876/// A character in a TFM file.
877///
878/// TFM and PL files only support 1-byte characters.
879#[derive(Debug, Default, PartialEq, Eq, Hash, Clone, Copy, PartialOrd, Ord)]
880#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
881pub struct Char(pub u8);
882
883impl std::fmt::Display for Char {
884    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
885        if (self.0 as char).is_ascii_graphic() {
886            write!(f, "{}", self.0 as char)
887        } else {
888            write!(f, "0x{:02x}", self.0)
889        }
890    }
891}
892
893impl From<u8> for Char {
894    fn from(value: u8) -> Self {
895        Char(value)
896    }
897}
898
899impl TryFrom<char> for Char {
900    type Error = std::char::TryFromCharError;
901
902    fn try_from(value: char) -> Result<Self, Self::Error> {
903        let u: u8 = value.try_into()?;
904        Ok(Char(u))
905    }
906}
907
908impl From<Char> for char {
909    fn from(value: Char) -> Self {
910        value.0 as char
911    }
912}
913
914macro_rules! const_chars {
915    ( $( ($name: ident, $value: expr), )+ ) => {
916        $(
917            pub const $name: Char = Char($value);
918        )+
919    };
920}
921
922impl Char {
923    const_chars![
924        (A, b'A'),
925        (B, b'B'),
926        (C, b'C'),
927        (D, b'D'),
928        (V, b'V'),
929        (W, b'W'),
930        (X, b'X'),
931        (Y, b'Y'),
932        (Z, b'Z'),
933    ];
934
935    pub fn is_seven_bit(&self) -> bool {
936        self.0 <= 127
937    }
938}
939
940#[derive(Debug, PartialEq, Eq, Copy, Clone)]
941#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
942pub enum FaceWeight {
943    Light,
944    Medium,
945    Bold,
946}
947
948#[derive(Debug, PartialEq, Eq, Copy, Clone)]
949#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
950pub enum FaceSlope {
951    Roman,
952    Italic,
953}
954
955#[derive(Debug, PartialEq, Eq, Copy, Clone)]
956#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
957pub enum FaceExpansion {
958    Regular,
959    Condensed,
960    Extended,
961}
962
963#[derive(Debug, PartialEq, Eq, Copy, Clone)]
964#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
965pub enum Face {
966    Valid(FaceWeight, FaceSlope, FaceExpansion),
967    Other(u8),
968}
969
970impl From<u8> for Face {
971    fn from(value: u8) -> Self {
972        if value >= 18 {
973            return Face::Other(value);
974        }
975        let a = match (value % 6) / 2 {
976            0 => FaceWeight::Medium,
977            1 => FaceWeight::Bold,
978            2 => FaceWeight::Light,
979            _ => unreachable!(),
980        };
981        let b = match value % 2 {
982            0 => FaceSlope::Roman,
983            1 => FaceSlope::Italic,
984            _ => unreachable!(),
985        };
986        let c = match value / 6 {
987            0 => FaceExpansion::Regular,
988            1 => FaceExpansion::Condensed,
989            2 => FaceExpansion::Extended,
990            _ => unreachable!(),
991        };
992        Face::Valid(a, b, c)
993    }
994}
995
996impl From<Face> for u8 {
997    fn from(value: Face) -> Self {
998        match value {
999            Face::Valid(w, s, c) => {
1000                let a: u8 = match w {
1001                    FaceWeight::Medium => 0,
1002                    FaceWeight::Bold => 1,
1003                    FaceWeight::Light => 2,
1004                };
1005                let b: u8 = match s {
1006                    FaceSlope::Roman => 0,
1007                    FaceSlope::Italic => 1,
1008                };
1009                let c: u8 = match c {
1010                    FaceExpansion::Regular => 0,
1011                    FaceExpansion::Condensed => 1,
1012                    FaceExpansion::Extended => 2,
1013                };
1014                c * 6 + a * 2 + b
1015            }
1016            Face::Other(b) => b,
1017        }
1018    }
1019}
1020
1021/// A named TeX font metric parameter.
1022#[derive(PartialEq, Eq, Debug, Copy, Clone)]
1023#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1024pub enum NamedParameter {
1025    Slant,
1026    Space,
1027    Stretch,
1028    Shrink,
1029    XHeight,
1030    Quad,
1031    ExtraSpace,
1032    Num1,
1033    Num2,
1034    Num3,
1035    Denom1,
1036    Denom2,
1037    Sup1,
1038    Sup2,
1039    Sup3,
1040    Sub1,
1041    Sub2,
1042    SupDrop,
1043    SubDrop,
1044    Delim1,
1045    Delim2,
1046    AxisHeight,
1047    DefaultRuleThickness,
1048    BigOpSpacing1,
1049    BigOpSpacing2,
1050    BigOpSpacing3,
1051    BigOpSpacing4,
1052    BigOpSpacing5,
1053}
1054
1055impl NamedParameter {
1056    pub fn number(&self) -> u8 {
1057        match self {
1058            NamedParameter::Slant => 1,
1059            NamedParameter::Space => 2,
1060            NamedParameter::Stretch => 3,
1061            NamedParameter::Shrink => 4,
1062            NamedParameter::XHeight => 5,
1063            NamedParameter::Quad => 6,
1064            NamedParameter::ExtraSpace => 7,
1065            NamedParameter::Num1 => 8,
1066            NamedParameter::Num2 => 9,
1067            NamedParameter::Num3 => 10,
1068            NamedParameter::Denom1 => 11,
1069            NamedParameter::Denom2 => 12,
1070            NamedParameter::Sup1 => 13,
1071            NamedParameter::Sup2 => 14,
1072            NamedParameter::Sup3 => 15,
1073            NamedParameter::Sub1 => 16,
1074            NamedParameter::Sub2 => 17,
1075            NamedParameter::SupDrop => 18,
1076            NamedParameter::SubDrop => 19,
1077            NamedParameter::Delim1 => 20,
1078            NamedParameter::Delim2 => 21,
1079            NamedParameter::AxisHeight => 22,
1080            NamedParameter::DefaultRuleThickness => 8,
1081            NamedParameter::BigOpSpacing1 => 9,
1082            NamedParameter::BigOpSpacing2 => 10,
1083            NamedParameter::BigOpSpacing3 => 11,
1084            NamedParameter::BigOpSpacing4 => 12,
1085            NamedParameter::BigOpSpacing5 => 13,
1086        }
1087    }
1088}
1089
1090/// Warning from the compilation of "next larger character" instructions.
1091#[derive(PartialEq, Eq, Debug, Clone)]
1092pub enum NextLargerProgramWarning {
1093    NonExistentCharacter { original: Char, next_larger: Char },
1094    InfiniteLoop { original: Char, next_larger: Char },
1095}
1096
1097impl NextLargerProgramWarning {
1098    pub fn bad_char(&self) -> Char {
1099        match self {
1100            NextLargerProgramWarning::NonExistentCharacter {
1101                original,
1102                next_larger: _,
1103            } => *original,
1104            NextLargerProgramWarning::InfiniteLoop { original, .. } => *original,
1105        }
1106    }
1107
1108    /// Returns the warning message the TFtoPL program prints for this kind of error.
1109    pub fn tftopl_message(&self) -> String {
1110        match self {
1111            NextLargerProgramWarning::NonExistentCharacter {
1112                original: _,
1113                next_larger,
1114            } => {
1115                format![
1116                    "Bad TFM file: Character list link to nonexistent character '{:03o}.",
1117                    next_larger.0
1118                ]
1119            }
1120            NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1121                format!["Bad TFM file: Cycle in a character list!\nCharacter '{:03o} now ends the list.", original.0]
1122            }
1123        }
1124    }
1125
1126    /// Returns the section in Knuth's TFtoPL (version 2014) in which this warning occurs.
1127    pub fn tftopl_section(&self) -> u8 {
1128        84
1129    }
1130
1131    /// Returns the section in Knuth's PLtoTF (version 2014) in which this warning occurs.
1132    pub fn pltotf_section(&self) -> u8 {
1133        match self {
1134            NextLargerProgramWarning::NonExistentCharacter { .. } => 111,
1135            NextLargerProgramWarning::InfiniteLoop { .. } => 113,
1136        }
1137    }
1138
1139    /// Returns the warning message the PLtoTF program prints for this kind of error.
1140    pub fn pltotf_message(&self) -> String {
1141        match self {
1142            NextLargerProgramWarning::NonExistentCharacter {
1143                original,
1144                next_larger: _,
1145            } => {
1146                format![
1147                    "The character NEXTLARGER than '{:03o} had no CHARACTER spec.",
1148                    original.0
1149                ]
1150            }
1151            NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1152                format![
1153                    "A cycle of NEXTLARGER characters has been broken at '{:03o}.",
1154                    original.0
1155                ]
1156            }
1157        }
1158    }
1159}
1160/// Compiled program of "next larger character" instructions
1161///
1162/// The .tfm file format can associate a "next larger" character to any character in a font.
1163/// Next larger characters form sequences: i.e. B can be the next larger character for A,
1164///     and C can be the next larger character for B,
1165///     leading to the sequences A-B-C.
1166/// These next larger characters are used at certain points in TeX.
1167/// TeX occasionally traverses the entire sequence for a given starting character (e.g. A).
1168///
1169/// As with ligatures, next larger specifications can contain infinite loops -
1170///     e.g, if X is the next larger character for Y
1171///      and Y is the next larger character for X.
1172/// These loops are invalid and removed by TFtoPL and PLtoTF.
1173///
1174/// Motivated by the idea of "parse don't validate", this type represents
1175///     a compiled version of the next larger specifications in which infinite loops
1176///     are statically guaranteed not to exist.
1177///
1178/// The basic use of a valid program looks like this:
1179///
1180/// ```
1181/// # use tfm::*;
1182/// let edges = vec![
1183///     (Char::A, Char::B),
1184///     (Char::B, Char::C),
1185/// ];
1186/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1187///
1188/// assert_eq![warnings, vec![]];
1189///
1190/// let sequence_A: Vec<Char> = next_larger_program.get(Char::A).collect();
1191/// assert_eq!(sequence_A, vec![Char::B, Char::C]);
1192///
1193/// let sequence_B: Vec<Char> = next_larger_program.get(Char::B).collect();
1194/// assert_eq!(sequence_B, vec![Char::C]);
1195///
1196/// let sequence_C: Vec<Char> = next_larger_program.get(Char::C).collect();
1197/// assert_eq!(sequence_C, vec![]);
1198///
1199/// // Character that is not in the program.
1200/// let sequence_D: Vec<Char> = next_larger_program.get(Char::D).collect();
1201/// assert_eq!(sequence_D, vec![]);
1202/// ```
1203///
1204/// ## Warnings
1205///
1206/// There are two types of error that can occur when constructing the next
1207///     larger program.
1208/// Both of these errors are handled gracefully, so we officially refer to them as warnings.
1209/// The constructor returns them as values of type [`NextLargerProgramWarning`].
1210///
1211/// ### Infinite loops
1212///
1213/// The first error is that next larger programs can contain infinite loops -
1214///     e.g, if X is the next larger character for Y
1215///      and Y is the next larger character for X.
1216/// In this case the loop is broken by removing the next larger program for the
1217///     character with the largest 8-bit code, in this case Y.
1218/// A [`NextLargerProgramWarning::InfiniteLoop`] warning is returned
1219///     from the program constructor.
1220///
1221/// ```
1222/// # use tfm::*;
1223/// let edges = vec![
1224///     (Char::X, Char::Y),
1225///     (Char::Y, Char::X),
1226/// ];
1227/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1228///
1229/// assert_eq!(warnings, vec![NextLargerProgramWarning::InfiniteLoop{
1230///     original: Char::Y,
1231///     next_larger: Char::X,
1232/// }]);
1233///
1234/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1235/// assert_eq!(sequence_X, vec![Char::Y]);
1236///
1237/// let sequence_Y: Vec<Char> = next_larger_program.get(Char::Y).collect();
1238/// assert_eq!(sequence_Y, vec![]);
1239/// ```
1240///
1241/// ### Non-existent characters
1242///
1243/// The second error is that characters referred to in the next larger program
1244///     may not be defined in the .tfm or .pl file.
1245/// For example, a .pl file may contain the snippet `(CHARACTER C X (NEXTLARGER C Y))`
1246///     without defining the character Y.
1247/// The constructor [`NextLargerProgram::new`] accepts a function for checking if a
1248///     character exists.
1249/// In all cases a [`NextLargerProgramWarning::NonExistentCharacter`] warning is returned
1250///     if a non-existent character is encountered.
1251///
1252/// The behavior of the resulting next larger program is configured using the
1253///     `drop_non_existent_characters` argument.
1254/// If this is false, then the behavior is the same as PLtoTF and the program still
1255///     contains the character.
1256///
1257/// ```
1258/// # use tfm::*;
1259/// let edges = vec![
1260///     (Char::X, Char::Y),
1261/// ];
1262/// let character_exists = |c| {
1263///     if c == Char::Y {
1264///         false
1265///     } else {
1266///         true
1267///     }
1268/// };
1269/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, false);
1270///
1271/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1272///     original: Char::X,
1273///     next_larger: Char::Y,
1274/// }]);
1275///
1276/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1277/// assert_eq!(sequence_X, vec![Char::Y]);
1278/// ```
1279///
1280/// If `drop_non_existent_characters` is true, next larger instructions pointing at non-existent
1281///     characters are dropped.
1282/// This is how TFtoPL behaves.
1283///
1284///
1285/// ```
1286/// # use tfm::*;
1287/// let edges = vec![
1288///     (Char::X, Char::Y),
1289/// ];
1290/// let character_exists = |c| {
1291///     if c == Char::Y {
1292///         false
1293///     } else {
1294///         true
1295///     }
1296/// };
1297/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, true);
1298///
1299/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1300///     original: Char::X,
1301///     next_larger: Char::Y,
1302/// }]);
1303///
1304/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1305/// assert_eq!(sequence_X, vec![]);
1306/// ```
1307///
1308#[derive(Clone, Debug)]
1309pub struct NextLargerProgram {
1310    entrypoints: HashMap<Char, u8>,
1311    next_larger: Vec<(Char, NonZeroU8)>,
1312}
1313
1314impl NextLargerProgram {
1315    /// Build a new next larger program from an iterator over edges.
1316    pub fn new<I: Iterator<Item = (Char, Char)>, F: Fn(Char) -> bool>(
1317        edges: I,
1318        character_exists: F,
1319        drop_non_existent_characters: bool,
1320    ) -> (Self, Vec<NextLargerProgramWarning>) {
1321        // This function implements functionality in TFtoPL.2014.84 and PLtoTF.2014.{110,111,113}.
1322        let mut warnings: Vec<NextLargerProgramWarning> = vec![];
1323
1324        let mut node_to_larger = HashMap::<Char, Char>::new();
1325        let mut node_to_num_smaller = HashMap::<Char, usize>::new();
1326        for (smaller, larger) in edges {
1327            if !character_exists(larger) {
1328                warnings.push(NextLargerProgramWarning::NonExistentCharacter {
1329                    original: smaller,
1330                    next_larger: larger,
1331                });
1332                if drop_non_existent_characters {
1333                    continue;
1334                }
1335            }
1336            node_to_larger.insert(smaller, larger);
1337            node_to_num_smaller.entry(smaller).or_default();
1338            *node_to_num_smaller.entry(larger).or_default() += 1;
1339        }
1340
1341        let mut leaves: Vec<Char> = vec![];
1342        let mut non_leaves: BTreeSet<Char> = Default::default();
1343        for (node, num_smaller) in &node_to_num_smaller {
1344            if *num_smaller == 0 {
1345                leaves.push(*node);
1346            } else {
1347                non_leaves.insert(*node);
1348            }
1349        }
1350
1351        let mut sorted_chars = vec![];
1352        let mut infinite_loop_warnings = vec![];
1353        loop {
1354            while let Some(smaller) = leaves.pop() {
1355                if let Some(larger) = node_to_larger.get(&smaller).copied() {
1356                    let num_smaller = node_to_num_smaller
1357                        .get_mut(&larger)
1358                        .expect("`node_to_num_smaller` contains all nodes");
1359                    *num_smaller = num_smaller
1360                        .checked_sub(1)
1361                        .expect("the larger of a smaller must have at least one smaller");
1362                    if *num_smaller == 0 {
1363                        leaves.push(larger);
1364                        non_leaves.remove(&larger);
1365                    }
1366                }
1367                sorted_chars.push(smaller);
1368            }
1369
1370            // There could be pending nodes left because of a cycle.
1371            // We break the cycle at the largest node.
1372            let smaller = match non_leaves.last() {
1373                None => break,
1374                Some(child) => *child,
1375            };
1376            let larger = node_to_larger.remove(&smaller).expect(
1377                "General graph fact: sum_(node)#in_edges(node)=sum_(node)#out_edges(node).
1378                Fact about the next larger graph: #out_edges(node)<=1, because each char has at most one next larger char.
1379                If #out_edge(child)=0 then sum_(node)#in_edges(node)=sum_(node)#out_edges(node) < #nodes.
1380                Thus there exists another node with #in_edges(node)=0 and that node is a leaf.
1381                But `leaves.len()=0` at this line of code",
1382            );
1383            infinite_loop_warnings.push(NextLargerProgramWarning::InfiniteLoop {
1384                original: smaller,
1385                next_larger: larger,
1386            });
1387            leaves.push(larger);
1388            non_leaves.remove(&larger);
1389        }
1390        warnings.extend(infinite_loop_warnings.into_iter().rev());
1391
1392        let next_larger = {
1393            let parents: HashSet<Char> = node_to_larger.values().copied().collect();
1394            let mut node_to_position = HashMap::<Char, u8>::new();
1395            let mut next_larger: Vec<(Char, NonZeroU8)> = vec![];
1396
1397            for c in sorted_chars.iter().rev() {
1398                if !parents.contains(c) {
1399                    continue;
1400                }
1401                // The present character is the parent of at least one child, aka it is the next larger
1402                // character for another character. So it needs to be in the next_larger array.
1403                let child_position: u8 = next_larger
1404                    .len()
1405                    .try_into()
1406                    .expect("there are at most u8::MAX chars in the `next_larger` array");
1407                let offset = match node_to_larger.get(c) {
1408                    // The next_larger array contains at most 256 elements: one for each char.
1409                    // (Actually it contains at most 255 because one character necessarily does not
1410                    // have a child node and this character does not appear in the array.)
1411                    // Anyway, an offset of 256 sends the index outside the array bound, and so
1412                    // subsequent calls to iterator return None.
1413                    None => NonZeroU8::MAX,
1414                    Some(parent) => {
1415                        let parent_position = *node_to_position
1416                            .get(parent)
1417                            .expect("parent has already been inserted");
1418                        child_position
1419                            .checked_sub(parent_position)
1420                            .expect("parent inserted before so its position it is strictly smaller")
1421                            .try_into()
1422                            .expect("parent inserted before so its position it is strictly smaller")
1423                    }
1424                };
1425                next_larger.push((*c, offset));
1426                node_to_position.insert(*c, child_position);
1427            }
1428            next_larger.reverse();
1429            next_larger
1430        };
1431
1432        let entrypoints = {
1433            let node_to_position: HashMap<Char, u8> = next_larger
1434                .iter()
1435                .enumerate()
1436                .map(|(i, (c, _))| {
1437                    let u: u8 = i
1438                        .try_into()
1439                        .expect("there are at most u8::MAX chars in the `next_larger` array");
1440                    (*c, u)
1441                })
1442                .collect();
1443            let mut entrypoints = HashMap::<Char, u8>::new();
1444            for c in sorted_chars.iter().rev() {
1445                if let Some(parent) = node_to_larger.get(c) {
1446                    entrypoints.insert(
1447                        *c,
1448                        *node_to_position
1449                            .get(parent)
1450                            .expect("parent has already been inserted"),
1451                    );
1452                }
1453            }
1454            entrypoints
1455        };
1456
1457        (
1458            Self {
1459                entrypoints,
1460                next_larger,
1461            },
1462            warnings,
1463        )
1464    }
1465
1466    /// Get the next larger sequence for a character
1467    pub fn get(&self, c: Char) -> impl Iterator<Item = Char> + '_ {
1468        NextLargerProgramIter {
1469            current: self.entrypoints.get(&c).copied(),
1470            program: self,
1471        }
1472    }
1473
1474    /// Returns whether this program is seven-bit safe.
1475    ///
1476    /// A next larger program is seven-bit safe if the next larger sequences for
1477    ///     seven-bit characters only contain seven-bit characters.
1478    /// Conversely a program is seven-bit unsafe if there is a seven-bit
1479    ///     character whose next larger sequence contains a non-seven-bit character.
1480    ///
1481    /// ```
1482    /// # use tfm::*;
1483    /// let edges = vec![
1484    ///     (Char(250), Char(125)),
1485    ///     (Char(125), Char(126)),
1486    /// ];
1487    /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1488    /// assert_eq!(true, next_larger_program.is_seven_bit_safe());
1489    ///
1490    /// let edges = vec![
1491    ///     (Char(125), Char(250)),
1492    /// ];
1493    /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1494    /// assert_eq!(false, next_larger_program.is_seven_bit_safe());
1495    /// ```
1496    pub fn is_seven_bit_safe(&self) -> bool {
1497        // For each c, we only need to check the first element in c's next larger sequence.
1498        // If there is a subsequent element d of the sequence that is seven-bit unsafe,
1499        // we will find it when considering one of d's children.
1500        // This optimization makes this function O(n), rather than worst case O(n^2).
1501        self.entrypoints
1502            .keys()
1503            .copied()
1504            .filter(Char::is_seven_bit)
1505            .filter_map(|c| self.get(c).next())
1506            .all(|c| c.is_seven_bit())
1507    }
1508}
1509
1510#[derive(Clone, Debug)]
1511struct NextLargerProgramIter<'a> {
1512    current: Option<u8>,
1513    program: &'a NextLargerProgram,
1514}
1515
1516impl<'a> Iterator for NextLargerProgramIter<'a> {
1517    type Item = Char;
1518
1519    fn next(&mut self) -> Option<Self::Item> {
1520        // Note that the iterator is statically guaranteed to terminate!
1521        // If it returns None, it has already terminated.
1522        // If it returns Some, then self.current will be incremented by a strictly
1523        //  positive number.
1524        // Incrementing like this can only happen a finite number of times before overflow
1525        //  occurs, and then self.current is None and the iterator is terminated.
1526        match self.current {
1527            None => None,
1528            Some(current) => match self.program.next_larger.get(current as usize) {
1529                None => None,
1530                Some((c, inc)) => {
1531                    self.current = current.checked_add(inc.get());
1532                    Some(*c)
1533                }
1534            },
1535        }
1536    }
1537}
1538
1539impl core::FontFormat for File {
1540    const DEFAULT_FILE_EXTENSION: &'static str = "tfm";
1541    type Error = DeserializationError;
1542
1543    fn parse(b: &[u8]) -> Result<Self, Self::Error> {
1544        File::deserialize(b).0
1545    }
1546}
1547
1548/// Fixed-width numeric type used in TFM files.
1549///
1550/// This numeric type has 11 bits for the integer part,
1551/// 20 bits for the fractional part, and a single signed bit.
1552/// The inner value is the number multiplied by 2^20.
1553/// It is called a `fix_word` in TFtoPL.
1554///
1555/// In property list files, this type is represented as a decimal number
1556///   with up to 6 digits after the decimal point.
1557/// This is a non-lossy representation
1558///   because 10^(-6) is larger than 2^(-20).
1559#[derive(Default, PartialEq, Eq, Debug, Copy, Clone, PartialOrd, Ord, Hash)]
1560#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1561pub struct FixWord(pub i32);
1562
1563impl FixWord {
1564    /// Representation of the number 0 as a [FixWord].
1565    pub const ZERO: FixWord = FixWord(0);
1566
1567    /// Representation of the number 1 as a [FixWord].
1568    pub const ONE: FixWord = FixWord(1 << 20);
1569
1570    /// Returns true if the number is less than 16.0 in magnitude according to Knuth.
1571    ///
1572    /// The number +16.0 is not allowed.
1573    /// This is covered in the E2E tests.
1574    /// See `check_fix` in TFtoPL.2014.60.
1575    pub fn is_abs_less_than_16(&self) -> bool {
1576        *self >= FixWord::ONE * -16 && *self < FixWord::ONE * 16
1577    }
1578
1579    /// Calculates the scaled value of this number at the specified design size.
1580    ///
1581    /// All dimensions in a TFM file are specified as multiples of the design size
1582    /// (with the exception of the design size itself).
1583    /// The scaled value (i.e. the value is scaled points) is the TFM value multiplied
1584    /// by the design size.
1585    ///
1586    /// The raw value and the design size both have 20 binary fractional bits,
1587    /// so the product has 40 fractional bits.
1588    /// A scaled value has only 16 fractional bits, and how the truncation is performed
1589    /// can effect the result.
1590    /// The specific multiplication algorithm Knuth wrote (TeX.2021.571-572)
1591    /// truncates the design size first and then performs the multiplication
1592    /// and then truncates again.
1593    /// Additionally, the integer division in that algorithm rounds towards zero
1594    /// even for negative numbers (e.g. -1.25 rounds to -1 instead of -2).
1595    ///
1596    /// In order to ensure the result here matched Knuth's algorithm, we reimplement
1597    /// Knuth's algorithm.
1598    pub fn to_scaled(self, design_size: FixWord) -> core::Scaled {
1599        // TeX.2021.568
1600        let mut z = core::Scaled(design_size.0 / 16);
1601        // TeX.2021.572
1602        let mut alpha = 16;
1603        // Ensure that z < 2^23 sp
1604        while z.0 >= 0o40000000 {
1605            z /= 2;
1606            alpha *= 2;
1607        }
1608        let beta = 256 / alpha; // beta = 16 * z / design_size
1609        let alpha = z * alpha; // alpha = 16 * design_size
1610
1611        // TeX.2021.571 (store_scaled)
1612        let [a, b, c, d] = self.0.to_be_bytes();
1613        assert!(a == 0 || a == 255);
1614        let sw = (((z * (d as i32)) / 0o400 + (z * (c as i32))) / 0o400 + z * (b as i32)) / beta;
1615        if a == 255 {
1616            // In this case self < 0.
1617            // We have calculated sw using only the 3 least significant bytes of self,
1618            // which is equivalent to setting self=16+self at the start. We then undo
1619            // this by setting sw = sw - 16 * design_size.
1620            sw - alpha
1621        } else {
1622            sw
1623        }
1624    }
1625}
1626
1627impl std::fmt::Display for FixWord {
1628    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
1629        // TFtoPL.2014.40-43
1630        if self.0 < 0 {
1631            write!(f, "-")?;
1632        }
1633        let integer_part = (self.0 / FixWord::ONE.0).abs();
1634        write!(f, "{integer_part}.")?;
1635        let mut fp = (self.0 % FixWord::ONE.0).abs();
1636        fp = 10 * fp + 5;
1637        let mut delta = 10;
1638        loop {
1639            if delta > 0o4_000_000 {
1640                fp = fp + 0o2_000_000 - delta / 2;
1641            }
1642            write!(f, "{}", fp / 0o4_000_000)?;
1643            fp = 10 * (fp % 0o4_000_000);
1644            delta *= 10;
1645            if fp <= delta {
1646                break;
1647            }
1648        }
1649        Ok(())
1650    }
1651}
1652
1653impl std::ops::Add<FixWord> for FixWord {
1654    type Output = FixWord;
1655    fn add(self, rhs: FixWord) -> Self::Output {
1656        FixWord(self.0 + rhs.0)
1657    }
1658}
1659impl std::ops::Sub<FixWord> for FixWord {
1660    type Output = FixWord;
1661    fn sub(self, rhs: FixWord) -> Self::Output {
1662        FixWord(self.0 - rhs.0)
1663    }
1664}
1665
1666impl std::ops::Mul<i32> for FixWord {
1667    type Output = FixWord;
1668
1669    fn mul(self, rhs: i32) -> Self::Output {
1670        FixWord(self.0 * rhs)
1671    }
1672}
1673
1674impl std::ops::Div<i32> for FixWord {
1675    type Output = FixWord;
1676
1677    fn div(self, rhs: i32) -> Self::Output {
1678        FixWord(self.0 / rhs)
1679    }
1680}
1681
1682#[cfg(test)]
1683mod tests {
1684    use super::*;
1685
1686    #[test]
1687    fn to_scaled_test() {
1688        assert_eq!(core::Scaled::ZERO, FixWord::ZERO.to_scaled(FixWord::ONE));
1689        assert_eq!(core::Scaled::ONE, FixWord::ONE.to_scaled(FixWord::ONE));
1690    }
1691
1692    fn run_compress_test(
1693        values: Vec<FixWord>,
1694        max_size: u8,
1695        want: Vec<FixWord>,
1696        want_map: Vec<u8>,
1697    ) {
1698        let (got, got_map) = compress(&values, max_size);
1699        assert_eq!(got, want);
1700        let want_map: HashMap<FixWord, NonZeroU8> = want_map
1701            .into_iter()
1702            .enumerate()
1703            .map(|(i, t)| (values[i], t.try_into().unwrap()))
1704            .collect();
1705        assert_eq!(got_map, want_map);
1706    }
1707
1708    macro_rules! compress_tests {
1709        ( $( ($name: ident, $values: expr, $max_size: expr, $want: expr, $want_map: expr, ), )+ ) => {
1710            $(
1711                #[test]
1712                fn $name () {
1713                    let values = $values;
1714                    let max_size = $max_size;
1715                    let want = $want;
1716                    let want_map = $want_map;
1717                    run_compress_test(values, max_size, want, want_map);
1718                }
1719            )+
1720        };
1721    }
1722
1723    compress_tests!(
1724        (no_op_0, vec![], 1, vec![FixWord(0)], vec![],),
1725        (
1726            no_op_2,
1727            vec![FixWord::ONE * 2, FixWord::ONE],
1728            2,
1729            vec![FixWord(0), FixWord::ONE, FixWord::ONE * 2],
1730            vec![2, 1],
1731        ),
1732        (
1733            just_deduplication,
1734            vec![FixWord::ONE, FixWord::ONE],
1735            1,
1736            vec![FixWord(0), FixWord::ONE],
1737            vec![1, 1],
1738        ),
1739        (
1740            simple_compression_case,
1741            vec![FixWord::ONE, FixWord::ONE * 2],
1742            1,
1743            vec![FixWord(0), FixWord::ONE * 3 / 2],
1744            vec![1, 1],
1745        ),
1746        (
1747            simple_compression_case_2,
1748            vec![
1749                FixWord::ONE,
1750                FixWord::ONE * 2,
1751                FixWord::ONE * 200,
1752                FixWord::ONE * 201
1753            ],
1754            2,
1755            vec![FixWord(0), FixWord::ONE * 3 / 2, FixWord::ONE * 401 / 2],
1756            vec![1, 1, 2, 2],
1757        ),
1758        (
1759            lower_upper_close_edge_case_1,
1760            vec![FixWord(1), FixWord(3)],
1761            1,
1762            vec![FixWord(0), FixWord(2)],
1763            vec![1, 1],
1764        ),
1765        (
1766            lower_upper_close_edge_case_2,
1767            vec![FixWord(0), FixWord(2)],
1768            1,
1769            vec![FixWord(0), FixWord(1)],
1770            vec![1, 1],
1771        ),
1772        (
1773            lower_upper_close_edge_case_3,
1774            vec![FixWord(1), FixWord(4)],
1775            1,
1776            vec![FixWord(0), FixWord(2)],
1777            vec![1, 1],
1778        ),
1779        (
1780            lower_upper_close_edge_case_4,
1781            vec![FixWord(1), FixWord(2)],
1782            1,
1783            vec![FixWord(0), FixWord(1)],
1784            vec![1, 1],
1785        ),
1786    );
1787
1788    mod next_larger_tests {
1789        use super::*;
1790
1791        fn run(
1792            edges: Vec<(Char, Char)>,
1793            want_sequences: HashMap<Char, Vec<Char>>,
1794            want_warnings: Vec<NextLargerProgramWarning>,
1795        ) {
1796            let (program, got_warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1797            assert_eq!(got_warnings, want_warnings);
1798            for u in 0..=u8::MAX {
1799                let want_sequence = want_sequences
1800                    .get(&Char(u))
1801                    .map(Vec::as_slice)
1802                    .unwrap_or_default();
1803                let got_sequence: Vec<Char> = program.get(Char(u)).collect();
1804                assert_eq!(
1805                    got_sequence, want_sequence,
1806                    "got/want sequences for {u} do not match"
1807                );
1808            }
1809        }
1810
1811        fn big_infinite_loop_edges() -> Vec<(Char, Char)> {
1812            (0..=u8::MAX)
1813                .into_iter()
1814                .map(|u| (Char(u), Char(u.wrapping_add(1))))
1815                .collect()
1816        }
1817
1818        fn big_infinite_loop_sequences() -> HashMap<Char, Vec<Char>> {
1819            (0..=u8::MAX)
1820                .into_iter()
1821                .map(|u| {
1822                    let v: Vec<Char> = match u.checked_add(1) {
1823                        None => vec![],
1824                        Some(w) => (w..=u8::MAX).into_iter().map(Char).collect(),
1825                    };
1826                    (Char(u), v)
1827                })
1828                .collect()
1829        }
1830
1831        macro_rules! next_larger_tests {
1832            ( $( ($name: ident, $edges: expr, $want_sequences: expr, $want_warnings: expr, ), )+ ) => {
1833                $(
1834                    #[test]
1835                    fn $name () {
1836                        run($edges, $want_sequences, $want_warnings);
1837                    }
1838                )+
1839            };
1840        }
1841
1842        next_larger_tests!(
1843            (
1844                same_node_loop,
1845                vec![(Char::A, Char::A)],
1846                HashMap::from([(Char::A, vec![])]),
1847                vec![NextLargerProgramWarning::InfiniteLoop {
1848                    original: Char::A,
1849                    next_larger: Char::A,
1850                }],
1851            ),
1852            (
1853                two_loops,
1854                vec![
1855                    (Char::A, Char::B),
1856                    (Char::B, Char::C),
1857                    (Char::C, Char::B),
1858                    (Char::X, Char::Y),
1859                    (Char::Y, Char::Z),
1860                    (Char::Z, Char::X),
1861                ],
1862                HashMap::from([
1863                    (Char::A, vec![Char::B, Char::C]),
1864                    (Char::B, vec![Char::C]),
1865                    (Char::C, vec![]),
1866                    (Char::X, vec![Char::Y, Char::Z]),
1867                    (Char::Y, vec![Char::Z]),
1868                    (Char::Z, vec![]),
1869                ]),
1870                vec![
1871                    NextLargerProgramWarning::InfiniteLoop {
1872                        original: Char::C,
1873                        next_larger: Char::B,
1874                    },
1875                    NextLargerProgramWarning::InfiniteLoop {
1876                        original: Char::Z,
1877                        next_larger: Char::X,
1878                    },
1879                ],
1880            ),
1881            (
1882                path_leading_to_loop,
1883                vec![(Char::A, Char::B), (Char::B, Char::C), (Char::C, Char::B),],
1884                HashMap::from([
1885                    (Char::A, vec![Char::B, Char::C]),
1886                    (Char::B, vec![Char::C]),
1887                    (Char::C, vec![]),
1888                ]),
1889                vec![NextLargerProgramWarning::InfiniteLoop {
1890                    original: Char::C,
1891                    next_larger: Char::B,
1892                }],
1893            ),
1894            (
1895                big_infinite_loop,
1896                big_infinite_loop_edges(),
1897                big_infinite_loop_sequences(),
1898                vec![NextLargerProgramWarning::InfiniteLoop {
1899                    original: Char(u8::MAX),
1900                    next_larger: Char(0),
1901                }],
1902            ),
1903        );
1904    }
1905}