tfm/
lib.rs

1//! # tfm: TeX font metric data
2//!
3//! This is a crate for working with TeX font metric data.
4//! It includes:
5//!
6//! - Functions to read and write TeX font metric (.tfm) files
7//!   to and from a value of type [`File`]
8//!   ([`deserialize`](File::deserialize), [`serialize`](File::serialize)).
9//!
10//! - Functions to read and write property list (.pl) files
11//!   to and from a value of type [`pl::File`]
12//!   ([`from_pl_source_code`](pl::File::from_pl_source_code), [`display`](pl::File::display)).
13//!
14//! - Converters from .tfm to .pl files and vice-versa
15//!   (using Rust's [`From`] trait to go between [`File`] and [`pl::File`]).
16//!
17//! ## Background
18//!
19//! Probably the most famous part of the implementation of TeX is the Knuth-Plass line breaking algorithm.
20//! This algorithm determines the "optimal" places to add line breaks when typesetting a paragraph of text.
21//! In order to run the algorithm one needs to provide the dimensions of all characters in the current font.
22//! These dimensions are used to size the boxes in the Knuth-Plass box and glue model.
23//!
24//! In TeX, character dimensions are provided using TeX font metric files.
25//! These are binary files.
26//! By convention they have a .tfm file extension.
27//! Unlike more modern file formats like TrueType, .tfm files only contain the font dimensions;
28//!     they don't contains the glyphs.
29//! In general,
30//!     .tfm files are produced by other software like Metafont,
31//!     placed in some well-known directory in the TeX distribution,
32//!     and then read into memory when TeX is running.
33//!
34//! Because .tfm files are binary files, it's hard to debug or tweak them.
35//! To remedy this, Knuth and his team developed another file format called a property list file
36//!     (extension .pl or .plst)
37//!     that contains the same information but in a modifiable text format.
38//! They then wrote two programs:
39//!     `tftopl` to convert a .tfm file to a .pl file,
40//!     and `pltotf` to convert a .pl file to a .tfm file.
41//!
42//! The general goal of this crate to fully re-implement all of the TeX font metric
43//!     code written by Knuth and others.
44//! This includes `tftopl`, `pltotf`, and also the parts of TeX itself that contain logic
45//!     for reading and interpreting .tfm files.
46//! However, unlike these monolithic software programs,
47//!     this re-implementation is in the form of a modular library in which
48//!     individual pieces of logic and be used and re-used.
49//!
50//! ## Basic example
51//!
52//! ```
53//! // Include the .tfm file for Computer Modern in size 10pt.
54//! let tfm_bytes = include_bytes!["../corpus/computer-modern/cmr10.tfm"];
55//!
56//! // Deserialize the .tfm file.
57//! let (tfm_file_or_error, deserialization_warnings) = tfm::File::deserialize(tfm_bytes);
58//! let mut tfm_file = tfm_file_or_error.expect("cmr10.tfm is a valid .tfm file");
59//! assert_eq![deserialization_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no deserialization warnings"];
60//! // TODO assert_eq![tfm_file.header.design_size, tfm::FixWord::UNITY * 10]; make it 11 to be more interesting
61//! // TODO query some data
62//!
63//! // Validate the .tfm file.
64//! let validation_warnings = tfm_file.validate_and_fix();
65//! assert_eq![validation_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no validation warnings"];
66//!
67//! // Convert the .tfm file to a .pl file and print it.
68//! let pl_file: tfm::pl::File = tfm_file.clone().into();
69//! // TODO query some data
70//! println!["cmr10.pl:\n{}", pl_file.display(/*indent=*/2, tfm::pl::CharDisplayFormat::Default)];
71//! ```
72//!
73//!
74//! ## Advanced functionality
75//!
76//! In addition to supporting the basic use cases of querying font metric data
77//!     and converting between different formats,
78//!     this crate has advanced functionality for performing additional tasks on font metric data.
79//! The full set of functionality can be understood by navigating through the crate documentation.
80//! But here are 3 highlights we think are interesting:
81//!
82//! - **Language analysis of .pl files**:
83//!   In `pltotf`, Knuth parses .pl files in a single pass.
84//!   This crate takes a common approach nowadays of parsing in multiple passes:
85//!   first constructing a [concrete syntax tree](pl::cst::Cst) (or parse tree),
86//!   next constructing a [fully typed and checked abstract syntax tree](pl::ast::Ast),
87//!   and finally building the [`pl::File`] itself.
88//!   Each of the passes is exposed, so you can e.g. just build the AST for the .pl file and
89//!   do some analysis on it.
90//!
91//! - **Debug output for .tfm files**:
92//!     
93//! - **Compilation of lig/kern programs**:
94//!
95//!
96//! ## Binaries
97//!
98//! The Texcraft project produces 3 binaries based on this crate:
99//!
100//! - `tftopl` and `pltotf`: re-implementations of Knuth's programs.
101//! - `tfmtools`: a new binary that has a bunch of different tools
102//!   for working with TeX font metric data.
103//!   Run `tfmtools help` to list all of the available tools.
104//!
105//! In the root of [the Texcraft repository](https://github.com/jamespfennell/texcraft)
106//!     these tools can be run with `cargo run --bin $NAME`
107//!     and built with `cargo build --bin $NAME`.
108//!
109//!
110//! ## Correctness
111//!
112//! As part of the development of this crate significant effort has been spent
113//!     ensuring it exactly replicates the work of Knuth.
114//! This correctness checking is largely based around diff testing the binaries
115//!     `tftopl` and `pltotf`.
116//! We verify that the Texcraft and Knuth implementations have the same output
117//!     and generate the same error messages.
118//!
119//! This diff testing has been performed in a few different ways:
120//!
121//! - We have run diff tests over all ~100,000 .tfm files in CTAN.
122//!   These tests verify that `tftopl` gives the correct result,
123//!   and that running `pltotf` on the output .pl file gives the correct result too.
124//!   Unfortunately running `pltotf` on the .pl files in CTAN is infeasible
125//!   because most of these files are Perl scripts, not property list files.
126//!
127//! - We have developed a fuzz testing harness (so far just for `tftopl`)
128//!   that generates highly heterogenous .tfm files and verifies that `tftopl` gives the correct result.
129//!   This fuzz testing has uncovered many issues in the Texcraft implementation,
130//!   and has even identified [a 30-year old bug](https://tug.org/pipermail/tex-k/2024-March/004031.html)
131//!   in Knuth's implementation of `tftopl`.
132//!
133//! Any .tfm or .pl file that exposes a bug in this library is added to
134//!     [our automated testing corpus](https://github.com/jamespfennell/texcraft/tree/main/crates/tfm/bin/tests/data).
135//! Running `cargo t` validates that Texcraft's binaries give the same result as Knuth's binaries
136//!     (the output of Knuth's binaries is in source control).
137//! This ensures there are no regressions.
138//!
139//! If you discover a .tfm or .pl file such that the Texcraft and Knuth implementations
140//!     diverge, this indicates there is a bug in this library.
141//! Please create an issue on the Texcraft GitHub repo.
142//! We will fix the bug and add your files to the testing corpus.
143
144pub mod algorithms;
145use std::{
146    collections::{BTreeSet, HashMap, HashSet},
147    num::NonZeroU8,
148};
149pub mod ligkern;
150pub mod pl;
151
152use std::collections::BTreeMap;
153
154mod debug;
155mod deserialize;
156mod serialize;
157mod validate;
158
159pub use deserialize::DeserializationError;
160pub use deserialize::DeserializationWarning;
161pub use deserialize::RawFile;
162pub use deserialize::SubFileSizes;
163pub use validate::ValidationWarning;
164
165/// Complete contents of a TeX font metric (.tfm) file.
166///
167/// The struct contain multiple vectors.
168/// In TeX and TFtoPL there is an optimization in which all of data in the vectors
169/// is stored in one large vector of 32-bit integers.
170/// The conversion from [u32] to the specific types like [FixWord] are then done when the
171/// data is needed.
172/// This makes the memory footprint of this type much more compact,
173///     and such a change may be considered in the future.
174///
175/// In fact in TeX the font data for all fonts is stored in one contiguous piece of memory
176///     (`font_info`, defined in TeX82.2021.549).
177/// This is a little too unsafe to pull off though.
178#[derive(Clone, Debug, PartialEq, Eq)]
179pub struct File {
180    /// Header.
181    pub header: Header,
182
183    /// Smallest character in the .tfm.
184    pub smallest_char: Char,
185
186    /// Character dimensions.
187    pub char_dimens: BTreeMap<Char, CharDimensions>,
188
189    /// Character tags.
190    ///
191    /// Note there is no correlation between a character having
192    ///     a tag and a character having a dimension.
193    /// All four combinations of (has or hasn't dimensions) and (has or hasn't a tag)
194    ///     are possible.
195    pub char_tags: BTreeMap<Char, CharTag>,
196
197    /// Tags that have been unset, but whose discriminant is still written to a .tfm file by PLtoTF.
198    pub unset_char_tags: BTreeMap<Char, u8>,
199
200    /// Character widths
201    pub widths: Vec<FixWord>,
202
203    /// Character heights
204    pub heights: Vec<FixWord>,
205
206    /// Character depths
207    pub depths: Vec<FixWord>,
208
209    /// Character italic corrections
210    pub italic_corrections: Vec<FixWord>,
211
212    /// Lig kern program.
213    pub lig_kern_program: ligkern::lang::Program,
214
215    /// Kerns. These are referenced from inside the lig kern commands.
216    pub kerns: Vec<FixWord>,
217
218    /// Extensible characters.
219    pub extensible_chars: Vec<ExtensibleRecipe>,
220
221    /// Font parameters.
222    pub params: Vec<FixWord>,
223}
224
225/// Data about one character in a .tfm file.
226#[derive(Clone, Debug, PartialEq, Eq)]
227pub struct CharDimensions {
228    /// Index of the width of this character in the widths array.
229    ///
230    /// In valid TFM files, this index will always be non-zero.
231    /// This is because if the width index is zero it means there is no data for the character in the file.
232    /// In this case the other indices (height, etc.) are necessarily zero.
233    /// See TFtoPL.2014.? (where blocks of data with a zero width index are skipped)
234    ///     and PLtoTF.2014.? (where missing characters are written with all indices 0).
235    ///
236    /// There is one edge case where this index can be zero.
237    /// This is if the width index is invalid.
238    /// In this case tftopl essentially sets the width index to 0.
239    ///
240    /// Note that even if a character doesn't have dimensions, it can still have a tag.
241    pub width_index: WidthIndex,
242    /// Index of the height of this character in the height array.
243    pub height_index: u8,
244    pub depth_index: u8,
245    pub italic_index: u8,
246}
247
248#[derive(Copy, Clone, Debug, PartialEq, Eq)]
249pub enum WidthIndex {
250    Invalid,
251    Valid(NonZeroU8),
252}
253
254impl WidthIndex {
255    pub fn valid(&self) -> Option<NonZeroU8> {
256        match self {
257            WidthIndex::Invalid => None,
258            WidthIndex::Valid(n) => Some(*n),
259        }
260    }
261    pub fn get(&self) -> u8 {
262        match self {
263            WidthIndex::Invalid => 0,
264            WidthIndex::Valid(n) => n.get(),
265        }
266    }
267}
268
269/// Tag of a character in a .tfm file.
270#[derive(Clone, Debug, PartialEq, Eq)]
271pub enum CharTag {
272    Ligature(u8),
273    List(Char),
274    Extension(u8),
275}
276
277impl CharTag {
278    pub fn ligature(&self) -> Option<u8> {
279        match self {
280            CharTag::Ligature(l) => Some(*l),
281            CharTag::List(_) | CharTag::Extension(_) => None,
282        }
283    }
284    pub fn list(&self) -> Option<Char> {
285        match self {
286            CharTag::List(c) => Some(*c),
287            CharTag::Ligature(_) | CharTag::Extension(_) => None,
288        }
289    }
290    pub fn extension(&self) -> Option<u8> {
291        match self {
292            CharTag::Extension(u) => Some(*u),
293            CharTag::Ligature(_) | CharTag::List(_) => None,
294        }
295    }
296}
297
298/// Extensible recipe instruction in a .tfm file.
299#[derive(Debug, Default, Clone, PartialEq, Eq)]
300pub struct ExtensibleRecipe {
301    pub top: Option<Char>,
302    pub middle: Option<Char>,
303    pub bottom: Option<Char>,
304    pub rep: Char,
305}
306
307impl ExtensibleRecipe {
308    pub fn is_seven_bit(&self) -> bool {
309        self.chars().all(|c| c.is_seven_bit())
310    }
311
312    pub fn chars(&self) -> impl Iterator<Item = Char> {
313        [self.top, self.middle, self.bottom, Some(self.rep)]
314            .into_iter()
315            .flatten()
316    }
317}
318
319impl Default for File {
320    fn default() -> Self {
321        Self {
322            header: Default::default(),
323            smallest_char: Char(1),
324            char_dimens: Default::default(),
325            char_tags: Default::default(),
326            unset_char_tags: Default::default(),
327            widths: vec![FixWord::ZERO],
328            heights: vec![FixWord::ZERO],
329            depths: vec![FixWord::ZERO],
330            italic_corrections: vec![FixWord::ZERO],
331            lig_kern_program: Default::default(),
332            kerns: vec![],
333            extensible_chars: vec![],
334            params: Default::default(),
335        }
336    }
337}
338
339/// Print debugging information about a .tfm file.
340pub fn debug<'a>(
341    path: Option<&'a str>,
342    sub_file_sizes: SubFileSizes,
343    tfm_file: Option<&'a File>,
344    raw_file: Option<&'a RawFile<'a>>,
345    sections: Vec<Section>,
346) -> impl std::fmt::Display + 'a {
347    debug::Debug {
348        sub_file_sizes,
349        path,
350        raw_file,
351        tfm_file,
352        sections,
353    }
354}
355
356impl File {
357    pub fn from_raw_file(raw_file: &RawFile) -> Self {
358        deserialize::from_raw_file(raw_file)
359    }
360
361    pub fn validate_and_fix(&mut self) -> Vec<ValidationWarning> {
362        validate::validate_and_fix(self)
363    }
364
365    pub fn deserialize(
366        b: &[u8],
367    ) -> (
368        Result<File, DeserializationError>,
369        Vec<DeserializationWarning>,
370    ) {
371        deserialize::deserialize(b)
372    }
373
374    pub fn serialize(&self) -> Vec<u8> {
375        serialize::serialize(self)
376    }
377
378    pub fn replace_lig_kern_program(
379        &mut self,
380        mut program: ligkern::lang::Program,
381        entrypoints: HashMap<Char, u16>,
382    ) {
383        let entrypoints = program.pack_entrypoints(entrypoints);
384        self.kerns = program.unpack_kerns();
385        self.lig_kern_program = program;
386        self.char_tags
387            .retain(|_, tag| !matches!(tag, CharTag::Ligature(_)));
388        for (c, ep) in entrypoints {
389            self.char_tags.insert(c, CharTag::Ligature(ep));
390        }
391    }
392
393    /// Return a map from characters to the lig/kern entrypoint for that character.
394    /// TODO: can probably return impl Iterator<Item=(Char, u8)>
395    pub fn lig_kern_entrypoints(&self) -> HashMap<Char, u8> {
396        self.char_tags
397            .iter()
398            .filter_map(|(c, d)| match *d {
399                CharTag::Ligature(l) => Some((*c, l)),
400                _ => None,
401            })
402            .collect()
403    }
404
405    fn char_info_bounds(&self) -> Option<(Char, Char)> {
406        let mut r: Option<(Char, Char)> = None;
407        for c in self.char_dimens.keys().copied() {
408            r = Some(match r {
409                None => (c, c),
410                Some((lower, upper)) => (
411                    if c.0 < lower.0 { c } else { lower },
412                    if c.0 < upper.0 { upper } else { c },
413                ),
414            })
415        }
416        r
417    }
418
419    /// Calculate the checksum of this .tfm file.
420    pub fn checksum(&self) -> u32 {
421        // This checksum algorithm is in PLtoTF.2014.134.
422        let (bc, ec) = self.char_info_bounds().unwrap_or((Char(1), Char(0)));
423        let mut b = [bc.0, ec.0, bc.0, ec.0];
424        for c in bc.0..=ec.0 {
425            let char_dimens = match self.char_dimens.get(&Char(c)) {
426                None => continue,
427                Some(char_dimens) => char_dimens,
428            };
429            let width = self.widths[char_dimens.width_index.get() as usize].0;
430            // TODO: adjust based on the design units
431            let width = width + (c as i32 + 4) * 0o20_000_000;
432            let add = |b: u8, m: u8| -> u8 {
433                (((b as i32) + (b as i32) + width) % (m as i32))
434                    .try_into()
435                    .expect("(i32 % u8) is always a u8")
436            };
437            b = [
438                add(b[0], 255),
439                add(b[1], 253),
440                add(b[2], 251),
441                add(b[3], 247),
442            ]
443        }
444        u32::from_be_bytes(b)
445    }
446
447    pub fn named_param(&self, param: NamedParameter) -> Option<FixWord> {
448        self.params.get((param.number() - 1) as usize).copied()
449    }
450
451    pub fn named_param_scaled(&self, param: NamedParameter) -> Option<common::Scaled> {
452        self.named_param(param)
453            .map(|p| p.to_scaled(self.header.design_size))
454    }
455
456    pub fn width_utf8(&self, char: char) -> Option<common::Scaled> {
457        let char: Char = char.try_into().ok()?;
458        let dimens = self.char_dimens.get(&char)?;
459        let i = dimens.width_index.valid()?.get();
460        Some(
461            self.widths
462                .get(i as usize)?
463                .to_scaled(self.header.design_size),
464        )
465    }
466    pub fn height_utf8(&self, char: char) -> Option<common::Scaled> {
467        let char: Char = char.try_into().ok()?;
468        let dimens = self.char_dimens.get(&char)?;
469        let i = dimens.height_index;
470        Some(
471            self.heights
472                .get(i as usize)?
473                .to_scaled(self.header.design_size),
474        )
475    }
476    pub fn depth_utf8(&self, char: char) -> Option<common::Scaled> {
477        let char: Char = char.try_into().ok()?;
478        let dimens = self.char_dimens.get(&char)?;
479        let i = dimens.depth_index;
480        Some(
481            self.depths
482                .get(i as usize)?
483                .to_scaled(self.header.design_size),
484        )
485    }
486}
487
488impl From<crate::pl::File> for File {
489    fn from(pl_file: crate::pl::File) -> Self {
490        let mut char_bounds: Option<(Char, Char)> = None;
491
492        let lig_kern_entrypoints = pl_file.lig_kern_entrypoints(true);
493        let mut lig_kern_program = pl_file.lig_kern_program;
494        let kerns = lig_kern_program.unpack_kerns();
495        let lig_kern_entrypoints = lig_kern_program.pack_entrypoints(lig_kern_entrypoints);
496
497        let mut widths = pl_file.additional_widths;
498        let mut heights = pl_file.additional_heights;
499        let mut depths = pl_file.additional_depths;
500        let mut italic_corrections = pl_file.additional_italics;
501        for (char, char_dimens) in &pl_file.char_dimens {
502            widths.push(char_dimens.width.unwrap_or_default());
503            match char_dimens.height {
504                None | Some(FixWord::ZERO) => {}
505                Some(height) => heights.push(height),
506            }
507            match char_dimens.depth {
508                None | Some(FixWord::ZERO) => {}
509                Some(depth) => depths.push(depth),
510            }
511            match char_dimens.italic_correction {
512                None | Some(FixWord::ZERO) => {}
513                Some(italic_correction) => italic_corrections.push(italic_correction),
514            }
515            char_bounds = Some(match char_bounds {
516                None => (*char, *char),
517                Some((lower, upper)) => (
518                    if *char < lower { *char } else { lower },
519                    if *char > upper { *char } else { upper },
520                ),
521            })
522        }
523        let (widths, width_to_index) = compress(&widths, 255);
524        let (heights, height_to_index) = compress(&heights, 15);
525        let (depths, depth_to_index) = compress(&depths, 15);
526        let (italic_corrections, italic_correction_to_index) = compress(&italic_corrections, 63);
527        let mut extensible_chars = vec![];
528        let char_dimens = match char_bounds {
529            None => Default::default(),
530            Some((lower, upper)) => {
531                let mut m: BTreeMap<Char, CharDimensions> = Default::default();
532                for c in lower.0..=upper.0 {
533                    let pl_data = match pl_file.char_dimens.get(&Char(c)) {
534                        Some(pl_data) => pl_data,
535                        None => continue,
536                    };
537                    let width = pl_data.width.unwrap_or_default();
538                    let width_index = *width_to_index.get(&width).expect(
539                        "the map returned from compress(_,_) contains every input as a key",
540                    );
541                    m.insert(
542                        Char(c),
543                        CharDimensions {
544                            width_index: WidthIndex::Valid(width_index),
545                            height_index: match pl_data.height {
546                                None => 0,
547                                Some(height) => {
548                                    // If the height data is missing from the height_to_index map, it's because
549                                    // the height is 0. Similar with depths and italic corrections.
550                                    height_to_index
551                                        .get(&height)
552                                        .copied()
553                                        .map(NonZeroU8::get)
554                                        .unwrap_or(0)
555                                }
556                            },
557                            depth_index: match pl_data.depth {
558                                None => 0,
559                                Some(depth) => depth_to_index
560                                    .get(&depth)
561                                    .copied()
562                                    .map(NonZeroU8::get)
563                                    .unwrap_or(0),
564                            },
565                            italic_index: match pl_data.italic_correction {
566                                None => 0,
567                                Some(italic_correction) => italic_correction_to_index
568                                    .get(&italic_correction)
569                                    .copied()
570                                    .map(NonZeroU8::get)
571                                    .unwrap_or(0),
572                            },
573                        },
574                    );
575                }
576                m
577            }
578        };
579        let ordered_chars = {
580            let mut m: Vec<Char> = pl_file.char_tags.keys().copied().collect();
581            m.sort();
582            m
583        };
584        let char_tags = ordered_chars.into_iter().map(|c| {
585            (c,
586                match pl_file.char_tags.get(&c).unwrap() {
587                    pl::CharTag::Ligature(_) => {
588                        let entrypoint = *lig_kern_entrypoints.get(&c).expect("the map returned by crate::ligkern::lang::compress_entrypoints has a key for all chars with a lig tag");
589                        CharTag::Ligature(entrypoint)
590                    }
591                    pl::CharTag::List(c) => CharTag::List(*c),
592                    pl::CharTag::Extension(e) => {
593                        let index: u8 = extensible_chars.len().try_into().unwrap();
594                        extensible_chars.push(e.clone());
595                        CharTag::Extension(index)
596                    }
597                },
598            )
599        }).collect();
600
601        let mut file = Self {
602            header: pl_file.header,
603            smallest_char: char_bounds.map(|t| t.0).unwrap_or(Char(1)),
604            char_dimens,
605            char_tags,
606            unset_char_tags: pl_file.unset_char_tags.clone(),
607            widths,
608            heights,
609            depths,
610            italic_corrections,
611            lig_kern_program,
612            kerns,
613            extensible_chars,
614            params: pl_file.params,
615        };
616        if file.header.checksum.is_none() {
617            file.header.checksum = Some(file.checksum());
618        }
619        file
620    }
621}
622
623/// Lossy compression of numbers for TFM files.
624///
625/// The TFM file format can only store up to 15 heights, 15 depths and 63 italic corrections.
626/// If a property list file contains, e.g., more than 15 distinct heights, something has to give.
627/// PLtoTF contains a lossy compression algorithm that takes a list of values and returns another
628/// bounded list of values which approximates the original list.
629/// This is implemented in PLtoTF.2014.75-80 and re-implemented here.
630///
631/// ## How the algorithm works.
632///
633/// For a given delta, the algorithm partitions the ordered list of values such that within
634/// each partition the maximum distance between two elements is delta. All of the values within
635/// the partition are then approximated by `(interval_max+interval_min)/2`.
636/// As such, each value may be increased or decreased by up to `delta/2`.
637/// After this procedure, the list of values is replaced by the list of approximations.
638/// This compresses the list of values into a list whose size is the number of intervals.
639///
640/// Given this subroutine, the algorithm finds the smallest delta such that the number of intervals
641/// is less than the required maximum (e.g., 15 for heights).
642///
643/// There are some important features of this algorithm to note:
644///
645/// - For a given delta value there may be multiple possible partitions.
646///   The algorithm uses a greedy approach in which it maximizes the size of the first partition,
647///   then the size of the second partition, and so on.
648///
649/// - Distinct delta values can yield the same partition.
650///   For example, if the initial values are `[1, 4, 5]` then any delta in the range `[1, 3)`
651///   gives the same result (`[[1], [4, 5]]`)
652///   Whenever we check a delta, we are really checking the _interval of deltas_  that gives the same result.
653///   Both Knuth's and our implementations use this fact to speed up the search by reducing the search space.
654///   E.g. in the example above, after checking `delta=1` we _don't_ check `delta=2`.
655///
656/// ## This re-implementation
657///
658/// The re-implementation here follows PLtoTF closely enough, but with one modification.
659/// To find the optimal delta, Knuth first calculates the minimal possible delta.
660/// This is the minimum distance between adjacent elements in the ordered list of values.
661/// In general this will not be a valid solution because the number of intervals
662///     it generates will be large.
663/// So, he next finds the smallest `k` such that `2^k * min_delta` is a valid solution.
664/// Te then does a upwards linear search within the interval `[min_delta * 2^{k-1}, min_delta * 2^k]` to find
665///     the optimal delta.
666///
667/// Checking a particular delta is `O(n)`.
668/// The worst-case running time of Knuth's algorithm is then `O(n^3)` because the interval
669///     `[2^{k-1}, 2^k]` can contain `O(n^2)` distinct deltas to check in the worst-case [Note 1].
670///
671/// In the re-implementation here we realize that the problem of finding the smallest possible delta
672///     is a classic binary search problem.
673/// This is because if delta is a valid solution, any larger delta also is;
674///     and if delta is not a valid solution, any smaller delta is also not a valid solution.
675/// The re-implementation using binary search is `O(n log n)`.
676/// Moreover, the constant factors are about the same.
677///
678/// [Note 1] In the worst-case there are O(n^2) distinct deltas because each pair of elements yields a delta.
679/// Let m be the smallest delta and M the largest delta.
680/// In the initial `2^k`-based ranging scheme, the largest `K` satisfies `m 2^{K-1} < M <= m 2^K`,
681/// or `K-1 <= log_2(M/m)`. Thus there are `K=O(1)` ranges in this initial scheme.
682/// By the pigeon-hole principle, there exists a `k` such that the range `[m * 2^{k-1}, m * 2^k]`
683///     contains O(n^2) elements.
684/// In the worst-case, the solution is the maximum element of this range.
685pub fn compress(values: &[FixWord], max_size: u8) -> (Vec<FixWord>, HashMap<FixWord, NonZeroU8>) {
686    let max_size = max_size as usize;
687    let dedup_values = {
688        let s: HashSet<FixWord> = values.iter().copied().collect();
689        // remove the zero value for non-widths
690        // and then add it back in at the start
691        let mut v: Vec<FixWord> = s.into_iter().collect();
692        v.sort();
693        v
694    };
695    // After deduplication, it is possible we don't need to compress at all so we can exit early.
696    // This also handles the case when the values slice is empty.
697    if dedup_values.len() <= max_size {
698        let m: HashMap<FixWord, NonZeroU8> = dedup_values
699            .iter()
700            .enumerate()
701            .map(|(i, &w)| {
702                let i: u8 = i.try_into().expect("`dedup_values` has at most `max_size` elements, so the index is at most `max_size-1`");
703                let i: NonZeroU8 = (i+1).try_into().expect("`i<=max_size-1<=u8::MAX`, so `i+1<=u8::MAX`");
704                (w, i) })
705            .collect();
706        let mut dedup_values = dedup_values;
707        dedup_values.push(FixWord::ZERO);
708        dedup_values.rotate_right(1);
709        return (dedup_values, m);
710    }
711
712    // For the binary search we maintain lower and upper indices as usual.
713    // The optimal delta is in the interval [lower, upper].
714    //
715    // Invariant: delta<lower is never a solution.
716    // Because delta must be non-negative, we initialize it to zero.
717    let mut lower = FixWord::ZERO;
718    // Invariant: delta=upper is always solution.
719    // To initialize upper and begin the search we construct a solution that always works: a single
720    // interval encompassing the entire slice and the largest delta possible.
721    let max_delta = *dedup_values.last().unwrap() - *dedup_values.first().unwrap();
722    let mut upper = max_delta;
723    let mut solution = vec![dedup_values.len()];
724
725    let mut buffer = vec![];
726    while lower < upper {
727        // After the following line delta is potentially equal to lower. This is what we want as
728        // we know upper is a solution so to advance the search when upper=lower+1
729        // we need to check lower+1.
730        let delta = lower + (upper - lower) / 2;
731
732        let mut interval_start = *dedup_values.first().unwrap();
733        // The smallest delta such that the candidate solution will be the same.
734        // This is the maximum of all gaps that don't start a new interval.
735        let mut delta_lower = FixWord::ZERO;
736        // The largest delta such that the candidate solution will be different.
737        // This is the minimum of all gaps that start a new interval.
738        let mut delta_upper = max_delta;
739        for (i, &v) in dedup_values.iter().enumerate() {
740            let gap = v - interval_start;
741            if gap > delta {
742                // We need to start a new interval
743                if gap < delta_upper {
744                    delta_upper = gap;
745                }
746                buffer.push(i);
747                // If the candidate solution is already too big, we can exit early.
748                if buffer.len() >= max_size {
749                    break;
750                }
751                interval_start = v;
752            } else {
753                // We need to extend the current interval
754                // For any delta in the range [gap, delta] we would have made the same choice here.
755                if gap > delta_lower {
756                    delta_lower = gap;
757                }
758            }
759        }
760        buffer.push(dedup_values.len());
761
762        if buffer.len() <= max_size {
763            // solution
764            std::mem::swap(&mut buffer, &mut solution);
765            upper = delta_lower;
766        } else {
767            // not a solution
768            lower = delta_upper;
769        }
770        buffer.clear();
771    }
772
773    let mut value_to_index = HashMap::<FixWord, NonZeroU8>::new();
774    let mut result = vec![FixWord::ZERO];
775    let mut previous = 0_usize;
776    for i in solution {
777        let interval = &dedup_values[previous..i];
778        previous = i;
779        for &v in interval {
780            let index: u8 = result.len().try_into().expect("the `result` array contains at most `1+max_size` elements, so the index it at most `max_size` which is a u8");
781            let index: NonZeroU8 = index
782                .try_into()
783                .expect("the `result` array contains at least 1 element so this is never 0");
784            value_to_index.insert(v, index);
785        }
786        let replacement = (*interval.last().unwrap() + *interval.first().unwrap()) / 2;
787        result.push(replacement);
788    }
789
790    (result, value_to_index)
791}
792
793#[derive(Debug, Hash, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
794pub enum Section {
795    /// Sub-file sizes
796    SubFileSizes,
797    /// Header
798    Header,
799    /// Character data
800    CharInfos,
801    /// Widths array
802    Widths,
803    /// Heights array
804    Heights,
805    /// Depths array
806    Depths,
807    /// Italic corrections array
808    ItalicCorrections,
809    /// Lig/kern instructions
810    LigKern,
811    /// Kerns array
812    Kerns,
813    /// Extensible recipes
814    ExtensibleRecipes,
815    /// Params array
816    Params,
817}
818
819impl Section {
820    pub const NAMES: [&'static str; 11] = [
821        "sub-file-sizes",
822        "header",
823        "char-infos",
824        "widths",
825        "heights",
826        "depths",
827        "italic-corrections",
828        "lig-kern",
829        "kerns",
830        "extensible-recipes",
831        "params",
832    ];
833    pub const ALL_SECTIONS: [Section; 11] = [
834        Section::SubFileSizes,
835        Section::Header,
836        Section::CharInfos,
837        Section::Widths,
838        Section::Heights,
839        Section::Depths,
840        Section::ItalicCorrections,
841        Section::LigKern,
842        Section::Kerns,
843        Section::ExtensibleRecipes,
844        Section::Params,
845    ];
846}
847
848impl TryFrom<&str> for Section {
849    type Error = ();
850
851    fn try_from(value: &str) -> Result<Self, Self::Error> {
852        for i in 0..Section::NAMES.len() {
853            if Section::NAMES[i] == value {
854                return Ok(Section::ALL_SECTIONS[i]);
855            }
856        }
857        Err(())
858    }
859}
860
861impl std::fmt::Display for Section {
862    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
863        write!(f, "{}", Section::NAMES[*self as usize])
864    }
865}
866
867/// The TFM header, which contains metadata about the file.
868///
869/// This is defined in TFtoPL.2014.10.
870#[derive(Clone, Debug, Default, PartialEq, Eq)]
871pub struct Header {
872    /// The font checksum, if specified.
873    ///
874    /// In .tfm files checksums are always specified because the format has no
875    ///     way to omit a checksum.
876    ///
877    /// In .pl files checksums are specified if the `CHECKSUM` node appears.
878    /// If no checksum is specified in a .pl file, pltotf calculates the
879    ///     correct value and writes that.
880    ///
881    /// In TeX82, this is stored in the `font_check` array (TeX82.2021.549).
882    pub checksum: Option<u32>,
883    /// In TeX82, this is stored in the `font_dsize` array (TeX82.2021.549).
884    pub design_size: FixWord,
885    pub design_size_valid: bool,
886    pub character_coding_scheme: Option<String>,
887    pub font_family: Option<String>,
888    pub seven_bit_safe: Option<bool>,
889    pub face: Option<Face>,
890    /// The TFM format allows the header to contain arbitrary additional data.
891    pub additional_data: Vec<u32>,
892}
893
894impl Header {
895    /// Returns the default header when parsing property list files.
896    ///
897    /// This is defined in PLtoTF.2014.70.
898    pub fn pl_default() -> Header {
899        Header {
900            checksum: None,
901            design_size: FixWord::ONE * 10,
902            design_size_valid: true,
903            character_coding_scheme: Some("UNSPECIFIED".into()),
904            font_family: Some("UNSPECIFIED".into()),
905            seven_bit_safe: None,
906            face: Some(0.into()),
907            additional_data: vec![],
908        }
909    }
910
911    /// Returns the default header when parsing .tfm files.
912    ///
913    /// This is defined in PLtoTF.2014.70.
914    pub fn tfm_default() -> Header {
915        Header {
916            checksum: Some(0),
917            design_size: FixWord::ZERO,
918            design_size_valid: true,
919            character_coding_scheme: None,
920            font_family: None,
921            seven_bit_safe: None,
922            face: None,
923            additional_data: vec![],
924        }
925    }
926}
927
928/// A character in a TFM file.
929///
930/// TFM and PL files only support 1-byte characters.
931#[derive(Debug, Default, PartialEq, Eq, Hash, Clone, Copy, PartialOrd, Ord)]
932#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
933pub struct Char(pub u8);
934
935impl std::fmt::Display for Char {
936    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
937        if (self.0 as char).is_ascii_graphic() {
938            write!(f, "{}", self.0 as char)
939        } else {
940            write!(f, "\\u{{{:02x}}}", self.0)
941        }
942    }
943}
944
945impl From<u8> for Char {
946    fn from(value: u8) -> Self {
947        Char(value)
948    }
949}
950
951impl TryFrom<char> for Char {
952    type Error = std::char::TryFromCharError;
953
954    fn try_from(value: char) -> Result<Self, Self::Error> {
955        let u: u8 = value.try_into()?;
956        Ok(Char(u))
957    }
958}
959
960impl From<Char> for char {
961    fn from(value: Char) -> Self {
962        value.0 as char
963    }
964}
965
966macro_rules! const_chars {
967    ( $( ($name: ident, $value: expr), )+ ) => {
968        $(
969            pub const $name: Char = Char($value);
970        )+
971    };
972}
973
974impl Char {
975    const_chars![
976        (A, b'A'),
977        (B, b'B'),
978        (C, b'C'),
979        (D, b'D'),
980        (V, b'V'),
981        (W, b'W'),
982        (X, b'X'),
983        (Y, b'Y'),
984        (Z, b'Z'),
985    ];
986
987    pub fn is_seven_bit(&self) -> bool {
988        self.0 <= 127
989    }
990}
991
992#[derive(Debug, PartialEq, Eq, Copy, Clone)]
993#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
994pub enum FaceWeight {
995    Light,
996    Medium,
997    Bold,
998}
999
1000#[derive(Debug, PartialEq, Eq, Copy, Clone)]
1001#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1002pub enum FaceSlope {
1003    Roman,
1004    Italic,
1005}
1006
1007#[derive(Debug, PartialEq, Eq, Copy, Clone)]
1008#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1009pub enum FaceExpansion {
1010    Regular,
1011    Condensed,
1012    Extended,
1013}
1014
1015#[derive(Debug, PartialEq, Eq, Copy, Clone)]
1016#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1017pub enum Face {
1018    Valid(FaceWeight, FaceSlope, FaceExpansion),
1019    Other(u8),
1020}
1021
1022impl From<u8> for Face {
1023    fn from(value: u8) -> Self {
1024        if value >= 18 {
1025            return Face::Other(value);
1026        }
1027        let a = match (value % 6) / 2 {
1028            0 => FaceWeight::Medium,
1029            1 => FaceWeight::Bold,
1030            2 => FaceWeight::Light,
1031            _ => unreachable!(),
1032        };
1033        let b = match value % 2 {
1034            0 => FaceSlope::Roman,
1035            1 => FaceSlope::Italic,
1036            _ => unreachable!(),
1037        };
1038        let c = match value / 6 {
1039            0 => FaceExpansion::Regular,
1040            1 => FaceExpansion::Condensed,
1041            2 => FaceExpansion::Extended,
1042            _ => unreachable!(),
1043        };
1044        Face::Valid(a, b, c)
1045    }
1046}
1047
1048impl From<Face> for u8 {
1049    fn from(value: Face) -> Self {
1050        match value {
1051            Face::Valid(w, s, c) => {
1052                let a: u8 = match w {
1053                    FaceWeight::Medium => 0,
1054                    FaceWeight::Bold => 1,
1055                    FaceWeight::Light => 2,
1056                };
1057                let b: u8 = match s {
1058                    FaceSlope::Roman => 0,
1059                    FaceSlope::Italic => 1,
1060                };
1061                let c: u8 = match c {
1062                    FaceExpansion::Regular => 0,
1063                    FaceExpansion::Condensed => 1,
1064                    FaceExpansion::Extended => 2,
1065                };
1066                c * 6 + a * 2 + b
1067            }
1068            Face::Other(b) => b,
1069        }
1070    }
1071}
1072
1073/// A named TeX font metric parameter.
1074#[derive(PartialEq, Eq, Debug, Copy, Clone)]
1075#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1076pub enum NamedParameter {
1077    Slant,
1078    Space,
1079    Stretch,
1080    Shrink,
1081    XHeight,
1082    Quad,
1083    ExtraSpace,
1084    Num1,
1085    Num2,
1086    Num3,
1087    Denom1,
1088    Denom2,
1089    Sup1,
1090    Sup2,
1091    Sup3,
1092    Sub1,
1093    Sub2,
1094    SupDrop,
1095    SubDrop,
1096    Delim1,
1097    Delim2,
1098    AxisHeight,
1099    DefaultRuleThickness,
1100    BigOpSpacing1,
1101    BigOpSpacing2,
1102    BigOpSpacing3,
1103    BigOpSpacing4,
1104    BigOpSpacing5,
1105}
1106
1107impl NamedParameter {
1108    pub fn number(&self) -> u8 {
1109        match self {
1110            NamedParameter::Slant => 1,
1111            NamedParameter::Space => 2,
1112            NamedParameter::Stretch => 3,
1113            NamedParameter::Shrink => 4,
1114            NamedParameter::XHeight => 5,
1115            NamedParameter::Quad => 6,
1116            NamedParameter::ExtraSpace => 7,
1117            NamedParameter::Num1 => 8,
1118            NamedParameter::Num2 => 9,
1119            NamedParameter::Num3 => 10,
1120            NamedParameter::Denom1 => 11,
1121            NamedParameter::Denom2 => 12,
1122            NamedParameter::Sup1 => 13,
1123            NamedParameter::Sup2 => 14,
1124            NamedParameter::Sup3 => 15,
1125            NamedParameter::Sub1 => 16,
1126            NamedParameter::Sub2 => 17,
1127            NamedParameter::SupDrop => 18,
1128            NamedParameter::SubDrop => 19,
1129            NamedParameter::Delim1 => 20,
1130            NamedParameter::Delim2 => 21,
1131            NamedParameter::AxisHeight => 22,
1132            NamedParameter::DefaultRuleThickness => 8,
1133            NamedParameter::BigOpSpacing1 => 9,
1134            NamedParameter::BigOpSpacing2 => 10,
1135            NamedParameter::BigOpSpacing3 => 11,
1136            NamedParameter::BigOpSpacing4 => 12,
1137            NamedParameter::BigOpSpacing5 => 13,
1138        }
1139    }
1140}
1141
1142/// Warning from the compilation of "next larger character" instructions.
1143#[derive(PartialEq, Eq, Debug, Clone)]
1144pub enum NextLargerProgramWarning {
1145    NonExistentCharacter { original: Char, next_larger: Char },
1146    InfiniteLoop { original: Char, next_larger: Char },
1147}
1148
1149impl NextLargerProgramWarning {
1150    pub fn bad_char(&self) -> Char {
1151        match self {
1152            NextLargerProgramWarning::NonExistentCharacter {
1153                original,
1154                next_larger: _,
1155            } => *original,
1156            NextLargerProgramWarning::InfiniteLoop { original, .. } => *original,
1157        }
1158    }
1159
1160    /// Returns the warning message the TFtoPL program prints for this kind of error.
1161    pub fn tftopl_message(&self) -> String {
1162        match self {
1163            NextLargerProgramWarning::NonExistentCharacter {
1164                original: _,
1165                next_larger,
1166            } => {
1167                format![
1168                    "Bad TFM file: Character list link to nonexistent character '{:03o}.",
1169                    next_larger.0
1170                ]
1171            }
1172            NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1173                format!["Bad TFM file: Cycle in a character list!\nCharacter '{:03o} now ends the list.", original.0]
1174            }
1175        }
1176    }
1177
1178    /// Returns the section in Knuth's TFtoPL (version 2014) in which this warning occurs.
1179    pub fn tftopl_section(&self) -> u8 {
1180        84
1181    }
1182
1183    /// Returns the section in Knuth's PLtoTF (version 2014) in which this warning occurs.
1184    pub fn pltotf_section(&self) -> u8 {
1185        match self {
1186            NextLargerProgramWarning::NonExistentCharacter { .. } => 111,
1187            NextLargerProgramWarning::InfiniteLoop { .. } => 113,
1188        }
1189    }
1190
1191    /// Returns the warning message the PLtoTF program prints for this kind of error.
1192    pub fn pltotf_message(&self) -> String {
1193        match self {
1194            NextLargerProgramWarning::NonExistentCharacter {
1195                original,
1196                next_larger: _,
1197            } => {
1198                format![
1199                    "The character NEXTLARGER than '{:03o} had no CHARACTER spec.",
1200                    original.0
1201                ]
1202            }
1203            NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1204                format![
1205                    "A cycle of NEXTLARGER characters has been broken at '{:03o}.",
1206                    original.0
1207                ]
1208            }
1209        }
1210    }
1211}
1212/// Compiled program of "next larger character" instructions
1213///
1214/// The .tfm file format can associate a "next larger" character to any character in a font.
1215/// Next larger characters form sequences: i.e. B can be the next larger character for A,
1216///     and C can be the next larger character for B,
1217///     leading to the sequences A-B-C.
1218/// These next larger characters are used at certain points in TeX.
1219/// TeX occasionally traverses the entire sequence for a given starting character (e.g. A).
1220///
1221/// As with ligatures, next larger specifications can contain infinite loops -
1222///     e.g, if X is the next larger character for Y
1223///      and Y is the next larger character for X.
1224/// These loops are invalid and removed by TFtoPL and PLtoTF.
1225///
1226/// Motivated by the idea of "parse don't validate", this type represents
1227///     a compiled version of the next larger specifications in which infinite loops
1228///     are statically guaranteed not to exist.
1229///
1230/// The basic use of a valid program looks like this:
1231///
1232/// ```
1233/// # use tfm::*;
1234/// let edges = vec![
1235///     (Char::A, Char::B),
1236///     (Char::B, Char::C),
1237/// ];
1238/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1239///
1240/// assert_eq![warnings, vec![]];
1241///
1242/// let sequence_A: Vec<Char> = next_larger_program.get(Char::A).collect();
1243/// assert_eq!(sequence_A, vec![Char::B, Char::C]);
1244///
1245/// let sequence_B: Vec<Char> = next_larger_program.get(Char::B).collect();
1246/// assert_eq!(sequence_B, vec![Char::C]);
1247///
1248/// let sequence_C: Vec<Char> = next_larger_program.get(Char::C).collect();
1249/// assert_eq!(sequence_C, vec![]);
1250///
1251/// // Character that is not in the program.
1252/// let sequence_D: Vec<Char> = next_larger_program.get(Char::D).collect();
1253/// assert_eq!(sequence_D, vec![]);
1254/// ```
1255///
1256/// ## Warnings
1257///
1258/// There are two types of error that can occur when constructing the next
1259///     larger program.
1260/// Both of these errors are handled gracefully, so we officially refer to them as warnings.
1261/// The constructor returns them as values of type [`NextLargerProgramWarning`].
1262///
1263/// ### Infinite loops
1264///
1265/// The first error is that next larger programs can contain infinite loops -
1266///     e.g, if X is the next larger character for Y
1267///      and Y is the next larger character for X.
1268/// In this case the loop is broken by removing the next larger program for the
1269///     character with the largest 8-bit code, in this case Y.
1270/// A [`NextLargerProgramWarning::InfiniteLoop`] warning is returned
1271///     from the program constructor.
1272///
1273/// ```
1274/// # use tfm::*;
1275/// let edges = vec![
1276///     (Char::X, Char::Y),
1277///     (Char::Y, Char::X),
1278/// ];
1279/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1280///
1281/// assert_eq!(warnings, vec![NextLargerProgramWarning::InfiniteLoop{
1282///     original: Char::Y,
1283///     next_larger: Char::X,
1284/// }]);
1285///
1286/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1287/// assert_eq!(sequence_X, vec![Char::Y]);
1288///
1289/// let sequence_Y: Vec<Char> = next_larger_program.get(Char::Y).collect();
1290/// assert_eq!(sequence_Y, vec![]);
1291/// ```
1292///
1293/// ### Non-existent characters
1294///
1295/// The second error is that characters referred to in the next larger program
1296///     may not be defined in the .tfm or .pl file.
1297/// For example, a .pl file may contain the snippet `(CHARACTER C X (NEXTLARGER C Y))`
1298///     without defining the character Y.
1299/// The constructor [`NextLargerProgram::new`] accepts a function for checking if a
1300///     character exists.
1301/// In all cases a [`NextLargerProgramWarning::NonExistentCharacter`] warning is returned
1302///     if a non-existent character is encountered.
1303///
1304/// The behavior of the resulting next larger program is configured using the
1305///     `drop_non_existent_characters` argument.
1306/// If this is false, then the behavior is the same as PLtoTF and the program still
1307///     contains the character.
1308///
1309/// ```
1310/// # use tfm::*;
1311/// let edges = vec![
1312///     (Char::X, Char::Y),
1313/// ];
1314/// let character_exists = |c| {
1315///     if c == Char::Y {
1316///         false
1317///     } else {
1318///         true
1319///     }
1320/// };
1321/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, false);
1322///
1323/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1324///     original: Char::X,
1325///     next_larger: Char::Y,
1326/// }]);
1327///
1328/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1329/// assert_eq!(sequence_X, vec![Char::Y]);
1330/// ```
1331///
1332/// If `drop_non_existent_characters` is true, next larger instructions pointing at non-existent
1333///     characters are dropped.
1334/// This is how TFtoPL behaves.
1335///
1336///
1337/// ```
1338/// # use tfm::*;
1339/// let edges = vec![
1340///     (Char::X, Char::Y),
1341/// ];
1342/// let character_exists = |c| {
1343///     if c == Char::Y {
1344///         false
1345///     } else {
1346///         true
1347///     }
1348/// };
1349/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, true);
1350///
1351/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1352///     original: Char::X,
1353///     next_larger: Char::Y,
1354/// }]);
1355///
1356/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1357/// assert_eq!(sequence_X, vec![]);
1358/// ```
1359///
1360#[derive(Clone, Debug)]
1361pub struct NextLargerProgram {
1362    entrypoints: HashMap<Char, u8>,
1363    next_larger: Vec<(Char, NonZeroU8)>,
1364}
1365
1366impl NextLargerProgram {
1367    /// Build a new next larger program from an iterator over edges.
1368    pub fn new<I: Iterator<Item = (Char, Char)>, F: Fn(Char) -> bool>(
1369        edges: I,
1370        character_exists: F,
1371        drop_non_existent_characters: bool,
1372    ) -> (Self, Vec<NextLargerProgramWarning>) {
1373        // This function implements functionality in TFtoPL.2014.84 and PLtoTF.2014.{110,111,113}.
1374        let mut warnings: Vec<NextLargerProgramWarning> = vec![];
1375
1376        let mut node_to_larger = HashMap::<Char, Char>::new();
1377        let mut node_to_num_smaller = HashMap::<Char, usize>::new();
1378        for (smaller, larger) in edges {
1379            if !character_exists(larger) {
1380                warnings.push(NextLargerProgramWarning::NonExistentCharacter {
1381                    original: smaller,
1382                    next_larger: larger,
1383                });
1384                if drop_non_existent_characters {
1385                    continue;
1386                }
1387            }
1388            node_to_larger.insert(smaller, larger);
1389            node_to_num_smaller.entry(smaller).or_default();
1390            *node_to_num_smaller.entry(larger).or_default() += 1;
1391        }
1392
1393        let mut leaves: Vec<Char> = vec![];
1394        let mut non_leaves: BTreeSet<Char> = Default::default();
1395        for (node, num_smaller) in &node_to_num_smaller {
1396            if *num_smaller == 0 {
1397                leaves.push(*node);
1398            } else {
1399                non_leaves.insert(*node);
1400            }
1401        }
1402
1403        let mut sorted_chars = vec![];
1404        let mut infinite_loop_warnings = vec![];
1405        loop {
1406            while let Some(smaller) = leaves.pop() {
1407                if let Some(larger) = node_to_larger.get(&smaller).copied() {
1408                    let num_smaller = node_to_num_smaller
1409                        .get_mut(&larger)
1410                        .expect("`node_to_num_smaller` contains all nodes");
1411                    *num_smaller = num_smaller
1412                        .checked_sub(1)
1413                        .expect("the larger of a smaller must have at least one smaller");
1414                    if *num_smaller == 0 {
1415                        leaves.push(larger);
1416                        non_leaves.remove(&larger);
1417                    }
1418                }
1419                sorted_chars.push(smaller);
1420            }
1421
1422            // There could be pending nodes left because of a cycle.
1423            // We break the cycle at the largest node.
1424            let smaller = match non_leaves.last() {
1425                None => break,
1426                Some(child) => *child,
1427            };
1428            let larger = node_to_larger.remove(&smaller).expect(
1429                "General graph fact: sum_(node)#in_edges(node)=sum_(node)#out_edges(node).
1430                Fact about the next larger graph: #out_edges(node)<=1, because each char has at most one next larger char.
1431                If #out_edge(child)=0 then sum_(node)#in_edges(node)=sum_(node)#out_edges(node) < #nodes.
1432                Thus there exists another node with #in_edges(node)=0 and that node is a leaf.
1433                But `leaves.len()=0` at this line of code",
1434            );
1435            infinite_loop_warnings.push(NextLargerProgramWarning::InfiniteLoop {
1436                original: smaller,
1437                next_larger: larger,
1438            });
1439            leaves.push(larger);
1440            non_leaves.remove(&larger);
1441        }
1442        warnings.extend(infinite_loop_warnings.into_iter().rev());
1443
1444        let next_larger = {
1445            let parents: HashSet<Char> = node_to_larger.values().copied().collect();
1446            let mut node_to_position = HashMap::<Char, u8>::new();
1447            let mut next_larger: Vec<(Char, NonZeroU8)> = vec![];
1448
1449            for c in sorted_chars.iter().rev() {
1450                if !parents.contains(c) {
1451                    continue;
1452                }
1453                // The present character is the parent of at least one child, aka it is the next larger
1454                // character for another character. So it needs to be in the next_larger array.
1455                let child_position: u8 = next_larger
1456                    .len()
1457                    .try_into()
1458                    .expect("there are at most u8::MAX chars in the `next_larger` array");
1459                let offset = match node_to_larger.get(c) {
1460                    // The next_larger array contains at most 256 elements: one for each char.
1461                    // (Actually it contains at most 255 because one character necessarily does not
1462                    // have a child node and this character does not appear in the array.)
1463                    // Anyway, an offset of 256 sends the index outside the array bound, and so
1464                    // subsequent calls to iterator return None.
1465                    None => NonZeroU8::MAX,
1466                    Some(parent) => {
1467                        let parent_position = *node_to_position
1468                            .get(parent)
1469                            .expect("parent has already been inserted");
1470                        child_position
1471                            .checked_sub(parent_position)
1472                            .expect("parent inserted before so its position it is strictly smaller")
1473                            .try_into()
1474                            .expect("parent inserted before so its position it is strictly smaller")
1475                    }
1476                };
1477                next_larger.push((*c, offset));
1478                node_to_position.insert(*c, child_position);
1479            }
1480            next_larger.reverse();
1481            next_larger
1482        };
1483
1484        let entrypoints = {
1485            let node_to_position: HashMap<Char, u8> = next_larger
1486                .iter()
1487                .enumerate()
1488                .map(|(i, (c, _))| {
1489                    let u: u8 = i
1490                        .try_into()
1491                        .expect("there are at most u8::MAX chars in the `next_larger` array");
1492                    (*c, u)
1493                })
1494                .collect();
1495            let mut entrypoints = HashMap::<Char, u8>::new();
1496            for c in sorted_chars.iter().rev() {
1497                if let Some(parent) = node_to_larger.get(c) {
1498                    entrypoints.insert(
1499                        *c,
1500                        *node_to_position
1501                            .get(parent)
1502                            .expect("parent has already been inserted"),
1503                    );
1504                }
1505            }
1506            entrypoints
1507        };
1508
1509        (
1510            Self {
1511                entrypoints,
1512                next_larger,
1513            },
1514            warnings,
1515        )
1516    }
1517
1518    /// Get the next larger sequence for a character
1519    pub fn get(&self, c: Char) -> impl Iterator<Item = Char> + '_ {
1520        NextLargerProgramIter {
1521            current: self.entrypoints.get(&c).copied(),
1522            program: self,
1523        }
1524    }
1525
1526    /// Returns whether this program is seven-bit safe.
1527    ///
1528    /// A next larger program is seven-bit safe if the next larger sequences for
1529    ///     seven-bit characters only contain seven-bit characters.
1530    /// Conversely a program is seven-bit unsafe if there is a seven-bit
1531    ///     character whose next larger sequence contains a non-seven-bit character.
1532    ///
1533    /// ```
1534    /// # use tfm::*;
1535    /// let edges = vec![
1536    ///     (Char(250), Char(125)),
1537    ///     (Char(125), Char(126)),
1538    /// ];
1539    /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1540    /// assert_eq!(true, next_larger_program.is_seven_bit_safe());
1541    ///
1542    /// let edges = vec![
1543    ///     (Char(125), Char(250)),
1544    /// ];
1545    /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1546    /// assert_eq!(false, next_larger_program.is_seven_bit_safe());
1547    /// ```
1548    pub fn is_seven_bit_safe(&self) -> bool {
1549        // For each c, we only need to check the first element in c's next larger sequence.
1550        // If there is a subsequent element d of the sequence that is seven-bit unsafe,
1551        // we will find it when considering one of d's children.
1552        // This optimization makes this function O(n), rather than worst case O(n^2).
1553        self.entrypoints
1554            .keys()
1555            .copied()
1556            .filter(Char::is_seven_bit)
1557            .filter_map(|c| self.get(c).next())
1558            .all(|c| c.is_seven_bit())
1559    }
1560}
1561
1562#[derive(Clone, Debug)]
1563struct NextLargerProgramIter<'a> {
1564    current: Option<u8>,
1565    program: &'a NextLargerProgram,
1566}
1567
1568impl<'a> Iterator for NextLargerProgramIter<'a> {
1569    type Item = Char;
1570
1571    fn next(&mut self) -> Option<Self::Item> {
1572        // Note that the iterator is statically guaranteed to terminate!
1573        // If it returns None, it has already terminated.
1574        // If it returns Some, then self.current will be incremented by a strictly
1575        //  positive number.
1576        // Incrementing like this can only happen a finite number of times before overflow
1577        //  occurs, and then self.current is None and the iterator is terminated.
1578        match self.current {
1579            None => None,
1580            Some(current) => match self.program.next_larger.get(current as usize) {
1581                None => None,
1582                Some((c, inc)) => {
1583                    self.current = current.checked_add(inc.get());
1584                    Some(*c)
1585                }
1586            },
1587        }
1588    }
1589}
1590
1591impl common::FontFormat for File {
1592    const DEFAULT_FILE_EXTENSION: &'static str = "tfm";
1593    type Error = DeserializationError;
1594
1595    fn parse(b: &[u8]) -> Result<Self, Self::Error> {
1596        File::deserialize(b).0
1597    }
1598}
1599
1600/// Fixed-width numeric type used in TFM files.
1601///
1602/// This numeric type has 11 bits for the integer part,
1603/// 20 bits for the fractional part, and a single signed bit.
1604/// The inner value is the number multiplied by 2^20.
1605/// It is called a `fix_word` in TFtoPL.
1606///
1607/// In property list files, this type is represented as a decimal number
1608///   with up to 6 digits after the decimal point.
1609/// This is a non-lossy representation
1610///   because 10^(-6) is larger than 2^(-20).
1611#[derive(Default, PartialEq, Eq, Debug, Copy, Clone, PartialOrd, Ord, Hash)]
1612#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1613pub struct FixWord(pub i32);
1614
1615impl FixWord {
1616    /// Representation of the number 0 as a [FixWord].
1617    pub const ZERO: FixWord = FixWord(0);
1618
1619    /// Representation of the number 1 as a [FixWord].
1620    pub const ONE: FixWord = FixWord(1 << 20);
1621
1622    /// Returns true if the number is less than 16.0 in magnitude according to Knuth.
1623    ///
1624    /// The number +16.0 is not allowed.
1625    /// This is covered in the E2E tests.
1626    /// See `check_fix` in TFtoPL.2014.60.
1627    pub fn is_abs_less_than_16(&self) -> bool {
1628        *self >= FixWord::ONE * -16 && *self < FixWord::ONE * 16
1629    }
1630
1631    /// Calculates the scaled value of this number at the specified design size.
1632    ///
1633    /// All dimensions in a TFM file are specified as multiples of the design size
1634    /// (with the exception of the design size itself).
1635    /// The scaled value (i.e. the value is scaled points) is the TFM value multiplied
1636    /// by the design size.
1637    ///
1638    /// The raw value and the design size both have 20 binary fractional bits,
1639    /// so the product has 40 fractional bits.
1640    /// A scaled value has only 16 fractional bits, and how the truncation is performed
1641    /// can effect the result.
1642    /// The specific multiplication algorithm Knuth wrote (TeX.2021.571-572)
1643    /// truncates the design size first and then performs the multiplication
1644    /// and then truncates again.
1645    /// Additionally, the integer division in that algorithm rounds towards zero
1646    /// even for negative numbers (e.g. -1.25 rounds to -1 instead of -2).
1647    ///
1648    /// In order to ensure the result here matched Knuth's algorithm, we reimplement
1649    /// Knuth's algorithm.
1650    pub fn to_scaled(self, design_size: FixWord) -> common::Scaled {
1651        // TeX.2021.568
1652        let mut z = common::Scaled(design_size.0 / 16);
1653        // TeX.2021.572
1654        let mut alpha = 16;
1655        // Ensure that z < 2^23 sp
1656        while z.0 >= 0o40000000 {
1657            z /= 2;
1658            alpha *= 2;
1659        }
1660        let beta = 256 / alpha; // beta = 16 * z / design_size
1661        let alpha = z * alpha; // alpha = 16 * design_size
1662
1663        // TeX.2021.571 (store_scaled)
1664        let [a, b, c, d] = self.0.to_be_bytes();
1665        assert!(a == 0 || a == 255);
1666        let sw = (((z * (d as i32)) / 0o400 + (z * (c as i32))) / 0o400 + z * (b as i32)) / beta;
1667        if a == 255 {
1668            // In this case self < 0.
1669            // We have calculated sw using only the 3 least significant bytes of self,
1670            // which is equivalent to setting self=16+self at the start. We then undo
1671            // this by setting sw = sw - 16 * design_size.
1672            sw - alpha
1673        } else {
1674            sw
1675        }
1676    }
1677}
1678
1679impl std::fmt::Display for FixWord {
1680    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
1681        // TFtoPL.2014.40-43
1682        if self.0 < 0 {
1683            write!(f, "-")?;
1684        }
1685        let integer_part = (self.0 / FixWord::ONE.0).abs();
1686        write!(f, "{integer_part}.")?;
1687        let mut fp = (self.0 % FixWord::ONE.0).abs();
1688        fp = 10 * fp + 5;
1689        let mut delta = 10;
1690        loop {
1691            if delta > 0o4_000_000 {
1692                fp = fp + 0o2_000_000 - delta / 2;
1693            }
1694            write!(f, "{}", fp / 0o4_000_000)?;
1695            fp = 10 * (fp % 0o4_000_000);
1696            delta *= 10;
1697            if fp <= delta {
1698                break;
1699            }
1700        }
1701        Ok(())
1702    }
1703}
1704
1705impl std::ops::Add<FixWord> for FixWord {
1706    type Output = FixWord;
1707    fn add(self, rhs: FixWord) -> Self::Output {
1708        FixWord(self.0 + rhs.0)
1709    }
1710}
1711impl std::ops::Sub<FixWord> for FixWord {
1712    type Output = FixWord;
1713    fn sub(self, rhs: FixWord) -> Self::Output {
1714        FixWord(self.0 - rhs.0)
1715    }
1716}
1717
1718impl std::ops::Mul<i32> for FixWord {
1719    type Output = FixWord;
1720
1721    fn mul(self, rhs: i32) -> Self::Output {
1722        FixWord(self.0 * rhs)
1723    }
1724}
1725
1726impl std::ops::Div<i32> for FixWord {
1727    type Output = FixWord;
1728
1729    fn div(self, rhs: i32) -> Self::Output {
1730        FixWord(self.0 / rhs)
1731    }
1732}
1733
1734#[cfg(test)]
1735mod tests {
1736    use super::*;
1737
1738    #[test]
1739    fn to_scaled_test() {
1740        assert_eq!(common::Scaled::ZERO, FixWord::ZERO.to_scaled(FixWord::ONE));
1741        assert_eq!(common::Scaled::ONE, FixWord::ONE.to_scaled(FixWord::ONE));
1742    }
1743
1744    fn run_compress_test(
1745        values: Vec<FixWord>,
1746        max_size: u8,
1747        want: Vec<FixWord>,
1748        want_map: Vec<u8>,
1749    ) {
1750        let (got, got_map) = compress(&values, max_size);
1751        assert_eq!(got, want);
1752        let want_map: HashMap<FixWord, NonZeroU8> = want_map
1753            .into_iter()
1754            .enumerate()
1755            .map(|(i, t)| (values[i], t.try_into().unwrap()))
1756            .collect();
1757        assert_eq!(got_map, want_map);
1758    }
1759
1760    macro_rules! compress_tests {
1761        ( $( ($name: ident, $values: expr, $max_size: expr, $want: expr, $want_map: expr, ), )+ ) => {
1762            $(
1763                #[test]
1764                fn $name () {
1765                    let values = $values;
1766                    let max_size = $max_size;
1767                    let want = $want;
1768                    let want_map = $want_map;
1769                    run_compress_test(values, max_size, want, want_map);
1770                }
1771            )+
1772        };
1773    }
1774
1775    compress_tests!(
1776        (no_op_0, vec![], 1, vec![FixWord(0)], vec![],),
1777        (
1778            no_op_2,
1779            vec![FixWord::ONE * 2, FixWord::ONE],
1780            2,
1781            vec![FixWord(0), FixWord::ONE, FixWord::ONE * 2],
1782            vec![2, 1],
1783        ),
1784        (
1785            just_deduplication,
1786            vec![FixWord::ONE, FixWord::ONE],
1787            1,
1788            vec![FixWord(0), FixWord::ONE],
1789            vec![1, 1],
1790        ),
1791        (
1792            simple_compression_case,
1793            vec![FixWord::ONE, FixWord::ONE * 2],
1794            1,
1795            vec![FixWord(0), FixWord::ONE * 3 / 2],
1796            vec![1, 1],
1797        ),
1798        (
1799            simple_compression_case_2,
1800            vec![
1801                FixWord::ONE,
1802                FixWord::ONE * 2,
1803                FixWord::ONE * 200,
1804                FixWord::ONE * 201
1805            ],
1806            2,
1807            vec![FixWord(0), FixWord::ONE * 3 / 2, FixWord::ONE * 401 / 2],
1808            vec![1, 1, 2, 2],
1809        ),
1810        (
1811            lower_upper_close_edge_case_1,
1812            vec![FixWord(1), FixWord(3)],
1813            1,
1814            vec![FixWord(0), FixWord(2)],
1815            vec![1, 1],
1816        ),
1817        (
1818            lower_upper_close_edge_case_2,
1819            vec![FixWord(0), FixWord(2)],
1820            1,
1821            vec![FixWord(0), FixWord(1)],
1822            vec![1, 1],
1823        ),
1824        (
1825            lower_upper_close_edge_case_3,
1826            vec![FixWord(1), FixWord(4)],
1827            1,
1828            vec![FixWord(0), FixWord(2)],
1829            vec![1, 1],
1830        ),
1831        (
1832            lower_upper_close_edge_case_4,
1833            vec![FixWord(1), FixWord(2)],
1834            1,
1835            vec![FixWord(0), FixWord(1)],
1836            vec![1, 1],
1837        ),
1838    );
1839
1840    mod next_larger_tests {
1841        use super::*;
1842
1843        fn run(
1844            edges: Vec<(Char, Char)>,
1845            want_sequences: HashMap<Char, Vec<Char>>,
1846            want_warnings: Vec<NextLargerProgramWarning>,
1847        ) {
1848            let (program, got_warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1849            assert_eq!(got_warnings, want_warnings);
1850            for u in 0..=u8::MAX {
1851                let want_sequence = want_sequences
1852                    .get(&Char(u))
1853                    .map(Vec::as_slice)
1854                    .unwrap_or_default();
1855                let got_sequence: Vec<Char> = program.get(Char(u)).collect();
1856                assert_eq!(
1857                    got_sequence, want_sequence,
1858                    "got/want sequences for {u} do not match"
1859                );
1860            }
1861        }
1862
1863        fn big_infinite_loop_edges() -> Vec<(Char, Char)> {
1864            (0..=u8::MAX)
1865                .into_iter()
1866                .map(|u| (Char(u), Char(u.wrapping_add(1))))
1867                .collect()
1868        }
1869
1870        fn big_infinite_loop_sequences() -> HashMap<Char, Vec<Char>> {
1871            (0..=u8::MAX)
1872                .into_iter()
1873                .map(|u| {
1874                    let v: Vec<Char> = match u.checked_add(1) {
1875                        None => vec![],
1876                        Some(w) => (w..=u8::MAX).into_iter().map(Char).collect(),
1877                    };
1878                    (Char(u), v)
1879                })
1880                .collect()
1881        }
1882
1883        macro_rules! next_larger_tests {
1884            ( $( ($name: ident, $edges: expr, $want_sequences: expr, $want_warnings: expr, ), )+ ) => {
1885                $(
1886                    #[test]
1887                    fn $name () {
1888                        run($edges, $want_sequences, $want_warnings);
1889                    }
1890                )+
1891            };
1892        }
1893
1894        next_larger_tests!(
1895            (
1896                same_node_loop,
1897                vec![(Char::A, Char::A)],
1898                HashMap::from([(Char::A, vec![])]),
1899                vec![NextLargerProgramWarning::InfiniteLoop {
1900                    original: Char::A,
1901                    next_larger: Char::A,
1902                }],
1903            ),
1904            (
1905                two_loops,
1906                vec![
1907                    (Char::A, Char::B),
1908                    (Char::B, Char::C),
1909                    (Char::C, Char::B),
1910                    (Char::X, Char::Y),
1911                    (Char::Y, Char::Z),
1912                    (Char::Z, Char::X),
1913                ],
1914                HashMap::from([
1915                    (Char::A, vec![Char::B, Char::C]),
1916                    (Char::B, vec![Char::C]),
1917                    (Char::C, vec![]),
1918                    (Char::X, vec![Char::Y, Char::Z]),
1919                    (Char::Y, vec![Char::Z]),
1920                    (Char::Z, vec![]),
1921                ]),
1922                vec![
1923                    NextLargerProgramWarning::InfiniteLoop {
1924                        original: Char::C,
1925                        next_larger: Char::B,
1926                    },
1927                    NextLargerProgramWarning::InfiniteLoop {
1928                        original: Char::Z,
1929                        next_larger: Char::X,
1930                    },
1931                ],
1932            ),
1933            (
1934                path_leading_to_loop,
1935                vec![(Char::A, Char::B), (Char::B, Char::C), (Char::C, Char::B),],
1936                HashMap::from([
1937                    (Char::A, vec![Char::B, Char::C]),
1938                    (Char::B, vec![Char::C]),
1939                    (Char::C, vec![]),
1940                ]),
1941                vec![NextLargerProgramWarning::InfiniteLoop {
1942                    original: Char::C,
1943                    next_larger: Char::B,
1944                }],
1945            ),
1946            (
1947                big_infinite_loop,
1948                big_infinite_loop_edges(),
1949                big_infinite_loop_sequences(),
1950                vec![NextLargerProgramWarning::InfiniteLoop {
1951                    original: Char(u8::MAX),
1952                    next_larger: Char(0),
1953                }],
1954            ),
1955        );
1956    }
1957}