tfm/lib.rs
1//! # tfm: TeX font metric data
2//!
3//! This is a crate for working with TeX font metric data.
4//! It includes:
5//!
6//! - Functions to read and write TeX font metric (.tfm) files
7//! to and from a value of type [`File`]
8//! ([`deserialize`](File::deserialize), [`serialize`](File::serialize)).
9//!
10//! - Functions to read and write property list (.pl) files
11//! to and from a value of type [`pl::File`]
12//! ([`from_pl_source_code`](pl::File::from_pl_source_code), [`display`](pl::File::display)).
13//!
14//! - Converters from .tfm to .pl files and vice-versa
15//! (using Rust's [`From`] trait to go between [`File`] and [`pl::File`]).
16//!
17//! ## Background
18//!
19//! Probably the most famous part of the implementation of TeX is the Knuth-Plass line breaking algorithm.
20//! This algorithm determines the "optimal" places to add line breaks when typesetting a paragraph of text.
21//! In order to run the algorithm one needs to provide the dimensions of all characters in the current font.
22//! These dimensions are used to size the boxes in the Knuth-Plass box and glue model.
23//!
24//! In TeX, character dimensions are provided using TeX font metric files.
25//! These are binary files.
26//! By convention they have a .tfm file extension.
27//! Unlike more modern file formats like TrueType, .tfm files only contain the font dimensions;
28//! they don't contains the glyphs.
29//! In general,
30//! .tfm files are produced by other software like Metafont,
31//! placed in some well-known directory in the TeX distribution,
32//! and then read into memory when TeX is running.
33//!
34//! Because .tfm files are binary files, it's hard to debug or tweak them.
35//! To remedy this, Knuth and his team developed another file format called a property list file
36//! (extension .pl or .plst)
37//! that contains the same information but in a modifiable text format.
38//! They then wrote two programs:
39//! `tftopl` to convert a .tfm file to a .pl file,
40//! and `pltotf` to convert a .pl file to a .tfm file.
41//!
42//! The general goal of this crate to fully re-implement all of the TeX font metric
43//! code written by Knuth and others.
44//! This includes `tftopl`, `pltotf`, and also the parts of TeX itself that contain logic
45//! for reading and interpreting .tfm files.
46//! However, unlike these monolithic software programs,
47//! this re-implementation is in the form of a modular library in which
48//! individual pieces of logic and be used and re-used.
49//!
50//! ## Basic example
51//!
52//! ```
53//! // Include the .tfm file for Computer Modern in size 10pt.
54//! let tfm_bytes = include_bytes!["../corpus/computer-modern/cmr10.tfm"];
55//!
56//! // Deserialize the .tfm file.
57//! let (tfm_file_or_error, deserialization_warnings) = tfm::File::deserialize(tfm_bytes);
58//! let mut tfm_file = tfm_file_or_error.expect("cmr10.tfm is a valid .tfm file");
59//! assert_eq![deserialization_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no deserialization warnings"];
60//! // TODO assert_eq![tfm_file.header.design_size, tfm::FixWord::UNITY * 10]; make it 11 to be more interesting
61//! // TODO query some data
62//!
63//! // Validate the .tfm file.
64//! let validation_warnings = tfm_file.validate_and_fix();
65//! assert_eq![validation_warnings, vec![], "the data in cmr10.tfm is 100% valid, so there are no validation warnings"];
66//!
67//! // Convert the .tfm file to a .pl file and print it.
68//! let pl_file: tfm::pl::File = tfm_file.clone().into();
69//! // TODO query some data
70//! println!["cmr10.pl:\n{}", pl_file.display(/*indent=*/2, tfm::pl::CharDisplayFormat::Default)];
71//! ```
72//!
73//!
74//! ## Advanced functionality
75//!
76//! In addition to supporting the basic use cases of querying font metric data
77//! and converting between different formats,
78//! this crate has advanced functionality for performing additional tasks on font metric data.
79//! The full set of functionality can be understood by navigating through the crate documentation.
80//! But here are 3 highlights we think are interesting:
81//!
82//! - **Language analysis of .pl files**:
83//! In `pltotf`, Knuth parses .pl files in a single pass.
84//! This crate takes a common approach nowadays of parsing in multiple passes:
85//! first constructing a [concrete syntax tree](pl::cst::Cst) (or parse tree),
86//! next constructing a [fully typed and checked abstract syntax tree](pl::ast::Ast),
87//! and finally building the [`pl::File`] itself.
88//! Each of the passes is exposed, so you can e.g. just build the AST for the .pl file and
89//! do some analysis on it.
90//!
91//! - **Debug output for .tfm files**:
92//!
93//! - **Compilation of lig/kern programs**:
94//!
95//!
96//! ## Binaries
97//!
98//! The Texcraft project produces 3 binaries based on this crate:
99//!
100//! - `tftopl` and `pltotf`: re-implementations of Knuth's programs.
101//! - `tfmtools`: a new binary that has a bunch of different tools
102//! for working with TeX font metric data.
103//! Run `tfmtools help` to list all of the available tools.
104//!
105//! In the root of [the Texcraft repository](https://github.com/jamespfennell/texcraft)
106//! these tools can be run with `cargo run --bin $NAME`
107//! and built with `cargo build --bin $NAME`.
108//!
109//!
110//! ## Correctness
111//!
112//! As part of the development of this crate significant effort has been spent
113//! ensuring it exactly replicates the work of Knuth.
114//! This correctness checking is largely based around diff testing the binaries
115//! `tftopl` and `pltotf`.
116//! We verify that the Texcraft and Knuth implementations have the same output
117//! and generate the same error messages.
118//!
119//! This diff testing has been performed in a few different ways:
120//!
121//! - We have run diff tests over all ~100,000 .tfm files in CTAN.
122//! These tests verify that `tftopl` gives the correct result,
123//! and that running `pltotf` on the output .pl file gives the correct result too.
124//! Unfortunately running `pltotf` on the .pl files in CTAN is infeasible
125//! because most of these files are Perl scripts, not property list files.
126//!
127//! - We have developed a fuzz testing harness (so far just for `tftopl`)
128//! that generates highly heterogenous .tfm files and verifies that `tftopl` gives the correct result.
129//! This fuzz testing has uncovered many issues in the Texcraft implementation,
130//! and has even identified [a 30-year old bug](https://tug.org/pipermail/tex-k/2024-March/004031.html)
131//! in Knuth's implementation of `tftopl`.
132//!
133//! Any .tfm or .pl file that exposes a bug in this library is added to
134//! [our automated testing corpus](https://github.com/jamespfennell/texcraft/tree/main/crates/tfm/bin/tests/data).
135//! Running `cargo t` validates that Texcraft's binaries give the same result as Knuth's binaries
136//! (the output of Knuth's binaries is in source control).
137//! This ensures there are no regressions.
138//!
139//! If you discover a .tfm or .pl file such that the Texcraft and Knuth implementations
140//! diverge, this indicates there is a bug in this library.
141//! Please create an issue on the Texcraft GitHub repo.
142//! We will fix the bug and add your files to the testing corpus.
143
144pub mod algorithms;
145use std::{
146 collections::{BTreeSet, HashMap, HashSet},
147 num::NonZeroU8,
148};
149pub mod ligkern;
150pub mod pl;
151
152use std::collections::BTreeMap;
153
154mod debug;
155mod deserialize;
156mod serialize;
157mod validate;
158
159pub use deserialize::DeserializationError;
160pub use deserialize::DeserializationWarning;
161pub use deserialize::RawFile;
162pub use deserialize::SubFileSizes;
163pub use validate::ValidationWarning;
164
165/// Complete contents of a TeX font metric (.tfm) file.
166///
167/// The struct contain multiple vectors.
168/// In TeX and TFtoPL there is an optimization in which all of data in the vectors
169/// is stored in one large vector of 32-bit integers.
170/// The conversion from [u32] to the specific types like [FixWord] are then done when the
171/// data is needed.
172/// This makes the memory footprint of this type much more compact,
173/// and such a change may be considered in the future.
174///
175/// In fact in TeX the font data for all fonts is stored in one contiguous piece of memory
176/// (`font_info`, defined in TeX82.2021.549).
177/// This is a little too unsafe to pull off though.
178#[derive(Clone, Debug, PartialEq, Eq)]
179pub struct File {
180 /// Header.
181 pub header: Header,
182
183 /// Smallest character in the .tfm.
184 pub smallest_char: Char,
185
186 /// Character dimensions.
187 pub char_dimens: BTreeMap<Char, CharDimensions>,
188
189 /// Character tags.
190 ///
191 /// Note there is no correlation between a character having
192 /// a tag and a character having a dimension.
193 /// All four combinations of (has or hasn't dimensions) and (has or hasn't a tag)
194 /// are possible.
195 pub char_tags: BTreeMap<Char, CharTag>,
196
197 /// Tags that have been unset, but whose discriminant is still written to a .tfm file by PLtoTF.
198 pub unset_char_tags: BTreeMap<Char, u8>,
199
200 /// Character widths
201 pub widths: Vec<FixWord>,
202
203 /// Character heights
204 pub heights: Vec<FixWord>,
205
206 /// Character depths
207 pub depths: Vec<FixWord>,
208
209 /// Character italic corrections
210 pub italic_corrections: Vec<FixWord>,
211
212 /// Lig kern program.
213 pub lig_kern_program: ligkern::lang::Program,
214
215 /// Kerns. These are referenced from inside the lig kern commands.
216 pub kerns: Vec<FixWord>,
217
218 /// Extensible characters.
219 pub extensible_chars: Vec<ExtensibleRecipe>,
220
221 /// Font parameters.
222 pub params: Vec<FixWord>,
223}
224
225/// Data about one character in a .tfm file.
226#[derive(Clone, Debug, PartialEq, Eq)]
227pub struct CharDimensions {
228 /// Index of the width of this character in the widths array.
229 ///
230 /// In valid TFM files, this index will always be non-zero.
231 /// This is because if the width index is zero it means there is no data for the character in the file.
232 /// In this case the other indices (height, etc.) are necessarily zero.
233 /// See TFtoPL.2014.? (where blocks of data with a zero width index are skipped)
234 /// and PLtoTF.2014.? (where missing characters are written with all indices 0).
235 ///
236 /// There is one edge case where this index can be zero.
237 /// This is if the width index is invalid.
238 /// In this case tftopl essentially sets the width index to 0.
239 ///
240 /// Note that even if a character doesn't have dimensions, it can still have a tag.
241 pub width_index: WidthIndex,
242 /// Index of the height of this character in the height array.
243 pub height_index: u8,
244 pub depth_index: u8,
245 pub italic_index: u8,
246}
247
248#[derive(Copy, Clone, Debug, PartialEq, Eq)]
249pub enum WidthIndex {
250 Invalid,
251 Valid(NonZeroU8),
252}
253
254impl WidthIndex {
255 pub fn get(&self) -> u8 {
256 match self {
257 WidthIndex::Invalid => 0,
258 WidthIndex::Valid(n) => n.get(),
259 }
260 }
261}
262
263/// Tag of a character in a .tfm file.
264#[derive(Clone, Debug, PartialEq, Eq)]
265pub enum CharTag {
266 Ligature(u8),
267 List(Char),
268 Extension(u8),
269}
270
271impl CharTag {
272 pub fn ligature(&self) -> Option<u8> {
273 match self {
274 CharTag::Ligature(l) => Some(*l),
275 CharTag::List(_) | CharTag::Extension(_) => None,
276 }
277 }
278 pub fn list(&self) -> Option<Char> {
279 match self {
280 CharTag::List(c) => Some(*c),
281 CharTag::Ligature(_) | CharTag::Extension(_) => None,
282 }
283 }
284 pub fn extension(&self) -> Option<u8> {
285 match self {
286 CharTag::Extension(u) => Some(*u),
287 CharTag::Ligature(_) | CharTag::List(_) => None,
288 }
289 }
290}
291
292/// Extensible recipe instruction in a .tfm file.
293#[derive(Debug, Default, Clone, PartialEq, Eq)]
294pub struct ExtensibleRecipe {
295 pub top: Option<Char>,
296 pub middle: Option<Char>,
297 pub bottom: Option<Char>,
298 pub rep: Char,
299}
300
301impl ExtensibleRecipe {
302 pub fn is_seven_bit(&self) -> bool {
303 self.chars().all(|c| c.is_seven_bit())
304 }
305
306 pub fn chars(&self) -> impl Iterator<Item = Char> {
307 [self.top, self.middle, self.bottom, Some(self.rep)]
308 .into_iter()
309 .flatten()
310 }
311}
312
313impl Default for File {
314 fn default() -> Self {
315 Self {
316 header: Default::default(),
317 smallest_char: Char(1),
318 char_dimens: Default::default(),
319 char_tags: Default::default(),
320 unset_char_tags: Default::default(),
321 widths: vec![FixWord::ZERO],
322 heights: vec![FixWord::ZERO],
323 depths: vec![FixWord::ZERO],
324 italic_corrections: vec![FixWord::ZERO],
325 lig_kern_program: Default::default(),
326 kerns: vec![],
327 extensible_chars: vec![],
328 params: Default::default(),
329 }
330 }
331}
332
333/// Print debugging information about a .tfm file.
334pub fn debug<'a>(
335 path: Option<&'a str>,
336 sub_file_sizes: SubFileSizes,
337 tfm_file: Option<&'a File>,
338 raw_file: Option<&'a RawFile<'a>>,
339 sections: Vec<Section>,
340) -> impl std::fmt::Display + 'a {
341 debug::Debug {
342 sub_file_sizes,
343 path,
344 raw_file,
345 tfm_file,
346 sections,
347 }
348}
349
350impl File {
351 pub fn from_raw_file(raw_file: &RawFile) -> Self {
352 deserialize::from_raw_file(raw_file)
353 }
354
355 pub fn validate_and_fix(&mut self) -> Vec<ValidationWarning> {
356 validate::validate_and_fix(self)
357 }
358
359 pub fn deserialize(
360 b: &[u8],
361 ) -> (
362 Result<File, DeserializationError>,
363 Vec<DeserializationWarning>,
364 ) {
365 deserialize::deserialize(b)
366 }
367
368 pub fn serialize(&self) -> Vec<u8> {
369 serialize::serialize(self)
370 }
371
372 /// Return a map from characters to the lig/kern entrypoint for that character.
373 /// TODO: can probably return impl Iterator<Item=(Char, u8)>
374 pub fn lig_kern_entrypoints(&self) -> HashMap<Char, u8> {
375 self.char_tags
376 .iter()
377 .filter_map(|(c, d)| match *d {
378 CharTag::Ligature(l) => Some((*c, l)),
379 _ => None,
380 })
381 .collect()
382 }
383
384 fn char_info_bounds(&self) -> Option<(Char, Char)> {
385 let mut r: Option<(Char, Char)> = None;
386 for c in self.char_dimens.keys().copied() {
387 r = Some(match r {
388 None => (c, c),
389 Some((lower, upper)) => (
390 if c.0 < lower.0 { c } else { lower },
391 if c.0 < upper.0 { upper } else { c },
392 ),
393 })
394 }
395 r
396 }
397
398 /// Calculate the checksum of this .tfm file.
399 pub fn checksum(&self) -> u32 {
400 // This checksum algorithm is in PLtoTF.2014.134.
401 let (bc, ec) = self.char_info_bounds().unwrap_or((Char(1), Char(0)));
402 let mut b = [bc.0, ec.0, bc.0, ec.0];
403 for c in bc.0..=ec.0 {
404 let char_dimens = match self.char_dimens.get(&Char(c)) {
405 None => continue,
406 Some(char_dimens) => char_dimens,
407 };
408 let width = self.widths[char_dimens.width_index.get() as usize].0;
409 // TODO: adjust based on the design units
410 let width = width + (c as i32 + 4) * 0o20_000_000;
411 let add = |b: u8, m: u8| -> u8 {
412 (((b as i32) + (b as i32) + width) % (m as i32))
413 .try_into()
414 .expect("(i32 % u8) is always a u8")
415 };
416 b = [
417 add(b[0], 255),
418 add(b[1], 253),
419 add(b[2], 251),
420 add(b[3], 247),
421 ]
422 }
423 u32::from_be_bytes(b)
424 }
425
426 pub fn named_param(&self, param: NamedParameter) -> Option<FixWord> {
427 self.params.get((param.number() - 1) as usize).copied()
428 }
429
430 pub fn named_param_scaled(&self, param: NamedParameter) -> Option<core::Scaled> {
431 self.named_param(param)
432 .map(|p| p.to_scaled(self.header.design_size))
433 }
434}
435
436impl From<crate::pl::File> for File {
437 fn from(pl_file: crate::pl::File) -> Self {
438 let mut char_bounds: Option<(Char, Char)> = None;
439
440 let lig_kern_entrypoints = pl_file.lig_kern_entrypoints(true);
441 let mut lig_kern_program = pl_file.lig_kern_program;
442 let kerns = lig_kern_program.unpack_kerns();
443 let lig_kern_entrypoints = lig_kern_program.pack_entrypoints(lig_kern_entrypoints);
444
445 let mut widths = pl_file.additional_widths;
446 let mut heights = pl_file.additional_heights;
447 let mut depths = pl_file.additional_depths;
448 let mut italic_corrections = pl_file.additional_italics;
449 for (char, char_dimens) in &pl_file.char_dimens {
450 widths.push(char_dimens.width.unwrap_or_default());
451 match char_dimens.height {
452 None | Some(FixWord::ZERO) => {}
453 Some(height) => heights.push(height),
454 }
455 match char_dimens.depth {
456 None | Some(FixWord::ZERO) => {}
457 Some(depth) => depths.push(depth),
458 }
459 match char_dimens.italic_correction {
460 None | Some(FixWord::ZERO) => {}
461 Some(italic_correction) => italic_corrections.push(italic_correction),
462 }
463 char_bounds = Some(match char_bounds {
464 None => (*char, *char),
465 Some((lower, upper)) => (
466 if *char < lower { *char } else { lower },
467 if *char > upper { *char } else { upper },
468 ),
469 })
470 }
471 let (widths, width_to_index) = compress(&widths, 255);
472 let (heights, height_to_index) = compress(&heights, 15);
473 let (depths, depth_to_index) = compress(&depths, 15);
474 let (italic_corrections, italic_correction_to_index) = compress(&italic_corrections, 63);
475 let mut extensible_chars = vec![];
476 let char_dimens = match char_bounds {
477 None => Default::default(),
478 Some((lower, upper)) => {
479 let mut m: BTreeMap<Char, CharDimensions> = Default::default();
480 for c in lower.0..=upper.0 {
481 let pl_data = match pl_file.char_dimens.get(&Char(c)) {
482 Some(pl_data) => pl_data,
483 None => continue,
484 };
485 let width = pl_data.width.unwrap_or_default();
486 let width_index = *width_to_index.get(&width).expect(
487 "the map returned from compress(_,_) contains every input as a key",
488 );
489 m.insert(
490 Char(c),
491 CharDimensions {
492 width_index: WidthIndex::Valid(width_index),
493 height_index: match pl_data.height {
494 None => 0,
495 Some(height) => {
496 // If the height data is missing from the height_to_index map, it's because
497 // the height is 0. Similar with depths and italic corrections.
498 height_to_index
499 .get(&height)
500 .copied()
501 .map(NonZeroU8::get)
502 .unwrap_or(0)
503 }
504 },
505 depth_index: match pl_data.depth {
506 None => 0,
507 Some(depth) => depth_to_index
508 .get(&depth)
509 .copied()
510 .map(NonZeroU8::get)
511 .unwrap_or(0),
512 },
513 italic_index: match pl_data.italic_correction {
514 None => 0,
515 Some(italic_correction) => italic_correction_to_index
516 .get(&italic_correction)
517 .copied()
518 .map(NonZeroU8::get)
519 .unwrap_or(0),
520 },
521 },
522 );
523 }
524 m
525 }
526 };
527 let ordered_chars = {
528 let mut m: Vec<Char> = pl_file.char_tags.keys().copied().collect();
529 m.sort();
530 m
531 };
532 let char_tags = ordered_chars.into_iter().map(|c| {
533 (c,
534 match pl_file.char_tags.get(&c).unwrap() {
535 pl::CharTag::Ligature(_) => {
536 let entrypoint = *lig_kern_entrypoints.get(&c).expect("the map returned by crate::ligkern::lang::compress_entrypoints has a key for all chars with a lig tag");
537 CharTag::Ligature(entrypoint)
538 }
539 pl::CharTag::List(c) => CharTag::List(*c),
540 pl::CharTag::Extension(e) => {
541 let index: u8 = extensible_chars.len().try_into().unwrap();
542 extensible_chars.push(e.clone());
543 CharTag::Extension(index)
544 }
545 },
546 )
547 }).collect();
548
549 let mut file = Self {
550 header: pl_file.header,
551 smallest_char: char_bounds.map(|t| t.0).unwrap_or(Char(1)),
552 char_dimens,
553 char_tags,
554 unset_char_tags: pl_file.unset_char_tags.clone(),
555 widths,
556 heights,
557 depths,
558 italic_corrections,
559 lig_kern_program,
560 kerns,
561 extensible_chars,
562 params: pl_file.params,
563 };
564 if file.header.checksum.is_none() {
565 file.header.checksum = Some(file.checksum());
566 }
567 file
568 }
569}
570
571/// Lossy compression of numbers for TFM files.
572///
573/// The TFM file format can only store up to 15 heights, 15 depths and 63 italic corrections.
574/// If a property list file contains, e.g., more than 15 distinct heights, something has to give.
575/// PLtoTF contains a lossy compression algorithm that takes a list of values and returns another
576/// bounded list of values which approximates the original list.
577/// This is implemented in PLtoTF.2014.75-80 and re-implemented here.
578///
579/// ## How the algorithm works.
580///
581/// For a given delta, the algorithm partitions the ordered list of values such that within
582/// each partition the maximum distance between two elements is delta. All of the values within
583/// the partition are then approximated by `(interval_max+interval_min)/2`.
584/// As such, each value may be increased or decreased by up to `delta/2`.
585/// After this procedure, the list of values is replaced by the list of approximations.
586/// This compresses the list of values into a list whose size is the number of intervals.
587///
588/// Given this subroutine, the algorithm finds the smallest delta such that the number of intervals
589/// is less than the required maximum (e.g., 15 for heights).
590///
591/// There are some important features of this algorithm to note:
592///
593/// - For a given delta value there may be multiple possible partitions.
594/// The algorithm uses a greedy approach in which it maximizes the size of the first partition,
595/// then the size of the second partition, and so on.
596///
597/// - Distinct delta values can yield the same partition.
598/// For example, if the initial values are `[1, 4, 5]` then any delta in the range `[1, 3)`
599/// gives the same result (`[[1], [4, 5]]`)
600/// Whenever we check a delta, we are really checking the _interval of deltas_ that gives the same result.
601/// Both Knuth's and our implementations use this fact to speed up the search by reducing the search space.
602/// E.g. in the example above, after checking `delta=1` we _don't_ check `delta=2`.
603///
604/// ## This re-implementation
605///
606/// The re-implementation here follows PLtoTF closely enough, but with one modification.
607/// To find the optimal delta, Knuth first calculates the minimal possible delta.
608/// This is the minimum distance between adjacent elements in the ordered list of values.
609/// In general this will not be a valid solution because the number of intervals
610/// it generates will be large.
611/// So, he next finds the smallest `k` such that `2^k * min_delta` is a valid solution.
612/// Te then does a upwards linear search within the interval `[min_delta * 2^{k-1}, min_delta * 2^k]` to find
613/// the optimal delta.
614///
615/// Checking a particular delta is `O(n)`.
616/// The worst-case running time of Knuth's algorithm is then `O(n^3)` because the interval
617/// `[2^{k-1}, 2^k]` can contain `O(n^2)` distinct deltas to check in the worst-case [Note 1].
618///
619/// In the re-implementation here we realize that the problem of finding the smallest possible delta
620/// is a classic binary search problem.
621/// This is because if delta is a valid solution, any larger delta also is;
622/// and if delta is not a valid solution, any smaller delta is also not a valid solution.
623/// The re-implementation using binary search is `O(n log n)`.
624/// Moreover, the constant factors are about the same.
625///
626/// [Note 1] In the worst-case there are O(n^2) distinct deltas because each pair of elements yields a delta.
627/// Let m be the smallest delta and M the largest delta.
628/// In the initial `2^k`-based ranging scheme, the largest `K` satisfies `m 2^{K-1} < M <= m 2^K`,
629/// or `K-1 <= log_2(M/m)`. Thus there are `K=O(1)` ranges in this initial scheme.
630/// By the pigeon-hole principle, there exists a `k` such that the range `[m * 2^{k-1}, m * 2^k]`
631/// contains O(n^2) elements.
632/// In the worst-case, the solution is the maximum element of this range.
633pub fn compress(values: &[FixWord], max_size: u8) -> (Vec<FixWord>, HashMap<FixWord, NonZeroU8>) {
634 let max_size = max_size as usize;
635 let dedup_values = {
636 let s: HashSet<FixWord> = values.iter().copied().collect();
637 // remove the zero value for non-widths
638 // and then add it back in at the start
639 let mut v: Vec<FixWord> = s.into_iter().collect();
640 v.sort();
641 v
642 };
643 // After deduplication, it is possible we don't need to compress at all so we can exit early.
644 // This also handles the case when the values slice is empty.
645 if dedup_values.len() <= max_size {
646 let m: HashMap<FixWord, NonZeroU8> = dedup_values
647 .iter()
648 .enumerate()
649 .map(|(i, &w)| {
650 let i: u8 = i.try_into().expect("`dedup_values` has at most `max_size` elements, so the index is at most `max_size-1`");
651 let i: NonZeroU8 = (i+1).try_into().expect("`i<=max_size-1<=u8::MAX`, so `i+1<=u8::MAX`");
652 (w, i) })
653 .collect();
654 let mut dedup_values = dedup_values;
655 dedup_values.push(FixWord::ZERO);
656 dedup_values.rotate_right(1);
657 return (dedup_values, m);
658 }
659
660 // For the binary search we maintain lower and upper indices as usual.
661 // The optimal delta is in the interval [lower, upper].
662 //
663 // Invariant: delta<lower is never a solution.
664 // Because delta must be non-negative, we initialize it to zero.
665 let mut lower = FixWord::ZERO;
666 // Invariant: delta=upper is always solution.
667 // To initialize upper and begin the search we construct a solution that always works: a single
668 // interval encompassing the entire slice and the largest delta possible.
669 let max_delta = *dedup_values.last().unwrap() - *dedup_values.first().unwrap();
670 let mut upper = max_delta;
671 let mut solution = vec![dedup_values.len()];
672
673 let mut buffer = vec![];
674 while lower < upper {
675 // After the following line delta is potentially equal to lower. This is what we want as
676 // we know upper is a solution so to advance the search when upper=lower+1
677 // we need to check lower+1.
678 let delta = lower + (upper - lower) / 2;
679
680 let mut interval_start = *dedup_values.first().unwrap();
681 // The smallest delta such that the candidate solution will be the same.
682 // This is the maximum of all gaps that don't start a new interval.
683 let mut delta_lower = FixWord::ZERO;
684 // The largest delta such that the candidate solution will be different.
685 // This is the minimum of all gaps that start a new interval.
686 let mut delta_upper = max_delta;
687 for (i, &v) in dedup_values.iter().enumerate() {
688 let gap = v - interval_start;
689 if gap > delta {
690 // We need to start a new interval
691 if gap < delta_upper {
692 delta_upper = gap;
693 }
694 buffer.push(i);
695 // If the candidate solution is already too big, we can exit early.
696 if buffer.len() >= max_size {
697 break;
698 }
699 interval_start = v;
700 } else {
701 // We need to extend the current interval
702 // For any delta in the range [gap, delta] we would have made the same choice here.
703 if gap > delta_lower {
704 delta_lower = gap;
705 }
706 }
707 }
708 buffer.push(dedup_values.len());
709
710 if buffer.len() <= max_size {
711 // solution
712 std::mem::swap(&mut buffer, &mut solution);
713 upper = delta_lower;
714 } else {
715 // not a solution
716 lower = delta_upper;
717 }
718 buffer.clear();
719 }
720
721 let mut value_to_index = HashMap::<FixWord, NonZeroU8>::new();
722 let mut result = vec![FixWord::ZERO];
723 let mut previous = 0_usize;
724 for i in solution {
725 let interval = &dedup_values[previous..i];
726 previous = i;
727 for &v in interval {
728 let index: u8 = result.len().try_into().expect("the `result` array contains at most `1+max_size` elements, so the index it at most `max_size` which is a u8");
729 let index: NonZeroU8 = index
730 .try_into()
731 .expect("the `result` array contains at least 1 element so this is never 0");
732 value_to_index.insert(v, index);
733 }
734 let replacement = (*interval.last().unwrap() + *interval.first().unwrap()) / 2;
735 result.push(replacement);
736 }
737
738 (result, value_to_index)
739}
740
741#[derive(Debug, Hash, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
742pub enum Section {
743 /// Sub-file sizes
744 SubFileSizes,
745 /// Header
746 Header,
747 /// Character data
748 CharInfos,
749 /// Widths array
750 Widths,
751 /// Heights array
752 Heights,
753 /// Depths array
754 Depths,
755 /// Italic corrections array
756 ItalicCorrections,
757 /// Lig/kern instructions
758 LigKern,
759 /// Kerns array
760 Kerns,
761 /// Extensible recipes
762 ExtensibleRecipes,
763 /// Params array
764 Params,
765}
766
767impl Section {
768 pub const NAMES: [&'static str; 11] = [
769 "sub-file-sizes",
770 "header",
771 "char-infos",
772 "widths",
773 "heights",
774 "depths",
775 "italic-corrections",
776 "lig-kern",
777 "kerns",
778 "extensible-recipes",
779 "params",
780 ];
781 pub const ALL_SECTIONS: [Section; 11] = [
782 Section::SubFileSizes,
783 Section::Header,
784 Section::CharInfos,
785 Section::Widths,
786 Section::Heights,
787 Section::Depths,
788 Section::ItalicCorrections,
789 Section::LigKern,
790 Section::Kerns,
791 Section::ExtensibleRecipes,
792 Section::Params,
793 ];
794}
795
796impl TryFrom<&str> for Section {
797 type Error = ();
798
799 fn try_from(value: &str) -> Result<Self, Self::Error> {
800 for i in 0..Section::NAMES.len() {
801 if Section::NAMES[i] == value {
802 return Ok(Section::ALL_SECTIONS[i]);
803 }
804 }
805 Err(())
806 }
807}
808
809impl std::fmt::Display for Section {
810 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
811 write!(f, "{}", Section::NAMES[*self as usize])
812 }
813}
814
815/// The TFM header, which contains metadata about the file.
816///
817/// This is defined in TFtoPL.2014.10.
818#[derive(Clone, Debug, Default, PartialEq, Eq)]
819pub struct Header {
820 /// The font checksum, if specified.
821 ///
822 /// In .tfm files checksums are always specified because the format has no
823 /// way to omit a checksum.
824 ///
825 /// In .pl files checksums are specified if the `CHECKSUM` node appears.
826 /// If no checksum is specified in a .pl file, pltotf calculates the
827 /// correct value and writes that.
828 ///
829 /// In TeX82, this is stored in the `font_check` array (TeX82.2021.549).
830 pub checksum: Option<u32>,
831 /// In TeX82, this is stored in the `font_dsize` array (TeX82.2021.549).
832 pub design_size: FixWord,
833 pub design_size_valid: bool,
834 pub character_coding_scheme: Option<String>,
835 pub font_family: Option<String>,
836 pub seven_bit_safe: Option<bool>,
837 pub face: Option<Face>,
838 /// The TFM format allows the header to contain arbitrary additional data.
839 pub additional_data: Vec<u32>,
840}
841
842impl Header {
843 /// Returns the default header when parsing property list files.
844 ///
845 /// This is defined in PLtoTF.2014.70.
846 pub fn pl_default() -> Header {
847 Header {
848 checksum: None,
849 design_size: FixWord::ONE * 10,
850 design_size_valid: true,
851 character_coding_scheme: Some("UNSPECIFIED".into()),
852 font_family: Some("UNSPECIFIED".into()),
853 seven_bit_safe: None,
854 face: Some(0.into()),
855 additional_data: vec![],
856 }
857 }
858
859 /// Returns the default header when parsing .tfm files.
860 ///
861 /// This is defined in PLtoTF.2014.70.
862 pub fn tfm_default() -> Header {
863 Header {
864 checksum: Some(0),
865 design_size: FixWord::ZERO,
866 design_size_valid: true,
867 character_coding_scheme: None,
868 font_family: None,
869 seven_bit_safe: None,
870 face: None,
871 additional_data: vec![],
872 }
873 }
874}
875
876/// A character in a TFM file.
877///
878/// TFM and PL files only support 1-byte characters.
879#[derive(Debug, Default, PartialEq, Eq, Hash, Clone, Copy, PartialOrd, Ord)]
880#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
881pub struct Char(pub u8);
882
883impl std::fmt::Display for Char {
884 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
885 if (self.0 as char).is_ascii_graphic() {
886 write!(f, "{}", self.0 as char)
887 } else {
888 write!(f, "0x{:02x}", self.0)
889 }
890 }
891}
892
893impl From<u8> for Char {
894 fn from(value: u8) -> Self {
895 Char(value)
896 }
897}
898
899impl TryFrom<char> for Char {
900 type Error = std::char::TryFromCharError;
901
902 fn try_from(value: char) -> Result<Self, Self::Error> {
903 let u: u8 = value.try_into()?;
904 Ok(Char(u))
905 }
906}
907
908impl From<Char> for char {
909 fn from(value: Char) -> Self {
910 value.0 as char
911 }
912}
913
914macro_rules! const_chars {
915 ( $( ($name: ident, $value: expr), )+ ) => {
916 $(
917 pub const $name: Char = Char($value);
918 )+
919 };
920}
921
922impl Char {
923 const_chars![
924 (A, b'A'),
925 (B, b'B'),
926 (C, b'C'),
927 (D, b'D'),
928 (V, b'V'),
929 (W, b'W'),
930 (X, b'X'),
931 (Y, b'Y'),
932 (Z, b'Z'),
933 ];
934
935 pub fn is_seven_bit(&self) -> bool {
936 self.0 <= 127
937 }
938}
939
940#[derive(Debug, PartialEq, Eq, Copy, Clone)]
941#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
942pub enum FaceWeight {
943 Light,
944 Medium,
945 Bold,
946}
947
948#[derive(Debug, PartialEq, Eq, Copy, Clone)]
949#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
950pub enum FaceSlope {
951 Roman,
952 Italic,
953}
954
955#[derive(Debug, PartialEq, Eq, Copy, Clone)]
956#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
957pub enum FaceExpansion {
958 Regular,
959 Condensed,
960 Extended,
961}
962
963#[derive(Debug, PartialEq, Eq, Copy, Clone)]
964#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
965pub enum Face {
966 Valid(FaceWeight, FaceSlope, FaceExpansion),
967 Other(u8),
968}
969
970impl From<u8> for Face {
971 fn from(value: u8) -> Self {
972 if value >= 18 {
973 return Face::Other(value);
974 }
975 let a = match (value % 6) / 2 {
976 0 => FaceWeight::Medium,
977 1 => FaceWeight::Bold,
978 2 => FaceWeight::Light,
979 _ => unreachable!(),
980 };
981 let b = match value % 2 {
982 0 => FaceSlope::Roman,
983 1 => FaceSlope::Italic,
984 _ => unreachable!(),
985 };
986 let c = match value / 6 {
987 0 => FaceExpansion::Regular,
988 1 => FaceExpansion::Condensed,
989 2 => FaceExpansion::Extended,
990 _ => unreachable!(),
991 };
992 Face::Valid(a, b, c)
993 }
994}
995
996impl From<Face> for u8 {
997 fn from(value: Face) -> Self {
998 match value {
999 Face::Valid(w, s, c) => {
1000 let a: u8 = match w {
1001 FaceWeight::Medium => 0,
1002 FaceWeight::Bold => 1,
1003 FaceWeight::Light => 2,
1004 };
1005 let b: u8 = match s {
1006 FaceSlope::Roman => 0,
1007 FaceSlope::Italic => 1,
1008 };
1009 let c: u8 = match c {
1010 FaceExpansion::Regular => 0,
1011 FaceExpansion::Condensed => 1,
1012 FaceExpansion::Extended => 2,
1013 };
1014 c * 6 + a * 2 + b
1015 }
1016 Face::Other(b) => b,
1017 }
1018 }
1019}
1020
1021/// A named TeX font metric parameter.
1022#[derive(PartialEq, Eq, Debug, Copy, Clone)]
1023#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1024pub enum NamedParameter {
1025 Slant,
1026 Space,
1027 Stretch,
1028 Shrink,
1029 XHeight,
1030 Quad,
1031 ExtraSpace,
1032 Num1,
1033 Num2,
1034 Num3,
1035 Denom1,
1036 Denom2,
1037 Sup1,
1038 Sup2,
1039 Sup3,
1040 Sub1,
1041 Sub2,
1042 SupDrop,
1043 SubDrop,
1044 Delim1,
1045 Delim2,
1046 AxisHeight,
1047 DefaultRuleThickness,
1048 BigOpSpacing1,
1049 BigOpSpacing2,
1050 BigOpSpacing3,
1051 BigOpSpacing4,
1052 BigOpSpacing5,
1053}
1054
1055impl NamedParameter {
1056 pub fn number(&self) -> u8 {
1057 match self {
1058 NamedParameter::Slant => 1,
1059 NamedParameter::Space => 2,
1060 NamedParameter::Stretch => 3,
1061 NamedParameter::Shrink => 4,
1062 NamedParameter::XHeight => 5,
1063 NamedParameter::Quad => 6,
1064 NamedParameter::ExtraSpace => 7,
1065 NamedParameter::Num1 => 8,
1066 NamedParameter::Num2 => 9,
1067 NamedParameter::Num3 => 10,
1068 NamedParameter::Denom1 => 11,
1069 NamedParameter::Denom2 => 12,
1070 NamedParameter::Sup1 => 13,
1071 NamedParameter::Sup2 => 14,
1072 NamedParameter::Sup3 => 15,
1073 NamedParameter::Sub1 => 16,
1074 NamedParameter::Sub2 => 17,
1075 NamedParameter::SupDrop => 18,
1076 NamedParameter::SubDrop => 19,
1077 NamedParameter::Delim1 => 20,
1078 NamedParameter::Delim2 => 21,
1079 NamedParameter::AxisHeight => 22,
1080 NamedParameter::DefaultRuleThickness => 8,
1081 NamedParameter::BigOpSpacing1 => 9,
1082 NamedParameter::BigOpSpacing2 => 10,
1083 NamedParameter::BigOpSpacing3 => 11,
1084 NamedParameter::BigOpSpacing4 => 12,
1085 NamedParameter::BigOpSpacing5 => 13,
1086 }
1087 }
1088}
1089
1090/// Warning from the compilation of "next larger character" instructions.
1091#[derive(PartialEq, Eq, Debug, Clone)]
1092pub enum NextLargerProgramWarning {
1093 NonExistentCharacter { original: Char, next_larger: Char },
1094 InfiniteLoop { original: Char, next_larger: Char },
1095}
1096
1097impl NextLargerProgramWarning {
1098 pub fn bad_char(&self) -> Char {
1099 match self {
1100 NextLargerProgramWarning::NonExistentCharacter {
1101 original,
1102 next_larger: _,
1103 } => *original,
1104 NextLargerProgramWarning::InfiniteLoop { original, .. } => *original,
1105 }
1106 }
1107
1108 /// Returns the warning message the TFtoPL program prints for this kind of error.
1109 pub fn tftopl_message(&self) -> String {
1110 match self {
1111 NextLargerProgramWarning::NonExistentCharacter {
1112 original: _,
1113 next_larger,
1114 } => {
1115 format![
1116 "Bad TFM file: Character list link to nonexistent character '{:03o}.",
1117 next_larger.0
1118 ]
1119 }
1120 NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1121 format!["Bad TFM file: Cycle in a character list!\nCharacter '{:03o} now ends the list.", original.0]
1122 }
1123 }
1124 }
1125
1126 /// Returns the section in Knuth's TFtoPL (version 2014) in which this warning occurs.
1127 pub fn tftopl_section(&self) -> u8 {
1128 84
1129 }
1130
1131 /// Returns the section in Knuth's PLtoTF (version 2014) in which this warning occurs.
1132 pub fn pltotf_section(&self) -> u8 {
1133 match self {
1134 NextLargerProgramWarning::NonExistentCharacter { .. } => 111,
1135 NextLargerProgramWarning::InfiniteLoop { .. } => 113,
1136 }
1137 }
1138
1139 /// Returns the warning message the PLtoTF program prints for this kind of error.
1140 pub fn pltotf_message(&self) -> String {
1141 match self {
1142 NextLargerProgramWarning::NonExistentCharacter {
1143 original,
1144 next_larger: _,
1145 } => {
1146 format![
1147 "The character NEXTLARGER than '{:03o} had no CHARACTER spec.",
1148 original.0
1149 ]
1150 }
1151 NextLargerProgramWarning::InfiniteLoop { original, .. } => {
1152 format![
1153 "A cycle of NEXTLARGER characters has been broken at '{:03o}.",
1154 original.0
1155 ]
1156 }
1157 }
1158 }
1159}
1160/// Compiled program of "next larger character" instructions
1161///
1162/// The .tfm file format can associate a "next larger" character to any character in a font.
1163/// Next larger characters form sequences: i.e. B can be the next larger character for A,
1164/// and C can be the next larger character for B,
1165/// leading to the sequences A-B-C.
1166/// These next larger characters are used at certain points in TeX.
1167/// TeX occasionally traverses the entire sequence for a given starting character (e.g. A).
1168///
1169/// As with ligatures, next larger specifications can contain infinite loops -
1170/// e.g, if X is the next larger character for Y
1171/// and Y is the next larger character for X.
1172/// These loops are invalid and removed by TFtoPL and PLtoTF.
1173///
1174/// Motivated by the idea of "parse don't validate", this type represents
1175/// a compiled version of the next larger specifications in which infinite loops
1176/// are statically guaranteed not to exist.
1177///
1178/// The basic use of a valid program looks like this:
1179///
1180/// ```
1181/// # use tfm::*;
1182/// let edges = vec![
1183/// (Char::A, Char::B),
1184/// (Char::B, Char::C),
1185/// ];
1186/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1187///
1188/// assert_eq![warnings, vec![]];
1189///
1190/// let sequence_A: Vec<Char> = next_larger_program.get(Char::A).collect();
1191/// assert_eq!(sequence_A, vec![Char::B, Char::C]);
1192///
1193/// let sequence_B: Vec<Char> = next_larger_program.get(Char::B).collect();
1194/// assert_eq!(sequence_B, vec![Char::C]);
1195///
1196/// let sequence_C: Vec<Char> = next_larger_program.get(Char::C).collect();
1197/// assert_eq!(sequence_C, vec![]);
1198///
1199/// // Character that is not in the program.
1200/// let sequence_D: Vec<Char> = next_larger_program.get(Char::D).collect();
1201/// assert_eq!(sequence_D, vec![]);
1202/// ```
1203///
1204/// ## Warnings
1205///
1206/// There are two types of error that can occur when constructing the next
1207/// larger program.
1208/// Both of these errors are handled gracefully, so we officially refer to them as warnings.
1209/// The constructor returns them as values of type [`NextLargerProgramWarning`].
1210///
1211/// ### Infinite loops
1212///
1213/// The first error is that next larger programs can contain infinite loops -
1214/// e.g, if X is the next larger character for Y
1215/// and Y is the next larger character for X.
1216/// In this case the loop is broken by removing the next larger program for the
1217/// character with the largest 8-bit code, in this case Y.
1218/// A [`NextLargerProgramWarning::InfiniteLoop`] warning is returned
1219/// from the program constructor.
1220///
1221/// ```
1222/// # use tfm::*;
1223/// let edges = vec![
1224/// (Char::X, Char::Y),
1225/// (Char::Y, Char::X),
1226/// ];
1227/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1228///
1229/// assert_eq!(warnings, vec![NextLargerProgramWarning::InfiniteLoop{
1230/// original: Char::Y,
1231/// next_larger: Char::X,
1232/// }]);
1233///
1234/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1235/// assert_eq!(sequence_X, vec![Char::Y]);
1236///
1237/// let sequence_Y: Vec<Char> = next_larger_program.get(Char::Y).collect();
1238/// assert_eq!(sequence_Y, vec![]);
1239/// ```
1240///
1241/// ### Non-existent characters
1242///
1243/// The second error is that characters referred to in the next larger program
1244/// may not be defined in the .tfm or .pl file.
1245/// For example, a .pl file may contain the snippet `(CHARACTER C X (NEXTLARGER C Y))`
1246/// without defining the character Y.
1247/// The constructor [`NextLargerProgram::new`] accepts a function for checking if a
1248/// character exists.
1249/// In all cases a [`NextLargerProgramWarning::NonExistentCharacter`] warning is returned
1250/// if a non-existent character is encountered.
1251///
1252/// The behavior of the resulting next larger program is configured using the
1253/// `drop_non_existent_characters` argument.
1254/// If this is false, then the behavior is the same as PLtoTF and the program still
1255/// contains the character.
1256///
1257/// ```
1258/// # use tfm::*;
1259/// let edges = vec![
1260/// (Char::X, Char::Y),
1261/// ];
1262/// let character_exists = |c| {
1263/// if c == Char::Y {
1264/// false
1265/// } else {
1266/// true
1267/// }
1268/// };
1269/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, false);
1270///
1271/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1272/// original: Char::X,
1273/// next_larger: Char::Y,
1274/// }]);
1275///
1276/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1277/// assert_eq!(sequence_X, vec![Char::Y]);
1278/// ```
1279///
1280/// If `drop_non_existent_characters` is true, next larger instructions pointing at non-existent
1281/// characters are dropped.
1282/// This is how TFtoPL behaves.
1283///
1284///
1285/// ```
1286/// # use tfm::*;
1287/// let edges = vec![
1288/// (Char::X, Char::Y),
1289/// ];
1290/// let character_exists = |c| {
1291/// if c == Char::Y {
1292/// false
1293/// } else {
1294/// true
1295/// }
1296/// };
1297/// let (next_larger_program, warnings) = NextLargerProgram::new(edges.into_iter(), character_exists, true);
1298///
1299/// assert_eq!(warnings, vec![NextLargerProgramWarning::NonExistentCharacter{
1300/// original: Char::X,
1301/// next_larger: Char::Y,
1302/// }]);
1303///
1304/// let sequence_X: Vec<Char> = next_larger_program.get(Char::X).collect();
1305/// assert_eq!(sequence_X, vec![]);
1306/// ```
1307///
1308#[derive(Clone, Debug)]
1309pub struct NextLargerProgram {
1310 entrypoints: HashMap<Char, u8>,
1311 next_larger: Vec<(Char, NonZeroU8)>,
1312}
1313
1314impl NextLargerProgram {
1315 /// Build a new next larger program from an iterator over edges.
1316 pub fn new<I: Iterator<Item = (Char, Char)>, F: Fn(Char) -> bool>(
1317 edges: I,
1318 character_exists: F,
1319 drop_non_existent_characters: bool,
1320 ) -> (Self, Vec<NextLargerProgramWarning>) {
1321 // This function implements functionality in TFtoPL.2014.84 and PLtoTF.2014.{110,111,113}.
1322 let mut warnings: Vec<NextLargerProgramWarning> = vec![];
1323
1324 let mut node_to_larger = HashMap::<Char, Char>::new();
1325 let mut node_to_num_smaller = HashMap::<Char, usize>::new();
1326 for (smaller, larger) in edges {
1327 if !character_exists(larger) {
1328 warnings.push(NextLargerProgramWarning::NonExistentCharacter {
1329 original: smaller,
1330 next_larger: larger,
1331 });
1332 if drop_non_existent_characters {
1333 continue;
1334 }
1335 }
1336 node_to_larger.insert(smaller, larger);
1337 node_to_num_smaller.entry(smaller).or_default();
1338 *node_to_num_smaller.entry(larger).or_default() += 1;
1339 }
1340
1341 let mut leaves: Vec<Char> = vec![];
1342 let mut non_leaves: BTreeSet<Char> = Default::default();
1343 for (node, num_smaller) in &node_to_num_smaller {
1344 if *num_smaller == 0 {
1345 leaves.push(*node);
1346 } else {
1347 non_leaves.insert(*node);
1348 }
1349 }
1350
1351 let mut sorted_chars = vec![];
1352 let mut infinite_loop_warnings = vec![];
1353 loop {
1354 while let Some(smaller) = leaves.pop() {
1355 if let Some(larger) = node_to_larger.get(&smaller).copied() {
1356 let num_smaller = node_to_num_smaller
1357 .get_mut(&larger)
1358 .expect("`node_to_num_smaller` contains all nodes");
1359 *num_smaller = num_smaller
1360 .checked_sub(1)
1361 .expect("the larger of a smaller must have at least one smaller");
1362 if *num_smaller == 0 {
1363 leaves.push(larger);
1364 non_leaves.remove(&larger);
1365 }
1366 }
1367 sorted_chars.push(smaller);
1368 }
1369
1370 // There could be pending nodes left because of a cycle.
1371 // We break the cycle at the largest node.
1372 let smaller = match non_leaves.last() {
1373 None => break,
1374 Some(child) => *child,
1375 };
1376 let larger = node_to_larger.remove(&smaller).expect(
1377 "General graph fact: sum_(node)#in_edges(node)=sum_(node)#out_edges(node).
1378 Fact about the next larger graph: #out_edges(node)<=1, because each char has at most one next larger char.
1379 If #out_edge(child)=0 then sum_(node)#in_edges(node)=sum_(node)#out_edges(node) < #nodes.
1380 Thus there exists another node with #in_edges(node)=0 and that node is a leaf.
1381 But `leaves.len()=0` at this line of code",
1382 );
1383 infinite_loop_warnings.push(NextLargerProgramWarning::InfiniteLoop {
1384 original: smaller,
1385 next_larger: larger,
1386 });
1387 leaves.push(larger);
1388 non_leaves.remove(&larger);
1389 }
1390 warnings.extend(infinite_loop_warnings.into_iter().rev());
1391
1392 let next_larger = {
1393 let parents: HashSet<Char> = node_to_larger.values().copied().collect();
1394 let mut node_to_position = HashMap::<Char, u8>::new();
1395 let mut next_larger: Vec<(Char, NonZeroU8)> = vec![];
1396
1397 for c in sorted_chars.iter().rev() {
1398 if !parents.contains(c) {
1399 continue;
1400 }
1401 // The present character is the parent of at least one child, aka it is the next larger
1402 // character for another character. So it needs to be in the next_larger array.
1403 let child_position: u8 = next_larger
1404 .len()
1405 .try_into()
1406 .expect("there are at most u8::MAX chars in the `next_larger` array");
1407 let offset = match node_to_larger.get(c) {
1408 // The next_larger array contains at most 256 elements: one for each char.
1409 // (Actually it contains at most 255 because one character necessarily does not
1410 // have a child node and this character does not appear in the array.)
1411 // Anyway, an offset of 256 sends the index outside the array bound, and so
1412 // subsequent calls to iterator return None.
1413 None => NonZeroU8::MAX,
1414 Some(parent) => {
1415 let parent_position = *node_to_position
1416 .get(parent)
1417 .expect("parent has already been inserted");
1418 child_position
1419 .checked_sub(parent_position)
1420 .expect("parent inserted before so its position it is strictly smaller")
1421 .try_into()
1422 .expect("parent inserted before so its position it is strictly smaller")
1423 }
1424 };
1425 next_larger.push((*c, offset));
1426 node_to_position.insert(*c, child_position);
1427 }
1428 next_larger.reverse();
1429 next_larger
1430 };
1431
1432 let entrypoints = {
1433 let node_to_position: HashMap<Char, u8> = next_larger
1434 .iter()
1435 .enumerate()
1436 .map(|(i, (c, _))| {
1437 let u: u8 = i
1438 .try_into()
1439 .expect("there are at most u8::MAX chars in the `next_larger` array");
1440 (*c, u)
1441 })
1442 .collect();
1443 let mut entrypoints = HashMap::<Char, u8>::new();
1444 for c in sorted_chars.iter().rev() {
1445 if let Some(parent) = node_to_larger.get(c) {
1446 entrypoints.insert(
1447 *c,
1448 *node_to_position
1449 .get(parent)
1450 .expect("parent has already been inserted"),
1451 );
1452 }
1453 }
1454 entrypoints
1455 };
1456
1457 (
1458 Self {
1459 entrypoints,
1460 next_larger,
1461 },
1462 warnings,
1463 )
1464 }
1465
1466 /// Get the next larger sequence for a character
1467 pub fn get(&self, c: Char) -> impl Iterator<Item = Char> + '_ {
1468 NextLargerProgramIter {
1469 current: self.entrypoints.get(&c).copied(),
1470 program: self,
1471 }
1472 }
1473
1474 /// Returns whether this program is seven-bit safe.
1475 ///
1476 /// A next larger program is seven-bit safe if the next larger sequences for
1477 /// seven-bit characters only contain seven-bit characters.
1478 /// Conversely a program is seven-bit unsafe if there is a seven-bit
1479 /// character whose next larger sequence contains a non-seven-bit character.
1480 ///
1481 /// ```
1482 /// # use tfm::*;
1483 /// let edges = vec![
1484 /// (Char(250), Char(125)),
1485 /// (Char(125), Char(126)),
1486 /// ];
1487 /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1488 /// assert_eq!(true, next_larger_program.is_seven_bit_safe());
1489 ///
1490 /// let edges = vec![
1491 /// (Char(125), Char(250)),
1492 /// ];
1493 /// let (next_larger_program, _) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1494 /// assert_eq!(false, next_larger_program.is_seven_bit_safe());
1495 /// ```
1496 pub fn is_seven_bit_safe(&self) -> bool {
1497 // For each c, we only need to check the first element in c's next larger sequence.
1498 // If there is a subsequent element d of the sequence that is seven-bit unsafe,
1499 // we will find it when considering one of d's children.
1500 // This optimization makes this function O(n), rather than worst case O(n^2).
1501 self.entrypoints
1502 .keys()
1503 .copied()
1504 .filter(Char::is_seven_bit)
1505 .filter_map(|c| self.get(c).next())
1506 .all(|c| c.is_seven_bit())
1507 }
1508}
1509
1510#[derive(Clone, Debug)]
1511struct NextLargerProgramIter<'a> {
1512 current: Option<u8>,
1513 program: &'a NextLargerProgram,
1514}
1515
1516impl<'a> Iterator for NextLargerProgramIter<'a> {
1517 type Item = Char;
1518
1519 fn next(&mut self) -> Option<Self::Item> {
1520 // Note that the iterator is statically guaranteed to terminate!
1521 // If it returns None, it has already terminated.
1522 // If it returns Some, then self.current will be incremented by a strictly
1523 // positive number.
1524 // Incrementing like this can only happen a finite number of times before overflow
1525 // occurs, and then self.current is None and the iterator is terminated.
1526 match self.current {
1527 None => None,
1528 Some(current) => match self.program.next_larger.get(current as usize) {
1529 None => None,
1530 Some((c, inc)) => {
1531 self.current = current.checked_add(inc.get());
1532 Some(*c)
1533 }
1534 },
1535 }
1536 }
1537}
1538
1539impl core::FontFormat for File {
1540 const DEFAULT_FILE_EXTENSION: &'static str = "tfm";
1541 type Error = DeserializationError;
1542
1543 fn parse(b: &[u8]) -> Result<Self, Self::Error> {
1544 File::deserialize(b).0
1545 }
1546}
1547
1548/// Fixed-width numeric type used in TFM files.
1549///
1550/// This numeric type has 11 bits for the integer part,
1551/// 20 bits for the fractional part, and a single signed bit.
1552/// The inner value is the number multiplied by 2^20.
1553/// It is called a `fix_word` in TFtoPL.
1554///
1555/// In property list files, this type is represented as a decimal number
1556/// with up to 6 digits after the decimal point.
1557/// This is a non-lossy representation
1558/// because 10^(-6) is larger than 2^(-20).
1559#[derive(Default, PartialEq, Eq, Debug, Copy, Clone, PartialOrd, Ord, Hash)]
1560#[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))]
1561pub struct FixWord(pub i32);
1562
1563impl FixWord {
1564 /// Representation of the number 0 as a [FixWord].
1565 pub const ZERO: FixWord = FixWord(0);
1566
1567 /// Representation of the number 1 as a [FixWord].
1568 pub const ONE: FixWord = FixWord(1 << 20);
1569
1570 /// Returns true if the number is less than 16.0 in magnitude according to Knuth.
1571 ///
1572 /// The number +16.0 is not allowed.
1573 /// This is covered in the E2E tests.
1574 /// See `check_fix` in TFtoPL.2014.60.
1575 pub fn is_abs_less_than_16(&self) -> bool {
1576 *self >= FixWord::ONE * -16 && *self < FixWord::ONE * 16
1577 }
1578
1579 /// Calculates the scaled value of this number at the specified design size.
1580 ///
1581 /// All dimensions in a TFM file are specified as multiples of the design size
1582 /// (with the exception of the design size itself).
1583 /// The scaled value (i.e. the value is scaled points) is the TFM value multiplied
1584 /// by the design size.
1585 ///
1586 /// The raw value and the design size both have 20 binary fractional bits,
1587 /// so the product has 40 fractional bits.
1588 /// A scaled value has only 16 fractional bits, and how the truncation is performed
1589 /// can effect the result.
1590 /// The specific multiplication algorithm Knuth wrote (TeX.2021.571-572)
1591 /// truncates the design size first and then performs the multiplication
1592 /// and then truncates again.
1593 /// Additionally, the integer division in that algorithm rounds towards zero
1594 /// even for negative numbers (e.g. -1.25 rounds to -1 instead of -2).
1595 ///
1596 /// In order to ensure the result here matched Knuth's algorithm, we reimplement
1597 /// Knuth's algorithm.
1598 pub fn to_scaled(self, design_size: FixWord) -> core::Scaled {
1599 // TeX.2021.568
1600 let mut z = core::Scaled(design_size.0 / 16);
1601 // TeX.2021.572
1602 let mut alpha = 16;
1603 // Ensure that z < 2^23 sp
1604 while z.0 >= 0o40000000 {
1605 z /= 2;
1606 alpha *= 2;
1607 }
1608 let beta = 256 / alpha; // beta = 16 * z / design_size
1609 let alpha = z * alpha; // alpha = 16 * design_size
1610
1611 // TeX.2021.571 (store_scaled)
1612 let [a, b, c, d] = self.0.to_be_bytes();
1613 assert!(a == 0 || a == 255);
1614 let sw = (((z * (d as i32)) / 0o400 + (z * (c as i32))) / 0o400 + z * (b as i32)) / beta;
1615 if a == 255 {
1616 // In this case self < 0.
1617 // We have calculated sw using only the 3 least significant bytes of self,
1618 // which is equivalent to setting self=16+self at the start. We then undo
1619 // this by setting sw = sw - 16 * design_size.
1620 sw - alpha
1621 } else {
1622 sw
1623 }
1624 }
1625}
1626
1627impl std::fmt::Display for FixWord {
1628 fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
1629 // TFtoPL.2014.40-43
1630 if self.0 < 0 {
1631 write!(f, "-")?;
1632 }
1633 let integer_part = (self.0 / FixWord::ONE.0).abs();
1634 write!(f, "{integer_part}.")?;
1635 let mut fp = (self.0 % FixWord::ONE.0).abs();
1636 fp = 10 * fp + 5;
1637 let mut delta = 10;
1638 loop {
1639 if delta > 0o4_000_000 {
1640 fp = fp + 0o2_000_000 - delta / 2;
1641 }
1642 write!(f, "{}", fp / 0o4_000_000)?;
1643 fp = 10 * (fp % 0o4_000_000);
1644 delta *= 10;
1645 if fp <= delta {
1646 break;
1647 }
1648 }
1649 Ok(())
1650 }
1651}
1652
1653impl std::ops::Add<FixWord> for FixWord {
1654 type Output = FixWord;
1655 fn add(self, rhs: FixWord) -> Self::Output {
1656 FixWord(self.0 + rhs.0)
1657 }
1658}
1659impl std::ops::Sub<FixWord> for FixWord {
1660 type Output = FixWord;
1661 fn sub(self, rhs: FixWord) -> Self::Output {
1662 FixWord(self.0 - rhs.0)
1663 }
1664}
1665
1666impl std::ops::Mul<i32> for FixWord {
1667 type Output = FixWord;
1668
1669 fn mul(self, rhs: i32) -> Self::Output {
1670 FixWord(self.0 * rhs)
1671 }
1672}
1673
1674impl std::ops::Div<i32> for FixWord {
1675 type Output = FixWord;
1676
1677 fn div(self, rhs: i32) -> Self::Output {
1678 FixWord(self.0 / rhs)
1679 }
1680}
1681
1682#[cfg(test)]
1683mod tests {
1684 use super::*;
1685
1686 #[test]
1687 fn to_scaled_test() {
1688 assert_eq!(core::Scaled::ZERO, FixWord::ZERO.to_scaled(FixWord::ONE));
1689 assert_eq!(core::Scaled::ONE, FixWord::ONE.to_scaled(FixWord::ONE));
1690 }
1691
1692 fn run_compress_test(
1693 values: Vec<FixWord>,
1694 max_size: u8,
1695 want: Vec<FixWord>,
1696 want_map: Vec<u8>,
1697 ) {
1698 let (got, got_map) = compress(&values, max_size);
1699 assert_eq!(got, want);
1700 let want_map: HashMap<FixWord, NonZeroU8> = want_map
1701 .into_iter()
1702 .enumerate()
1703 .map(|(i, t)| (values[i], t.try_into().unwrap()))
1704 .collect();
1705 assert_eq!(got_map, want_map);
1706 }
1707
1708 macro_rules! compress_tests {
1709 ( $( ($name: ident, $values: expr, $max_size: expr, $want: expr, $want_map: expr, ), )+ ) => {
1710 $(
1711 #[test]
1712 fn $name () {
1713 let values = $values;
1714 let max_size = $max_size;
1715 let want = $want;
1716 let want_map = $want_map;
1717 run_compress_test(values, max_size, want, want_map);
1718 }
1719 )+
1720 };
1721 }
1722
1723 compress_tests!(
1724 (no_op_0, vec![], 1, vec![FixWord(0)], vec![],),
1725 (
1726 no_op_2,
1727 vec![FixWord::ONE * 2, FixWord::ONE],
1728 2,
1729 vec![FixWord(0), FixWord::ONE, FixWord::ONE * 2],
1730 vec![2, 1],
1731 ),
1732 (
1733 just_deduplication,
1734 vec![FixWord::ONE, FixWord::ONE],
1735 1,
1736 vec![FixWord(0), FixWord::ONE],
1737 vec![1, 1],
1738 ),
1739 (
1740 simple_compression_case,
1741 vec![FixWord::ONE, FixWord::ONE * 2],
1742 1,
1743 vec![FixWord(0), FixWord::ONE * 3 / 2],
1744 vec![1, 1],
1745 ),
1746 (
1747 simple_compression_case_2,
1748 vec![
1749 FixWord::ONE,
1750 FixWord::ONE * 2,
1751 FixWord::ONE * 200,
1752 FixWord::ONE * 201
1753 ],
1754 2,
1755 vec![FixWord(0), FixWord::ONE * 3 / 2, FixWord::ONE * 401 / 2],
1756 vec![1, 1, 2, 2],
1757 ),
1758 (
1759 lower_upper_close_edge_case_1,
1760 vec![FixWord(1), FixWord(3)],
1761 1,
1762 vec![FixWord(0), FixWord(2)],
1763 vec![1, 1],
1764 ),
1765 (
1766 lower_upper_close_edge_case_2,
1767 vec![FixWord(0), FixWord(2)],
1768 1,
1769 vec![FixWord(0), FixWord(1)],
1770 vec![1, 1],
1771 ),
1772 (
1773 lower_upper_close_edge_case_3,
1774 vec![FixWord(1), FixWord(4)],
1775 1,
1776 vec![FixWord(0), FixWord(2)],
1777 vec![1, 1],
1778 ),
1779 (
1780 lower_upper_close_edge_case_4,
1781 vec![FixWord(1), FixWord(2)],
1782 1,
1783 vec![FixWord(0), FixWord(1)],
1784 vec![1, 1],
1785 ),
1786 );
1787
1788 mod next_larger_tests {
1789 use super::*;
1790
1791 fn run(
1792 edges: Vec<(Char, Char)>,
1793 want_sequences: HashMap<Char, Vec<Char>>,
1794 want_warnings: Vec<NextLargerProgramWarning>,
1795 ) {
1796 let (program, got_warnings) = NextLargerProgram::new(edges.into_iter(), |_| true, true);
1797 assert_eq!(got_warnings, want_warnings);
1798 for u in 0..=u8::MAX {
1799 let want_sequence = want_sequences
1800 .get(&Char(u))
1801 .map(Vec::as_slice)
1802 .unwrap_or_default();
1803 let got_sequence: Vec<Char> = program.get(Char(u)).collect();
1804 assert_eq!(
1805 got_sequence, want_sequence,
1806 "got/want sequences for {u} do not match"
1807 );
1808 }
1809 }
1810
1811 fn big_infinite_loop_edges() -> Vec<(Char, Char)> {
1812 (0..=u8::MAX)
1813 .into_iter()
1814 .map(|u| (Char(u), Char(u.wrapping_add(1))))
1815 .collect()
1816 }
1817
1818 fn big_infinite_loop_sequences() -> HashMap<Char, Vec<Char>> {
1819 (0..=u8::MAX)
1820 .into_iter()
1821 .map(|u| {
1822 let v: Vec<Char> = match u.checked_add(1) {
1823 None => vec![],
1824 Some(w) => (w..=u8::MAX).into_iter().map(Char).collect(),
1825 };
1826 (Char(u), v)
1827 })
1828 .collect()
1829 }
1830
1831 macro_rules! next_larger_tests {
1832 ( $( ($name: ident, $edges: expr, $want_sequences: expr, $want_warnings: expr, ), )+ ) => {
1833 $(
1834 #[test]
1835 fn $name () {
1836 run($edges, $want_sequences, $want_warnings);
1837 }
1838 )+
1839 };
1840 }
1841
1842 next_larger_tests!(
1843 (
1844 same_node_loop,
1845 vec![(Char::A, Char::A)],
1846 HashMap::from([(Char::A, vec![])]),
1847 vec![NextLargerProgramWarning::InfiniteLoop {
1848 original: Char::A,
1849 next_larger: Char::A,
1850 }],
1851 ),
1852 (
1853 two_loops,
1854 vec![
1855 (Char::A, Char::B),
1856 (Char::B, Char::C),
1857 (Char::C, Char::B),
1858 (Char::X, Char::Y),
1859 (Char::Y, Char::Z),
1860 (Char::Z, Char::X),
1861 ],
1862 HashMap::from([
1863 (Char::A, vec![Char::B, Char::C]),
1864 (Char::B, vec![Char::C]),
1865 (Char::C, vec![]),
1866 (Char::X, vec![Char::Y, Char::Z]),
1867 (Char::Y, vec![Char::Z]),
1868 (Char::Z, vec![]),
1869 ]),
1870 vec![
1871 NextLargerProgramWarning::InfiniteLoop {
1872 original: Char::C,
1873 next_larger: Char::B,
1874 },
1875 NextLargerProgramWarning::InfiniteLoop {
1876 original: Char::Z,
1877 next_larger: Char::X,
1878 },
1879 ],
1880 ),
1881 (
1882 path_leading_to_loop,
1883 vec![(Char::A, Char::B), (Char::B, Char::C), (Char::C, Char::B),],
1884 HashMap::from([
1885 (Char::A, vec![Char::B, Char::C]),
1886 (Char::B, vec![Char::C]),
1887 (Char::C, vec![]),
1888 ]),
1889 vec![NextLargerProgramWarning::InfiniteLoop {
1890 original: Char::C,
1891 next_larger: Char::B,
1892 }],
1893 ),
1894 (
1895 big_infinite_loop,
1896 big_infinite_loop_edges(),
1897 big_infinite_loop_sequences(),
1898 vec![NextLargerProgramWarning::InfiniteLoop {
1899 original: Char(u8::MAX),
1900 next_larger: Char(0),
1901 }],
1902 ),
1903 );
1904 }
1905}