tfm/ligkern/
mod.rs

1//! Lig/kern programming.
2//!
3//! TFM files can include information about ligatures and kerns.
4//! A [ligature](https://en.wikipedia.org/wiki/Ligature_(writing))
5//!     is a special character that can replace two or more adjacent characters.
6//! For example, the pair of characters ae can be replaced by the æ ligature which is a single character.
7//! A [kern](https://en.wikipedia.org/wiki/Kerning) is special space inserted between
8//!     two adjacent characters to align them better.
9//! For example, a kern can be inserted between A and V to compensate for the large
10//!     amount of space created by the specific combination of these two characters.
11//!
12//! ## The lig/kern programming language
13//!
14//! TFM provides ligature and kern data in the form of
15//!     "instructions in a simple programming language that explains what to do for special letter pairs"
16//!     (quoting TFtoPL.2014.13).
17//! This lig/kern programming language can be used to specify instructions like
18//!     "replace the pair (a,e) by æ" and
19//!     "insert a kern of width -0.1pt between the pair (A,V)".
20//! But it can also specify more complex behaviors.
21//! For example, a lig/kern program can specify "replace the pair (x,y) by the pair (z,y)".
22//!
23//! In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions.
24//! After this instruction is executed, there may be a new
25//!     pair of characters remaining, as in the (x,y) to (z,y) instruction.
26//! The lig/kern instruction for this pair is then executed, if it exists.
27//! This process continues until there are no more instructions left to run.
28//!
29//! Lig/kern instructions are represented in this module by the [`lang::Instruction`] type.
30//!
31//! ## Related code by Knuth
32//!
33//! The TFtoPL and PLtoTF programs don't contain any code for running lig/kern programs.
34//! They only contain logic for translating between the `.tfm` and `.pl`
35//!     formats for lig/kern programs, and for doing some validation as described below.
36//! Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
37//!
38//! One of the challenges with lig/kern programs is that they can contain infinite loops.
39//! Here is a simple example of a lig/kern program with two instruction and an infinite loop:
40//!
41//! - Replace (x,y) with (z,y) (in property list format, `(LABEL C x)(LIG/ C y C z)`)
42//! - Replace (z,y) with (x,y) (in property list format, `(LABEL C z)(LIG/ C y C x)`)
43//!
44//! When this program runs (x,y) will be swapped with (z,y) ad infinitum.
45//! See TFtoPL.2014.88 for more examples.
46//!
47//! Both TFtoPL and PLtoTF contain code that checks that a lig/kern program
48//!     does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125).
49//! The algorithm for detecting infinite loops is a topological sorting algorithm
50//!     over a graph where each node is a pair of characters.
51//! However it's a bit complicated because the full graph cannot be constructed without
52//!     running the lig/kern program.
53//!
54//! TeX does not check for infinite loops, presumably under the assumption that any `.tfm` file will have
55//!     been generated by PLtoTF and thus already validated.
56//! However TeX does check for interrupts when executing lig/kern programs so that
57//!     at least a user can terminate TeX if an infinite loop is hit.
58//! (See the `check_interrupt` line in KnuthTeX.2021.1040.)
59//!
60//! ## Functionality in this module
61//!
62//! This module handles lig/kern programs in a different way,
63//!     inspired by the ["parse don't validate"](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
64//!     philosophy.
65//! This module is able to represent raw lig/kern programs as a vector of [`lang::Instruction`] values.
66//! But can also _compile_ lig/kern programs (into a [`CompiledProgram`]).
67//! This compilation process essentially executes the lig/kern program for every possible character pair.
68//! The result is a map from each character pair to the full list of
69//!     replacement characters and kerns for that pair.
70//! If there is an infinite loop in the program this compilation will naturally fail.
71//! The compiled program is thus a "parsed" version of the lig/kern program
72//!     and it is impossible for infinite loops to appear in it.
73//!
74//! An advantage of this model is that the lig/kern program does not need to be repeatedly
75//!     executed in the main hot loop of TeX.
76//! This may make TeX faster.
77//! However the compiled lig/kern program does have a larger memory footprint than the raw program,
78//!     and so it may be slower if TeX is memory bound.
79
80mod compiler;
81use crate::ligkern::compiler::Replacement;
82use crate::Char;
83use crate::FixWord;
84use std::collections::HashMap;
85use std::rc::Rc;
86pub mod lang;
87
88/// A compiled lig/kern program.
89///
90/// The default value is an empty program with no kerns or ligatures.
91#[derive(Clone, Debug, Default)]
92pub struct CompiledProgram {
93    right_boundary_char: Option<Char>,
94    replacements: HashMap<(Option<Char>, Char), compiler::Replacement>,
95}
96
97#[derive(Clone, Debug, PartialEq, Eq)]
98enum IntermediateOp {
99    // Emit the kern.
100    Kern(core::Scaled),
101    // Emit the char in the payload.
102    C(compiler::C),
103}
104
105impl CompiledProgram {
106    /// Compile a lig/kern program.
107    pub fn compile(
108        program: &lang::Program,
109        design_size: FixWord,
110        kerns: &[FixWord],
111        entrypoints: HashMap<Char, u16>,
112    ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
113        compiler::compile(program, design_size, kerns, &entrypoints)
114    }
115
116    /// Compile a lig/kern program from a PL file.
117    pub fn compile_from_pl_file(
118        pl_file: &super::pl::File,
119    ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
120        let entrypoints = pl_file.lig_kern_entrypoints(false);
121        // The kerns array, which is used for KernAtIndex operations, is empty
122        // because PL files do not have such operations.
123        CompiledProgram::compile(
124            &pl_file.lig_kern_program,
125            pl_file.header.design_size,
126            &[],
127            entrypoints,
128        )
129    }
130
131    /// Compile a lig/kern program from a TFM file.
132    pub fn compile_from_tfm_file(
133        tfm_file: &mut super::File,
134    ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
135        let entrypoints: HashMap<Char, u16> = tfm_file
136            .lig_kern_entrypoints()
137            .into_iter()
138            .filter_map(|(c, e)| {
139                tfm_file
140                    .lig_kern_program
141                    .unpack_entrypoint(e)
142                    .ok()
143                    .map(|e| (c, e))
144            })
145            .collect();
146        CompiledProgram::compile(
147            &tfm_file.lig_kern_program,
148            tfm_file.header.design_size,
149            &tfm_file.kerns,
150            entrypoints,
151        )
152    }
153
154    fn get_replacement(
155        &self,
156        left_char: Option<Char>,
157        right_char: Option<Char>,
158    ) -> Option<&Replacement> {
159        let right_char = match right_char {
160            None => self.right_boundary_char?,
161            Some(c) => c,
162        };
163        self.replacements.get(&(left_char, right_char))
164    }
165
166    fn get_replacement_utf8(
167        &self,
168        left_char: Option<char>,
169        right_char: Option<char>,
170    ) -> Option<&Replacement> {
171        let left_char = match left_char {
172            None => None,
173            Some(c) => {
174                let Ok(c) = c.try_into() else {
175                    return None;
176                };
177                Some(c)
178            }
179        };
180        let right_char = match right_char {
181            None => None,
182            Some(c) => {
183                let Ok(c) = c.try_into() else {
184                    return None;
185                };
186                Some(c)
187            }
188        };
189        self.get_replacement(left_char, right_char)
190    }
191
192    /// Returns an iterator over all pairs `(char,char)` that have a replacement
193    ///     specified in the lig/kern program.
194    pub fn all_pairs_with_replacements(&self) -> Vec<(Option<Char>, Char)> {
195        let mut v: Vec<(Option<Char>, Char)> = self.replacements.keys().copied().collect();
196        v.sort();
197        v
198    }
199
200    /// Returns whether this program is seven-bit safe.
201    ///
202    /// A lig/kern program is seven-bit safe if the replacement for any
203    ///     pair of seven-bit safe characters
204    ///     consists only of seven-bit characters.
205    /// Conversely a program is seven-bit unsafe if there is a
206    ///     pair of seven-bit characters whose replacement
207    ///     contains a non-seven-bit character.
208    pub fn is_seven_bit_safe(&self) -> bool {
209        self.all_pairs_with_replacements()
210            .into_iter()
211            .filter(|(l, r)| l.map(|c| c.is_seven_bit()).unwrap_or(true) && r.is_seven_bit())
212            .flat_map(|(l, r)| self.get_replacement(l, Some(r)))
213            .all(|rep| {
214                rep.0.iter().all(|op| match op {
215                    IntermediateOp::Kern(_) => true,
216                    IntermediateOp::C(c) => c.c.is_seven_bit(),
217                }) && rep.1.c.is_seven_bit()
218            })
219    }
220
221    /// Run this lig/kern program.
222    pub fn run<T: Emitter>(&self, word: &str, emitter: &mut T) {
223        let mut left: Option<char> = None;
224        let mut left_in_original = true;
225        let mut lig_pieces: Option<String> = None;
226        let mut iter = word.chars();
227        loop {
228            let right = iter.next();
229            if left.is_none() && right.is_none() {
230                break;
231            }
232
233            match self.get_replacement_utf8(left, right) {
234                Some(replacement) => {
235                    for op in &replacement.0 {
236                        match op {
237                            IntermediateOp::Kern(kern) => emitter.emit_kern(*kern),
238                            IntermediateOp::C(compiler::C {
239                                c,
240                                is_lig: false,
241                                consumes_left,
242                                consumes_right,
243                            }) => {
244                                debug_assert!(consumes_left);
245                                debug_assert!(!consumes_right);
246                                match lig_pieces.take() {
247                                    Some(l) => {
248                                        // This happens when left is a ligature.
249                                        emitter.emit_ligature((*c).into(), l.into());
250                                    }
251                                    None => {
252                                        emitter.emit_character((*c).into());
253                                    }
254                                }
255                            }
256                            IntermediateOp::C(compiler::C {
257                                c,
258                                is_lig: true,
259                                consumes_left,
260                                consumes_right,
261                            }) => {
262                                let mut s = lig_pieces.take().unwrap_or_default();
263                                if *consumes_left && left_in_original {
264                                    // TODO: figure out where in TeX the '|' comes from!
265                                    s.push(left.unwrap_or('|'));
266                                }
267                                if *consumes_right {
268                                    s.push(right.unwrap_or('|'));
269                                }
270                                emitter.emit_ligature((*c).into(), s.into());
271                            }
272                        }
273                    }
274                    match replacement.1 {
275                        compiler::C {
276                            c,
277                            is_lig: false,
278                            consumes_left: _,
279                            consumes_right,
280                        } => {
281                            debug_assert!(consumes_right);
282                            if right.is_none() {
283                                left = None;
284                            } else {
285                                left = Some(c.into());
286                                left_in_original = true; // = consumes_right;
287                            }
288                        }
289                        compiler::C {
290                            c,
291                            is_lig: true,
292                            consumes_left,
293                            consumes_right,
294                        } => {
295                            let s = lig_pieces.get_or_insert(Default::default());
296                            if consumes_left && left_in_original {
297                                s.push(left.unwrap_or('|'));
298                            }
299                            if consumes_right {
300                                s.push(right.unwrap_or('|'));
301                            }
302                            left = Some(c.into());
303                            left_in_original = false;
304                        }
305                    }
306                }
307                None => {
308                    if let Some(left) = left {
309                        match lig_pieces.take() {
310                            Some(l) => {
311                                emitter.emit_ligature(left, l.into());
312                            }
313                            None => {
314                                emitter.emit_character(left);
315                            }
316                        }
317                    }
318                    left = right;
319                    left_in_original = true;
320                }
321            }
322        }
323    }
324}
325
326/// Implementations of this trait determine how characters, kerns and ligatures
327/// are handled when running a lig/kern program.
328pub trait Emitter {
329    fn emit_character(&mut self, c: char);
330    fn emit_kern(&mut self, kern: core::Scaled);
331    fn emit_ligature(&mut self, c: char, original: Rc<str>);
332}
333
334/// An error returned from lig/kern compilation.
335///
336/// TODO: rename Cycle everywhere including the docs
337#[derive(Clone, Debug, PartialEq, Eq)]
338pub struct InfiniteLoopError {
339    /// The pair of characters the starts the infinite loop.
340    pub starting_pair: (Option<Char>, Char),
341}
342
343impl InfiniteLoopError {
344    pub fn pltotf_message(&self) -> String {
345        let left = match self.starting_pair.0 {
346            Some(c) => format!["'{:03o}", c.0],
347            None => "boundary".to_string(),
348        };
349        format!(
350            "Infinite ligature loop starting with {} and '{:03o}!",
351            left, self.starting_pair.1 .0
352        )
353    }
354    pub fn pltotf_section(&self) -> u8 {
355        125
356    }
357}
358
359/// One step in a lig/kern infinite loop.
360///
361/// A vector of these steps is returned in a [`InfiniteLoopError`].
362#[derive(Clone, Debug, PartialEq, Eq)]
363pub struct InfiniteLoopStep {
364    /// The index of the instruction to apply in this step.
365    pub instruction_index: usize,
366    /// The replacement text after applying this step.
367    ///
368    /// The boolean specifies whether the replacement begins with the
369    /// left boundary char.
370    pub post_replacement: (bool, Vec<Char>),
371    /// The position of the cursor after applying this step.
372    pub post_cursor_position: usize,
373}
374
375#[cfg(test)]
376mod tests {
377    use core::Scaled;
378
379    use super::*;
380
381    const LIGAROO: &'static str = include_str!["ligaroo.plst"];
382
383    #[derive(PartialEq, Eq, Debug)]
384    enum Element {
385        Char(char),
386        Kern(core::Scaled),
387        Ligature(char, Rc<str>),
388    }
389
390    #[derive(Default)]
391    struct ElementEmitter(Vec<Element>);
392
393    impl Emitter for ElementEmitter {
394        fn emit_character(&mut self, c: char) {
395            self.0.push(Element::Char(c))
396        }
397
398        fn emit_kern(&mut self, kern: core::Scaled) {
399            self.0.push(Element::Kern(kern))
400        }
401
402        fn emit_ligature(&mut self, c: char, original: Rc<str>) {
403            self.0.push(Element::Ligature(c, original))
404        }
405    }
406
407    fn run_test(program: &str, input: &str, want: Vec<Element>) {
408        let source = LIGAROO.replace("(LIGTABLE", &format!["(LIGTABLE\n{program}"]);
409        let pl_file = crate::pl::File::from_pl_source_code(&source).0;
410        let program = CompiledProgram::compile_from_pl_file(&pl_file).0;
411        let mut emitter: ElementEmitter = Default::default();
412        program.run(input, &mut emitter);
413        assert_eq!(emitter.0, want);
414    }
415
416    macro_rules! tests {
417        ( $(
418            ($name: ident, $program: expr, $input: expr, $want: expr, ),
419        )+ ) => { $(
420            #[test]
421            fn $name() {
422                use Element::*;
423                let program = $program;
424                let input = $input;
425                let want = $want;
426                run_test(program, input, want);
427            }
428        )+ };
429    }
430
431    tests!(
432        // AB -> ^1
433        (
434            single_lig_1,
435            "
436                (LABEL C A)
437                (LIG C B C 1)
438                (KRN C 1 R 0.1)
439                (STOP)
440
441                (LABEL C 1)
442                (KRN C B R 0.3)
443                (STOP)
444            ",
445            "AB",
446            vec![Ligature('1', "AB".into())],
447        ),
448        // AB -> ^A1
449        (
450            single_lig_2,
451            "
452                (LABEL C A)
453                (/LIG C B C 1)
454                (KRN C 1 R 0.1)
455                (STOP)
456
457                (LABEL C 1)
458                (KRN C B R 0.3)
459                (STOP)
460            ",
461            "AB",
462            vec![Char('A'), Kern(Scaled::ONE), Ligature('1', "B".into())],
463        ),
464        // AB -> A^1
465        (
466            single_lig_3,
467            "
468                (LABEL C A)
469                (/LIG> C B C 1)
470                (KRN C 1 R 0.1)
471                (STOP)
472
473                (LABEL C 1)
474                (KRN C B R 0.3)
475                (STOP)
476            ",
477            "AB",
478            vec![Char('A'), Ligature('1', "B".into()),],
479        ),
480        // AB -> ^1B
481        (
482            single_lig_4,
483            "
484                (LABEL C A)
485                (LIG/ C B C 1)
486                (KRN C 1 R 0.1)
487                (STOP)
488
489                (LABEL C 1)
490                (KRN C B R 0.3)
491                (STOP)
492            ",
493            "AB",
494            vec![Ligature('1', "A".into()), Kern(Scaled::ONE * 3), Char('B'),],
495        ),
496        // AB -> 1^B
497        (
498            single_lig_5,
499            "
500                (LABEL C A)
501                (LIG/> C B C 1)
502                (KRN C 1 R 0.1)
503                (STOP)
504
505                (LABEL C 1)
506                (KRN C B R 0.3)
507                (STOP)
508            ",
509            "AB",
510            vec![Ligature('1', "A".into()), Char('B'),],
511        ),
512        // AB -> ^A1B
513        (
514            single_lig_6,
515            "
516                (LABEL C A)
517                (/LIG/ C B C 1)
518                (KRN C 1 R 0.1)
519                (STOP)
520
521                (LABEL C 1)
522                (KRN C B R 0.3)
523                (STOP)
524            ",
525            "AB",
526            vec![
527                Char('A'),
528                Kern(Scaled::ONE),
529                Ligature('1', "".into()),
530                Kern(Scaled::ONE * 3),
531                Char('B'),
532            ],
533        ),
534        // AB -> A^1B
535        (
536            single_lig_7,
537            "
538                (LABEL C A)
539                (/LIG/> C B C 1)
540                (KRN C 1 R 0.1)
541                (STOP)
542
543                (LABEL C 1)
544                (KRN C B R 0.3)
545                (STOP)
546            ",
547            "AB",
548            vec![
549                Char('A'),
550                Ligature('1', "".into()),
551                Kern(Scaled::ONE * 3),
552                Char('B'),
553            ],
554        ),
555        // AB -> A1^B
556        (
557            single_lig_8,
558            "
559                (LABEL C A)
560                (/LIG/>> C B C 1)
561                (KRN C 1 R 0.1)
562                (STOP)
563
564                (LABEL C 1)
565                (KRN C B R 0.3)
566                (STOP)
567            ",
568            "AB",
569            vec![Char('A'), Ligature('1', "".into()), Char('B'),],
570        ),
571        // AB -> A^B
572        // This is the same as single_lig_5, but the replacement character
573        // is the same as the character that is removed. In theory lig(A, A)
574        // could be replaced by char(A), and this test verifies that it is not.
575        (
576            no_op_lig,
577            "
578                (LABEL C A)
579                (LIG/> C B C A)
580                (STOP)
581            ",
582            "AB",
583            vec![Ligature('A', "A".into()), Char('B'),],
584        ),
585        // AB -> ^1, 1C -> ^2
586        (
587            multiple_lig_1,
588            "
589                (LABEL C A)
590                (LIG C B C 1)
591
592                (LABEL C 1)
593                (LIG C C C 2)
594                (STOP)
595            ",
596            "ABC",
597            vec![Ligature('2', "ABC".into()),],
598        ),
599        // AB -> ^A1B, 1B -> 2
600        (
601            multiple_lig_2,
602            "
603                (LABEL C A)
604                (/LIG/ C B C 1)
605                (LABEL C 1)
606                (LIG C B C 2)
607                (STOP)
608            ",
609            "AB",
610            vec![Char('A'), Ligature('2', "B".into()),],
611        ),
612        // AA -> 1^A multiple times
613        (
614            multiple_lig_3,
615            "
616                (LABEL C A)
617                (LIG/ C A C 1)
618                (STOP)
619            ",
620            "AAAAA",
621            vec![
622                Ligature('1', "A".into()),
623                Ligature('1', "A".into()),
624                Ligature('1', "A".into()),
625                Ligature('1', "A".into()),
626                Char('A'),
627            ],
628        ),
629        // AA -> ^A multiple times
630        (
631            multiple_lig_4,
632            "
633                (LABEL C A)
634                (LIG C A C A)
635                (STOP)
636            ",
637            "AAAAAA",
638            vec![Ligature('A', "AAAAAA".into()),],
639        ),
640        // AB -> ^A1, A1 -> ^21
641        (
642            multiple_lig_5,
643            "
644                (LABEL C A)
645                (/LIG C B C 1)
646                (LIG/ C 1 C 2)
647                (STOP)
648            ",
649            "AB",
650            vec![Ligature('2', "A".into()), Ligature('1', "B".into()),],
651        ),
652        // AB -> ^A1, A1 -> ^21, 21 -> 3
653        (
654            multiple_lig_6,
655            "
656                (LABEL C A)
657                (/LIG C B C 1)
658                (LIG/ C 1 C 2)
659                (LABEL C 2)
660                (LIG C 1 C 3)
661                (STOP)
662            ",
663            "AB",
664            vec![Ligature('3', "AB".into())],
665        ),
666        // AB -> ^1, 1C -> 12^C
667        (
668            multiple_lig_7,
669            "
670                (LABEL C A)
671                (LIG C B C 1)
672                (STOP)
673
674                (LABEL C 1)
675                (/LIG/>> C C C 2)
676                (STOP)
677            ",
678            "ABC",
679            vec![
680                Ligature('1', "AB".into()),
681                Ligature('2', "".into()),
682                Char('C'),
683            ],
684        ),
685        (
686            kern_after_lig_1,
687            "
688                (LABEL C A)
689                (LIG C B C 1)
690                (STOP)
691
692                (LABEL C 1)
693                (KRN C C R 0.1)
694            ",
695            "ABC",
696            vec![Ligature('1', "AB".into()), Kern(Scaled::ONE), Char('C'),],
697        ),
698        (
699            kern_after_lig_2,
700            "
701                (LABEL C A)
702                (LIG C B C 1)
703                (STOP)
704
705                (LABEL C 1)
706                (KRN C A R 0.1)
707            ",
708            "ABAB",
709            vec![
710                Ligature('1', "AB".into()),
711                Kern(Scaled::ONE),
712                Ligature('1', "AB".into()),
713            ],
714        ),
715        (
716            left_boundary_char_1,
717            "
718                (LABEL BOUNDARYCHAR)
719                (LIG C A C 1)
720            ",
721            "A",
722            vec![Ligature('1', "|A".into()),],
723        ),
724        (
725            left_boundary_char_2,
726            "
727                (LABEL BOUNDARYCHAR)
728                (/LIG/ C A C 1)
729                (/LIG/ C 1 C 2)
730            ",
731            "A",
732            vec![
733                Ligature('2', "|".into()),
734                Ligature('1', "".into()),
735                Char('A'),
736            ],
737        ),
738        (
739            left_boundary_char_3,
740            "
741                (LABEL BOUNDARYCHAR)
742                (/LIG/ C A C 1)
743            ",
744            "A",
745            vec![Ligature('1', "|".into()), Char('A'),],
746        ),
747        /*
748        (
749            right_boundary_char_1,
750            "
751                (LABEL C M)
752                (/LIG/ C L C N)
753                (STOP)
754            ",
755            "N",
756            vec![Char('N'), Ligature('Q', "|".into()),],
757        ),
758        (
759            right_boundary_char_2,
760            "
761                (LABEL C N)
762                (/LIG/ C L C Q)
763                (STOP)
764            ",
765            "M",
766            vec![
767                Char('M'),
768                Ligature('N', "".into()),
769                Ligature('Q', "|".into()),
770            ],
771        ),
772        */
773        (
774            right_boundary_char_3,
775            "
776                (LABEL C A)
777                (LIG C L C 1)
778                (STOP)
779            ",
780            "A",
781            vec![Ligature('1', "A|".into()),],
782        ),
783        (
784            right_boundary_char_4,
785            "
786                (LABEL C A)
787                (KRN C L R 1)
788                (STOP)
789            ",
790            "A",
791            vec![Char('A'), Kern(Scaled::ONE * 10),],
792        ),
793    );
794}