tfm/ligkern/mod.rs
1//! Lig/kern programming.
2//!
3//! TFM files can include information about ligatures and kerns.
4//! A [ligature](https://en.wikipedia.org/wiki/Ligature_(writing))
5//! is a special character that can replace two or more adjacent characters.
6//! For example, the pair of characters ae can be replaced by the æ ligature which is a single character.
7//! A [kern](https://en.wikipedia.org/wiki/Kerning) is special space inserted between
8//! two adjacent characters to align them better.
9//! For example, a kern can be inserted between A and V to compensate for the large
10//! amount of space created by the specific combination of these two characters.
11//!
12//! ## The lig/kern programming language
13//!
14//! TFM provides ligature and kern data in the form of
15//! "instructions in a simple programming language that explains what to do for special letter pairs"
16//! (quoting TFtoPL.2014.13).
17//! This lig/kern programming language can be used to specify instructions like
18//! "replace the pair (a,e) by æ" and
19//! "insert a kern of width -0.1pt between the pair (A,V)".
20//! But it can also specify more complex behaviors.
21//! For example, a lig/kern program can specify "replace the pair (x,y) by the pair (z,y)".
22//!
23//! In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions.
24//! After this instruction is executed, there may be a new
25//! pair of characters remaining, as in the (x,y) to (z,y) instruction.
26//! The lig/kern instruction for this pair is then executed, if it exists.
27//! This process continues until there are no more instructions left to run.
28//!
29//! Lig/kern instructions are represented in this module by the [`lang::Instruction`] type.
30//!
31//! ## Related code by Knuth
32//!
33//! The TFtoPL and PLtoTF programs don't contain any code for running lig/kern programs.
34//! They only contain logic for translating between the `.tfm` and `.pl`
35//! formats for lig/kern programs, and for doing some validation as described below.
36//! Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
37//!
38//! One of the challenges with lig/kern programs is that they can contain infinite loops.
39//! Here is a simple example of a lig/kern program with two instruction and an infinite loop:
40//!
41//! - Replace (x,y) with (z,y) (in property list format, `(LABEL C x)(LIG/ C y C z)`)
42//! - Replace (z,y) with (x,y) (in property list format, `(LABEL C z)(LIG/ C y C x)`)
43//!
44//! When this program runs (x,y) will be swapped with (z,y) ad infinitum.
45//! See TFtoPL.2014.88 for more examples.
46//!
47//! Both TFtoPL and PLtoTF contain code that checks that a lig/kern program
48//! does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125).
49//! The algorithm for detecting infinite loops is a topological sorting algorithm
50//! over a graph where each node is a pair of characters.
51//! However it's a bit complicated because the full graph cannot be constructed without
52//! running the lig/kern program.
53//!
54//! TeX does not check for infinite loops, presumably under the assumption that any `.tfm` file will have
55//! been generated by PLtoTF and thus already validated.
56//! However TeX does check for interrupts when executing lig/kern programs so that
57//! at least a user can terminate TeX if an infinite loop is hit.
58//! (See the `check_interrupt` line in KnuthTeX.2021.1040.)
59//!
60//! ## Functionality in this module
61//!
62//! This module handles lig/kern programs in a different way,
63//! inspired by the ["parse don't validate"](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
64//! philosophy.
65//! This module is able to represent raw lig/kern programs as a vector of [`lang::Instruction`] values.
66//! But can also _compile_ lig/kern programs (into a [`CompiledProgram`]).
67//! This compilation process essentially executes the lig/kern program for every possible character pair.
68//! The result is a map from each character pair to the full list of
69//! replacement characters and kerns for that pair.
70//! If there is an infinite loop in the program this compilation will naturally fail.
71//! The compiled program is thus a "parsed" version of the lig/kern program
72//! and it is impossible for infinite loops to appear in it.
73//!
74//! An advantage of this model is that the lig/kern program does not need to be repeatedly
75//! executed in the main hot loop of TeX.
76//! This may make TeX faster.
77//! However the compiled lig/kern program does have a larger memory footprint than the raw program,
78//! and so it may be slower if TeX is memory bound.
79
80mod compiler;
81use crate::ligkern::compiler::Replacement;
82use crate::Char;
83use crate::FixWord;
84use std::collections::HashMap;
85use std::rc::Rc;
86pub mod lang;
87
88/// A compiled lig/kern program.
89///
90/// The default value is an empty program with no kerns or ligatures.
91#[derive(Clone, Debug, Default)]
92pub struct CompiledProgram {
93 right_boundary_char: Option<Char>,
94 replacements: HashMap<(Option<Char>, Char), compiler::Replacement>,
95}
96
97#[derive(Clone, Debug, PartialEq, Eq)]
98enum IntermediateOp {
99 // Emit the kern.
100 Kern(core::Scaled),
101 // Emit the char in the payload.
102 C(compiler::C),
103}
104
105impl CompiledProgram {
106 /// Compile a lig/kern program.
107 pub fn compile(
108 program: &lang::Program,
109 design_size: FixWord,
110 kerns: &[FixWord],
111 entrypoints: HashMap<Char, u16>,
112 ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
113 compiler::compile(program, design_size, kerns, &entrypoints)
114 }
115
116 /// Compile a lig/kern program from a PL file.
117 pub fn compile_from_pl_file(
118 pl_file: &super::pl::File,
119 ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
120 let entrypoints = pl_file.lig_kern_entrypoints(false);
121 // The kerns array, which is used for KernAtIndex operations, is empty
122 // because PL files do not have such operations.
123 CompiledProgram::compile(
124 &pl_file.lig_kern_program,
125 pl_file.header.design_size,
126 &[],
127 entrypoints,
128 )
129 }
130
131 /// Compile a lig/kern program from a TFM file.
132 pub fn compile_from_tfm_file(
133 tfm_file: &mut super::File,
134 ) -> (CompiledProgram, Vec<InfiniteLoopError>) {
135 let entrypoints: HashMap<Char, u16> = tfm_file
136 .lig_kern_entrypoints()
137 .into_iter()
138 .filter_map(|(c, e)| {
139 tfm_file
140 .lig_kern_program
141 .unpack_entrypoint(e)
142 .ok()
143 .map(|e| (c, e))
144 })
145 .collect();
146 CompiledProgram::compile(
147 &tfm_file.lig_kern_program,
148 tfm_file.header.design_size,
149 &tfm_file.kerns,
150 entrypoints,
151 )
152 }
153
154 fn get_replacement(
155 &self,
156 left_char: Option<Char>,
157 right_char: Option<Char>,
158 ) -> Option<&Replacement> {
159 let right_char = match right_char {
160 None => self.right_boundary_char?,
161 Some(c) => c,
162 };
163 self.replacements.get(&(left_char, right_char))
164 }
165
166 fn get_replacement_utf8(
167 &self,
168 left_char: Option<char>,
169 right_char: Option<char>,
170 ) -> Option<&Replacement> {
171 let left_char = match left_char {
172 None => None,
173 Some(c) => {
174 let Ok(c) = c.try_into() else {
175 return None;
176 };
177 Some(c)
178 }
179 };
180 let right_char = match right_char {
181 None => None,
182 Some(c) => {
183 let Ok(c) = c.try_into() else {
184 return None;
185 };
186 Some(c)
187 }
188 };
189 self.get_replacement(left_char, right_char)
190 }
191
192 /// Returns an iterator over all pairs `(char,char)` that have a replacement
193 /// specified in the lig/kern program.
194 pub fn all_pairs_with_replacements(&self) -> Vec<(Option<Char>, Char)> {
195 let mut v: Vec<(Option<Char>, Char)> = self.replacements.keys().copied().collect();
196 v.sort();
197 v
198 }
199
200 /// Returns whether this program is seven-bit safe.
201 ///
202 /// A lig/kern program is seven-bit safe if the replacement for any
203 /// pair of seven-bit safe characters
204 /// consists only of seven-bit characters.
205 /// Conversely a program is seven-bit unsafe if there is a
206 /// pair of seven-bit characters whose replacement
207 /// contains a non-seven-bit character.
208 pub fn is_seven_bit_safe(&self) -> bool {
209 self.all_pairs_with_replacements()
210 .into_iter()
211 .filter(|(l, r)| l.map(|c| c.is_seven_bit()).unwrap_or(true) && r.is_seven_bit())
212 .flat_map(|(l, r)| self.get_replacement(l, Some(r)))
213 .all(|rep| {
214 rep.0.iter().all(|op| match op {
215 IntermediateOp::Kern(_) => true,
216 IntermediateOp::C(c) => c.c.is_seven_bit(),
217 }) && rep.1.c.is_seven_bit()
218 })
219 }
220
221 /// Run this lig/kern program.
222 pub fn run<T: Emitter>(&self, word: &str, emitter: &mut T) {
223 let mut left: Option<char> = None;
224 let mut left_in_original = true;
225 let mut lig_pieces: Option<String> = None;
226 let mut iter = word.chars();
227 loop {
228 let right = iter.next();
229 if left.is_none() && right.is_none() {
230 break;
231 }
232
233 match self.get_replacement_utf8(left, right) {
234 Some(replacement) => {
235 for op in &replacement.0 {
236 match op {
237 IntermediateOp::Kern(kern) => emitter.emit_kern(*kern),
238 IntermediateOp::C(compiler::C {
239 c,
240 is_lig: false,
241 consumes_left,
242 consumes_right,
243 }) => {
244 debug_assert!(consumes_left);
245 debug_assert!(!consumes_right);
246 match lig_pieces.take() {
247 Some(l) => {
248 // This happens when left is a ligature.
249 emitter.emit_ligature((*c).into(), l.into());
250 }
251 None => {
252 emitter.emit_character((*c).into());
253 }
254 }
255 }
256 IntermediateOp::C(compiler::C {
257 c,
258 is_lig: true,
259 consumes_left,
260 consumes_right,
261 }) => {
262 let mut s = lig_pieces.take().unwrap_or_default();
263 if *consumes_left && left_in_original {
264 // TODO: figure out where in TeX the '|' comes from!
265 s.push(left.unwrap_or('|'));
266 }
267 if *consumes_right {
268 s.push(right.unwrap_or('|'));
269 }
270 emitter.emit_ligature((*c).into(), s.into());
271 }
272 }
273 }
274 match replacement.1 {
275 compiler::C {
276 c,
277 is_lig: false,
278 consumes_left: _,
279 consumes_right,
280 } => {
281 debug_assert!(consumes_right);
282 if right.is_none() {
283 left = None;
284 } else {
285 left = Some(c.into());
286 left_in_original = true; // = consumes_right;
287 }
288 }
289 compiler::C {
290 c,
291 is_lig: true,
292 consumes_left,
293 consumes_right,
294 } => {
295 let s = lig_pieces.get_or_insert(Default::default());
296 if consumes_left && left_in_original {
297 s.push(left.unwrap_or('|'));
298 }
299 if consumes_right {
300 s.push(right.unwrap_or('|'));
301 }
302 left = Some(c.into());
303 left_in_original = false;
304 }
305 }
306 }
307 None => {
308 if let Some(left) = left {
309 match lig_pieces.take() {
310 Some(l) => {
311 emitter.emit_ligature(left, l.into());
312 }
313 None => {
314 emitter.emit_character(left);
315 }
316 }
317 }
318 left = right;
319 left_in_original = true;
320 }
321 }
322 }
323 }
324}
325
326/// Implementations of this trait determine how characters, kerns and ligatures
327/// are handled when running a lig/kern program.
328pub trait Emitter {
329 fn emit_character(&mut self, c: char);
330 fn emit_kern(&mut self, kern: core::Scaled);
331 fn emit_ligature(&mut self, c: char, original: Rc<str>);
332}
333
334/// An error returned from lig/kern compilation.
335///
336/// TODO: rename Cycle everywhere including the docs
337#[derive(Clone, Debug, PartialEq, Eq)]
338pub struct InfiniteLoopError {
339 /// The pair of characters the starts the infinite loop.
340 pub starting_pair: (Option<Char>, Char),
341}
342
343impl InfiniteLoopError {
344 pub fn pltotf_message(&self) -> String {
345 let left = match self.starting_pair.0 {
346 Some(c) => format!["'{:03o}", c.0],
347 None => "boundary".to_string(),
348 };
349 format!(
350 "Infinite ligature loop starting with {} and '{:03o}!",
351 left, self.starting_pair.1 .0
352 )
353 }
354 pub fn pltotf_section(&self) -> u8 {
355 125
356 }
357}
358
359/// One step in a lig/kern infinite loop.
360///
361/// A vector of these steps is returned in a [`InfiniteLoopError`].
362#[derive(Clone, Debug, PartialEq, Eq)]
363pub struct InfiniteLoopStep {
364 /// The index of the instruction to apply in this step.
365 pub instruction_index: usize,
366 /// The replacement text after applying this step.
367 ///
368 /// The boolean specifies whether the replacement begins with the
369 /// left boundary char.
370 pub post_replacement: (bool, Vec<Char>),
371 /// The position of the cursor after applying this step.
372 pub post_cursor_position: usize,
373}
374
375#[cfg(test)]
376mod tests {
377 use core::Scaled;
378
379 use super::*;
380
381 const LIGAROO: &'static str = include_str!["ligaroo.plst"];
382
383 #[derive(PartialEq, Eq, Debug)]
384 enum Element {
385 Char(char),
386 Kern(core::Scaled),
387 Ligature(char, Rc<str>),
388 }
389
390 #[derive(Default)]
391 struct ElementEmitter(Vec<Element>);
392
393 impl Emitter for ElementEmitter {
394 fn emit_character(&mut self, c: char) {
395 self.0.push(Element::Char(c))
396 }
397
398 fn emit_kern(&mut self, kern: core::Scaled) {
399 self.0.push(Element::Kern(kern))
400 }
401
402 fn emit_ligature(&mut self, c: char, original: Rc<str>) {
403 self.0.push(Element::Ligature(c, original))
404 }
405 }
406
407 fn run_test(program: &str, input: &str, want: Vec<Element>) {
408 let source = LIGAROO.replace("(LIGTABLE", &format!["(LIGTABLE\n{program}"]);
409 let pl_file = crate::pl::File::from_pl_source_code(&source).0;
410 let program = CompiledProgram::compile_from_pl_file(&pl_file).0;
411 let mut emitter: ElementEmitter = Default::default();
412 program.run(input, &mut emitter);
413 assert_eq!(emitter.0, want);
414 }
415
416 macro_rules! tests {
417 ( $(
418 ($name: ident, $program: expr, $input: expr, $want: expr, ),
419 )+ ) => { $(
420 #[test]
421 fn $name() {
422 use Element::*;
423 let program = $program;
424 let input = $input;
425 let want = $want;
426 run_test(program, input, want);
427 }
428 )+ };
429 }
430
431 tests!(
432 // AB -> ^1
433 (
434 single_lig_1,
435 "
436 (LABEL C A)
437 (LIG C B C 1)
438 (KRN C 1 R 0.1)
439 (STOP)
440
441 (LABEL C 1)
442 (KRN C B R 0.3)
443 (STOP)
444 ",
445 "AB",
446 vec![Ligature('1', "AB".into())],
447 ),
448 // AB -> ^A1
449 (
450 single_lig_2,
451 "
452 (LABEL C A)
453 (/LIG C B C 1)
454 (KRN C 1 R 0.1)
455 (STOP)
456
457 (LABEL C 1)
458 (KRN C B R 0.3)
459 (STOP)
460 ",
461 "AB",
462 vec![Char('A'), Kern(Scaled::ONE), Ligature('1', "B".into())],
463 ),
464 // AB -> A^1
465 (
466 single_lig_3,
467 "
468 (LABEL C A)
469 (/LIG> C B C 1)
470 (KRN C 1 R 0.1)
471 (STOP)
472
473 (LABEL C 1)
474 (KRN C B R 0.3)
475 (STOP)
476 ",
477 "AB",
478 vec![Char('A'), Ligature('1', "B".into()),],
479 ),
480 // AB -> ^1B
481 (
482 single_lig_4,
483 "
484 (LABEL C A)
485 (LIG/ C B C 1)
486 (KRN C 1 R 0.1)
487 (STOP)
488
489 (LABEL C 1)
490 (KRN C B R 0.3)
491 (STOP)
492 ",
493 "AB",
494 vec![Ligature('1', "A".into()), Kern(Scaled::ONE * 3), Char('B'),],
495 ),
496 // AB -> 1^B
497 (
498 single_lig_5,
499 "
500 (LABEL C A)
501 (LIG/> C B C 1)
502 (KRN C 1 R 0.1)
503 (STOP)
504
505 (LABEL C 1)
506 (KRN C B R 0.3)
507 (STOP)
508 ",
509 "AB",
510 vec![Ligature('1', "A".into()), Char('B'),],
511 ),
512 // AB -> ^A1B
513 (
514 single_lig_6,
515 "
516 (LABEL C A)
517 (/LIG/ C B C 1)
518 (KRN C 1 R 0.1)
519 (STOP)
520
521 (LABEL C 1)
522 (KRN C B R 0.3)
523 (STOP)
524 ",
525 "AB",
526 vec![
527 Char('A'),
528 Kern(Scaled::ONE),
529 Ligature('1', "".into()),
530 Kern(Scaled::ONE * 3),
531 Char('B'),
532 ],
533 ),
534 // AB -> A^1B
535 (
536 single_lig_7,
537 "
538 (LABEL C A)
539 (/LIG/> C B C 1)
540 (KRN C 1 R 0.1)
541 (STOP)
542
543 (LABEL C 1)
544 (KRN C B R 0.3)
545 (STOP)
546 ",
547 "AB",
548 vec![
549 Char('A'),
550 Ligature('1', "".into()),
551 Kern(Scaled::ONE * 3),
552 Char('B'),
553 ],
554 ),
555 // AB -> A1^B
556 (
557 single_lig_8,
558 "
559 (LABEL C A)
560 (/LIG/>> C B C 1)
561 (KRN C 1 R 0.1)
562 (STOP)
563
564 (LABEL C 1)
565 (KRN C B R 0.3)
566 (STOP)
567 ",
568 "AB",
569 vec![Char('A'), Ligature('1', "".into()), Char('B'),],
570 ),
571 // AB -> A^B
572 // This is the same as single_lig_5, but the replacement character
573 // is the same as the character that is removed. In theory lig(A, A)
574 // could be replaced by char(A), and this test verifies that it is not.
575 (
576 no_op_lig,
577 "
578 (LABEL C A)
579 (LIG/> C B C A)
580 (STOP)
581 ",
582 "AB",
583 vec![Ligature('A', "A".into()), Char('B'),],
584 ),
585 // AB -> ^1, 1C -> ^2
586 (
587 multiple_lig_1,
588 "
589 (LABEL C A)
590 (LIG C B C 1)
591
592 (LABEL C 1)
593 (LIG C C C 2)
594 (STOP)
595 ",
596 "ABC",
597 vec![Ligature('2', "ABC".into()),],
598 ),
599 // AB -> ^A1B, 1B -> 2
600 (
601 multiple_lig_2,
602 "
603 (LABEL C A)
604 (/LIG/ C B C 1)
605 (LABEL C 1)
606 (LIG C B C 2)
607 (STOP)
608 ",
609 "AB",
610 vec![Char('A'), Ligature('2', "B".into()),],
611 ),
612 // AA -> 1^A multiple times
613 (
614 multiple_lig_3,
615 "
616 (LABEL C A)
617 (LIG/ C A C 1)
618 (STOP)
619 ",
620 "AAAAA",
621 vec![
622 Ligature('1', "A".into()),
623 Ligature('1', "A".into()),
624 Ligature('1', "A".into()),
625 Ligature('1', "A".into()),
626 Char('A'),
627 ],
628 ),
629 // AA -> ^A multiple times
630 (
631 multiple_lig_4,
632 "
633 (LABEL C A)
634 (LIG C A C A)
635 (STOP)
636 ",
637 "AAAAAA",
638 vec![Ligature('A', "AAAAAA".into()),],
639 ),
640 // AB -> ^A1, A1 -> ^21
641 (
642 multiple_lig_5,
643 "
644 (LABEL C A)
645 (/LIG C B C 1)
646 (LIG/ C 1 C 2)
647 (STOP)
648 ",
649 "AB",
650 vec![Ligature('2', "A".into()), Ligature('1', "B".into()),],
651 ),
652 // AB -> ^A1, A1 -> ^21, 21 -> 3
653 (
654 multiple_lig_6,
655 "
656 (LABEL C A)
657 (/LIG C B C 1)
658 (LIG/ C 1 C 2)
659 (LABEL C 2)
660 (LIG C 1 C 3)
661 (STOP)
662 ",
663 "AB",
664 vec![Ligature('3', "AB".into())],
665 ),
666 // AB -> ^1, 1C -> 12^C
667 (
668 multiple_lig_7,
669 "
670 (LABEL C A)
671 (LIG C B C 1)
672 (STOP)
673
674 (LABEL C 1)
675 (/LIG/>> C C C 2)
676 (STOP)
677 ",
678 "ABC",
679 vec![
680 Ligature('1', "AB".into()),
681 Ligature('2', "".into()),
682 Char('C'),
683 ],
684 ),
685 (
686 kern_after_lig_1,
687 "
688 (LABEL C A)
689 (LIG C B C 1)
690 (STOP)
691
692 (LABEL C 1)
693 (KRN C C R 0.1)
694 ",
695 "ABC",
696 vec![Ligature('1', "AB".into()), Kern(Scaled::ONE), Char('C'),],
697 ),
698 (
699 kern_after_lig_2,
700 "
701 (LABEL C A)
702 (LIG C B C 1)
703 (STOP)
704
705 (LABEL C 1)
706 (KRN C A R 0.1)
707 ",
708 "ABAB",
709 vec![
710 Ligature('1', "AB".into()),
711 Kern(Scaled::ONE),
712 Ligature('1', "AB".into()),
713 ],
714 ),
715 (
716 left_boundary_char_1,
717 "
718 (LABEL BOUNDARYCHAR)
719 (LIG C A C 1)
720 ",
721 "A",
722 vec![Ligature('1', "|A".into()),],
723 ),
724 (
725 left_boundary_char_2,
726 "
727 (LABEL BOUNDARYCHAR)
728 (/LIG/ C A C 1)
729 (/LIG/ C 1 C 2)
730 ",
731 "A",
732 vec![
733 Ligature('2', "|".into()),
734 Ligature('1', "".into()),
735 Char('A'),
736 ],
737 ),
738 (
739 left_boundary_char_3,
740 "
741 (LABEL BOUNDARYCHAR)
742 (/LIG/ C A C 1)
743 ",
744 "A",
745 vec![Ligature('1', "|".into()), Char('A'),],
746 ),
747 /*
748 (
749 right_boundary_char_1,
750 "
751 (LABEL C M)
752 (/LIG/ C L C N)
753 (STOP)
754 ",
755 "N",
756 vec![Char('N'), Ligature('Q', "|".into()),],
757 ),
758 (
759 right_boundary_char_2,
760 "
761 (LABEL C N)
762 (/LIG/ C L C Q)
763 (STOP)
764 ",
765 "M",
766 vec![
767 Char('M'),
768 Ligature('N', "".into()),
769 Ligature('Q', "|".into()),
770 ],
771 ),
772 */
773 (
774 right_boundary_char_3,
775 "
776 (LABEL C A)
777 (LIG C L C 1)
778 (STOP)
779 ",
780 "A",
781 vec![Ligature('1', "A|".into()),],
782 ),
783 (
784 right_boundary_char_4,
785 "
786 (LABEL C A)
787 (KRN C L R 1)
788 (STOP)
789 ",
790 "A",
791 vec![Char('A'), Kern(Scaled::ONE * 10),],
792 ),
793 );
794}