1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
//! Lig/kern programming.
//!
//! TFM files can include information about ligatures and kerns.
//! A [ligature](https://en.wikipedia.org/wiki/Ligature_(writing))
//!     is a special character that can replace two or more adjacent characters.
//! For example, the pair of characters ae can be replaced by the æ ligature which is a single character.
//! A [kern](https://en.wikipedia.org/wiki/Kerning) is special space inserted between
//!     two adjacent characters to align them better.
//! For example, a kern can be inserted between A and V to compensate for the large
//!     amount of space created by the specific combination of these two characters.
//!
//! ## The lig/kern programming language
//!
//! TFM provides ligature and kern data in the form of
//!     "instructions in a simple programming language that explains what to do for special letter pairs"
//!     (quoting TFtoPL.2014.13).
//! This lig/kern programming language can be used to specify instructions like
//!     "replace the pair (a,e) by æ" and
//!     "insert a kern of width -0.1pt between the pair (A,V)".
//! But it can also specify more complex behaviors.
//! For example, a lig/kern program can specify "replace the pair (x,y) by the pair (z,y)".
//!
//! In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions.
//! After this instruction is executed, there may be a new
//!     pair of characters remaining, as in the (x,y) to (z,y) instruction.
//! The lig/kern instruction for this pair is then executed, if it exists.
//! This process continues until there are no more instructions left to run.
//!
//! Lig/kern instructions are represented in this module by the [`lang::Instruction`] type.
//!
//! ## Related code by Knuth
//!
//! The TFtoPL and PLtoTF programs don't contain any code for running lig/kern programs.
//! They only contain logic for translating between the `.tfm` and `.pl`
//!     formats for lig/kern programs, and for doing some validation as described below.
//! Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
//!
//! One of the challenges with lig/kern programs is that they can contain infinite loops.
//! Here is a simple example of a lig/kern program with two instruction and an infinite loop:
//!
//! - Replace (x,y) with (z,y) (in property list format, `(LABEL C x)(LIG/ C y C z)`)
//! - Replace (z,y) with (x,y) (in property list format, `(LABEL C z)(LIG/ C y C x)`)
//!
//! When this program runs (x,y) will be swapped with (z,y) ad infinitum.
//! See TFtoPL.2014.88 for more examples.
//!
//! Both TFtoPL and PLtoTF contain code that checks that a lig/kern program
//!     does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125).
//! The algorithm for detecting infinite loops is a topological sorting algorithm
//!     over a graph where each node is a pair of characters.
//! However it's a bit complicated because the full graph cannot be constructed without
//!     running the lig/kern program.
//!
//! TeX does not check for infinite loops, presumably under the assumption that any `.tfm` file will have
//!     been generated by PLtoTF and thus already validated.
//! However TeX does check for interrupts when executing lig/kern programs so that
//!     at least a user can terminate TeX if an infinite loop is hit.
//! (See the `check_interrupt` line in KnuthTeX.2021.1040.)
//!
//! ## Functionality in this module
//!
//! This module handles lig/kern programs in a different way,
//!     inspired by the ["parse don't validate"](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
//!     philosophy.
//! This module is able to represent raw lig/kern programs as a vector of [`lang::Instruction`] values.
//! But can also _compile_ lig/kern programs (into a [`CompiledProgram`]).
//! This compilation process essentially executes the lig/kern program for every possible character pair.
//! The result is a map from each character pair to the full list of
//!     replacement characters and kerns for that pair.
//! If there is an infinite loop in the program this compilation will naturally fail.
//! The compiled program is thus a "parsed" version of the lig/kern program
//!     and it is impossible for infinite loops to appear in it.
//!
//! An advantage of this model is that the lig/kern program does not need to be repeatedly
//!     executed in the main hot loop of TeX.
//! This may make TeX faster.
//! However the compiled lig/kern program does have a larger memory footprint than the raw program,
//!     and so it may be slower if TeX is memory bound.

mod compiler;
use crate::Char;
use crate::Number;
use std::collections::BTreeMap;
use std::collections::HashMap;
pub mod lang;

/// A compiled lig/kern program.
#[derive(Debug)]
pub struct CompiledProgram {
    left_to_pairs: BTreeMap<Char, (u16, u16)>,
    pairs: Vec<(Char, RawReplacement)>,
    middle_chars: Vec<(Char, Number)>,
}

#[derive(Debug, Clone)]
struct RawReplacement {
    left_char_operation: LeftCharOperation,
    middle_char_bounds: std::ops::Range<u16>,
    last_char: Char,
}

impl CompiledProgram {
    /// Compile a lig/kern program.
    pub fn compile(
        program: &lang::Program,
        kerns: &[Number],
        entrypoints: HashMap<Char, u16>,
    ) -> (CompiledProgram, Option<InfiniteLoopError>) {
        compiler::compile(program, kerns, &entrypoints)
    }

    /// Get an iterator over the full lig/kern replacement for a pair of characters.
    pub fn get_replacement_iter(&self, left_char: Char, right_char: Char) -> ReplacementIter {
        self.get_replacement(left_char, right_char)
            .into_iter(left_char)
    }

    /// Get the full lig/kern replacement for a pair of characters.
    pub fn get_replacement(&self, left_char: Char, right_char: Char) -> Replacement {
        if let Some((lower, upper)) = self.left_to_pairs.get(&left_char) {
            for (candidate_right_char, replacement) in
                &self.pairs[(*lower as usize)..(*upper as usize)]
            {
                if *candidate_right_char != right_char {
                    continue;
                }
                return if replacement.middle_char_bounds.end == 0 {
                    Replacement {
                        left_char_operation: replacement.left_char_operation,
                        middle_chars: &[],
                        last_char: replacement.last_char,
                    }
                } else {
                    Replacement {
                        left_char_operation: replacement.left_char_operation,
                        middle_chars: &self.middle_chars[replacement.middle_char_bounds.start
                            as usize
                            ..replacement.middle_char_bounds.end as usize],
                        last_char: replacement.last_char,
                    }
                };
            }
        }
        Replacement::no_op(right_char)
    }

    /// Returns an iterator over all pairs `(char,char)` that have a replacement
    ///     specified in the lig/kern program.
    pub fn all_pairs_having_replacement(&self) -> impl '_ + Iterator<Item = (Char, Char)> {
        PairsIter {
            current_left: Char(0),
            left_iter: self.left_to_pairs.iter(),
            right_chars: vec![],
            program: self,
        }
    }

    /// Returns whether this program is seven-bit safe.
    ///
    /// A lig/kern program is seven-bit safe if the replacement for any
    ///     pair of seven-bit safe characters
    ///     consists only of seven-bit characters.
    /// Conversely a program is seven-bit unsafe if there is a
    ///     pair of seven-bit characters whose replacement
    ///     contains a non-seven-bit character.
    pub fn is_seven_bit_safe(&self) -> bool {
        self.all_pairs_having_replacement()
            .filter(|(l, r)| l.is_seven_bit() && r.is_seven_bit())
            .flat_map(|(l, r)| self.get_replacement_iter(l, r))
            .all(|(c, _kern)| c.is_seven_bit())
    }
}

/// An error returned from lig/kern compilation.
///
/// TODO: rename Cycle everywhere including the docs
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopError {
    /// The pair of characters the starts the infinite loop.
    pub starting_pair: (Option<Char>, Char),
}

impl InfiniteLoopError {
    pub fn pltotf_message(&self) -> String {
        let left = match self.starting_pair.0 {
            Some(c) => format!["'{:03o}", c.0],
            None => "boundary".to_string(),
        };
        format!(
            "Infinite ligature loop starting with {} and '{:03o}!",
            left, self.starting_pair.1 .0
        )
    }
    pub fn pltotf_section(&self) -> u8 {
        125
    }
}

/// One step in a lig/kern infinite loop.
///
/// A vector of these steps is returned in a [`InfiniteLoopError`].
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopStep {
    /// The index of the instruction to apply in this step.
    pub instruction_index: usize,
    /// The replacement text after applying this step.
    ///
    /// The boolean specifies whether the replacement begins with the
    /// left boundary char.
    pub post_replacement: (bool, Vec<Char>),
    /// The position of the cursor after applying this step.
    pub post_cursor_position: usize,
}

/// Data structure describing the replacement of a character pair in a lig/kern program.
pub struct Replacement<'a> {
    /// Operation to perform on the left character.
    pub left_char_operation: LeftCharOperation,
    /// Slice of characters and kerns to insert after the left character.
    pub middle_chars: &'a [(Char, Number)],
    /// Last character to insert.
    pub last_char: Char,
}

impl<'a> Replacement<'a> {
    fn no_op(right_char: Char) -> Replacement<'a> {
        Replacement {
            left_char_operation: LeftCharOperation::Retain,
            middle_chars: &[],
            last_char: right_char,
        }
    }
}

impl<'a> Replacement<'a> {
    pub fn into_iter(self, left_char: Char) -> ReplacementIter<'a> {
        ReplacementIter {
            left_char,
            full_operation: self,
            state: IterState::LeftChar,
        }
    }
}

/// Operation to perform on the left character of a lig/kern pair.
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
pub enum LeftCharOperation {
    /// Retain the left character and do not add a kern.
    Retain,
    /// Delete the left character.
    Delete,
    /// Retain the left character and append the specified kern.
    AppendKern(Number),
}

/// Iterator over the replacement of a character pair in a lig/kern program.
pub struct ReplacementIter<'a> {
    left_char: Char,
    full_operation: Replacement<'a>,
    state: IterState,
}

enum IterState {
    LeftChar,
    MiddleChar(usize),
    LastChar,
    Exhausted,
}

impl<'a> ReplacementIter<'a> {
    fn i(&self) -> (IterState, Option<(Char, Number)>) {
        match self.state {
            IterState::LeftChar => (
                IterState::MiddleChar(0),
                match self.full_operation.left_char_operation {
                    LeftCharOperation::Retain => Some((self.left_char, Number::ZERO)),
                    LeftCharOperation::Delete => None,
                    LeftCharOperation::AppendKern(kern) => Some((self.left_char, kern)),
                },
            ),
            IterState::MiddleChar(i) => match self.full_operation.middle_chars.get(i).copied() {
                None => (IterState::LastChar, None),
                Some(t) => (IterState::MiddleChar(i + 1), Some(t)),
            },
            IterState::LastChar => (
                IterState::Exhausted,
                Some((self.full_operation.last_char, Number::ZERO)),
            ),
            IterState::Exhausted => (IterState::Exhausted, None),
        }
    }
}

impl<'a> Iterator for ReplacementIter<'a> {
    type Item = (Char, Number);

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            let (state, r) = self.i();
            self.state = state;
            match (&self.state, r) {
                (_, Some(t)) => return Some(t),
                (IterState::Exhausted, _) => return None,
                (_, _) => {}
            }
        }
    }
}

/// An iterator over all pairs of characters that have a lig/kern replacement in a program.
struct PairsIter<'a, L> {
    current_left: Char,
    left_iter: L,
    right_chars: Vec<Char>,
    program: &'a CompiledProgram,
}

impl<'a, L: 'a + Iterator<Item = (&'a Char, &'a (u16, u16))>> Iterator for PairsIter<'a, L> {
    type Item = (Char, Char);
    fn next(&mut self) -> Option<Self::Item> {
        match self.right_chars.pop() {
            Some(right_char) => Some((self.current_left, right_char)),
            None => match self.left_iter.next() {
                None => None,
                Some((&new_left, (lower, upper))) => {
                    self.current_left = new_left;
                    self.right_chars = self.program.pairs[*lower as usize..*upper as usize]
                        .iter()
                        .map(|t| t.0)
                        .collect();
                    Some((new_left, self.right_chars.pop().unwrap()))
                }
            },
        }
    }
}