1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292
//! Lig/kern programming.
//!
//! TFM files can include information about ligatures and kerns.
//! A [ligature](https://en.wikipedia.org/wiki/Ligature_(writing))
//! is a special character that can replace two or more adjacent characters.
//! For example, the pair of characters ae can be replaced by the æ ligature which is a single character.
//! A [kern](https://en.wikipedia.org/wiki/Kerning) is special space inserted between
//! two adjacent characters to align them better.
//! For example, a kern can be inserted between A and V to compensate for the large
//! amount of space created by the specific combination of these two characters.
//!
//! ## The lig/kern programming language
//!
//! TFM provides ligature and kern data in the form of
//! "instructions in a simple programming language that explains what to do for special letter pairs"
//! (quoting TFtoPL.2014.13).
//! This lig/kern programming language can be used to specify instructions like
//! "replace the pair (a,e) by æ" and
//! "insert a kern of width -0.1pt between the pair (A,V)".
//! But it can also specify more complex behaviors.
//! For example, a lig/kern program can specify "replace the pair (x,y) by the pair (z,y)".
//!
//! In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions.
//! After this instruction is executed, there may be a new
//! pair of characters remaining, as in the (x,y) to (z,y) instruction.
//! The lig/kern instruction for this pair is then executed, if it exists.
//! This process continues until there are no more instructions left to run.
//!
//! Lig/kern instructions are represented in this module by the [`lang::Instruction`] type.
//!
//! ## Related code by Knuth
//!
//! The TFtoPL and PLtoTF programs don't contain any code for running lig/kern programs.
//! They only contain logic for translating between the `.tfm` and `.pl`
//! formats for lig/kern programs, and for doing some validation as described below.
//! Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
//!
//! One of the challenges with lig/kern programs is that they can contain infinite loops.
//! Here is a simple example of a lig/kern program with two instruction and an infinite loop:
//!
//! - Replace (x,y) with (z,y) (in property list format, `(LABEL C x)(LIG/ C y C z)`)
//! - Replace (z,y) with (x,y) (in property list format, `(LABEL C z)(LIG/ C y C x)`)
//!
//! When this program runs (x,y) will be swapped with (z,y) ad infinitum.
//! See TFtoPL.2014.88 for more examples.
//!
//! Both TFtoPL and PLtoTF contain code that checks that a lig/kern program
//! does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125).
//! The algorithm for detecting infinite loops is a topological sorting algorithm
//! over a graph where each node is a pair of characters.
//! However it's a bit complicated because the full graph cannot be constructed without
//! running the lig/kern program.
//!
//! TeX does not check for infinite loops, presumably under the assumption that any `.tfm` file will have
//! been generated by PLtoTF and thus already validated.
//! However TeX does check for interrupts when executing lig/kern programs so that
//! at least a user can terminate TeX if an infinite loop is hit.
//! (See the `check_interrupt` line in KnuthTeX.2021.1040.)
//!
//! ## Functionality in this module
//!
//! This module handles lig/kern programs in a different way,
//! inspired by the ["parse don't validate"](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
//! philosophy.
//! This module is able to represent raw lig/kern programs as a vector of [`lang::Instruction`] values.
//! But can also _compile_ lig/kern programs (into a [`CompiledProgram`]).
//! This compilation process essentially executes the lig/kern program for every possible character pair.
//! The result is a map from each character pair to the full list of
//! replacement characters and kerns for that pair.
//! If there is an infinite loop in the program this compilation will naturally fail.
//! The compiled program is thus a "parsed" version of the lig/kern program
//! and it is impossible for infinite loops to appear in it.
//!
//! An advantage of this model is that the lig/kern program does not need to be repeatedly
//! executed in the main hot loop of TeX.
//! This may make TeX faster.
//! However the compiled lig/kern program does have a larger memory footprint than the raw program,
//! and so it may be slower if TeX is memory bound.
mod compiler;
use crate::Char;
use crate::FixWord;
use std::collections::BTreeMap;
use std::collections::HashMap;
pub mod lang;
/// A compiled lig/kern program.
#[derive(Clone, Debug)]
pub struct CompiledProgram {
left_to_pairs: BTreeMap<Char, (u16, u16)>,
pairs: Vec<(Char, RawReplacement)>,
middle_chars: Vec<(Char, FixWord)>,
}
#[derive(Debug, Clone)]
struct RawReplacement {
left_char_operation: LeftCharOperation,
middle_char_bounds: std::ops::Range<u16>,
last_char: Char,
}
/// Lig/kern operation on two characters.
#[derive(Debug, Clone)]
pub enum Op {
/// Do nothing.
None,
/// Insert a kern between the two characters.
///
/// TODO: should return core::Scaled.
Kern(FixWord),
/// Replace the two characters with the specified ligature character.
SimpleLig(Char),
/// Replace the two characters with the specified sequence of characters and kerns.
///
/// TODO: change the vec to a reference.
ComplexLig(Vec<(Char, FixWord)>, Char),
}
impl Op {
pub fn build_sequence(&self, left_char: Char, right_char: Char) -> Vec<(Char, FixWord)> {
match self {
Op::None => vec![(left_char, FixWord::ZERO), (right_char, FixWord::ZERO)],
Op::Kern(fix_word) => vec![(left_char, *fix_word), (right_char, FixWord::ZERO)],
Op::SimpleLig(char) => vec![(*char, FixWord::ZERO)],
Op::ComplexLig(items, char) => {
let mut v = items.clone();
v.push((*char, FixWord::ZERO));
v
}
}
}
}
impl CompiledProgram {
/// Compile a lig/kern program.
pub fn compile(
program: &lang::Program,
kerns: &[FixWord],
entrypoints: HashMap<Char, u16>,
) -> (CompiledProgram, Vec<InfiniteLoopError>) {
compiler::compile(program, kerns, &entrypoints)
}
/// Get an operation between two characters.
///
/// TODO: what about boundaries?
pub fn get_op(&self, left_char: Char, right_char: Char) -> Op {
let Some((lower, upper)) = self.left_to_pairs.get(&left_char) else {
return Op::None;
};
for (candidate_right_char, r) in &self.pairs[(*lower as usize)..(*upper as usize)] {
if *candidate_right_char != right_char {
continue;
}
let first_or = match (
r.left_char_operation,
r.middle_char_bounds.end == 0,
r.last_char == right_char,
) {
(LeftCharOperation::Retain, true, true) => return Op::None,
(LeftCharOperation::AppendKern(fix_word), true, true) => return Op::Kern(fix_word),
(LeftCharOperation::Delete, true, _) => return Op::SimpleLig(r.last_char),
(LeftCharOperation::Retain, _, _) => Some((left_char, FixWord::ZERO)),
(LeftCharOperation::AppendKern(fix_word), _, _) => Some((left_char, fix_word)),
(LeftCharOperation::Delete, false, _) => None,
};
let mut vc = vec![];
if let Some(first) = first_or {
vc.push(first);
}
vc.extend_from_slice(
&self.middle_chars
[r.middle_char_bounds.start as usize..r.middle_char_bounds.end as usize],
);
return Op::ComplexLig(vc, r.last_char);
}
Op::None
}
/// Returns an iterator over all pairs `(char,char)` that have an operation
/// specified in the lig/kern program.
pub fn all_pairs_having_ops(&self) -> impl '_ + Iterator<Item = (Char, Char)> {
PairsIter {
current_left: Char(0),
left_iter: self.left_to_pairs.iter(),
right_chars: vec![],
program: self,
}
}
/// Returns whether this program is seven-bit safe.
///
/// A lig/kern program is seven-bit safe if the replacement for any
/// pair of seven-bit safe characters
/// consists only of seven-bit characters.
/// Conversely a program is seven-bit unsafe if there is a
/// pair of seven-bit characters whose replacement
/// contains a non-seven-bit character.
pub fn is_seven_bit_safe(&self) -> bool {
self.all_pairs_having_ops()
.filter(|(l, r)| l.is_seven_bit() && r.is_seven_bit())
.map(|(l, r)| self.get_op(l, r))
.all(|op| match op {
Op::None => true,
Op::Kern(_) => true,
Op::SimpleLig(char) => char.is_seven_bit(),
Op::ComplexLig(items, char) => {
items.into_iter().all(|(c, _)| c.is_seven_bit()) && char.is_seven_bit()
}
})
}
}
/// An error returned from lig/kern compilation.
///
/// TODO: rename Cycle everywhere including the docs
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopError {
/// The pair of characters the starts the infinite loop.
pub starting_pair: (Option<Char>, Char),
}
impl InfiniteLoopError {
pub fn pltotf_message(&self) -> String {
let left = match self.starting_pair.0 {
Some(c) => format!["'{:03o}", c.0],
None => "boundary".to_string(),
};
format!(
"Infinite ligature loop starting with {} and '{:03o}!",
left, self.starting_pair.1 .0
)
}
pub fn pltotf_section(&self) -> u8 {
125
}
}
/// One step in a lig/kern infinite loop.
///
/// A vector of these steps is returned in a [`InfiniteLoopError`].
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopStep {
/// The index of the instruction to apply in this step.
pub instruction_index: usize,
/// The replacement text after applying this step.
///
/// The boolean specifies whether the replacement begins with the
/// left boundary char.
pub post_replacement: (bool, Vec<Char>),
/// The position of the cursor after applying this step.
pub post_cursor_position: usize,
}
/// Operation to perform on the left character of a lig/kern pair.
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
enum LeftCharOperation {
/// Retain the left character and do not add a kern.
Retain,
/// Delete the left character.
Delete,
/// Retain the left character and append the specified kern.
AppendKern(FixWord),
}
/// An iterator over all pairs of characters that have a lig/kern replacement in a program.
struct PairsIter<'a, L> {
current_left: Char,
left_iter: L,
right_chars: Vec<Char>,
program: &'a CompiledProgram,
}
impl<'a, L: 'a + Iterator<Item = (&'a Char, &'a (u16, u16))>> Iterator for PairsIter<'a, L> {
type Item = (Char, Char);
fn next(&mut self) -> Option<Self::Item> {
match self.right_chars.pop() {
Some(right_char) => Some((self.current_left, right_char)),
None => match self.left_iter.next() {
None => None,
Some((&new_left, (lower, upper))) => {
self.current_left = new_left;
self.right_chars = self.program.pairs[*lower as usize..*upper as usize]
.iter()
.map(|t| t.0)
.collect();
Some((new_left, self.right_chars.pop().unwrap()))
}
},
}
}
}