1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336
//! Lig/kern programming.
//!
//! TFM files can include information about ligatures and kerns.
//! A [ligature](https://en.wikipedia.org/wiki/Ligature_(writing))
//! is a special character that can replace two or more adjacent characters.
//! For example, the pair of characters ae can be replaced by the æ ligature which is a single character.
//! A [kern](https://en.wikipedia.org/wiki/Kerning) is special space inserted between
//! two adjacent characters to align them better.
//! For example, a kern can be inserted between A and V to compensate for the large
//! amount of space created by the specific combination of these two characters.
//!
//! ## The lig/kern programming language
//!
//! TFM provides ligature and kern data in the form of
//! "instructions in a simple programming language that explains what to do for special letter pairs"
//! (quoting TFtoPL.2014.13).
//! This lig/kern programming language can be used to specify instructions like
//! "replace the pair (a,e) by æ" and
//! "insert a kern of width -0.1pt between the pair (A,V)".
//! But it can also specify more complex behaviors.
//! For example, a lig/kern program can specify "replace the pair (x,y) by the pair (z,y)".
//!
//! In general for any pair of characters (x,y) the program specifies zero or one lig/kern instructions.
//! After this instruction is executed, there may be a new
//! pair of characters remaining, as in the (x,y) to (z,y) instruction.
//! The lig/kern instruction for this pair is then executed, if it exists.
//! This process continues until there are no more instructions left to run.
//!
//! Lig/kern instructions are represented in this module by the [`lang::Instruction`] type.
//!
//! ## Related code by Knuth
//!
//! The TFtoPL and PLtoTF programs don't contain any code for running lig/kern programs.
//! They only contain logic for translating between the `.tfm` and `.pl`
//! formats for lig/kern programs, and for doing some validation as described below.
//! Lig/kern programs are actually executed in TeX; see KnuthTeX.2021.1032-1040.
//!
//! One of the challenges with lig/kern programs is that they can contain infinite loops.
//! Here is a simple example of a lig/kern program with two instruction and an infinite loop:
//!
//! - Replace (x,y) with (z,y) (in property list format, `(LABEL C x)(LIG/ C y C z)`)
//! - Replace (z,y) with (x,y) (in property list format, `(LABEL C z)(LIG/ C y C x)`)
//!
//! When this program runs (x,y) will be swapped with (z,y) ad infinitum.
//! See TFtoPL.2014.88 for more examples.
//!
//! Both TFtoPL and PLtoTF contain code that checks that a lig/kern program
//! does not contain infinite loops (TFtoPL.2014.88-95 and PLtoTF.2014.116-125).
//! The algorithm for detecting infinite loops is a topological sorting algorithm
//! over a graph where each node is a pair of characters.
//! However it's a bit complicated because the full graph cannot be constructed without
//! running the lig/kern program.
//!
//! TeX does not check for infinite loops, presumably under the assumption that any `.tfm` file will have
//! been generated by PLtoTF and thus already validated.
//! However TeX does check for interrupts when executing lig/kern programs so that
//! at least a user can terminate TeX if an infinite loop is hit.
//! (See the `check_interrupt` line in KnuthTeX.2021.1040.)
//!
//! ## Functionality in this module
//!
//! This module handles lig/kern programs in a different way,
//! inspired by the ["parse don't validate"](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
//! philosophy.
//! This module is able to represent raw lig/kern programs as a vector of [`lang::Instruction`] values.
//! But can also _compile_ lig/kern programs (into a [`CompiledProgram`]).
//! This compilation process essentially executes the lig/kern program for every possible character pair.
//! The result is a map from each character pair to the full list of
//! replacement characters and kerns for that pair.
//! If there is an infinite loop in the program this compilation will naturally fail.
//! The compiled program is thus a "parsed" version of the lig/kern program
//! and it is impossible for infinite loops to appear in it.
//!
//! An advantage of this model is that the lig/kern program does not need to be repeatedly
//! executed in the main hot loop of TeX.
//! This may make TeX faster.
//! However the compiled lig/kern program does have a larger memory footprint than the raw program,
//! and so it may be slower if TeX is memory bound.
mod compiler;
use crate::Char;
use crate::Number;
use std::collections::BTreeMap;
use std::collections::HashMap;
pub mod lang;
/// A compiled lig/kern program.
#[derive(Debug)]
pub struct CompiledProgram {
left_to_pairs: BTreeMap<Char, (u16, u16)>,
pairs: Vec<(Char, RawReplacement)>,
middle_chars: Vec<(Char, Number)>,
}
#[derive(Debug, Clone)]
struct RawReplacement {
left_char_operation: LeftCharOperation,
middle_char_bounds: std::ops::Range<u16>,
last_char: Char,
}
impl CompiledProgram {
/// Compile a lig/kern program.
pub fn compile(
program: &lang::Program,
kerns: &[Number],
entrypoints: HashMap<Char, u16>,
) -> (CompiledProgram, Option<InfiniteLoopError>) {
compiler::compile(program, kerns, &entrypoints)
}
/// Get an iterator over the full lig/kern replacement for a pair of characters.
pub fn get_replacement_iter(&self, left_char: Char, right_char: Char) -> ReplacementIter {
self.get_replacement(left_char, right_char)
.into_iter(left_char)
}
/// Get the full lig/kern replacement for a pair of characters.
pub fn get_replacement(&self, left_char: Char, right_char: Char) -> Replacement {
if let Some((lower, upper)) = self.left_to_pairs.get(&left_char) {
for (candidate_right_char, replacement) in
&self.pairs[(*lower as usize)..(*upper as usize)]
{
if *candidate_right_char != right_char {
continue;
}
return if replacement.middle_char_bounds.end == 0 {
Replacement {
left_char_operation: replacement.left_char_operation,
middle_chars: &[],
last_char: replacement.last_char,
}
} else {
Replacement {
left_char_operation: replacement.left_char_operation,
middle_chars: &self.middle_chars[replacement.middle_char_bounds.start
as usize
..replacement.middle_char_bounds.end as usize],
last_char: replacement.last_char,
}
};
}
}
Replacement::no_op(right_char)
}
/// Returns an iterator over all pairs `(char,char)` that have a replacement
/// specified in the lig/kern program.
pub fn all_pairs_having_replacement(&self) -> impl '_ + Iterator<Item = (Char, Char)> {
PairsIter {
current_left: Char(0),
left_iter: self.left_to_pairs.iter(),
right_chars: vec![],
program: self,
}
}
/// Returns whether this program is seven-bit safe.
///
/// A lig/kern program is seven-bit safe if the replacement for any
/// pair of seven-bit safe characters
/// consists only of seven-bit characters.
/// Conversely a program is seven-bit unsafe if there is a
/// pair of seven-bit characters whose replacement
/// contains a non-seven-bit character.
pub fn is_seven_bit_safe(&self) -> bool {
self.all_pairs_having_replacement()
.filter(|(l, r)| l.is_seven_bit() && r.is_seven_bit())
.flat_map(|(l, r)| self.get_replacement_iter(l, r))
.all(|(c, _kern)| c.is_seven_bit())
}
}
/// An error returned from lig/kern compilation.
///
/// TODO: rename Cycle everywhere including the docs
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopError {
/// The pair of characters the starts the infinite loop.
pub starting_pair: (Option<Char>, Char),
}
impl InfiniteLoopError {
pub fn pltotf_message(&self) -> String {
let left = match self.starting_pair.0 {
Some(c) => format!["'{:03o}", c.0],
None => "boundary".to_string(),
};
format!(
"Infinite ligature loop starting with {} and '{:03o}!",
left, self.starting_pair.1 .0
)
}
pub fn pltotf_section(&self) -> u8 {
125
}
}
/// One step in a lig/kern infinite loop.
///
/// A vector of these steps is returned in a [`InfiniteLoopError`].
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InfiniteLoopStep {
/// The index of the instruction to apply in this step.
pub instruction_index: usize,
/// The replacement text after applying this step.
///
/// The boolean specifies whether the replacement begins with the
/// left boundary char.
pub post_replacement: (bool, Vec<Char>),
/// The position of the cursor after applying this step.
pub post_cursor_position: usize,
}
/// Data structure describing the replacement of a character pair in a lig/kern program.
pub struct Replacement<'a> {
/// Operation to perform on the left character.
pub left_char_operation: LeftCharOperation,
/// Slice of characters and kerns to insert after the left character.
pub middle_chars: &'a [(Char, Number)],
/// Last character to insert.
pub last_char: Char,
}
impl<'a> Replacement<'a> {
fn no_op(right_char: Char) -> Replacement<'a> {
Replacement {
left_char_operation: LeftCharOperation::Retain,
middle_chars: &[],
last_char: right_char,
}
}
}
impl<'a> Replacement<'a> {
pub fn into_iter(self, left_char: Char) -> ReplacementIter<'a> {
ReplacementIter {
left_char,
full_operation: self,
state: IterState::LeftChar,
}
}
}
/// Operation to perform on the left character of a lig/kern pair.
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
pub enum LeftCharOperation {
/// Retain the left character and do not add a kern.
Retain,
/// Delete the left character.
Delete,
/// Retain the left character and append the specified kern.
AppendKern(Number),
}
/// Iterator over the replacement of a character pair in a lig/kern program.
pub struct ReplacementIter<'a> {
left_char: Char,
full_operation: Replacement<'a>,
state: IterState,
}
enum IterState {
LeftChar,
MiddleChar(usize),
LastChar,
Exhausted,
}
impl<'a> ReplacementIter<'a> {
fn i(&self) -> (IterState, Option<(Char, Number)>) {
match self.state {
IterState::LeftChar => (
IterState::MiddleChar(0),
match self.full_operation.left_char_operation {
LeftCharOperation::Retain => Some((self.left_char, Number::ZERO)),
LeftCharOperation::Delete => None,
LeftCharOperation::AppendKern(kern) => Some((self.left_char, kern)),
},
),
IterState::MiddleChar(i) => match self.full_operation.middle_chars.get(i).copied() {
None => (IterState::LastChar, None),
Some(t) => (IterState::MiddleChar(i + 1), Some(t)),
},
IterState::LastChar => (
IterState::Exhausted,
Some((self.full_operation.last_char, Number::ZERO)),
),
IterState::Exhausted => (IterState::Exhausted, None),
}
}
}
impl<'a> Iterator for ReplacementIter<'a> {
type Item = (Char, Number);
fn next(&mut self) -> Option<Self::Item> {
loop {
let (state, r) = self.i();
self.state = state;
match (&self.state, r) {
(_, Some(t)) => return Some(t),
(IterState::Exhausted, _) => return None,
(_, _) => {}
}
}
}
}
/// An iterator over all pairs of characters that have a lig/kern replacement in a program.
struct PairsIter<'a, L> {
current_left: Char,
left_iter: L,
right_chars: Vec<Char>,
program: &'a CompiledProgram,
}
impl<'a, L: 'a + Iterator<Item = (&'a Char, &'a (u16, u16))>> Iterator for PairsIter<'a, L> {
type Item = (Char, Char);
fn next(&mut self) -> Option<Self::Item> {
match self.right_chars.pop() {
Some(right_char) => Some((self.current_left, right_char)),
None => match self.left_iter.next() {
None => None,
Some((&new_left, (lower, upper))) => {
self.current_left = new_left;
self.right_chars = self.program.pairs[*lower as usize..*upper as usize]
.iter()
.map(|t| t.0)
.collect();
Some((new_left, self.right_chars.pop().unwrap()))
}
},
}
}
}