Font and text overhaul #30161

QuLogic · 2025-06-10T21:03:43Z

PR summary

This PR is intended to hold all font and text PRs from the project Font and text overhaul

In order to not overwhelm the main repo with the churn of test image replacements, this PR comes from my fork and should only ever have 1 commit more than the text-overhaul branch with the changes to test images.

PR checklist

[n/a] "closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
Plotting related features are demonstrated in an example
New Features and API Changes are noted with a directive and release note
Documentation complies with general and docstring guidelines

Fix center of rotation with rotation_mode='anchor'

Glyph indices are specific to each font. It does not make sense to fall back based on glyph index to another font. This could only really be populated by calling `FT2Font.set_text`, but even that was fragile. If a fallback font was used for a character with the same glyph index as a previous character in the main font, then these lookups could be overwritten to the fallback instead of the main font, with a completely different character! Fortunately, nothing actually uses or requires a fallback through glyph indices.

Remove ttconv backwards-compatibility code

Remove fallback code for glyph indices

QuLogic · 2025-06-19T22:27:25Z

Also, if you would like to follow along with the figure changes, I've posted a branch that does the changes per merge commit: https://github.com/QuLogic/matplotlib/tree/text-overhaul-figures-per-commit

This allows checking that there are no _new_ failures, without committing the new figures to the repo until the branch is complete.

ci: Preload existing test images from text-overhaul-figures branch

Also, check some expected conditions at parse time instead of somewhere during use of the data.

ci: Fix image preload with multiple conflicts

Add typing to AFM parser

ft2font: Split layouting from set_text

Add os.PathLike support to FT2Font constructor, and FontManager

If the larger glyphs for an auto-sized character in `cmex10` uses a character that is in the `latex_to_bakoma` table, then it will be mapped an extra time into `cmr10` (usually). Thus we end up with a large version of a "normal" character, such as an exclamation point. Instead map these glyphs through the `latex_to_bakoma` table by using their glyph names as "commands". This ensures they don't get double-mapped to the wrong font and fixes the following issues: - slash (/) uses a comma at the larger sizes - right parenthesis uses an exclamation point at the largest size - left and right braces use parentheses at the largest size - right floor uses a percentage sign at the largest size - left ceiling uses an ampersand at the largest size Also, drop the regular size braces, as they are the same as the first `big`-sized version.

Fix auto-sized glyphs with BaKoMa fonts

For character codes outside the embedded font limits (256 for type 3 and 65536 for type 42), we output them as XObjects instead of using text commands. But there is nothing in the PDF spec that requires any specific encoding like this. Since we now support subsetting all fonts before embedding, split each font into groups based on the maximum character code (e.g., 256-entry groups for type 3), then switch text strings to a different font subset and re-map character codes to it when necessary. This means all text is true text (albeit with some strange encoding), and we no longer need any XObjects for glyphs. For users of non-English text, this means it will become selectable and copyable again. Fixes matplotlib#21797

For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF 1.2), and for Type 42 fonts, correct the Unicode encoding, which should be UTF-16BE, not UCS2.

These characters are outside the BMP and should test subset splitting for type 42 output in PDF.

pdf: Improve text with characters outside embedded font limits

No need to repeat the calculation of subset blocks, but instead offload it to `track_glyph`.

Instead of splitting fonts into `subset_size` blocks and writing text as character code modulo `subset_size`, compress the blocks by doing two things: 1. Preserve the character code if it lies in the first block. This keeps ASCII (for Type 3) and the Basic Multilingual Plane (for Type 42) as their normal codes. 2. Push everything else into the next spot in the next block, splitting by `subset_size` as necessary. This should reduce the number of additional font subsets to embed.

If mixing languages, sometimes a single character may use different glyphs in one document. In that case, we need to give it a new character code in the next subset, since subset 0 is preserving character codes.

For ligatures or complex shapings, multiple characters may map to a single glyph. In this case, we still want to output a single character code for the string using the font subset, but the `ToUnicode` map should give back all the characters.

Previously, this was supposed to "upgrade" type 3 to type 42 if the number of glyphs overflowed. However, as `CharacterTracker` can suggest a new subset for other reasons (i.e., multiple glyphs for the same character or a glyph for multiple characters may go to a second subset), we do need proper subset handling here as well. Since that is now done, we can drop the "promotion" from type 3 to type 42, as we don't get too many glyphs in each embedded font.

Prepare `CharacterTracker` for advanced font features

Font features allow font designers to provide alternate glyphs or shaping within a single font. These features may be accessed via special tags corresponding to internal tables of glyphs. The mplcairo backend supports font features via an elaborate re-use of the font file path [1]. This commit adds the API to make this officially supported in the main user API. [1] https://github.com/matplotlib/mplcairo/blob/v0.6.1/README.rst#font-formats-and-features

Add font feature API to Text

Previously, in a mathtext string like `r"$\sin x$"`, a thin space would (correctly) be added between "sin" and "x", but that space would be missing in expressions like `r"$\max f$"`. The difference arose because of the slightly different handling of subscripts and superscripts after the `\sin` and `\max` operators: `\sin^n` puts the superscript as a normal exponent, but `\max_x` puts the subscript centered below the operator name ("overunder symbol). The previous code for inserting the thin space did not handle the "overunder" case; fix that. The new behavior is tested by the change in test_operator_space, as well as by mathtext1_dejavusans_06. The change in mathtext_foo_29 arises because the extra thin space now inserted after `\limsup` slightly shifts the centering of the whole string. Ideally that thin space should be suppressed if there's no token after the operator, but that's not something currently implemented either for e.g. `\sin` (compare e.g. the right-alignments in `text(.5, .9, r"$\sin$", ha="right"); text(.5, .8, r"$\mathrm{sin}$", ha="right"); axvline(.5)` where the extra thin space after `\sin` is visible), so this patch just makes things more consistent.

Rename _in_subscript_or_superscript to the more descriptive _needs_space_after_subsuper; simplify its setting in operatorname(); avoid the need to introduce an extra explicitly-typed spaced_nucleus variable.

anntzer and others added 3 commits June 5, 2025 03:33

Remove ttconv backwards-compatibility code

8caff88

Fix center of rotation with rotation_mode='anchor'

c44db77

Merge pull request matplotlib#29199 from WPurre/fixing-rotation-bug

f1cdc19

Fix center of rotation with rotation_mode='anchor'

QuLogic added this to the v3.11.0 milestone Jun 10, 2025

QuLogic added the status: waiting for other PR label Jun 10, 2025

QuLogic added this to Font and text overhaul Jun 10, 2025

github-project-automation bot moved this to Waiting for other PR in Font and text overhaul Jun 10, 2025

github-actions bot added topic: mplot3d backend: agg labels Jun 10, 2025

QuLogic added 4 commits June 12, 2025 18:46

Merge pull request matplotlib#30145 from QuLogic/remove-ttconv

7dafe63

Remove ttconv backwards-compatibility code

Merge pull request matplotlib#30168 from QuLogic/no-glyph-fallback

67d1a02

Remove fallback code for glyph indices

Merge branch 'main' into text-overhaul

bb9aae4

QuLogic force-pushed the text-overhaul-figures branch from 045897c to e4be26c Compare June 19, 2025 22:15

github-actions bot added topic: path handling backend: pdf topic: text/fonts labels Jun 19, 2025

QuLogic force-pushed the text-overhaul-figures branch from e4be26c to 2b3f5c5 Compare June 19, 2025 22:25

QuLogic mentioned this pull request Jun 26, 2025

Fixed several accuracy bugs with image resampling #30184

Open

1 task

QuLogic added 8 commits July 7, 2025 19:06

Merge branch 'main' into text-overhaul

a7fd524

ci: Preload existing test images from text-overhaul-figures branch

389373e

This allows checking that there are no _new_ failures, without committing the new figures to the repo until the branch is complete.

Merge pull request matplotlib#30231 from QuLogic/preload-ci

4d47644

ci: Preload existing test images from text-overhaul-figures branch

Add typing to AFM parser

a018606

Also, check some expected conditions at parse time instead of somewhere during use of the data.

ci: Fix image preload with multiple conflicts

aff20cf

Merge pull request matplotlib#30274 from QuLogic/fix-preload

572540d

ci: Fix image preload with multiple conflicts

Merge pull request matplotlib#30134 from QuLogic/afm-typing

f231f2e

Add typing to AFM parser

Remove kerning_factor from tests

7b4d725

QuLogic force-pushed the text-overhaul-figures branch from 2b3f5c5 to b17bef1 Compare July 10, 2025 04:06

github-actions bot added the backend: ps label Jul 10, 2025

QuLogic and others added 2 commits September 25, 2025 13:50

Merge pull request matplotlib#30595 from QuLogic/libraqm-refactor

70563bd

ft2font: Split layouting from set_text

Merge pull request matplotlib#30573 from QuLogic/font-pathlike

1b3ba17

Add os.PathLike support to FT2Font constructor, and FontManager

QuLogic force-pushed the text-overhaul-figures branch from 16619c6 to 2117a71 Compare September 25, 2025 21:31

QuLogic added 7 commits September 25, 2025 17:32

Remove dead code from Auto{Height,Width}Char

e422def

Merge pull request matplotlib#29936 from QuLogic/bakoma-sizes

b6be596

Fix auto-sized glyphs with BaKoMa fonts

pdf: Correct Unicode mapping for out-of-range font chunks

1c4af68

For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF 1.2), and for Type 42 fonts, correct the Unicode encoding, which should be UTF-16BE, not UCS2.

TST: Add emoji to multi-font text

6cedcf7

These characters are outside the BMP and should test subset splitting for type 42 output in PDF.

DOC: Add a release note for PDF font embedding fixes

c908bbf

QuLogic force-pushed the text-overhaul-figures branch from 2117a71 to d56b646 Compare September 26, 2025 00:08

Merge pull request matplotlib#30512 from QuLogic/pdf-text-subsets

a1ed4ef

pdf: Improve text with characters outside embedded font limits

QuLogic force-pushed the text-overhaul-figures branch from d56b646 to 9703849 Compare September 27, 2025 05:54

QuLogic added 5 commits September 29, 2025 17:45

Deduplicate CharacterTracker.track implementation

50f76ff

No need to repeat the calculation of subset blocks, but instead offload it to `track_glyph`.

pdf: Fix first-block characters using multiple glyph representations

70dc388

If mixing languages, sometimes a single character may use different glyphs in one document. In that case, we need to give it a new character code in the next subset, since subset 0 is preserving character codes.

github-actions bot added the status: needs rebase label Oct 2, 2025

QuLogic and others added 8 commits October 2, 2025 19:00

Merge pull request matplotlib#30608 from QuLogic/simpler-track

ed4ca6c

Prepare `CharacterTracker` for advanced font features

Merge pull request matplotlib#29695 from QuLogic/font-features

a6ac58f

Add font feature API to Text

Merge branch 'main' into text-overhaul

a0fb5cf

Merge branch 'main' into text-overhaul

707f384

Tweak sub/superscript spacing implementation.

17428e3

Rename _in_subscript_or_superscript to the more descriptive _needs_space_after_subsuper; simplify its setting in operatorname(); avoid the need to introduce an extra explicitly-typed spaced_nucleus variable.

Update test images for font/text overhaul

ca9c54c

QuLogic force-pushed the text-overhaul-figures branch from 51d80cc to ca9c54c Compare November 6, 2025 17:26

github-actions bot removed the status: needs rebase label Nov 6, 2025

Update test images for previous libraqm-vector changes

2c24cd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Font and text overhaul #30161

Font and text overhaul #30161

Uh oh!

QuLogic commented Jun 10, 2025

Uh oh!

QuLogic commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Font and text overhaul #30161

Are you sure you want to change the base?

Font and text overhaul #30161

Uh oh!

Conversation

QuLogic commented Jun 10, 2025

PR summary

PR checklist

Uh oh!

QuLogic commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants