🌐 AI搜索 & 代理 主页
Skip to content

Conversation

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Jun 10, 2025

PR summary

This PR is intended to hold all font and text PRs from the project Font and text overhaul

In order to not overwhelm the main repo with the churn of test image replacements, this PR comes from my fork and should only ever have 1 commit more than the text-overhaul branch with the changes to test images.

PR checklist

QuLogic added 4 commits June 12, 2025 18:46
Glyph indices are specific to each font. It does not make sense to fall
back based on glyph index to another font.

This could only really be populated by calling `FT2Font.set_text`, but
even that was fragile. If a fallback font was used for a character with
the same glyph index as a previous character in the main font, then
these lookups could be overwritten to the fallback instead of the main
font, with a completely different character!

Fortunately, nothing actually uses or requires a fallback through glyph
indices.
Remove ttconv backwards-compatibility code
@QuLogic
Copy link
Member Author

QuLogic commented Jun 19, 2025

Also, if you would like to follow along with the figure changes, I've posted a branch that does the changes per merge commit: https://github.com/QuLogic/matplotlib/tree/text-overhaul-figures-per-commit

QuLogic added 8 commits July 7, 2025 19:06
This allows checking that there are no _new_ failures, without
committing the new figures to the repo until the branch is complete.
ci: Preload existing test images from text-overhaul-figures branch
Also, check some expected conditions at parse time instead of somewhere
during use of the data.
ci: Fix image preload with multiple conflicts
@QuLogic QuLogic force-pushed the text-overhaul-figures branch from 2b3f5c5 to b17bef1 Compare July 10, 2025 04:06
QuLogic and others added 2 commits September 25, 2025 13:50
Add os.PathLike support to FT2Font constructor, and FontManager
@QuLogic QuLogic force-pushed the text-overhaul-figures branch from 16619c6 to 2117a71 Compare September 25, 2025 21:31
If the larger glyphs for an auto-sized character in `cmex10` uses a
character that is in the `latex_to_bakoma` table, then it will be mapped
an extra time into `cmr10` (usually). Thus we end up with a large
version of a "normal" character, such as an exclamation point.

Instead map these glyphs through the `latex_to_bakoma` table by using
their glyph names as "commands". This ensures they don't get
double-mapped to the wrong font and fixes the following issues:

- slash (/) uses a comma at the larger sizes
- right parenthesis uses an exclamation point at the largest size
- left and right braces use parentheses at the largest size
- right floor uses a percentage sign at the largest size
- left ceiling uses an ampersand at the largest size

Also, drop the regular size braces, as they are the same as the first
`big`-sized version.
Fix auto-sized glyphs with BaKoMa fonts
For character codes outside the embedded font limits (256 for type 3 and
65536 for type 42), we output them as XObjects instead of using text
commands. But there is nothing in the PDF spec that requires any
specific encoding like this.

Since we now support subsetting all fonts before embedding, split each
font into groups based on the maximum character code (e.g., 256-entry
groups for type 3), then switch text strings to a different font subset
and re-map character codes to it when necessary.

This means all text is true text (albeit with some strange encoding),
and we no longer need any XObjects for glyphs. For users of non-English
text, this means it will become selectable and copyable again.

Fixes matplotlib#21797
For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF
1.2), and for Type 42 fonts, correct the Unicode encoding, which should
be UTF-16BE, not UCS2.
These characters are outside the BMP and should test subset splitting
for type 42 output in PDF.
@QuLogic QuLogic force-pushed the text-overhaul-figures branch from 2117a71 to d56b646 Compare September 26, 2025 00:08
pdf: Improve text with characters outside embedded font limits
@QuLogic QuLogic force-pushed the text-overhaul-figures branch from d56b646 to 9703849 Compare September 27, 2025 05:54
No need to repeat the calculation of subset blocks, but instead offload
it to `track_glyph`.
Instead of splitting fonts into `subset_size` blocks and writing text as
character code modulo `subset_size`, compress the blocks by doing two
things:

1. Preserve the character code if it lies in the first block. This keeps
   ASCII (for Type 3) and the Basic Multilingual Plane (for Type 42) as
   their normal codes.
2. Push everything else into the next spot in the next block, splitting
   by `subset_size` as necessary.

This should reduce the number of additional font subsets to embed.
If mixing languages, sometimes a single character may use different
glyphs in one document. In that case, we need to give it a new character
code in the next subset, since subset 0 is preserving character codes.
For ligatures or complex shapings, multiple characters may map to a
single glyph. In this case, we still want to output a single character
code for the string using the font subset, but the `ToUnicode` map
should give back all the characters.
Previously, this was supposed to "upgrade" type 3 to type 42 if the
number of glyphs overflowed. However, as `CharacterTracker` can suggest
a new subset for other reasons (i.e., multiple glyphs for the same
character or a glyph for multiple characters may go to a second subset),
we do need proper subset handling here as well.

Since that is now done, we can drop the "promotion" from type 3 to type
42, as we don't get too many glyphs in each embedded font.
QuLogic and others added 8 commits October 2, 2025 19:00
Prepare `CharacterTracker` for advanced font features
Font features allow font designers to provide alternate glyphs or
shaping within a single font. These features may be accessed via special
tags corresponding to internal tables of glyphs.

The mplcairo backend supports font features via an elaborate re-use of
the font file path [1]. This commit adds the API to make this officially
supported in the main user API.

[1] https://github.com/matplotlib/mplcairo/blob/v0.6.1/README.rst#font-formats-and-features
Previously, in a mathtext string like `r"$\sin x$"`, a thin space would
(correctly) be added between "sin" and "x", but that space would be
missing in expressions like `r"$\max f$"`.  The difference arose because
of the slightly different handling of subscripts and superscripts
after the `\sin` and `\max` operators: `\sin^n` puts the superscript as
a normal exponent, but `\max_x` puts the subscript centered below the
operator name ("overunder symbol).  The previous code for inserting the
thin space did not handle the "overunder" case; fix that.  The new
behavior is tested by the change in test_operator_space, as well as by
mathtext1_dejavusans_06.

The change in mathtext_foo_29 arises because the extra thin space now
inserted after `\limsup` slightly shifts the centering of the whole
string.  Ideally that thin space should be suppressed if there's no
token after the operator, but that's not something currently implemented
either for e.g. `\sin` (compare e.g. the right-alignments in
`text(.5, .9, r"$\sin$", ha="right"); text(.5, .8, r"$\mathrm{sin}$", ha="right"); axvline(.5)`
where the extra thin space after `\sin` is visible), so this patch just
makes things more consistent.
Rename _in_subscript_or_superscript to the more descriptive
_needs_space_after_subsuper; simplify its setting in operatorname();
avoid the need to introduce an extra explicitly-typed spaced_nucleus
variable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Waiting for other PR

Development

Successfully merging this pull request may close these issues.

3 participants