-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Prepare CharacterTracker for advanced font features
#30608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
58874b6 to
5afd71b
Compare
CharacterTracker.track implementationCharacterTracker for advanced font features
|
Note, I think the original commit was small, and the remaining ended up small enough, that I just put them all in this PR. |
| print("%%BeginProlog", file=fh) | ||
| if not mpl.rcParams['ps.useafm']: | ||
| Ndict += len(ps_renderer._character_tracker.used) | ||
| Ndict += sum(map(len, ps_renderer._character_tracker.used.values()), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0 at the end is unneeded.
No need to repeat the calculation of subset blocks, but instead offload it to `track_glyph`.
5e89363 to
662ac58
Compare
Instead of splitting fonts into `subset_size` blocks and writing text as character code modulo `subset_size`, compress the blocks by doing two things: 1. Preserve the character code if it lies in the first block. This keeps ASCII (for Type 3) and the Basic Multilingual Plane (for Type 42) as their normal codes. 2. Push everything else into the next spot in the next block, splitting by `subset_size` as necessary. This should reduce the number of additional font subsets to embed.
If mixing languages, sometimes a single character may use different glyphs in one document. In that case, we need to give it a new character code in the next subset, since subset 0 is preserving character codes.
662ac58 to
c21a0b0
Compare
|
OK, I've handled all your comments, I think. I also fixed subsetting in the PostScript backend, noted above. There are 3 test image changes:
|
anntzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor point regarding a comment.
For ligatures or complex shapings, multiple characters may map to a single glyph. In this case, we still want to output a single character code for the string using the font subset, but the `ToUnicode` map should give back all the characters.
Previously, this was supposed to "upgrade" type 3 to type 42 if the number of glyphs overflowed. However, as `CharacterTracker` can suggest a new subset for other reasons (i.e., multiple glyphs for the same character or a glyph for multiple characters may go to a second subset), we do need proper subset handling here as well. Since that is now done, we can drop the "promotion" from type 3 to type 42, as we don't get too many glyphs in each embedded font.
c21a0b0 to
d781040
Compare
|
Removed the image changes (and moved them to the |
|
Linting issues are known (#30626), so merging over those. |
PR summary
The original code split fonts into subsets based on the character modulo the subset size (determined by font type limits). This had some limitations:
ordon a multi-char string.To fix this,
CharacterTrackernow tracks characters and glyphs more closely. Specifically,With these changes, the complex/font features/languages tests in #30607 produce correct results.
PR checklist