3312 Commits

Author SHA1 Message Date
Jonas Jenwald
f26f984fa0 Improve validation in the Catalog.prototype.openAction getter
When the /OpenAction data is an Array we're currently using it as-is which could theoretically cause problems in corrupt PDF documents, hence we ensure that a "raw" destination is actually valid. (This change is covered by existing unit-tests.)

*Note:* In the Dictionary case we're using the `Catalog.parseDestDictionary` method, which already handles all of the necessary validation.
2025-05-10 11:51:58 +02:00
calixteman
293506ada7
Merge pull request #19903 from Snuffleupagus/shorten-fieldObjects-getter
Shorten the `PDFDocument.prototype.fieldObjects` getter slightly
2025-05-09 15:49:51 +02:00
Jonas Jenwald
1f7581b5c6 Shorten the PDFDocument.prototype.fieldObjects getter slightly
The effect is probably not even measurable, however this patch ever so slightly reduces the asynchronicity in the `fieldObjects` getter. These changes should be safe since:

 - We're inside of the `PDFDocument`-class and the `annotationGlobals`-getter, which will always return a (shadowed) Promise and won't throw `MissingDataException`s, can be accessed directly without going through the `BasePdfManager`-instance.

 - The `acroForm`-dictionary can be accessed through the `annotationGlobals`-data, removing the need to "manually" look it up and thus the need for using `Promise.all` here.

 - We can also lookup the /Fields-data, in the `acroForm`-dictionary, synchronously since the initial `formInfo.hasFields` check guarantees that it's available.
2025-05-07 17:47:09 +02:00
Jonas Jenwald
36fafbc05c Use object destructuring a bit more in the src/core/document.js file 2025-05-07 13:41:50 +02:00
Jonas Jenwald
92b065c87e Replace a number of semi-private fields with actual private ones in src/core/document.js
These are fields that can be moved out of their class constructors, and be initialized directly.
2025-05-07 13:41:44 +02:00
Jonas Jenwald
39803a9f25 Replace a number of semi-private methods with actual private ones in src/core/document.js
There's a few remaining cases that are used with either cached getters or `BasePdfManager.prototype.ensure`-methods, and those cannot be converted.
2025-05-07 13:41:36 +02:00
Jonas Jenwald
0ded85e9b3 Add a Page helper method to create a PartialEvaluator-instance
Currently we repeat the same identical code five times in the `Page`-class when creating a `PartialEvaluator`-instance, which given the number of parameters it needs seems like unnecessary duplication.
2025-05-07 13:41:29 +02:00
Jonas Jenwald
62009ffa70 Simplify how the ObjectLoader is used
The `ObjectLoader.prototype.load` method has a fast-path, which avoids any lookup/parsing if the entire PDF document is already loaded.
However, we still need to create an `ObjectLoader`-instance which seems unnecessary in that case.

Hence we introduce a *static* `ObjectLoader.load` method, which will help avoid creating `ObjectLoader`-instances needlessly and also (slightly) shortens the call-sites.
To ensure that the new method will be used, we extend the `no-restricted-syntax` ESLint rule to "forbid" direct usage of `new ObjectLoader()`.
2025-05-06 15:49:59 +02:00
Jonas Jenwald
ef1ad675c2 Unify method return values in the ObjectLoader class
Given that all the methods are already asynchronous we can just use `await` more throughout this code, rather than having to explicitly return function-calls and `undefined`.
Note also how none of the `ObjectLoader.prototype.load` call-sites use the return value.
2025-05-06 15:43:00 +02:00
Calixte Denizet
ac925f4f1b Downscale jpeg2000 images, if needed, while decoding them
It fixes #19517.
2025-05-05 22:39:59 +02:00
Jonas Jenwald
d9548b1c18 Slightly re-factor how we pre-load fonts and images in XFA documents
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
2025-05-04 13:44:33 +02:00
Jonas Jenwald
604153957a Reduce duplication when parsing fonts in loadXfaFonts
Currently we repeat virtually the same code when calling the `PartialEvaluator.prototype.handleSetFont` method, which we can avoid by introducing an inline helper function.
2025-05-04 13:42:17 +02:00
Jonas Jenwald
2979e23f3c Ensure that XFAFactory.prototype.isValid returns a boolean value
Considering the name of the method, and how it's actually being used, you'd expect it to return a boolean value.
Given how it's currently being used this inconsistency doesn't cause any issues, however we should still fix this.
2025-05-04 13:42:17 +02:00
Tim van der Meij
5ca57fbd4b
Merge pull request #19885 from Snuffleupagus/loadXfaImages-simplify
Simplify the `loadXfaImages` method and related code
2025-05-04 13:41:06 +02:00
Tim van der Meij
22cb3080ee
Merge pull request #19887 from Snuffleupagus/serializeXfaData-simplify
Simplify the `serializeXfaData` method and related code
2025-05-04 13:38:01 +02:00
Jonas Jenwald
b3e16800f5 Remove the BasePdfManager.prototype.catalog getter
This is only invoked *once* and it can be trivially replaced by the `ensureCatalog`-method, since the code where it's used is already asynchronous.
2025-05-03 13:40:23 +02:00
Jonas Jenwald
b531720d9c Simplify the serializeXfaData method and related code
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
2025-05-03 11:20:42 +02:00
Jonas Jenwald
122822a750 Simplify the loadXfaImages method and related code
Currently we create an intermediate `Dict` during parsing, however that seems unnecessary since (note especially the second point):
 - The `NameOrNumberTree.prototype.getAll` method will already resolve any references, as needed, during parsing.
 - The `Catalog.prototype.xfaImages` getter is invoked, via the `BasePdfManager`-instance, such that any `MissingDataException`s are already handled correctly.
2025-05-02 11:53:41 +02:00
calixteman
91bfe12f38
Merge pull request #19883 from gpanakkal/checkbutton-tostyle
Fix arguments in `toStyle` call in `CheckButton`
2025-05-01 22:08:03 +02:00
Gautam Panakkal
7bba3bd4ad Add missing this arg to toStyle in CheckButton.prototype.[$toHTML] 2025-05-01 10:19:28 -07:00
Jonas Jenwald
b629bafd1c Allow to, optionally, keep Unicode escape sequences in stringToPDFString (PR 17331 follow-up)
Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.

Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
 - Parsing named destinations[2] and URLs.
 - Handling "strings" that are actually /Name-instances.
 - Building a lookup Object/Map based on some PDF data-structure.

*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.

---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".

[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
2025-04-30 20:51:10 +02:00
Jonas Jenwald
254431df1e Avoid extra lookup/parsing when all destinations are already available
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
2025-04-30 15:26:00 +02:00
Jonas Jenwald
0922aa9e9d
Merge pull request #19880 from Snuffleupagus/numberToString-assert-number
Assert that `numberToString` is called with a number (issue 19877)
2025-04-29 20:35:32 +02:00
Jonas Jenwald
f5faf86180 Assert that numberToString is called with a number (issue 19877)
*NOTE:* Given that this is an *internal* function, used only in the worker-thread, it's not clear to me that this is an entirely "necessary" change.
2025-04-29 20:31:24 +02:00
Calixte Denizet
7a251b206e Fix the bbox when saving a rotated text field (bug 1963407) 2025-04-29 18:49:07 +02:00
Jonas Jenwald
abc9522886 Avoid (most) string parsing when removing/replacing the hash property of a URL 2025-04-25 23:13:05 +02:00
calixteman
efc5c3c231
Merge pull request #19862 from calixteman/bug1961423
Fix 'print to pdf' on Mac with a cid font (bug 1961423)
2025-04-25 15:11:55 +02:00
Jonas Jenwald
312c85bfd6
Merge pull request #19815 from Snuffleupagus/getMergedResources-size
Ensure that "local" /Contents stream-dict /Resources aren't empty (PR 19803 follow-up)
2025-04-25 10:46:04 +02:00
Calixte Denizet
785991a97c Fix 'print to pdf' on Mac with a cid font (bug 1961423) 2025-04-24 20:19:12 +02:00
Jonas Jenwald
64007e777e Ensure that the /Form XObject /Resources-entry is actually a dictionary (issue 19848) 2025-04-23 10:19:20 +02:00
Jonas Jenwald
adc9eb5a5a Always fallback to checking all destinations, when lookup fails (issue 19835)
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
2025-04-20 14:53:10 +02:00
Jonas Jenwald
91ba147317 Check that the Object.prototype hasn't been incorrectly extended (PR 11582 follow-up)
This complements, and extends, the existing check of the `Array.prototype` in the worker-thread.
To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.
2025-04-18 12:19:29 +02:00
calixteman
4b1875c8c0
Merge pull request #19825 from calixteman/bug1961107
Avoid to create any subarrays when optimizing 'save, transform, constructPath, restore' (bug 1961107)
2025-04-17 19:42:28 +02:00
Calixte Denizet
d7cbda6cb5 Avoid to create any subarrays when optimizing 'save, transform, constructPath, restore' (bug 1961107)
Removing those `subarray`calls helps to improve performance by a factor 6 on Linux and by a factor of 3
on Windows 11.
2025-04-17 19:14:01 +02:00
Jonas Jenwald
bf553f22da Ensure that the /P-entry is actually a dictionary in StructTreePage.prototype.addNode
This may fix issue 19822, but without a test-case it's simply impossible to know for sure.
2025-04-17 14:01:53 +02:00
Jonas Jenwald
76f23ce3b5 Catch, and ignore, errors during Page.prototype.getStructTree
This way any errors thrown during parsing of the page-structTree will not be forwarded to the viewer.
2025-04-17 13:57:30 +02:00
Jonas Jenwald
245d9ba925 Ensure that "local" /Contents stream-dict /Resources aren't empty (PR 19803 follow-up)
This is a small, and quite possibly pointless, optimization which ensures that any "local" /Resources aren't empty, to avoid needlessly trying to load and merge dictionaries.
2025-04-14 09:58:15 +02:00
Jonas Jenwald
834423b51d Add more logical assignment in the src/ folder
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).
2025-04-12 17:28:33 +02:00
Jonas Jenwald
1c80412f61 Change PDFDocument.prototype._xfaStreams to return a Map
Using a `Map` rather than an `Object` is a nicer, since it has better support for both iteration and checking if a key exists.
We also change the initial values to be `null`, rather than empty strings, and reduce duplication when creating the `Map`.

*Please note:* Since this is worker-thread code, these changes are "invisible" at the API-level.
2025-04-12 12:47:22 +02:00
Jonas Jenwald
1048508dd1 Catch circular references in /Form XObjects (issue 19800)
For simplicity we will abort /Form XObject parsing *immediately* when encountering a circular reference, rather than letting it continue up until some limit (as e.g. PDFium appears to do), which should be fine since there are never any guarantees if/how *corrupt* PDF documents will render.
2025-04-11 16:54:22 +02:00
Jonas Jenwald
7a94fafd30 Prefer /Resources from the /Contents stream-dict, if available
In rare cases /Resources are also found in the /Contents stream-dict, in addition to in the /Page dict, hence we need to prefer those when available; see `issue18894.pdf`.
2025-04-11 16:54:22 +02:00
Jonas Jenwald
835a456767 Use adjustWidths unconditionally for all embedded fonts (issue 19802)
Previously we'd only do this for Type1/CFF fonts, see e.g. PR 6736, since the font-program may update the /FontMatrix.
However, it seems that we should do this unconditionally to account for fonts with non-default /FontMatrix-entries in the font-dictionary (which seem to be pretty rare).
2025-04-11 15:01:35 +02:00
Jonas Jenwald
fbc4f4b12a Handle non-integer and out-of-range values correctly in Indexed color spaces
In PDF version 2.0 the handling of Indexed color spaces was clarified as follows:
> The index value should be an integer in the range 0 to hival. If the value is a real number, it shall be rounded to the nearest integer (0.5 values shall be rounded up); if it is outside the range 0 to hival, it shall be adjusted to the nearest value within that range.

Please refer to https://github.com/pdf-association/pdf-differences/tree/main/IndexedColor
2025-04-09 15:31:49 +02:00
Jonas Jenwald
12c7c7b0af
Merge pull request #19773 from Snuffleupagus/inline-PDFImage-createRawMask
Inline `PDFImage.createRawMask` in the `PDFImage.createMask` method
2025-04-08 17:19:09 +02:00
Jonas Jenwald
dc3e24a76a Inline PDFImage.createRawMask in the PDFImage.createMask method
After the introduction of `OffscreenCanvas` support we now have *two separate* mask-methods in the `PDFImage` class, and the reason that they were not combined is likely that we need the "raw" bytes when parsing Type3-glyph image masks.
However, that case is easy to support simply by disabling `OffscreenCanvas` usage when parsing Type3-glyphs and that way we're able to reduce some code duplication.

Another slightly strange property of the `PDFImage.createMask` method is that it needs various image-dictionary parameters *manually* provided, which is probably because this is very old code.
That feels slightly unwieldy, and we instead change the method to pass in the image-stream directly and do the necessary data-lookup internally.

A side-effect of this re-factoring is that we now support using the custom `isSingleOpaquePixel` operator in Type3-glyphs, which shouldn't hurt even though it seems extremely unlikely for that to ever happen in Type3-glyphs.
2025-04-08 12:01:50 +02:00
Jonas Jenwald
d882d0869c Move the IDENTITY_MATRIX constant into src/core/core_utils.js (PR 19772 follow-up)
After the changes in PR 19772 the `IDENTITY_MATRIX` constant is now only used on the worker-thread, which leads to Webpack marking the code as unused in the *built* `pdf.mjs` file; see https://phabricator.services.mozilla.com/D244533#change-8oITAexCvrlQ
2025-04-07 11:40:18 +02:00
Calixte Denizet
4c63905a18 Avoid to create an array when setting the text matrix 2025-04-05 20:45:26 +02:00
Jonas Jenwald
7cfb1be650
Merge pull request #19758 from Snuffleupagus/OperatorList-setOptions
Initialize the `isOffscreenCanvasSupported` option, in the `OperatorList` class, once per document
2025-04-05 18:45:55 +02:00
calixteman
7eef7dfc78
Merge pull request #19763 from calixteman/simplify_updaterect
Replace UpdateRectMinMax by getAxialAlignedBoundingBox
2025-04-04 21:33:05 +02:00
Calixte Denizet
e7a951547d Replace UpdateRectMinMax by getAxialAlignedBoundingBox
and don't use array destructuring because it induces a memory and perf penalties.
2025-04-04 19:57:55 +02:00