When the /OpenAction data is an Array we're currently using it as-is which could theoretically cause problems in corrupt PDF documents, hence we ensure that a "raw" destination is actually valid. (This change is covered by existing unit-tests.)
*Note:* In the Dictionary case we're using the `Catalog.parseDestDictionary` method, which already handles all of the necessary validation.
This way it helps to reduce the overall canvas dimensions and make the rendering faster.
The drawback is that when scrolling, the page can be blurry in waiting for the rendering.
The default value is 200% on desktop and will be 100% for GeckoView.
The effect is probably not even measurable, however this patch ever so slightly reduces the asynchronicity in the `fieldObjects` getter. These changes should be safe since:
- We're inside of the `PDFDocument`-class and the `annotationGlobals`-getter, which will always return a (shadowed) Promise and won't throw `MissingDataException`s, can be accessed directly without going through the `BasePdfManager`-instance.
- The `acroForm`-dictionary can be accessed through the `annotationGlobals`-data, removing the need to "manually" look it up and thus the need for using `Promise.all` here.
- We can also lookup the /Fields-data, in the `acroForm`-dictionary, synchronously since the initial `formInfo.hasFields` check guarantees that it's available.
Currently we repeat the same identical code five times in the `Page`-class when creating a `PartialEvaluator`-instance, which given the number of parameters it needs seems like unnecessary duplication.
The `ObjectLoader.prototype.load` method has a fast-path, which avoids any lookup/parsing if the entire PDF document is already loaded.
However, we still need to create an `ObjectLoader`-instance which seems unnecessary in that case.
Hence we introduce a *static* `ObjectLoader.load` method, which will help avoid creating `ObjectLoader`-instances needlessly and also (slightly) shortens the call-sites.
To ensure that the new method will be used, we extend the `no-restricted-syntax` ESLint rule to "forbid" direct usage of `new ObjectLoader()`.
Given that all the methods are already asynchronous we can just use `await` more throughout this code, rather than having to explicitly return function-calls and `undefined`.
Note also how none of the `ObjectLoader.prototype.load` call-sites use the return value.
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
Currently we repeat virtually the same code when calling the `PartialEvaluator.prototype.handleSetFont` method, which we can avoid by introducing an inline helper function.
Considering the name of the method, and how it's actually being used, you'd expect it to return a boolean value.
Given how it's currently being used this inconsistency doesn't cause any issues, however we should still fix this.
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
Currently we create an intermediate `Dict` during parsing, however that seems unnecessary since (note especially the second point):
- The `NameOrNumberTree.prototype.getAll` method will already resolve any references, as needed, during parsing.
- The `Catalog.prototype.xfaImages` getter is invoked, via the `BasePdfManager`-instance, such that any `MissingDataException`s are already handled correctly.
Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.
Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
- Parsing named destinations[2] and URLs.
- Handling "strings" that are actually /Name-instances.
- Building a lookup Object/Map based on some PDF data-structure.
*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.
---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
Modern Node.js versions now include a `navigator` implementation, with a few basic properties, that's actually enough for the PDF.js use-cases; please see https://nodejs.org/api/globals.html#navigator
Unfortunately we still support Node.js version `20`, hence we add a basic polyfill since that allows simplifying the code slightly.
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
This complements, and extends, the existing check of the `Array.prototype` in the worker-thread.
To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.
This is a small, and quite possibly pointless, optimization which ensures that any "local" /Resources aren't empty, to avoid needlessly trying to load and merge dictionaries.
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.
Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.
*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).