278 Commits

Author SHA1 Message Date
Jonas Jenwald
834423b51d Add more logical assignment in the src/ folder
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).
2025-04-12 17:28:33 +02:00
Jonas Jenwald
1c80412f61 Change PDFDocument.prototype._xfaStreams to return a Map
Using a `Map` rather than an `Object` is a nicer, since it has better support for both iteration and checking if a key exists.
We also change the initial values to be `null`, rather than empty strings, and reduce duplication when creating the `Map`.

*Please note:* Since this is worker-thread code, these changes are "invisible" at the API-level.
2025-04-12 12:47:22 +02:00
Jonas Jenwald
7a94fafd30 Prefer /Resources from the /Contents stream-dict, if available
In rare cases /Resources are also found in the /Contents stream-dict, in addition to in the /Page dict, hence we need to prefer those when available; see `issue18894.pdf`.
2025-04-11 16:54:22 +02:00
Jonas Jenwald
d00482380a Introduce more async code in the src/core/document.js file 2025-03-17 13:20:51 +01:00
Jonas Jenwald
3e8d01ad7c Move the calculateMD5 function into its own file
This allows us to remove a closure, and we also change the code to initialize various constants lazily.
2025-03-08 15:56:05 +01:00
Jonas Jenwald
7b5cd9cddd Use arrow functions with some Promise.then calls
A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 19:57:38 +01:00
Jonas Jenwald
4be79748c9 Add a GlobalColorSpaceCache to reduce unnecessary re-parsing
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
2025-03-01 14:21:05 +01:00
Jonas Jenwald
d428db63c3 Improve the "FontFallback" handling on the worker-thread
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
2025-02-24 09:34:58 +01:00
Jonas Jenwald
36979e9eb2 Fix all outstanding ESLint arrow-body-style warnings
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
2025-02-17 15:45:44 +01:00
Tim van der Meij
4d4e1befeb
Merge pull request #19289 from Snuffleupagus/issue-19281
Skip LinkAnnotations when collecting field objects (issue 19281)
2025-01-04 13:32:18 +01:00
Jonas Jenwald
6f062abb76 Skip LinkAnnotations when collecting field objects (issue 19281)
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
2025-01-04 11:54:45 +01:00
Jonas Jenwald
74c1795c9f Use Dict iteration more (PR 19051 follow-up)
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
2025-01-02 15:09:19 +01:00
Jonas Jenwald
2c0cc48d1b Replace the forEach method in Dict with "proper" iteration support 2024-11-17 12:45:32 +01:00
Calixte Denizet
4bf7787084 Simplify saving added/modified annotations.
Having this map to collect the different changes will allow to know if some objects have already been modified.
2024-11-12 10:59:38 +01:00
Jonas Jenwald
0b864ee7d5 Shorten the Page.prototype.userUnit getter slightly 2024-11-10 16:30:07 +01:00
Jonas Jenwald
b26dc19392 Ensure that serializing of StructTree-data cannot fail during loading
I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would *intermittently* cause (some of) the AnnotationLayers to break with errors printed in the console (see below).

In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used.

Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations.

```
Uncaught (in promise)
Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55

\#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15

Uncaught (in promise)
Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
```
2024-11-01 17:43:59 +01:00
Jonas Jenwald
8f47d06d07 Add helper functions to allow using new Uint8Array methods
This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64

*Please note:* These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.
2024-10-29 10:22:35 +01:00
Jonas Jenwald
f9fc477080 Improve the implementation of the PDFDocument.fingerprints-getter
- Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already *implicitly* assuming a particular length.

 - Move the constants into the `fingerprints`-getter, since they're not used anywhere else.

 - Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64
2024-10-29 10:22:35 +01:00
Jonas Jenwald
662bd022ce Reduce duplication in the PDFDocument.calculationOrderIds getter 2024-10-08 12:24:09 +02:00
Jonas Jenwald
e3b5ed2e40 Improve the promise-caching in the PDFDocument.fieldObjects getter
After PR 18845 we're accessing this getter more, hence it seems like a good idea to ensure that the initial `formInfo` access is covered as well.
While unlikely to be a problem in practice, at least theoretically that data may not be available and the code in `fieldObjects` could thus currently be *unintentionally* invoked more than once.
2024-10-08 12:15:04 +02:00
Calixte Denizet
3103deaa44 Fix missing annotation parent in using the one from the Fields entry
Fixes #15096.
2024-10-04 20:00:19 +02:00
Calixte Denizet
c9050be863 [Editor] Add the possibility to save an updated stamp annotation (bug 1921291) 2024-10-02 11:45:16 +02:00
Calixte Denizet
2481a4bab9 Write the display flags in F entry when saving an annotation (issue 18072) 2024-10-01 17:26:39 +02:00
Calixte Denizet
0382dd0e25 [Editor] When deleting an annotation with popup, then delete the popup too 2024-09-26 17:52:25 +02:00
Tim van der Meij
ccb141e211
Merge pull request #18393 from Snuffleupagus/mustBeViewedWhenEditing-params
Check the relevant parameters inside of the `mustBeViewedWhenEditing` method
2024-07-05 15:33:45 +02:00
Jonas Jenwald
38528d1116 Remove the renderForms parameter from the Annotation getOperatorList methods
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
2024-07-05 12:25:18 +02:00
Jonas Jenwald
5f744904ac Check the relevant parameters inside of the mustBeViewedWhenEditing method
Similar to the `mustBeViewed` method, we can check the relevant parameters within the `mustBeViewedWhenEditing` method itself since that (in my opinion) slightly helps readability of the code in the `src/core/document.js` file.
2024-07-05 11:38:55 +02:00
Jonas Jenwald
a4ffc1066c Move the internal API/Worker isEditing-state into RenderingIntentFlag
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
2024-07-04 23:34:30 +02:00
Calixte Denizet
64635f3b35 [api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
 - if the annotation has to be composed with the page then the canvas must be correctly
   composed with its parent. That means we should move the canvas under canvasWrapper
   and we should extract composing info from the drawing instructions...
   Currently it's the case with highlight annotations.
 - we use some extra memory for those canvas even if the user will never edit them, which
   the case for example when opening a pdf in Fenix.

So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
2024-07-02 14:11:40 +02:00
Jonas Jenwald
27436d52b2 Reduce indentation when parsing new annotations in getOperatorList
This code has, over the years, become more complex and less indentation generally helps readability.
2024-05-25 12:00:44 +02:00
Jonas Jenwald
3afa9bfc42 Improve /Page validation for linearized documents (issue 18138)
The referenced PDF document contains corrupt linearization-data, that doesn't point to the *first* page as intended.
2024-05-22 12:04:02 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Calixte Denizet
901d995a7e Correctly update the xref table when an annotation is deleted 2024-04-18 21:27:39 +02:00
Calixte Denizet
4e1b96c781 [Annotations] Widget annotations must be in front of the other ones (bug 1883609) 2024-03-05 19:04:58 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Calixte Denizet
09b4fe6a30 Get the field name from its parent when it doesn't have one when collecting fields (bug 1864136)
Some fields, somewhere under the Fields entry in Acroform, could have no name (in T)
but with a parent which has a name but which isn't somewhere under Fields.
As a side-effect, this patch prevents infinite loops because of potential cycles
under Fields.
2023-11-13 14:41:14 +01:00
Jonas Jenwald
ff62fc8e2c Skip fieldObjects that are not actually References
The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist.
The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.
2023-11-08 14:39:13 +01:00
Jonas Jenwald
65c827b0eb Ensure that fieldObjects and #collectFieldObjects handles References correctly
The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break.

To address this we change all data-lookups to be asynchronous instead.
2023-11-08 14:38:57 +01:00
Calixte Denizet
acc62f80de Don't try to collect a nonexistent field because of an invalid ref 2023-11-07 19:58:29 +01:00
Jonas Jenwald
18a661b6a0
Merge pull request #16920 from Snuffleupagus/annotationGlobals
Slightly reduce asynchronicity when parsing Annotations
2023-09-09 09:55:49 +02:00
Calixte Denizet
52cc1220e4 Simplify writeObject function
It'll avoid to have the duplication of the code to get the encrypt transform,
and last but not least, it'll avoid to forget about encryption.
2023-09-08 19:59:59 +02:00
Jonas Jenwald
df9cce39c0 Slightly reduce asynchronicity when parsing Annotations
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.

An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.

One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
2023-09-08 13:27:27 +02:00
Calixte Denizet
d185db2b70 Add tagged annotations in the structure tree (bug 1850797) 2023-08-31 12:35:32 +02:00
Calixte Denizet
ee3ac35e05 Revert fix for bug 1838855 (bug 1849876)
The issue described in the mentioned bug is reall because
Acrobat is rendering the XFA instead of the Acroform.
The original patch just tried to workaround the issue but it
induces some regressions.
2023-08-23 12:34:41 -04:00
Jonas Jenwald
4d19db0b19 Re-format the code to account for prettier and globals updates
The `prettier` update slightly changed the formatting of some await-expressions; please see https://github.com/prettier/prettier/blob/main/CHANGELOG.md#302

The `globals` update removed the need for some eslint-disable statements; please see https://github.com/sindresorhus/globals/releases/tag/v13.21.0
2023-08-19 09:30:34 +02:00
Calixte Denizet
e2f20a1afe [Annotation] Strip out the array index in the path only when the path is from a terminal node (bug 1847733) 2023-08-08 15:05:27 +02:00
Calixte Denizet
71960bea64 Don't print hidden annotatons (bug 1815196)
and handle correctly the NoView and NoPrint flags when they're changed
from JS.
2023-07-31 13:04:15 +02:00