782 Commits

Author SHA1 Message Date
Jonas Jenwald
f3f88eecb4 Use an AbortController to remove the temporary "error" handler for the worker 2024-06-15 14:35:32 +02:00
Jonas Jenwald
2d0e08f1c8 Introduce a helper method for resolving the PDFWorker promise
This avoids having to repeat the same code multiple times, since besides resolving the promise we also need to send the "configure" message to the worker-thread.
2024-06-15 14:35:30 +02:00
Jonas Jenwald
8d4456172b Reduce duplication when handling the "test" message from the worker
The feature-testing on the worker-thread has been simplified in previous pull requests, which means that we can simplify this main-thread handler as well.
2024-06-15 14:35:28 +02:00
Jonas Jenwald
0a36b667e4 Use an early return in PDFWorker.prototype._initialize when workers are disabled
This helps reduce overall indentation in the method, thus leading to slightly less code.
Also, remove an old comment referring to Chrome 15 since that's no longer relevant now.
2024-06-15 14:35:12 +02:00
Jonas Jenwald
d6612b3427 Remove some now redundant validation in getDocument
Given that we now check/validate all options properly this old code can be simplified.
2024-06-15 14:35:10 +02:00
Calixte Denizet
ff6180a4c9 Add an option to enable/disable hardware acceleration (bug 1902012) 2024-06-12 18:41:07 +02:00
Jonas Jenwald
06334c97ef Improve the loadingParams functionality in the API
- Move the definition of the `loadingParams` Object, to simplify the code.

 - Add a unit-test, since none existed and the viewer depends on this functionality.
2024-05-24 09:26:40 +02:00
Jonas Jenwald
15b5808eee [api-minor] Re-factor the basic textLayer-functionality
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.

This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.

Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
2024-05-17 14:20:20 +02:00
Tim van der Meij
4db843617f
Merge pull request #18047 from Snuffleupagus/issue-18042
Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
2024-05-15 15:40:18 +02:00
Jonas Jenwald
6b171540b7 Initialize the networkStream synchronously in getDocument
This is fairly old code, and at some point the need for this to be asynchronous disappeared.
2024-05-14 17:04:25 +02:00
Jonas Jenwald
cbb8748a22 Inline the _fetchDocument helper function in getDocument
This function has been modified a number of times over the years, and at this point it's small/simple enough that we can just inline the code instead.
2024-05-14 16:29:41 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
298d72133e
Merge pull request #18051 from Snuffleupagus/NodePackages
[api-minor] Re-factor how Node.js packages/polyfills are  loaded (issue 17245)
2024-05-14 11:43:57 +02:00
Jonas Jenwald
761abc7cc3
Merge pull request #18066 from Snuffleupagus/rm-FontFaceObject-ignoreErrors
Remove the `ignoreErrors` option from the `FontFaceObject` class
2024-05-14 09:49:08 +02:00
Jonas Jenwald
5f6f1686b5 Remove the ignoreErrors option from the FontFaceObject class
- The `stopAtErrors` API option, which is the inverse of the "internal" `ignoreErrors` option, is explicitly documented as applying to *parsing* (i.e. the worker-thread) while the `FontFaceObject` class is used during rendering (i.e. the main-thread); see b6765403a1/src/display/api.js (L164-L167)

 - A glyph that fails in the `FontRendererFactory`, on the worker-thread, will already cause (overall) parsing to stop when `ignoreErrors === false` hence checking the option on the main-thread as well seems redundant; see b6765403a1/src/core/evaluator.js (L4527-L4533)

 - Removing this option simplifies the code, and slightly reduces the number of options that we need to handle in the main-thread code.
2024-05-11 10:18:23 +02:00
Jonas Jenwald
5e50479ac6 Use more object destructuring in the "commonobj" handler in the API 2024-05-11 09:44:10 +02:00
Jonas Jenwald
4a8d742592 Move the reporting of page Stats into the API
This avoids having to add a couple of event listeners in the viewer, when debugging is enabled, and is consistent with the existing handling of `FontInspector` and `StepperManager` in the API.
2024-05-11 09:42:05 +02:00
Jonas Jenwald
2643570364 [api-minor] Re-factor how Node.js packages/polyfills are loaded (issue 17245)
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.

Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.

Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".

Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
2024-05-06 23:20:03 +02:00
Jonas Jenwald
f6cd03955b [api-minor] Move the page reference/number caching into the API
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
2024-04-29 18:54:06 +02:00
Calixte Denizet
551e63901c Simplify the way to pass the glyph drawing instructions from the worker to the main thread
and remove the use of eval in the font loader.
2024-04-27 21:28:31 +02:00
Jonas Jenwald
7206d0a237 Validate explicit destinations on the worker-thread to prevent DataCloneError (issue 17981)
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
2024-04-22 22:51:35 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Jonas Jenwald
0022310b9c
Merge pull request #17706 from Snuffleupagus/Node-Fetch-API
[api-minor] Use the Fetch API, when supported, to load PDF documents in Node.js environments
2024-03-19 11:04:28 +01:00
Jonas Jenwald
3c78ff5fb0 [api-minor] Implement basic support for OptionalContent Usage dicts (issue 5764, bug 1826783)
The following are some highlights of this patch:
 - In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.

 - In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.

 - Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
   This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]

---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
2024-03-12 13:18:15 +01:00
Jonas Jenwald
eded037d06 [api-minor] Use the Fetch API, when supported, to load PDF documents in Node.js environments
Given that modern Node.js versions now implement support for a fair number of "browser" APIs, we can utilize the standard Fetch API to load PDF documents that are specified via http/https URLs.

Please find compatibility information at:
 - https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility
 - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#fetch
 - https://developer.mozilla.org/en-US/docs/Web/API/Response#browser_compatibility
 - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#response
2024-02-21 22:38:42 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Jonas Jenwald
06cd278808 Simplify the signature of the PDFDataTransportStream constructor
Given that we need to pass in a `PDFDataRangeTransport`-instance a number of the needed parameters can be obtained from it, rather than having to specify them manually.
2024-02-03 13:10:42 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Jonas Jenwald
9dfe9c552c Use shorter arrow functions where possible
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.

For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
2024-01-21 10:13:12 +01:00
Jonas Jenwald
b37536c38c Remove the isArrayBuffer helper function
This old helper function can now be replaced with `ArrayBuffer.isView()` and/or `instanceof ArrayBuffer` checks, as needed depending on the situation.
2024-01-19 14:10:52 +01:00
Calixte Denizet
f84f48b5d0 Avoid to have the text layer mismatching the rendered text with mismatching locales (bug 1869001)
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
2024-01-04 19:20:20 +01:00
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
e547b198a3 Compute the length of the final image-bitmap/data on the worker-thread
Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
b09f238436 Add iteration support in the PDFObjects class
This (obviously) only includes "resolved" data, and will be used in an upcoming patch.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
ade692ff2e Set a type for the Blob used in createCDNWrapper (issue 17259)
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
2023-11-12 09:30:26 +01:00
Jonas Jenwald
d5acbbccd3 Update the ESLint globals list (PR 17055 follow-up)
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.

Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
2023-10-15 11:38:10 +02:00
Jonas Jenwald
af9a7b0003 Tweak PDFWorkerUtil.createCDNWrapper to account for JavaScript modules (PR 17055 follow-up) 2023-10-14 11:34:40 +02:00
Jonas Jenwald
0238cf134d Don't store page-level data, in the API, after cleanup has run (bug 1854145)
For large/complex images it's possible that the image-data arrives in the API *after* the page has been scrolled out-of-view and thus been cleaned-up. In this case we obviously shouldn't cache such page-level data, since it'll first of all be unused and secondly can increase memory usage *a lot*.
Also, ensure that we *immediately* release any `ImageBitmap` data in this case to help reclaim memory faster.
2023-10-11 11:51:42 +02:00
Jonas Jenwald
8bd3cc0313 [api-minor] Stop polyfilling structuredClone in legacy builds
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.

Note that `structuredClone`, with transfers, is only used in two spots:
 - The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
 - The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
2023-10-07 16:52:47 +02:00
Jonas Jenwald
927e50f5d4 [api-major] Output JavaScript modules in the builds (issue 10317)
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]

In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.

Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.

One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]

---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility

[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.

[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.

[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
2023-10-07 09:31:08 +02:00
Jonas Jenwald
0a970ee443 [api-major] Remove the fallbackWorkerSrc functionality in browsers
The user should *always* provide a correct `GlobalWorkerOptions.workerSrc` value when using the PDF.js library in browser environments. Note that the fallback:
 - Has been deprecated ever since PR 11418, first released in version `2.4.456` over three years ago.
 - Was always a best-effort solution, with no guarantees that it'd actually work correctly.
 - With upcoming changes, w.r.t. outputting JavaScript modules, it'd now be more diffiult to determine the correct value.
2023-10-06 12:12:30 +02:00
Jonas Jenwald
426209c6e6
Merge pull request #16699 from Snuffleupagus/rm-svg
[api-major] Remove the SVG back-end (PR 15173 follow-up)
2023-10-03 15:13:14 +02:00
Jonas Jenwald
3ced0dec1b [api-major] Remove the SVG back-end (PR 15173 follow-up)
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
2023-10-01 23:14:29 +02:00
Jonas Jenwald
f87ec67ab1 [api-major] Remove various deprecated functionality and options 2023-09-23 17:44:09 +02:00
Jonas Jenwald
1b8441dacc Don't pass in unused pageColors to CanvasGraphics.endDrawing (PR 16380 follow-up)
This became unnecessary in PR 16380, however we forgot to update one of the API call-sites.
2023-08-28 16:14:22 +02:00
Jonas Jenwald
9b4efe2c2f Use WeakSet.prototype.delete() unconditionally in the InternalRenderTask class
It's not necessary to check if an object exists before trying remove it from a `WeakSet`; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/WeakSet/delete#return_value
2023-08-28 16:10:33 +02:00
Jonas Jenwald
ec3d2be761 Introduce more optional chaining in the code-base
Also, use logical OR assignment a bit more.
2023-08-26 10:52:23 +02:00
Jonas Jenwald
988ce2820b Initialize the PDFWorker.#workerPorts WeakMap lazily
By default this WeakMap isn't needed, and it's simple enough to initialize it lazily instead.
2023-08-19 16:18:38 +02:00
Jonas Jenwald
2993c7725b [Firefox] Exclude more workerPort related code in MOZCENTRAL builds
Given that this code is (and has always been) unused in the Firefox PDF Viewer, we don't need to include it in that build-target.
2023-08-19 15:52:00 +02:00