Modern Node.js versions now include a `navigator` implementation, with a few basic properties, that's actually enough for the PDF.js use-cases; please see https://nodejs.org/api/globals.html#navigator
Unfortunately we still support Node.js version `20`, hence we add a basic polyfill since that allows simplifying the code slightly.
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
- Warn if the "regular" PDF.js build is used in Node.js environments, since that won't load any of the relevant polyfills.
- Warn if the `require` function cannot be accessed, since currently we're just "swallowing" any errors.
Given that `ImageData` has been supported for many years in all browsers, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/ImageData#browser_compatibility), we have a `typeof` check that's only necessary in Node.js environments.
Since the `@napi-rs/canvas` package provides that functionality, we can thus add an `ImageData` polyfill which allows us to ever so slightly simplify the code.
The `@napi-rs/canvas` package has fewer dependencies, which should *hopefully* make installing and using it easier for `pdfjs-dist` end-users. (Over the years we've seen, repeatedly, that `canvas` can be difficult to install successfully.)
Furthermore, this package includes more functionality (such as `Path2D`) which reduces the overall number of dependencies in the PDF.js project.
One point to note is that `@napi-rs/canvas` is a fair bit newer than `canvas`, and has a lot fewer users, however looking at the commit history it does seem to be actively maintained.
Note that I've successfully tested the [Node.js examples](https://github.com/mozilla/pdf.js/tree/master/examples/node), in particular the `pdf2png` one, with this patch applied and things appear to work fine.
Please see:
- https://www.npmjs.com/package/@napi-rs/canvas
- https://github.com/Brooooooklyn/canvas
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
This moves more functionality into the base-class, rather than having to duplicate that in the extending classes.
For consistency, also updates the `BaseStandardFontDataFactory` and introduces more `async`/`await` in various relevant code.
This way we allow the rest of the packages to be loaded successfully, such that e.g. the Node.js unit-tests work correctly.
Note that this occurred after updating the `node-canvas` package to version `3.0.0-rc2`, however it's not immediately clear to me if it's a problem there or in the `path2d` package; see also nilzona/path2d-polyfill/issues/84.
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.
Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.
Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".
Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.
Please see https://eslint.org/docs/latest/rules/arrow-body-style
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.
Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.
This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
The existing Node.js-specific polyfills depend on the `node-canvas` package, which has unfortunately (repeatedly) shown to cause trouble for many users. We attempted to improve the situation by listing the relevant packages as `optionalDependencies`, but that didn't seem to really fix the problem.
With this patch the library should be able to load in Node.js-environments even if polyfilling fails, and any errors will instead occur during rendering. Obviously this is *not* a proper solution, since it basically moves the problem to another part of the code-base.
However for certain "simpler" use-cases, such as e.g. text-extraction, these changes should hopefully improve general usability of the PDF.js library in Node.js-environments.
*Please note:* For most PDF documents rendering should still work though, since `DOMMatrix` is *currently* only used with Patterns and `Path2D` only with Type3-fonts and Patterns.
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
In the last couple of years we've been quicker to remove support for older browsers/environments, which means that at this point in time we don't bundle that many polyfills. (The polyfills are also generally simpler nowadays, ever since we removed support for e.g. Internet Explorer.)
Rather than having to *manually* handle the polyfills, we can actually let Babel take care of bundling the necessary polyfills for us; please refer to https://babeljs.io/docs/babel-preset-env
The only exception here is the Node.js-specific compatibility-code, which is moved into the `src/display/node_utils.js` file. This ought to be fine since workers are not available/used in Node.js-environments.
*Please note:* For the `legacy`-builds this will increase the size of the *built* files, however that seems like a very small price to pay in order to simplify maintenance of the general PDF.js library.
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.
These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.
*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
- Transfer functions are not very commonly used in PDF documents.
- Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
- The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
This first of all simplifies the file, since we no longer need dummy-classes and can instead *directly* define the actual classes.
Furthermore, and more importantly, this means that we no longer need to bundle this code in e.g. MOZCENTRAL-builds which reduces the size of *built* `pdf.js` file slightly.
Given that these factories are being used in *different* files, for Browser respectively Node.js implementations, it seems reasonable to move them into their own file instead.
Previously this rule has been enabled in the `web/` folder, and in select files in the `src/` sub-folders.
Note that a number of the files in the `src/display/` folder were already enforcing the `no-var` rule, and thanks to Prettier the necessary re-writing will be (mostly) handled automatically.
Please find additional details about the ESLint rule at https://eslint.org/docs/rules/no-var
This moves, and slightly simplifies, code that's currently residing in the unit-test utils into the actual library, such that it's bundled with `GENERIC`-builds and used in e.g. the API-code.
As an added bonus, this also brings out-of-the-box support for CMaps in e.g. the Node.js examples.