This addresses an inconsistency in the viewer, since the thumbnails don't respect the `maxCanvasPixels` option.
Note that, as far as I know, this has not lead to any bugs since the thumbnails render with a fixed (and small) width, however it really cannot hurt to address this (especially after the introduction of the `maxCanvasDim` option).
To support this a new `OutputScale`-method was added, to avoid having to duplicate code in multiple files.
Rather than modifying the "raw" dimensions of the page, we'll instead apply the `userUnit` as an *additional* scale-factor via CSS.
*Please note:* It's not clear to me if this solution is fully correct either, or if there's other problems with it, but it at least *appears* to work.
---
With these changes, the following CSS variables are now assumed to be available/set as necessary: `--total-scale-factor`, `--scale-factor`, `--user-unit`, `--scale-round-x`, and `--scale-round-y`.
The purpose of these changes is to make it more difficult to accidentally include logging statements, used during development and debugging, when submitting patches for review.
For (almost) all code residing in the `src/` folder we should use our existing helper functions to ensure that all logging can be controlled via the `verbosity` API-option.
For the `test/unit/` respectively `test/integration/` folders we shouldn't need any "normal" logging, but it should be OK to print the *occasional* warning/error message.
Please find additional details about the ESLint rule at https://eslint.org/docs/latest/rules/no-console
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
This moves more functionality into the base-class, rather than having to duplicate that in the extending classes.
For consistency, also updates the `BaseStandardFontDataFactory` and introduces more `async`/`await` in various relevant code.
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.
As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
Somehow I managed to mess up the URL creation relevant to e.g. MOZCENTRAL builds, which is breaking the pending PDF.js update in mozilla-central; sorry about that!
To avoid future issues, we'll now always check if absolute filter-URLs are necessary regardless of the build-target.
This functionality is purposely limited to development mode and GENERIC builds, since it's unnecessary in e.g. the *built-in* Firefox PDF Viewer, and will only be used when a `<base>`-element is actually present.
*Please note:* We also have tests in mozilla-central that will *indirectly* ensure that relative filter-URLs work as intended in the Firefox PDF Viewer, see https://searchfox.org/mozilla-central/source/toolkit/components/pdfjs/test/browser_pdfjs_filters.js
---
To test that the issue is fixed, the following code can be used:
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<base href=".">
<title>base href (issue 18406)</title>
</head>
<body>
<ul>
<li>Place this code in a file, named `base_href.html`, in the root of the PDF.js repository</li>
<li>Run <pre>npx gulp dist-install</pre></li>
<li>Run <pre>npx gulp server</pre></li>
<li>Open <a href="http://localhost:8888/base_href.html">http://localhost:8888/base_href.html</a> in a browser</li>
<li>Compare rendering with <a href="http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf">http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf</a></li>
</ul>
<canvas id="the-canvas" style="border: 1px solid black; direction: ltr;"></canvas>
<script src="/node_modules/pdfjs-dist/build/pdf.mjs" type="module"></script>
<script id="script" type="module">
//
// If absolute URL from the remote server is provided, configure the CORS
// header on that server.
//
const url = '/test/pdfs/issue16287.pdf';
//
// The workerSrc property shall be specified.
//
pdfjsLib.GlobalWorkerOptions.workerSrc =
'/node_modules/pdfjs-dist/build/pdf.worker.mjs';
//
// Asynchronous download PDF
//
const loadingTask = pdfjsLib.getDocument(url);
const pdf = await loadingTask.promise;
//
// Fetch the first page
//
const page = await pdf.getPage(1);
const scale = 1.5;
const viewport = page.getViewport({ scale });
// Support HiDPI-screens.
const outputScale = window.devicePixelRatio || 1;
//
// Prepare canvas using PDF page dimensions
//
const canvas = document.getElementById("the-canvas");
const context = canvas.getContext("2d");
canvas.width = Math.floor(viewport.width * outputScale);
canvas.height = Math.floor(viewport.height * outputScale);
canvas.style.width = Math.floor(viewport.width) + "px";
canvas.style.height = Math.floor(viewport.height) + "px";
const transform = outputScale !== 1
? [outputScale, 0, 0, outputScale, 0, 0]
: null;
//
// Render PDF page into canvas context
//
const renderContext = {
canvasContext: context,
transform,
viewport,
};
page.render(renderContext);
</script>
</body>
</html>
```
and implement then in using some SVG filters and composition.
Composing in using destination-in in order to multiply RGB components by
the alpha from the mask isn't perfect: it'd be a way better to natively have
alpha masks support, it induces some small rounding errors and consequently
computed RGB are approximatively correct.
In term of performance, it's a real improvement, for example, the pdf in
issue #17779 is now rendered in few seconds.
There are still some room for improvement, but overall it should be a way
better.
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.
Please see https://eslint.org/docs/latest/rules/arrow-body-style
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.
For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
In order to do that we must change the text layer opacity to 1 but
it has several implications:
- the selection color must have an alpha component,
- the background color of the span used for highlighted words
must have an alpha component either, but now the opacity is 1
we can use some backdrop-filters in HCM making the highlighted
words more visible.
- fix a regression caused by #17196: the css variable --hcm-highlight-filter
has to live under the #viewer element because in HCM it's overwritten
by js at this level, hence links annotations for example didn't
have the right colors when hovered.
- Extend the `fetchData` helper function to also support fetching of "blob" data.
- Use the `fetchData` helper function more in the code-base, when fetching non-PDF data. Given that the Fetch API isn't supported for all protocols, this should improve compatibility for the PDF.js library.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
- Modify the text and background colors in popup to fit a11y requirements
- Add a backdrop filter on clickable areas in using a svg filter mapping
canvas colors to Highlight and HighlightText ones.
The idea is to apply an overall filter on each page: the main advantage
is to have some filtered images which could help to make them visible for
some users.
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.
These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.
*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
- Transfer functions are not very commonly used in PDF documents.
- Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
- The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
In the general PDF.js library multiple PDF documents may be opened on the same web-page, which is why we many years ago started using document-specific identifiers to prevent issues with global data such e.g. with fonts.
Hence we need to treat the identifiers generated by the `FilterFactory` in the same way, since the SVG-filters for two separate PDF documents may otherwise get identical ids.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
While reviewing recent patches, I couldn't help but noticing that we now have a lot of call-sites that manually access the `PageViewport.viewBox`-property.
Rather than repeating that verbatim all over the code-base, this patch adds a lazily computed and cached getter for this data instead.
*Please note:* I don't really expect that this is will be an observable change, since virtually all PDF documents already order e.g. /MediaBox and /CropBox entries correctly.
By normalizing boundingBoxes already on the worker-thread, we can be sure that even a corrupt document won't cause issues.
Note how we're passing the `view`-getter to the `PartialEvaluator.getTextContent` method, in order to detect textContent which is outside of the page, hence it makes sense to ensure that it's formatted as expected.
Furthermore, by normalizing this once on the worker-tread we should no longer have to worry about a possibly negative width/height in the `PageViewport` constructor.
Finally, the patch also simplifies the `view`-getter a little bit.