769 Commits

Author SHA1 Message Date
Yeoh Joer
2221ccf160 Add the missing return type for PDFWorker.fromPort() 2025-02-28 02:49:44 +08:00
Jonas Jenwald
839e23f5c2 Send disableFontFace and fontExtraProperties as part of the exported font-data
These options are needed in the `FontFaceObject` class, and indirectly in `FontLoader` as well, which means that we currently need to pass them around manually in the API.
Given that the options are (obviously) available on the worker-thread, it's very easy to just provide them when creating `Font`-instances and then send them as part of the exported font-data. This way we're able to simplify the code (primarily on the main-thread), and note that `Font`-instances even had a `disableFontFace`-field already (but it wasn't properly initialized).
2025-02-24 09:34:48 +01:00
Jonas Jenwald
641e2f506e [api-minor] Re-factor how the useWorkerFetch option is used internally
With the recently added OpenJPEG no-wasm fallback we need to send the `wasmUrl` option to the worker-thread *regardless* of the value of the `useWorkerFetch` option, since the fallback won't work if we don't have a URL to `import` it from.
For consistency the code is re-factored to always send the factory-urls to the worker-thread, and simply check the `useWorkerFetch` option there instead.

Also, as a follow-up to PR 19525, introduce a new `useWasm` option that can be used in e.g. browser-tests to forcibly disable WebAssembly usage.
2025-02-22 09:56:53 +01:00
Nicolò Ribaudo
458b2ee402
[api-minor] Render high-res partial page views when falling back to CSS zoom (bug 1492303)
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.

This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.

Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
2025-02-21 10:00:55 -08:00
Jonas Jenwald
e3ea92603d
Merge pull request #19493 from Snuffleupagus/URL-parse
Introduce some `URL.parse()` usage in the code-base
2025-02-21 10:40:32 +01:00
Jonas Jenwald
06e4580f8b Ensure that the useWorkerFetch fallback value is always a boolean
If either of the factory-urls are missing or invalid, the fallback value would currently become `useWorkerFetch === null`.
While that is obviously a falsy value, which means that the code still works as intended, we should ensure that this is consistent.
2025-02-16 14:04:30 +01:00
Jonas Jenwald
c2e33307b1 Introduce some URL.parse() usage in the code-base
This (fairly new) static method allows parsing URLs without having to wrap `new URL(...)` calls within `try...catch` blocks, thus simplifying the code; see https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static

For older browsers/environments the functionality will be polyfilled, but *only* in `legacy` builds, via `core-js`; see https://github.com/zloirock/core-js?tab=readme-ov-file#url-and-urlsearchparams

*Please note:* This is currently limited to the `src/`- and `web/`-folders, such that we don't break development/testing, since the functionality is not available in all Node.js versions that we support; see https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static#browser_compatibility
2025-02-15 19:10:36 +01:00
Jonas Jenwald
88e5da1e37 Combine the main-thread message handlers for CMap-, StandardFontData-, and Wasm-files
Currently we have three separate and virtually identical message handlers for this data, which can easily be combined into a single message handler instead.
2025-02-07 14:33:15 +01:00
Jonas Jenwald
fa3358baf9 [api-minor] Add more validation for the cMapUrl, standardFontDataUrl, and wasmUrl parameters
Given that we now have a few different factory-url parameters, we introduce a helper function for parsing them.

*Please note:* These parameters have always been documented as requiring a trailing slash[1], which we can now easily enforce during the `getDocument`-call.

---
[1] I recall that we've occasionally seen issues because users miss that detail, and the new Error should hopefully be more easily actionable than one thrown during rendering/parsing.
2025-02-04 10:27:31 +01:00
Jonas Jenwald
9241e1be8c [api-minor] Simplify clean-up of page resources after rendering
After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).

Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.

Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
 - We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
 - We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
 - The amount of available RAM on devices should be a lot higher, since more than a decade has passed.

A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).

---
[1] More specifically:
 - `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
 - `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
 - `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
2025-01-22 12:19:44 +01:00
Jonas Jenwald
db43f158dc Inline the default Factory-definitions in getDocument
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.

 - A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.

 - For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
2025-01-18 14:09:14 +01:00
Jonas Jenwald
1ddce76a8b Simplify the JSDocs for the various getDocument Factory-parameters
Given that we nowadays provide default Node.js versions of these Factory-parameters it no longer seems necessary to mention that environment specifically.
2025-01-16 23:01:36 +01:00
Calixte Denizet
94b4b54ef6 [api-major] Add openjpeg.wasm to pdf.js (bug 1935076)
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.

So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
2025-01-16 21:09:50 +01:00
Jonas Jenwald
6bde49a606 Reduce duplication when handling "DocException" and "PasswordRequest" messages
Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need.

Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function.

Finally, the patch makes the following smaller changes:
 - Avoid needlessly re-creating exceptions in the `wrapReason` function.
 - Use a slightly shorter parameter name in the `wrapReason` function.
 - Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.
2024-12-26 12:55:49 +01:00
Jonas Jenwald
490e740365 [api-minor] Remove deprecated getDocument options (PR 18776 follow-up) 2024-12-15 14:13:44 +01:00
Jonas Jenwald
6153b15231 Remove the raw path-strings after creating the actual Path2D glyph-objects
The `Path2D` glyph-objects are cached on the `FontFaceObject`-instance, so we can save a little bit of memory by removing the raw path-strings once they're no longer needed.
2024-12-09 15:09:18 +01:00
Jonas Jenwald
c6e3fc4fe6 Take the userUnit into account in the PageViewport class (issue 19176) 2024-12-08 15:51:04 +01:00
Tim van der Meij
2bbf9b432c
Merge pull request #19099 from timvandermeij/updates
Update dependencies and translations to the most recent versions
2024-12-01 14:54:07 +01:00
Jonas Jenwald
4b0900fdfa Use even more optional chaining in the src/display/api.js file
This slightly shortens the code, in various `destroy`-methods, which cannot hurt.
Also, use pre-processor checks to simplify `PDFDocumentLoadingTask.destroy` in the Firefox PDF Viewer since the `PDFWorker.fromPort`-method isn't used there.
2024-11-30 12:06:27 +01:00
Tim van der Meij
725ae4998c
Fix the type definition of the fingerprints getter in src/display/api.js
The `Array` type takes one parameter that describes the possible types
of the inner  elements. This parameter may exist of pipes to indicate
multiple possible types. However, the current type definition provides
multiple parameters; this is incorrect syntax, and TypeScript 5.7+ now
fails on this due to stricter syntax validation.

This commit fixes the issue by changing the type definition to the
proper syntax, which together with the accompanying comment about the
contents of the fingerprints array should document it sufficiently.
2024-11-24 21:53:45 +01:00
Jonas Jenwald
471284f51b [api-minor] Disable ImageDecoder usage by default in Chromium browsers
Given that there are multiple issues with `ImageDecoder` in Chromium browsers, affecting both BMP and JPEG images, for now we (by default) disable that functionality there to avoid problems.

This also means that we can remove the previously added, and separate, `isChrome` API-option.
2024-11-14 12:05:15 +01:00
Jonas Jenwald
65eedfb0fc [api-minor] Add a getDocument option to disable ImageDecoder usage
This allows end-users to forcibly disable `ImageDecoder` usage, even if the browser appears to support it (similar to the pre-existing option for `OffscreenCanvas`).
2024-11-12 17:12:42 +01:00
Jonas Jenwald
cbf0ca71bf [api-minor] Only support the Fetch API for "remote" PDF documents in Node.js environments
The Fetch API has been supported since Node.js version 18, see https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility
2024-11-03 16:18:10 +01:00
Jonas Jenwald
c7407230c1 [api-minor] Load Node.js packages/polyfills with process.getBuiltinModule
This allows *synchronous* loading of Node.js modules and (indirectly) packages, thus simplifying the code a fair bit.
2024-11-03 16:13:58 +01:00
Jonas Jenwald
4e12906061 Move the various DOM-factories into their own files
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.

 - By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.

 - This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.

*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
2024-11-01 13:31:28 +01:00
Jonas Jenwald
7572382c7a Change the "FetchBuiltInCMap"/"FetchStandardFontData" message-handlers to be asynchronous
This way we can directly throw Errors, rather than having to "manually" return rejected Promises, which is ever so slightly shorter.

Also, since `useWorkerFetch` is always true in MOZCENTRAL builds these message-handlers should not be invoked there.
2024-10-31 09:29:11 +01:00
Jonas Jenwald
afb4813d1c Simplify the "ReaderHeadersReady" message-handler in the API
We can convert the handler to an `async` function, which removes the need to create a temporary Promise here.
Given the age of this code it shouldn't hurt to simplify it a little bit.
2024-10-29 14:59:39 +01:00
Calixte Denizet
b649b6f8dd Use a BMP decoder when resizing an image
The image decoding won't block the main thread any more.
For now, it isn't enabled for Chrome because issue6741.pdf leads to a crash.
2024-10-28 14:09:52 +01:00
Jonas Jenwald
788eabc76a Re-factor the MessageHandler-class event handler function
- Change the "message" event handler function to a private method.

 - Remove the "message" event listener with an `AbortSignal`.

 - Extend the `LoopbackPort`-class with `AbortSignal` support.
2024-10-16 10:16:27 +02:00
Jonas Jenwald
a989244570 Slightly re-factor the transportFactory initialization in getDocument
Given that the `WorkerTransport`-constructor will access all possible factory-instances, let's ensure that the `transportFactory`-object always has a consistent shape regardless of other options.

Also, since `useWorkerFetch` is always true in MOZCENTRAL builds we never need to create `CMapReaderFactory`/`StandardFontDataFactory`-instances there.
2024-09-26 12:16:05 +02:00
Jonas Jenwald
bb302dd993 [api-minor] Pass CanvasFactory/FilterFactory, rather than instances, to getDocument
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.

As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
2024-09-23 11:26:30 +02:00
Jonas Jenwald
f77a29d675 Simplify the code that picks the appropriate NetworkStream-implementation
This code is quite old and has been moved/re-factored a few times over the years, however we can simplify this even further since we don't actually need a function to determine what NetworkStream-implementation to use.
2024-09-17 12:23:43 +02:00
Jonas Jenwald
c4fdb28573 Remove PDFWorkerUtil and move its contents into PDFWorker instead
This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.
2024-07-29 11:22:43 +02:00
Tim van der Meij
c77b97daff
Update the JS/CSS files for the new Prettier/Stylelint versions 2024-07-13 16:29:47 +02:00
Jonas Jenwald
a4ffc1066c Move the internal API/Worker isEditing-state into RenderingIntentFlag
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
2024-07-04 23:34:30 +02:00
Calixte Denizet
64635f3b35 [api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
 - if the annotation has to be composed with the page then the canvas must be correctly
   composed with its parent. That means we should move the canvas under canvasWrapper
   and we should extract composing info from the drawing instructions...
   Currently it's the case with highlight annotations.
 - we use some extra memory for those canvas even if the user will never edit them, which
   the case for example when opening a pdf in Fenix.

So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
2024-07-02 14:11:40 +02:00
Jonas Jenwald
a4f1a9a41b Cancel the requestAnimationFrame in the API when cancelling rendering
Errors related to this `requestAnimationFrame` show up intermittently when running the integration-tests on the bots, however I've been unable to reproduce it locally.
Hence I cannot guarantee that it's enough to fix the timing issues, however this should be generally safe since the `requestAnimationFrame` invokes the `_next`-method and the first thing that one does is check that rendering hasn't been cancelled.
2024-06-26 17:26:02 +02:00
Jonas Jenwald
f3f88eecb4 Use an AbortController to remove the temporary "error" handler for the worker 2024-06-15 14:35:32 +02:00
Jonas Jenwald
2d0e08f1c8 Introduce a helper method for resolving the PDFWorker promise
This avoids having to repeat the same code multiple times, since besides resolving the promise we also need to send the "configure" message to the worker-thread.
2024-06-15 14:35:30 +02:00
Jonas Jenwald
8d4456172b Reduce duplication when handling the "test" message from the worker
The feature-testing on the worker-thread has been simplified in previous pull requests, which means that we can simplify this main-thread handler as well.
2024-06-15 14:35:28 +02:00
Jonas Jenwald
0a36b667e4 Use an early return in PDFWorker.prototype._initialize when workers are disabled
This helps reduce overall indentation in the method, thus leading to slightly less code.
Also, remove an old comment referring to Chrome 15 since that's no longer relevant now.
2024-06-15 14:35:12 +02:00
Jonas Jenwald
d6612b3427 Remove some now redundant validation in getDocument
Given that we now check/validate all options properly this old code can be simplified.
2024-06-15 14:35:10 +02:00
Calixte Denizet
ff6180a4c9 Add an option to enable/disable hardware acceleration (bug 1902012) 2024-06-12 18:41:07 +02:00
Jonas Jenwald
06334c97ef Improve the loadingParams functionality in the API
- Move the definition of the `loadingParams` Object, to simplify the code.

 - Add a unit-test, since none existed and the viewer depends on this functionality.
2024-05-24 09:26:40 +02:00
Jonas Jenwald
15b5808eee [api-minor] Re-factor the basic textLayer-functionality
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.

This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.

Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
2024-05-17 14:20:20 +02:00
Tim van der Meij
4db843617f
Merge pull request #18047 from Snuffleupagus/issue-18042
Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
2024-05-15 15:40:18 +02:00
Jonas Jenwald
6b171540b7 Initialize the networkStream synchronously in getDocument
This is fairly old code, and at some point the need for this to be asynchronous disappeared.
2024-05-14 17:04:25 +02:00
Jonas Jenwald
cbb8748a22 Inline the _fetchDocument helper function in getDocument
This function has been modified a number of times over the years, and at this point it's small/simple enough that we can just inline the code instead.
2024-05-14 16:29:41 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00