797 Commits

Author SHA1 Message Date
calixteman
9f576beee8
Merge branch 'master' into marked-content-lang 2025-11-21 18:22:30 +01:00
Calixte Denizet
50c48cf11b Add telemetry for tagged pdfs (bug 1997134) 2025-11-17 19:47:16 +01:00
Calixte Denizet
bc87f4e8d6 Add the possibility to create a pdf from different ones (bug 1997379)
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
 - the struct trees
 - the page labels
 - the outlines
 - named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
2025-11-07 14:57:48 +01:00
Edoardo Cavazza
e4569c5d22 Extract Lang attribute for marked contents 2025-10-29 19:50:28 +01:00
Aditi
fa631806bf Serialize pattern data into ArrayBuffer
Follow up on https://github.com/mozilla/pdf.js/pull/20197,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.

It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
2025-10-11 01:58:07 +05:30
Calixte Denizet
19ff148163 Fix incremental saving with hybrid references
This patch removes some previous fixes which are now likely fixed by #17636.

Fixes #20302.
2025-10-04 18:31:55 +02:00
Ujjwal Sharma
4bed7370f4 [WIP] Serialize font data into an ArrayBuffer
This PR serializes font data into an ArrayBuffer
that is then transfered from the worker to the
main thread. It's more efficient than the current
solution which clones the "export data" object
which includes the font data as a Uint8Array.

It prepares us to switch to a SharedArrayBuffer
in the future, which would allow us to share
the font data with multiple agents, which would be
crucial for the upcoming "renderer" worker.
2025-09-19 12:02:40 +05:30
Nicolò Ribaudo
42b4d97115
Fix JSDoc description in src/display/api.js 2025-09-12 15:01:41 +02:00
Nicolò Ribaudo
e4ea2e0c79
Store ops bboxes in a linear Uint8Array
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.

Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.

This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).

To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
2025-09-09 10:24:48 +02:00
Nicolò Ribaudo
6a22da9c2e
Add logic to track rendering area of various PDF ops
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.

This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.

All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.

The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
2025-08-22 18:26:59 +02:00
Calixte Denizet
9e5ee1e5a7 [Editor] Add the ability to get all the editable annotations in a pdf document
We want to be able to show all the comments in a pdf even if the pages where they are
haven't been rendered.
And it'll help to fix the issue #18915.
2025-08-18 21:31:11 +02:00
Knut Hühne
760c8d632c
Mark canvasContext as optional
In #20016, the `canvasContext` property of `RenderParameters` was deprecated in favor of the new `canvas` property.

The JSDoc was updated to include the new parameter along with the old one.

I think the old one should be enclosed in `[]` to mark it as optional (and to allow usage from TypeScript with just the `canvas` parameter provided).

I also reordered the properties so that all required properties come first, follow by optional ones.
2025-08-06 12:38:30 +02:00
Calixte Denizet
1c15ba7789 Use the canvas context from mozPrintCallback when printing a pdf from the Firefox viewer
The context is specific to the callback and cannot be created from the canvas itself.
2025-07-11 12:55:39 +02:00
Ujjwal Sharma
b1b728d47f [api-minor] Move getContext call to InternalRenderTask
This is a precursor to moving the call into a
worker thread to let us use `OffscreenCanvas`. The
current position wouldn't work since we make
transformations to the canvas object after the
getContext call, which isn't allowed for
OffscreenCanvas. Also it isn't allowed to clone or
`transferControlToOffscreen` the canvas after the
`getContext` call.
2025-07-04 00:53:51 +02:00
Jonas Jenwald
e91b480c09 Move the PDFObjects class to its own file
This isn't directly part of the official API, and having this class in its own file could help avoid future changes (e.g. issue 18148) affecting the size of the `src/display/api.js` file unnecessarily.
2025-05-20 13:47:36 +02:00
Jonas Jenwald
0105237af6 Move a few helper functions/classes out of the src/display/api.js file
Given that this file represents the official API, it's difficult to avoid it becoming fairly large as we add new functionality. However, it also contains a couple of smaller (and internal) helpers that we can move into a new utils-file.

Also, we inline the `DEFAULT_RANGE_CHUNK_SIZE` constant since it's only used *once* and its value has never been changed in over a decade.
2025-05-20 13:47:36 +02:00
Tim van der Meij
f6e4b1cf4a
Merge pull request #19945 from Snuffleupagus/NetworkStream-rm-Node-Error
Remove Node.js-specific checks when using the Fetch API
2025-05-18 12:05:13 +02:00
Jonas Jenwald
a882195e9b Remove Node.js-specific checks when using the Fetch API
Given that Node.js has full support for the Fetch API since version 21, see the "History" data at https://nodejs.org/api/globals.html#fetch, it seems unnecessary for us to manually check for various globals before using it.

Since our primary development target is browsers in general, and Firefox in particular, being able to remove Node.js-specific compatibility code is always helpful.

Note that we still, for now, support Node.js version 20 and if the relevant globals are not available then Errors will instead be thrown from within the `PDFFetchStream` class.
2025-05-18 10:49:02 +02:00
Jonas Jenwald
99b23ea1f6 Use private fields in the PDFDataRangeTransport class 2025-05-18 10:16:46 +02:00
Jonas Jenwald
fc697b3602 Utilize private fields and methods more in the PDFWorker class
This replaces, wherever possible, the old semi-private fields and methods with actually private ones.
2025-05-17 18:05:49 +02:00
Jonas Jenwald
ab672f0b77 Replace PDFWorker.fromPort with a generic PDFWorker.create method
This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.
2025-05-17 16:13:41 +02:00
Jonas Jenwald
e5e9d18289 Allow using the workerPort option in Firefox 2025-05-15 19:30:43 +02:00
Jonas Jenwald
6b961c424f Update Webpack to version 5.99.5 (issue 19808)
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.

Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.

*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
2025-04-13 16:48:19 +02:00
Jonas Jenwald
8bcc3664c9 [api-minor] Attempt to improve support for using the PDF.js builds with Vite
Similar to Webpack there's apparently other bundlers that will not leave `import`-calls alone unless magic comments are used.
Hence we extend the builder to also append `/* @vite-ignore */` comments to `import`-calls, in order to attempt to improve support for using the PDF.js builds together with Vite.

This patch also renames `__non_webpack_import__` to `__raw_import__` since the functionality is no longer bundler-specific.

***PLEASE NOTE:*** This patch is provided as-is, and it does *not* mean that the PDF.js project can/will provide official support for Vite.
2025-03-28 16:34:00 +01:00
Jonas Jenwald
9e8d4e4d46 [api-minor] Attempt to support fetching the raw data of the PDF document from the PDFDocumentLoadingTask-instance (issue 15085)
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
2025-03-16 10:09:44 +01:00
Calixte Denizet
7280540901 [api-minor] Use an icc profile for converting CMYK to RGB 2025-03-10 14:18:20 +01:00
Tim van der Meij
be6ef64a2c
Merge pull request #19577 from Snuffleupagus/isValidExplicitDest-reuse
Re-use the `isValidExplicitDest` helper function in the worker/viewer
2025-03-02 15:50:54 +01:00
Jonas Jenwald
165d90fe26 Re-use the isValidExplicitDest helper function in the worker/viewer
Currently we re-implement the same helper function twice, which in hindsight seems like the wrong decision since that way it's quite easy for the implementations to accidentally diverge.
The reason for doing it this way was because the code in the worker-thread is able to check for `Ref`- and `Name`-instances directly, which obviously isn't possible in the viewer but can be solved by passing validation-functions to the helper.
2025-03-01 12:08:56 +01:00
Yeoh Joer
2221ccf160 Add the missing return type for PDFWorker.fromPort() 2025-02-28 02:49:44 +08:00
Jonas Jenwald
839e23f5c2 Send disableFontFace and fontExtraProperties as part of the exported font-data
These options are needed in the `FontFaceObject` class, and indirectly in `FontLoader` as well, which means that we currently need to pass them around manually in the API.
Given that the options are (obviously) available on the worker-thread, it's very easy to just provide them when creating `Font`-instances and then send them as part of the exported font-data. This way we're able to simplify the code (primarily on the main-thread), and note that `Font`-instances even had a `disableFontFace`-field already (but it wasn't properly initialized).
2025-02-24 09:34:48 +01:00
Jonas Jenwald
641e2f506e [api-minor] Re-factor how the useWorkerFetch option is used internally
With the recently added OpenJPEG no-wasm fallback we need to send the `wasmUrl` option to the worker-thread *regardless* of the value of the `useWorkerFetch` option, since the fallback won't work if we don't have a URL to `import` it from.
For consistency the code is re-factored to always send the factory-urls to the worker-thread, and simply check the `useWorkerFetch` option there instead.

Also, as a follow-up to PR 19525, introduce a new `useWasm` option that can be used in e.g. browser-tests to forcibly disable WebAssembly usage.
2025-02-22 09:56:53 +01:00
Nicolò Ribaudo
458b2ee402
[api-minor] Render high-res partial page views when falling back to CSS zoom (bug 1492303)
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.

This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.

Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
2025-02-21 10:00:55 -08:00
Jonas Jenwald
e3ea92603d
Merge pull request #19493 from Snuffleupagus/URL-parse
Introduce some `URL.parse()` usage in the code-base
2025-02-21 10:40:32 +01:00
Jonas Jenwald
06e4580f8b Ensure that the useWorkerFetch fallback value is always a boolean
If either of the factory-urls are missing or invalid, the fallback value would currently become `useWorkerFetch === null`.
While that is obviously a falsy value, which means that the code still works as intended, we should ensure that this is consistent.
2025-02-16 14:04:30 +01:00
Jonas Jenwald
c2e33307b1 Introduce some URL.parse() usage in the code-base
This (fairly new) static method allows parsing URLs without having to wrap `new URL(...)` calls within `try...catch` blocks, thus simplifying the code; see https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static

For older browsers/environments the functionality will be polyfilled, but *only* in `legacy` builds, via `core-js`; see https://github.com/zloirock/core-js?tab=readme-ov-file#url-and-urlsearchparams

*Please note:* This is currently limited to the `src/`- and `web/`-folders, such that we don't break development/testing, since the functionality is not available in all Node.js versions that we support; see https://developer.mozilla.org/en-US/docs/Web/API/URL/parse_static#browser_compatibility
2025-02-15 19:10:36 +01:00
Jonas Jenwald
88e5da1e37 Combine the main-thread message handlers for CMap-, StandardFontData-, and Wasm-files
Currently we have three separate and virtually identical message handlers for this data, which can easily be combined into a single message handler instead.
2025-02-07 14:33:15 +01:00
Jonas Jenwald
fa3358baf9 [api-minor] Add more validation for the cMapUrl, standardFontDataUrl, and wasmUrl parameters
Given that we now have a few different factory-url parameters, we introduce a helper function for parsing them.

*Please note:* These parameters have always been documented as requiring a trailing slash[1], which we can now easily enforce during the `getDocument`-call.

---
[1] I recall that we've occasionally seen issues because users miss that detail, and the new Error should hopefully be more easily actionable than one thrown during rendering/parsing.
2025-02-04 10:27:31 +01:00
Jonas Jenwald
9241e1be8c [api-minor] Simplify clean-up of page resources after rendering
After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).

Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.

Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
 - We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
 - We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
 - The amount of available RAM on devices should be a lot higher, since more than a decade has passed.

A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).

---
[1] More specifically:
 - `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
 - `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
 - `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
2025-01-22 12:19:44 +01:00
Jonas Jenwald
db43f158dc Inline the default Factory-definitions in getDocument
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.

 - A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.

 - For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
2025-01-18 14:09:14 +01:00
Jonas Jenwald
1ddce76a8b Simplify the JSDocs for the various getDocument Factory-parameters
Given that we nowadays provide default Node.js versions of these Factory-parameters it no longer seems necessary to mention that environment specifically.
2025-01-16 23:01:36 +01:00
Calixte Denizet
94b4b54ef6 [api-major] Add openjpeg.wasm to pdf.js (bug 1935076)
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.

So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
2025-01-16 21:09:50 +01:00
Jonas Jenwald
6bde49a606 Reduce duplication when handling "DocException" and "PasswordRequest" messages
Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need.

Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function.

Finally, the patch makes the following smaller changes:
 - Avoid needlessly re-creating exceptions in the `wrapReason` function.
 - Use a slightly shorter parameter name in the `wrapReason` function.
 - Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.
2024-12-26 12:55:49 +01:00
Jonas Jenwald
490e740365 [api-minor] Remove deprecated getDocument options (PR 18776 follow-up) 2024-12-15 14:13:44 +01:00
Jonas Jenwald
6153b15231 Remove the raw path-strings after creating the actual Path2D glyph-objects
The `Path2D` glyph-objects are cached on the `FontFaceObject`-instance, so we can save a little bit of memory by removing the raw path-strings once they're no longer needed.
2024-12-09 15:09:18 +01:00
Jonas Jenwald
c6e3fc4fe6 Take the userUnit into account in the PageViewport class (issue 19176) 2024-12-08 15:51:04 +01:00
Tim van der Meij
2bbf9b432c
Merge pull request #19099 from timvandermeij/updates
Update dependencies and translations to the most recent versions
2024-12-01 14:54:07 +01:00
Jonas Jenwald
4b0900fdfa Use even more optional chaining in the src/display/api.js file
This slightly shortens the code, in various `destroy`-methods, which cannot hurt.
Also, use pre-processor checks to simplify `PDFDocumentLoadingTask.destroy` in the Firefox PDF Viewer since the `PDFWorker.fromPort`-method isn't used there.
2024-11-30 12:06:27 +01:00
Tim van der Meij
725ae4998c
Fix the type definition of the fingerprints getter in src/display/api.js
The `Array` type takes one parameter that describes the possible types
of the inner  elements. This parameter may exist of pipes to indicate
multiple possible types. However, the current type definition provides
multiple parameters; this is incorrect syntax, and TypeScript 5.7+ now
fails on this due to stricter syntax validation.

This commit fixes the issue by changing the type definition to the
proper syntax, which together with the accompanying comment about the
contents of the fingerprints array should document it sufficiently.
2024-11-24 21:53:45 +01:00
Jonas Jenwald
471284f51b [api-minor] Disable ImageDecoder usage by default in Chromium browsers
Given that there are multiple issues with `ImageDecoder` in Chromium browsers, affecting both BMP and JPEG images, for now we (by default) disable that functionality there to avoid problems.

This also means that we can remove the previously added, and separate, `isChrome` API-option.
2024-11-14 12:05:15 +01:00
Jonas Jenwald
65eedfb0fc [api-minor] Add a getDocument option to disable ImageDecoder usage
This allows end-users to forcibly disable `ImageDecoder` usage, even if the browser appears to support it (similar to the pre-existing option for `OffscreenCanvas`).
2024-11-12 17:12:42 +01:00