This commit updates the release pipeline to use OIDC trusted publishing
now that we have configured it between GitHub Actions and NPM. This
solution allows us to remove the token variable (because there is no
longer a fixed token) and provenance flag (because provenance
attestations are generated by default with this approach); refer to
https://docs.npmjs.com/trusted-publishers for more information.
For now its use is limited to the comment sidebar but it'll be used for the new one
containing the thumbnails.
And make the sidebar more accessible with the keyboard or a screen reader.
This dependency got introduced during the move to Fluent for
localization (bug 1858715), but it wasn't added to `package.json`
at the time. This worked because other dependencies already
installed it, but we shouldn't rely on that, so this commit
explicitly includes it in `package.json` instead.
Fixes 66982a2a.
The linter found some issues in viewer.html with </input> which isn't required
and a missing closing div in test/resources/reftest-analyzer.html.
The HTML can now be nicely formatted. In order to not break the build for
mozilla-central, the preprocessor has been fixed in order to take into account
the white spaces at the beginning of a comment line.
And finally, make .prettierrc (which is supposed to be either json or yaml)
itself lintable.
Doing so has a number of advantages:
- it removes code duplication, thereby improving readability;
- it removes hardcoded editor IDs, by using the `getNextEditorId` helper
function that was previously introduced for the highlight editor
integration tests, thereby improving readability and reusability;
- it removes potential for intermittent failures by not proceeding until
the freetext editor is fully created and all assertions pass, which
didn't happen consistently before because the code wasn't centralized.
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
- the struct trees
- the page labels
- the outlines
- named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
Metric-compatible font with Times New Roman created by ParaType, based on
their serif font PT Serif, released under OFL-1.1 license.
https://www.paratype.com/fonts/pt/pt-astra-serif
Signed-off-by: Coelacanthus <uwu@coelacanthus.name>
It'll help to make math equations "visible" for screen readers.
MS Office has a specific way to add some MathML code to struc tree leaf
and this patch handles it.
Move current pointer field of DrawingEditor to CurrentPointer class in tools.js: The pointer types fields have been moved to a CurrentPointer object in tools.js. This object is used by eraser.js and ink.js.
Only reset pointer type when user select a new mode: Clear the pointer type when changing mode, instead of at the end of the session. It seems more stable, as the method is not called this way when the user changes pages. Also, clear the pointer type when the mode is changed by an event (the user changes the editor type), otherwise, the same pointer type is kept (the document is changed for example)
On GitHub Actions this test could fail with `Expected 1.3125 to be less
than 1` due to slight differences in the decimals of the `rect.y` value.
However, for this check we're only really interested in full pixels, so
this commit rounds the value up to the next integer to not be affected
by small decimal differences so the test passes on GitHub Actions too.
For the integration tests we prefer non-linked test cases because those
PDF files are directly checked into the Git repository and thus don't
need a separate download step that linked test cases do.
However, for the freetext and ink integration tests we currently use
`aboutstacks.pdf` which is a linked test case. Fortunately we don't need
to use it because for most tests we don't actually use any properties of
it: we only create editors on top of the canvas, but for that any PDF
file works, so we can simply use the non-linked `empty.pdf` file instead.
The only exception is the "aria-owns" test that needs a line of text from
the PDF file, so we move that particular test to a dedicated `describe`
block and adapt it to use the non-linked `attachment.pdf` file that just
contains a single line of text that can be used for this purpose.
The changes combined make 12 more integration tests run out-of-the-box
after a Git clone, which also simplifies running on GitHub Actions.
We used a SVG string which can be pass to the Path2D ctor but it's a bit slower than
building the path step by step.
Having numerical data instead of a string will help the font data serialization.
The position of the text rendered by `showText` is affected
incrementally by the preceding `showText` operations "on the same line".
For this reason, we keep track of all of them (with their dependencies)
in `sameLineText`.
`sameLineText` can be reset whenever we explicitly position the text
somewhere else. We were previously only doing it for `moveText`, and
this patch updates the code to also do it on `setTextMatrix` (which
resets `this.current.x/y` in `CanvasGraphics` to 0).
The complexity of subsequent `sameLineText` operations dependency
tracking grows quadratically with the number of operations on the same
line, so this patch fixes the performance problem when there are whole
pages of text that only use `setTextMatrix` and not `moveText`.
Python 3.14 is the current stable version, released on October 7th. The
dependencies we use also support Python 3.14 now, most importantly
`fonttools` for which the OS-specific builds have been published (see
the `cp314` wheels on https://pypi.org/project/fonttools/#files).
Follow up on https://github.com/mozilla/pdf.js/pull/20197,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.
It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
Instead of looking at every bbox, we use a grid (64x64) where each cell of the grid is associated with the bboxes
touching it.
In order to get the potential bboxes containing a point, we just have to compute the number of the cell containing
it and in using the associated described above, we can quickly know if the point is contained.
With the pdf in the mentioned bug, it's ~20 times faster.
The `width` and `height` arguments for `setDims` have been removed in
PR #20285, but for two calls in the highlight code they remained. This
commit removes them as they are no longer used in the method itself.
Fixes 0722faa9.
But keep a lower quality when enableOptimizedPartialRendering is true because we need to compensate the time
used to compute the bboxes and since subsequent rendering are faster it's more acceptable to see
a lower quality image for few tenths of seconds.
[Annotation] In reading mode with new commment stuff enabled, use the comment popup for annotations without a popup but with some contents (bug 1991029)
This PR serializes font data into an ArrayBuffer
that is then transfered from the worker to the
main thread. It's more efficient than the current
solution which clones the "export data" object
which includes the font data as a Uint8Array.
It prepares us to switch to a SharedArrayBuffer
in the future, which would allow us to share
the font data with multiple agents, which would be
crucial for the upcoming "renderer" worker.
Change info to use console.info and warn function
to use console.warn, this not only makes sense semantically
but also in practice server side runtimes like deno
write console.log to stdout, and console.warn
to stderr (info goes to stdout, unfortunately?)
this is important because logging to stdout
can break some cli apps.
Before this patch, when `enableOptimizedPartialRendering`
is enabled we would record the bounding boxes of the
various operations on the first render.
This patches change it to happen on the first render that we
know will also need a detail view, so that the performance
cost is not paid for the case when the detail view is not used.
The size of the canvas has significant impact on the rendering
performance. If we are going to render a high-res detail
view on top of the full-page canvas, we can further
reduce the full-page canvas resolution to improve
rendering time without affecting the resolution seen by
the user.
Users will se the lower resolution when quickly scrolling around the
page, but it will then be replaced with the high-res
detail view.
It can happen with a pdf having a large text layer.
Instead of waiting for the first rendered page to enable the buttons we wait for
a rendered annotation editor layer.
In order to see the issue this patch is fixing:
- open a pdf with some highlights and a comment on page 1, at page 7
- open the comment sidebar
- click on the comment on page 1
Opening at page 7 lets a not fully rendered page which means that when jumping to it
with the sidebar, we re-use what we've instead of redrawing it.
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.
Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.
This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).
To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
Bug 1978027 has been fixed upstream 10 days ago, so this integration
test can be enabled for Firefox too now that it passed with recent
Nightly versions.
Before the introduction of the `renderRichText` helper function we
exclusively used `this.#html` for XFA rich text and exclusively used
`this.#contentsObj` for plain text. However, after the refactoring we
tried to access `this.#contentsObj.dir` in both cases, which fails for
XFA rich text because `this.#contentsObj` is `null` in that case.
This commit fixes the issue by using optional chaining to make sure we
don't try to access non-existent `this.#contentsObj` properties, which
makes the `must update an existing annotation and show the right popup`
freetext integration pass again.
Fixes#20237.
Fixes 35c90984.
Most places have a newline before/after `before{Each,All}`,
`after{Each,All}` and `it` to visually separate the blocks for clarity,
but in a handful of places this wasn't done. This commit removes the
inconsistencies so that the test code is formatted consistently.
The helper function was used in a number of places, but also a lot of
places contained the annotation selector string inline. This commit
makes sure that all places use `getAnnotationSelector` consistently to
make sure the annotation selector string is only defined in a single
place and to improve readability of the test code.
This test called `closeSinglePage` manually at the end of the test,
which is inconsistent with all other tests that call `closePages` in an
`afterEach` block. This commit fixes the difference for consistency.
If the document changes the comment state from the old document should
be replaced with that of the new document. To do this the comment
manager is destroyed, but the corresponding comment sidebar wasn't
destroyed yet, which resulted in the comment state from the old document
still being visible for the new document.
This commit fixes the issue by hiding the comment sidebar if the comment
manager is destroyed. Note that hiding the comment sidebar effectively
destroys all its state, and we already set the annotation mode to "none"
on document change so we don't want to keep showing the comment sidebar
anyway.
This function is required for the commenting feature: the user will be able to click
on a comment in the sidebar and the document will be scrolled accordingly.
Locally, on Arch Linux, this integration test permafails:
```
1) Text layer Text selection using selection carets doesn't jump when moving selection
Message:
second selection:
Expected '(frequently executed) bytecode sequences, records
them, and compiles them to fast native code. We call such a s' to roughly match /frequently .* We call such a se/s.
Stack:
at <Jasmine>
at UserContext.<anonymous> (file:///home/timvandermeij/Documenten/Ontwikkeling/pdf.js/Code/test/integration/text_layer_spec.mjs:521:12)
Message:
third selection:
Expected '(frequently executed) bytecode sequences, records
them, and compiles them to fast native code. We call such a s' to roughly match /frequently .* We call such a se/s.
Stack:
at <Jasmine>
at UserContext.<anonymous> (file:///home/timvandermeij/Documenten/Ontwikkeling/pdf.js/Code/test/integration/text_layer_spec.mjs:529:12
```
The exact selection can differ a bit per OS/browser. In this case the
last character was consistently not selected while on other platforms it
is, so this commit fixes the issue by relaxing the regex to not consider
the final character so that the test passes if the rest matches.
This removes the following error from the integration test logs:
```
JavaScript error: http://127.0.0.1:59283/build/generic/build/pdf.mjs, line 3879:
TypeError: EventTarget.addEventListener: 'signal' member of AddEventListenerOptions is not an object.
```
Fixes 636ff50.
Extends 63651885.
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.
This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.
Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.
This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.
All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.
The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
We don't need AI/ML features in the tests, so this should reduce CPU
usage by not having the inference process running. Moreover, it prevents
the following lines from being logged in the test output:
```
JavaScript error: resource://gre/actors/MLEngineParent.sys.mjs, line 509: Error: Unable to get the ML engine from Remote Settings.
JavaScript error: resource://gre/actors/MLEngineParent.sys.mjs, line 1279: TypeError: can't access property "postMessage", this[#port] is null
```
Implement a delay for Chrome protocol calls in the integration tests, and skip the "must check that an existing highlight is ignored on hovering" integration test on Windows
This is a temporary measure to reduce noise until #20136 is fixed. Note
that this shouldn't be an issue in terms of coverage because we still
run the test on Linux.
In Chrome protocol calls are faster than in Firefox and thus trigger in
quicker succession. This can cause intermittent failures because new
protocol calls can run before events triggered by the previous protocol
calls had a chance to be processed (essentially causing events to get
lost).
This commit fixes the issue by configuring Chrome with a protocol call
delay value that gives it a more similar execution speed as Firefox
(which also gives us more consistency between the two browser runs).
Note that this doesn't negatively impact the overall runtime of the
integration tests because Puppeteer already waits for a test to complete
in both browsers before continuing to the next one and Chrome
consistently was, and with this patch still slightly is, faster in
completing the tests.
It should fix the error:
```
JavaScript error: http://127.0.0.1:43303/build/generic/build/pdf.mjs, line 1445:
TypeError: EventTarget.addEventListener: 'signal' member of AddEventListenerOptions is not an object.
```
we've when running integration tests on the Linux bot.
In #20016, the `canvasContext` property of `RenderParameters` was deprecated in favor of the new `canvas` property.
The JSDoc was updated to include the new parameter along with the old one.
I think the old one should be enclosed in `[]` to mark it as optional (and to allow usage from TypeScript with just the `canvas` parameter provided).
I also reordered the properties so that all required properties come first, follow by optional ones.
The fixed -400px horizontal offset used by
scrollIntoView led to horizontal scroll only moving
part-way right on narrow screens. The highlights near
the right-edge remained party or completely off
screen.
This centres the highlighted match on any viewport width while
clamping the left margin to 20-400px. On very narrow screens
the scrollbar now moves all the way to the right instead of
stopping midway.
It's useful for users highlighting with NVDA.
They've to enable native selection and then selection some text.
In this case only a selectionchange is triggered once the selection is done.
Typing in the text field causes a sandbox event to trigger, which we
should await to avoid continuing to the next part of the test before
the sandbox event is fully processed.
Ensure that negative “LW” entries in an ExtGState dictionary are converted to their absolute values when the “gs” operator is processed.
See PR https://github.com/mozilla/pdf.js/pull/19639
It takes some time for the viewer alert to be updated after the editor
is committed, but the current tests don't await that and proceed too
fast to the viewer alert string assertion. This commit fixes the issue
by waiting for the expected viewer alert string to appear instead.
It fixes#20065.
The only to get a path (from the path generator) is when the font is embedded.
So when we need a path (disableFontFace: true or when we want to use a pattern for stroking/filling), it's impossible
to fulfil.
In order to screenshot the page and assert that it's monochrome,
providing a regression test for #18694, the viewer background is
configured to match the page background because screenshotting the page
always captures a small part of the viewer background as well, and this
way we can easily go over all pixels and check that they are all equal.
However, in addition to configuring the viewer background the test also
hides the toolbar and removes the page border. Especially the latter
makes `scrollIntoView` fail in both Chrome and Firefox with recent
Puppeteer versions, for reasons which remain a bit unclear.
Fortunately both hiding the toolbar and removing the page border is not
actually necessary (anymore) for the test to work, so we can simply
remove those actions to fix the issue and reduce the amount of code. To
make sure that the test still covers the original issue correctly we've
reverted the changes from #18698 and then test still fails as expected.
Fixes#19811.
Fixes 68332ec2.
This is a precursor to moving the call into a
worker thread to let us use `OffscreenCanvas`. The
current position wouldn't work since we make
transformations to the canvas object after the
getContext call, which isn't allowed for
OffscreenCanvas. Also it isn't allowed to clone or
`transferControlToOffscreen` the canvas after the
`getContext` call.
Necessary because when there is no Popup annotation created along
with a Text annotation, the Popup annotation created by pdf.js
does not receive the noRotate flag
The shadow was taken into account when computing the bounding box of the section
containing the link and it was making the clip path wrong.
Since the shadow is almost invisible because of the opacity, the yellow color and the clip
we can remove it without causing any visual regressions (and as a side effect it'll avoid
to use resources to compute it when displayed).
The fixed -400px horizontal offset used by
scrollIntoView led to horizontal scroll only moving
part-way right on narrow screens. The highlights near
the right-edge remained party or completely off
screen.
This centres the highlighted match on any viewport width while
clamping the left margin to 20-400px. On very narrow screens
the scrollbar now moves all the way to the right instead of
stopping midway.
After clicking the find button we need to wait for the input field to
appear before we try to type in it, but instead we were waiting for the
button to appear. However, the button is always present, so the current
`waitForSelector` call is basically a no-op, and this can cause the
integration tests to fail intermittently if we continue before the
input field is actually visible.
This appears to be due a typo first introduced in commit
d10da907dac86c103b787127110a59aed82c7aaa that has been copy/pasted.
This commit fixes the issue by waiting for the input field to be visible
before we continue typing in it.
On mac, the pdf backend used when printing is using the cid from the font,
so if a char has null cid then it's equivalent to .notdef and some viewers
don't display it.
The problem in the original code is that `getTextAt` is called too
early and therefore returns unexpected text content. This can happen
because we call `increaseScale` and then wait for the page's text layer
to be visible, but it can take some time before the zoom actually
occurs/completes in the viewer and in the meantime the old (pre-zoom)
text layer may still be visible, causing us to continue too soon because
we don't validate that we're dealing with the post-zoom text layer.
This commit fixes the issue by simply waiting for the expected text to
show up at the given origin coordinates, which makes the test work
independent of viewer actions/timing.
This isn't directly part of the official API, and having this class in its own file could help avoid future changes (e.g. issue 18148) affecting the size of the `src/display/api.js` file unnecessarily.
Given that this file represents the official API, it's difficult to avoid it becoming fairly large as we add new functionality. However, it also contains a couple of smaller (and internal) helpers that we can move into a new utils-file.
Also, we inline the `DEFAULT_RANGE_CHUNK_SIZE` constant since it's only used *once* and its value has never been changed in over a decade.
The `constructPath` op receives as arguments not only the
information to construct the path, but also the op to apply
the path to (such as "fill", or "stroke").
This commit updates the Stepper tool in the debugger to decode
that op, showing its name rather than just its numeric ID.
To make it easier to tell which PDF.js version/commit that the *built* files correspond to, they have (since many years) included `pdfjsVersion` and `pdfjsBuild` constants with that information.
As currently implemented this has a few shortcomings:
- It requires manually adding the code, with its preprocessor statements, in all relevant files.
- It requires ESLint disable statements, since it's obviously unused code.
- Being unused, this code is removed in the minified builds.
- This information would be more appropriate as comments, however Babel discards all comments during building.
- It would be helpful to have this information at the top of the *built* files, however it's being moved during building.
To address all of these issues, we'll instead utilize Webpack to insert the version/commit information as a comment placed just after the license header.
We have a fallback for the common case of Type3 fonts without a /FontDescriptor dictionary, however we also need to handle the case where it's present but lacking the required /FontName entry.
The `Page.evaluate()` and `Mouse.click()` APIs in Puppeteer both return
a promise; see https://pptr.dev/api/puppeteer.page.evaluate and
https://pptr.dev/api/puppeteer.mouse.click, and should therefore be
awaited before proceeding in tests to make sure that the test's behavior
is deterministic and avoid intermittent failures.
The following command was used to find potential places to fix:
`grep -nEr "[^await|return] page\." test/integration/*`
The clipboard, used via the `copyImage` helper function, is a shared
resource, so access to it cannot happen concurrently because it could
result in tests overwriting each other's contents. Most tests using
the clipboard are therefore run sequentially, but only the stamp
editor's undo-related tests weren't, so this commit fixes the issue
by running those tests sequentially too.
Given that Node.js has full support for the Fetch API since version 21, see the "History" data at https://nodejs.org/api/globals.html#fetch, it seems unnecessary for us to manually check for various globals before using it.
Since our primary development target is browsers in general, and Firefox in particular, being able to remove Node.js-specific compatibility code is always helpful.
Note that we still, for now, support Node.js version 20 and if the relevant globals are not available then Errors will instead be thrown from within the `PDFFetchStream` class.
This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.
Without this "fake" workers may be ignored in the API, which isn't really what you want when manually providing the `disableWorker=true` hash parameter. (Note that this requires the `pdfBugEnabled` option/preference to be set as well.)
Also, after the changes in PR 19810 we can just load the "fake" worker directly in development mode and don't need to manually assign it to the global scope.
Emscripten generates code that allows the caller to provide the Wasm
module (thorugh Module.instantiateWasm), with a fallback in case
.instantiateWasm is not provided. We always define instantiateWasm, so
we can hard-code the check and let our dead code elimination logic
remove the unused fallback.
This commit also improved the dead code elimination logic so that if
a function declaration becomes unused as a result of removing dead
code, the function itself is removed.
They were removed in the minified build, but the code that made the comments necessary was still there (just minified). This commit updates the Terser config to preserve them.
The default value of Terser's `comments` option is [`/@preserve|@copyright|@lic|@cc_on|^\**!/i`](d528103b7c/lib/output.js (L178C12-L178C53)), however the only type of comment it was actually matching in our case is `@lic`, for the license header in minified files. Thus the new regexp is `/@lic|webpackIgnore|@vite-ignore/i`.
*This is something that occurred to me when reviewing the latest PDF.js update in mozilla-central.*
Currently we duplicate essentially the same code in both the `OutputScale.prototype.limitCanvas` and `PDFPageDetailView.prototype.update` methods, which seems unnecessary, and to avoid that we introduce a new `OutputScale.capPixels` method that is used to compute the maximum canvas pixels.
When the /OpenAction data is an Array we're currently using it as-is which could theoretically cause problems in corrupt PDF documents, hence we ensure that a "raw" destination is actually valid. (This change is covered by existing unit-tests.)
*Note:* In the Dictionary case we're using the `Catalog.parseDestDictionary` method, which already handles all of the necessary validation.
This way it helps to reduce the overall canvas dimensions and make the rendering faster.
The drawback is that when scrolling, the page can be blurry in waiting for the rendering.
The default value is 200% on desktop and will be 100% for GeckoView.
The effect is probably not even measurable, however this patch ever so slightly reduces the asynchronicity in the `fieldObjects` getter. These changes should be safe since:
- We're inside of the `PDFDocument`-class and the `annotationGlobals`-getter, which will always return a (shadowed) Promise and won't throw `MissingDataException`s, can be accessed directly without going through the `BasePdfManager`-instance.
- The `acroForm`-dictionary can be accessed through the `annotationGlobals`-data, removing the need to "manually" look it up and thus the need for using `Promise.all` here.
- We can also lookup the /Fields-data, in the `acroForm`-dictionary, synchronously since the initial `formInfo.hasFields` check guarantees that it's available.
Currently we repeat the same identical code five times in the `Page`-class when creating a `PartialEvaluator`-instance, which given the number of parameters it needs seems like unnecessary duplication.
Node.js version 24 was just released, see https://github.com/nodejs/release#release-schedule, hence we should run tests in that version in order to help catch any possible issues as soon as possible.
Also, since version 23 will reach EOL (end-of-life) in less than a month we stop running tests in that version.
The `ObjectLoader.prototype.load` method has a fast-path, which avoids any lookup/parsing if the entire PDF document is already loaded.
However, we still need to create an `ObjectLoader`-instance which seems unnecessary in that case.
Hence we introduce a *static* `ObjectLoader.load` method, which will help avoid creating `ObjectLoader`-instances needlessly and also (slightly) shortens the call-sites.
To ensure that the new method will be used, we extend the `no-restricted-syntax` ESLint rule to "forbid" direct usage of `new ObjectLoader()`.
Given that all the methods are already asynchronous we can just use `await` more throughout this code, rather than having to explicitly return function-calls and `undefined`.
Note also how none of the `ObjectLoader.prototype.load` call-sites use the return value.
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
Currently we repeat virtually the same code when calling the `PartialEvaluator.prototype.handleSetFont` method, which we can avoid by introducing an inline helper function.
Considering the name of the method, and how it's actually being used, you'd expect it to return a boolean value.
Given how it's currently being used this inconsistency doesn't cause any issues, however we should still fix this.
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
Currently we create an intermediate `Dict` during parsing, however that seems unnecessary since (note especially the second point):
- The `NameOrNumberTree.prototype.getAll` method will already resolve any references, as needed, during parsing.
- The `Catalog.prototype.xfaImages` getter is invoked, via the `BasePdfManager`-instance, such that any `MissingDataException`s are already handled correctly.
Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.
Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
- Parsing named destinations[2] and URLs.
- Handling "strings" that are actually /Name-instances.
- Building a lookup Object/Map based on some PDF data-structure.
*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.
---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
Instead, we update the visible canvas every 500ms.
With large canvas, updating at 60fps lead to a lot gfx transactions and it can take a lot of time.
For example, with wuppertal_2012.pdf on Windows, displaying it at 150% takes around 14 min !!! without
this patch when it takes only around 14 sec with. Even at 30% it helps to improve the performance
by around 20%.
Modern Node.js versions now include a `navigator` implementation, with a few basic properties, that's actually enough for the PDF.js use-cases; please see https://nodejs.org/api/globals.html#navigator
Unfortunately we still support Node.js version `20`, hence we add a basic polyfill since that allows simplifying the code slightly.
This integration test fails intermittently because the undo button can
only be activated if focus can be put on it, and that in turn can only
happen if it's visible. The test tried to make sure that the undo bar
is visible, but checking for the absence of the `hidden` attribute is
unfortunately not enough to assert visibility according to Puppeteer
documentation [1]. Moreover, the undo button wasn't checked at all.
To fix the issue we let Puppeteer do the visibility detection for the
undo bar by providing the `visible: true` option to `waitForSelector`
[2]. This is consistent with the other tests that already do this, and
also with the existing code that detects if the undo bar is hidden
(which uses the `hidden: true` option of `waitForSelector`). Moreover,
we wait for the undo button to be present before putting focus on it.
For consistency, and to avoid intermittent failures elsewhere, we mirror
this solution to the other undo bar/button tests of the various editors.
[1] https://pptr.dev/api/puppeteer.elementhandle.isvisible
[2] https://pptr.dev/api/puppeteer.waitforselectoroptions
The intention with PR 19819 was to change how colours are specified for the viewer UI, however it wasn't intended to affect elements in the Annotation- and XFA-layers.
Hence we enforce the light `color-scheme` for these layers, and also make sure to provide a default `color` for PopupAnnotations.
Perhaps we should update the debugger CSS to properly account for the light/dark theme, however since that's not UI that end-users ever see we simply force using the light theme for now.
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
This complements, and extends, the existing check of the `Array.prototype` in the worker-thread.
To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.
This is a small, and quite possibly pointless, optimization which ensures that any "local" /Resources aren't empty, to avoid needlessly trying to load and merge dictionaries.
Remove debug code from the integration tests, and skip the "must check that canvas perfectly fits the page whatever the zoom level" integration test in Chrome
This is a temporary measure to reduce noise until #19811 is fixed. Note
that this shouldn't be an issue in terms of coverage because we still
run the test in Firefox.
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.
Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.
*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
It's no longer necessary after commit 1c73e52 that caused the document
to be closed properly between tests, and this therefore partly reverts
commit 973b67f.
This commit configures Jasmine to no longer run the tests in a fixed
order, which combined with the previous isolation commits avoids being
able to accidentally introduce dependencies between integration tests.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).
Using a `Map` rather than an `Object` is a nicer, since it has better support for both iteration and checking if a key exists.
We also change the initial values to be `null`, rather than empty strings, and reduce duplication when creating the `Map`.
*Please note:* Since this is worker-thread code, these changes are "invisible" at the API-level.
For simplicity we will abort /Form XObject parsing *immediately* when encountering a circular reference, rather than letting it continue up until some limit (as e.g. PDFium appears to do), which should be fine since there are never any guarantees if/how *corrupt* PDF documents will render.
In rare cases /Resources are also found in the /Contents stream-dict, in addition to in the /Page dict, hence we need to prefer those when available; see `issue18894.pdf`.
Previously we'd only do this for Type1/CFF fonts, see e.g. PR 6736, since the font-program may update the /FontMatrix.
However, it seems that we should do this unconditionally to account for fonts with non-default /FontMatrix-entries in the font-dictionary (which seem to be pretty rare).
In PDF version 2.0 the handling of Indexed color spaces was clarified as follows:
> The index value should be an integer in the range 0 to hival. If the value is a real number, it shall be rounded to the nearest integer (0.5 values shall be rounded up); if it is outside the range 0 to hival, it shall be adjusted to the nearest value within that range.
Please refer to https://github.com/pdf-association/pdf-differences/tree/main/IndexedColor
After the introduction of `OffscreenCanvas` support we now have *two separate* mask-methods in the `PDFImage` class, and the reason that they were not combined is likely that we need the "raw" bytes when parsing Type3-glyph image masks.
However, that case is easy to support simply by disabling `OffscreenCanvas` usage when parsing Type3-glyphs and that way we're able to reduce some code duplication.
Another slightly strange property of the `PDFImage.createMask` method is that it needs various image-dictionary parameters *manually* provided, which is probably because this is very old code.
That feels slightly unwieldy, and we instead change the method to pass in the image-stream directly and do the necessary data-lookup internally.
A side-effect of this re-factoring is that we now support using the custom `isSingleOpaquePixel` operator in Type3-glyphs, which shouldn't hurt even though it seems extremely unlikely for that to ever happen in Type3-glyphs.
This disallowd the following types of `export` declaration:
- `export class A {}`/`export function A() {}`
- `export default class A {}`/`export default function A() {}`
- `export let A`/`export const A`/`export var A`
While allowing
- `export { A }`
- `export default A`
The current text layer approach based on absolutely positioned
`<span>` elements by default causes flickering with text selection,
and we have browser-specific workarounds to solve that.
In Chrome, the workaround involves moving the `.endOfContent` element to
right after the last element that contains some selected content. This
works well in simple PDFs, but breaks when we have `span.markedContent`
elements. Given a text layer structure like the following, rendered
as four consecutive lines:
```html
<span class="markedContent">
<br>
<span>development enter the construction phase (estimated at around</span>
</span>
<span class="markedContent">
<br>
<span>300 MEUR).</span>
</span>
<span class="markedContent">
<br>
<span>Kreate's EBITA increased to 2.8 MEUR (Q4'23: 2.7 MEUR) and the</span>
</span>
<span class="markedContent">
<br>
<span>margin rose to 3.7% (Q4'23: 3.4%). However, profitability was</span>
</span>
```
when starting to select from inside the first line and dragging down
to the empty space after the second line, Chrome will anchor the
selection at the beginning of either the `<br>` or the `<span>` inside
the last `.markedContent`, depending on whether the selection is in
"per-character mode" (i.e. click and drag) or "per-word mode" (i.e.
double click and drag). This causes us to insert the `.endOfContent`
element in the wrong place (one element too far), which causes one
more line to be selected, which triggers another `"selecctionchange"`
event, which causes us to move `.endOfContent` again, and so on, looping
until when the whole page is selected.
This commit fixes the issue by making sure that when the end of the
selection range points to the _begining_ of an element, we walk back
the dom finding the first non-empty element, and attatch `.endOfContent`
to the end of that.
These `getAll` methods are not used anywhere within the PDF.js code-base, outside of tests, and were mostly added (speculatively) for third-party users.
To still allow access to the same data we instead introduce iterators on these classes, which (slightly) shortens the code and allows us to remove the `objectFromMap` helper function.
A summary of the changes in this patch:
- Replace the `getAll` methods with iterators in the following classes: `AnnotationStorage`, `Metadata`, and `OptionalContentGroup`.
- Change, and also re-name, `AnnotationStorage.prototype.setAll` into a test-only method since it's not used elsewhere.
- Remove the `Metadata.prototype.has` method, since it's only used in tests and can be trivially replaced by calling `Metadata.prototype.get` and checking if the returned value is `null`.
In order to use the PDF.js library in Node.js environments the `process.getBuiltinModule` functionality must be available, which was released in [version `20.16.0`](https://nodejs.org/en/blog/release/v20.16.0), however we've seen repeated issues filed by users on older `20.x` versions.
We want to iterate through the data in the `computeMD5` function, and `Map`s have "nicer" support for that than generic objects.
(Somewhat recently `Map` performance was improved in Firefox, however this also isn't really performance sensitive code.)
Note that we load all wasm-files manually, however the Emscripten Compiler (emcc) unfortunately generates `URL`s for fallback wasm-file loading.
In the PDF.js build-scripts we work-around that by using suitable Webpack-options, however that apparently doesn't work when third-party users re-bundle our code and we thus try to work-around this by adding "ignore comments" to these `URL`s (similar to how we handle `import`-statements).
It seems that the `@napi-rs/canvas` dependency has *basic* canvas-filter support, whereas the "old" `canvas` dependency didn't, hence we no longer need the Node.js-specific checks in the `src/display/canvas.js` file.
Note that I've successfully tested the [`pdf2png` example](https://github.com/mozilla/pdf.js/tree/master/examples/node/pdf2png) with this patch applied and things appear to work as before.
After PR 19731 the format of compiled Type3-glyphs is now simple enough that the compilation can be moved to the worker-thread, without introducing any significant additional complexity.
This allows us to, ever so slightly, simplify the implementation in `src/display/canvas.js` since the Type3 operatorLists will now directly include standard path-rendering operators (using the format introduced in PR 19689).
As part of these changes we also stop caching Type3 image masks since: we've not come across any cases where that actually helps, they're usually fairly small, and it simplifies the code.
Note that one "negative" change introduced in this patch is that we'll now compile Type3-glyphs *eagerly*, whereas previously we'd only do that lazily upon their first use.
However, this doesn't seem to impact performance in any noticeable way since the compilation is fast enough (way below 1 ms/glyph in my testing) and Type3-fonts are also limited to just 256 glyphs. Also, many (or most?) Type3-fonts don't even use image masks and are thus not affected by these changes.
In using the Firefox profiler (with JS allocations tracking) and wuppertal.pdf, I noticed
we were using a bit too much memory for a function which is supposed to just compute 2 numbers.
The memory used by itself isn't so important but having a too much objects lead to waste some time
to gc them.
So this patch aims to simplify it a bit.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 5 to 0 by fixing the following issues
in the "basic operations" block:
- Most tests relied on the first test to enable freetext editing mode.
For isolation we now do it explicitly in all tests.
- Most tests relied on the other tests having created editors. For
isolation we now create the editors explicitly in the tests themselves.
- Most tests relied on previous tests for the editor numbering. For
isolation we change the editor numbering to the one after initial
document load. Since we can't have state (editors) from a previous
test anymore we can remove various `clearAll` calls as well.
Currently we explicitly specify the fn-`OPS` both when adding entries to the operatorList and to the image-caches, and by using a temporary variable we can reduce a bit of duplication (similar to the existing args-handling).
The color-parameter is already available through `IR` (i.e. the internal representation), and after the changes in PR 4824 (which landed in 2014) we no longer need any special handling for it.
In the included PDF document the Type3-font doesn't contain any glyph definition for "space", despite that character being referenced in the /Contents stream.
While missing Type3-glyphs obviously cannot be rendered, we still need to update the current canvas position such that any char/word-spacing is correctly applied.
The test-case was found at https://github.com/pdf-association/pdf-differences/tree/main/Type3WordSpacing
Similar to Webpack there's apparently other bundlers that will not leave `import`-calls alone unless magic comments are used.
Hence we extend the builder to also append `/* @vite-ignore */` comments to `import`-calls, in order to attempt to improve support for using the PDF.js builds together with Vite.
This patch also renames `__non_webpack_import__` to `__raw_import__` since the functionality is no longer bundler-specific.
***PLEASE NOTE:*** This patch is provided as-is, and it does *not* mean that the PDF.js project can/will provide official support for Vite.
Originally this function would "manually" invoke the rendering commands for Type3-glyphs, however that was changed some time ago:
- Initial `Path2D` support was added in PR 14858, but the old code kept for Node.js compatibility.
- Since PR 15951 we've been using a `Path2D` polyfill in Node.js environments.
Hence, after the previous commit, we can further simplify this function by *directly* returning/using the `Path2D` object when rendering Type3-glyphs; see also https://github.com/mozilla/pdf.js/pull/19731#discussion_r2018712695
While this won't improve performance significantly, when compared to the introduction of `Path2D`, it definately cannot hurt.
Rather than updating the transform every time that we're painting a Type3-glyph, we can instead just compute the "final" coordinates during building of the `Path2D` objects.
NVDA behaves differently depending if the user is hovering or focusing an added signature.
An aria-description is read in both cases while an aria-label is not.
By changing the return format of the various `pointsCallback` functions we can use the `Util.rectBoundingBox` helper in the `MarkupAnnotation.prototype._setDefaultAppearance` method as well, thus shortening the code slightly.
This avoids the current situation where we're accessing it through various dictionaries, since that's a somewhat brittle solution given that in the general case a `Dict`-instance may not have the `xref`-field set (see e.g. the empty-Dict).
By tweaking a few local variable names we can shorten various viewer-component initialization code, and we can also reduce some duplication when assigning components to the `PDFViewerApplication`-scope.
Currently we have a `Util`-helper for computing the bounding-box of a Bézier curve, however for simple points and rectangles we repeat virtually identical code in many spots throughout the code-base.
- Introduce new `Util.pointBoundingBox` and `Util.rectBoundingBox` helpers.
- Remove the "fallback" from `Util.bezierBoundingBox` and only support passing in a `minMax`-array, since there's only a single call-site using the other format and it could be easily updated.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 6 to 5 by fixing the following issues
in the "create editor with keyboard" block:
- The second test relied on the first test to enable freetext editing
mode and put focus on the page (annotation layer). For isolation we
now do that explicitly in the second test.
- The second test relied on the first test for the editor numbering. For
isolation we change the editor numbering to the one after initial
document load.
Moreover, the test names have been updated to clarify with scenario is
being tested, which came up during comparison of the changes against
commit ea5eafa to make sure that we are still testing the originally
intended scenarios (confirmed by disabling the relevant code from the
commit per scenario and noticing the corresponding test failing).
The rendering bug with issue17779.pdf is due to the fact that we call save on the suspended ctx
but not on the the current ctx. So each time we've something like save/transform/restore then
the transform not "removed" when restoring.
So this patch just apply the save/restore operations to ctx which are mirrored on the suspended one.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 8 to 6 by fixing the following issues
in the "move editor with arrows" block:
- The second and third test relied on the first test to enable freetext
editing mode. For isolation we now do it explicitly in both tests.
- The second test relied on the first test having created an editor. For
isolation we now create the editor explicitly in the second test.
- The third test relied on the previous tests for the editor numbering.
For isolation we change the editor numbering to the one after initial
document load. Since we can't have state (editors) from a previous
test anymore we can remove the `clearAll` call as well.
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
To avoid being able to introduce dependencies between tests, and to
bring existing dependencies to the surface, this commit makes sure that
we close the document between tests so that we can't accidentally rely
on state set by a previous test. This prevents multiple tests from
failing if one of them fails and makes debugging easier by being able to
run each test on their own independent of other tests.
This commit, combined with the previous ones, is enough to make the
scripting integration test suite pass consistently if random mode in
Jasmine is enabled, proving that the tests are fully isolated now.
The integration tests are order-dependent because they rely on input
field state set by a previous test. This commit fixes the issue by
updating the values to match the initial state of the document, which
makes sure that we don't build upon values from previous tests while
still testing the intended logic in the individual tests like before.
By checking for the expected value directly we can shorten the code, and
it simplifies removing the dependencies between the tests in the next
commit (by having fewer places to change). Note that this follows the
same pattern as PRs #19192, #19001 and #18399 and also helps to remove
any further possibilities for intermittent failures.
It's already enabled by default in Firefox, and since there's no open issues regarding auto-linking I suppose that we can attempt to enable it unconditionally.
- Let the `writeString` helper function return the new offset, to avoid having to recompute that in multiple spots.
- In the `computeMD5` helper function we can create the `md5Buffer` via Array-destructuring, rather than using a manual loop.
Currently the MD5 computation doesn't actually work (at all?), since we're invoking the `calculateMD5` function without providing all of the necessary parameters and the PDF-data thus isn't taken into account.
Fixing this caused unit-tests to fail, which isn't that surprising since the current date/time is used in the MD5 computation, and we thus utilize Jasmine to work-around that.
- Point the `addSignatureDescription` respectively `editSignatureDescription` labels to their actual `input`-elements (this way clicking the label will actually focus the input).
- Add the event listener to the `addSignatureDescription`-input, rather than its `span`-element (this is consistent with the `editSignatureDescription` case).
- Correctly check if the `addSignatureDescription`-input is empty, since we're accidentally comparing with its `span`-element.
- Remove unbalanced, and likely accidentally added, `</span>` tags.
This is an admittedly very basic polyfill, to allow us to remove a bunch of inline feature testing, that I've thrown together based on reading https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static and related MDN articles.
Compared to PR 19218 it's obviously much more "primitive", however the implementation is simple and it doesn't suffer from any licensing issues (since I wrote the code myself).
Looking at PR 18492 it doesn't seem that the `$percent` variable, for the `pdfjs-editor-new-alt-text-ai-model-downloading-progress` l10n-string, was ever used.
This integration test fails intermittently because of concurrent
clipboard access due to running the test in parallel in both browsers.
It can be reproduced by introducing `await waitForTimeout(1000)` between
the copy and paste operations.
This commit fixes the issue by running the test sequentially instead,
mirroring the change from commit 0e94f2bd.
This commit improves validation of the API options for the ICC color
space logic. If `useWasm` is `true` but the corresponding `wasmUrl`
or `iccUrl` API options are not provided we can avoid requesting
files with `null` URLs which always results in a 404 response.
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
This is a major version bump, and the changelog at
https://github.com/metalsmith/layouts/releases/tag/v3.0.0
indicates a breaking change that impacts us, namely that we need to
explicitly define the pattern and transformer that we wish to use.
- Add a couple of `limit` parameters in cases where those were "missing", for `String.prototype.split()` calls, to avoid unnecessary parsing.
- Remove some "pointless" initial trimming of leading/trailing spaces, since that's already done at a later parsing step in many cases.
Given that the `draw` method is already asynchronous we can easily inline this old helper method, which shortens the code and improves consistency in the code-base (note the `BasePDFPageView`-implementation).
Having some interactive elements forces the screen readers to switch to form mode
and consequently they delegate the keyboard stuff to the browser.
This patch sets an aria label on each editor in order to have a better description than just
'application'.
Currently we lookup the `devicePixelRatio`, with fallback handling, in a number of spots in the code-base.
Rather than duplicating code we can instead add a new static method in the `OutputScale` class, since that one is now exposed in the API.
This patch extends the approach of PR 14543, by also treating e.g. minus signs followed by '(' or '<' as zero.
Inside of a /Contents stream those characters will generally mean the start of one or more glyphs.
With the exception of a different hashing function these classes are identical, and the base class thus help reduce code duplication.
This patch reduces the size of the `gulp mozcentral` build with 1344 bytes, which isn't a lot but still cannot hurt.
This patch fixes:
- the style of the primary/secondary buttons in the dialog which weren't fully compliant to the last specs.
- the style of the input field in HCM (wrong background)
- the color of the link in the image tab.
When the DOM structure of the viewer was updated in PR 18385 it caused the `secondaryToolbar` to accidentally start closing when clicking inside of it, since the `secondaryToolbar` now reside *under* the `toolbar` in the DOM.
**Steps to reproduce:**
- Open the viewer.
- Open the `secondaryToolbar`.
- Try to change document rotation at least *twice*.
**Expected behaviour:**
The document rotation can be changed an arbitrary number of times.
**Actual results:**
The `secondaryToolbar` closes after changing rotation just once.
For Type3 glyphs with `d1` operators it's easy to compute a fallback bounding box, however for `d0` the situation is more difficult.
Given that we nowadays compute the min/max of basic path-rendering operators on the worker-thread, we can utilize that by parsing these Type3 operatorLists to guess a more suitable fallback bounding box.
This addresses an inconsistency in the viewer, since the thumbnails don't respect the `maxCanvasPixels` option.
Note that, as far as I know, this has not lead to any bugs since the thumbnails render with a fixed (and small) width, however it really cannot hurt to address this (especially after the introduction of the `maxCanvasDim` option).
To support this a new `OutputScale`-method was added, to avoid having to duplicate code in multiple files.
One of the images have a corrupt SMask, where the /Height-entry is bogus; see the excerpt below (via https://brendandahl.github.io/pdf.js.utils/browser/).
```
SMask (stream) [id: 17, gen: 0]
ColorSpace = /DeviceGray
Height = /Length
Subtype = /Image
Filter = /FlateDecode
Type = /XObject
Width = 157
Matte (array)
BitsPerComponent = 8
Length = 3893
<view contents> download
```
Hence we enable SMask/Mask images to fallback to the parent image dimensions, and also add more validation of the width/height to get a better error message when that data is wrong.
These methods were introduced in *the first* commit of PR 4938, however they became unused in *the second* commit of that PR.
Hence it seems that we've been accidentally shipping, a small amount of, unused code for over a decade.
While initializing the `padded` Uint8Arrays we're manually assigning zeros to some entries, which is completely unnecessary since that's the default value for a TypedArray, and instead we can just increment the index.
When creating the `StyleMapping` for the "xfa-font-horizontal-scale" and "xfa-font-vertical-scale" properties there's currently pointless `Math.min` usage, since we're not actually comparing with anything.
When zooming, we should skip rendering the detail canvas until the
zoom is done, similarly to how normal page rendering is delayed.
To do so is enough to skip details view while zooming, since the
main view rendering that already happens after the delay will also
trigger rendering of the detail views.
This combines the `PDFFunctionFactory.create` and `PDFFunctionFactory.createFromArray` methods, which helps simplify and shorten the code.
Additionally, to simplify the parameter handling we pass the `PDFFunctionFactory`-instance directly to the various `PDFFunction`-methods.
This may not be possible to trigger in practice, however it seems that if `StampAnnotation.prototype.mustBeViewedWhenEditing` is called back-to-back with `isEditing === true` set then the second invocation could overwrite the `#savedHasOwnCanvas`-field and thus lose its initial state.
Currently we have a number of spots in the code-base where we need to clamp a value to a [min, max] range. This is either implemented using `Math.min`/`Math.max` or with a local helper function, which leads to some unnecessary duplication.
Hence this patch adds and re-uses a single helper function for this, which we'll hopefully be able to remove in the future once https://github.com/tc39/proposal-math-clamp/ becomes generally available.
With the changes in PR 19564 the actual `ColorSpace`-classes where separated from the various static "helper" methods.
Hence it seems that we can now simplify/shorten this old code to instead cache the "standard" ColorSpaces directly on the `ColorSpaceUtils`-class.
This patch reduces the number of `ColorSpaceUtils` static-methods, and in particular the `parseAsync` method is removed and it's now instead possible to have `parse` optionally return a Promise.
This thus removes the need to manually check if a `ColorSpace`-instance is cached, note the changes in the `src/core/evaluator.js` file.
Browsers not only limit the maximum total canvas area, but additionally also limit their maximum width/height which affects PDF documents with e.g. very tall and narrow pages.
To address this we add a new `maxCanvasDim` viewer-option, which in Firefox will use a browser preference, such that both the total canvas area and the width/height will affect when CSS-zooming is used.
Some ColorSpaces can reference other ColorSpaces, and since it's fairly common for many pages to use the same ColorSpaces (see e.g. `issue17061.pdf`) this can help reduce a lot of unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces.
The current logic assumes that all spans in the text layer contain
only one text node, and thus that the position information
returned by `highlighter._convertMatches` can be directly used
on the element's only child.
This is not true in case of highlighted search results: they will be
injected in the DOM as `<span>` elements, causing the `<span>`s
in the text layer to have more than one child.
This patch fixes the problem by properly converting the (span, offset)
pair in a (textNode, offset) pair that points to the right text node.
Update the test to wait for the `pagerendered`` event of a specific
page (1), so that the `pagerendered`` event of other pages from a
previously running render doesn't resolve the `waitForPageRendered`
promise.
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
Currently we re-implement the same helper function twice, which in hindsight seems like the wrong decision since that way it's quite easy for the implementations to accidentally diverge.
The reason for doing it this way was because the code in the worker-thread is able to check for `Ref`- and `Name`-instances directly, which obviously isn't possible in the viewer but can be solved by passing validation-functions to the helper.
This appears to have broken in PR 17431, since prior to that the `downloadFile` function returned a string (via a callback) on failure and now we're instead rejecting with an Error.
Auto-linking requires a normal textLayer which XFA documents obviously don't have, and currently XFA documents cause "pointless" error messages to be logged in the console.
Currently we're first loading the font, and then for Type3 fonts we're invoking `loadType3Data` every time that the font is encountered.
That seems completely unnecessary, and it's probably connected to the age of this code, since the `loadType3Data`-method will only run once anyway (note the caching).
The `viewerCssTheme` was removed in #17222 and subsequently reenabled in #17293,
but only for Chromium and generic builds. This commit reenables the
function using the new method introduced in #17293.
This restores the behaviour that existed prior to PR 19128, see e.g. 8727a04ae5/web/pdf_page_view.js (L908-L911), since `RenderingCancelledException` should still overwrite any previously seen Error.
Currently this `assert` isn't actually doing what it's supposed to, since the `FontLoader`-class doesn't have a `disableFontFace`-field.
The `FontFaceObject`-class on the other hand has such a field, hence we update the method-signature to be able to check the intended thing.
None of the "composite", "subtype", or "type" properties are normally used on the main-thread and/or in the API, hence there's no need to include them in the exported font-data by default.
Given that these properties may still be useful when debugging, and that `debugger.mjs` actually relies on the "type" property, they will instead only be sent to the main-thread when the `fontExtraProperties` API-option is used.
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
These options are needed in the `FontFaceObject` class, and indirectly in `FontLoader` as well, which means that we currently need to pass them around manually in the API.
Given that the options are (obviously) available on the worker-thread, it's very easy to just provide them when creating `Font`-instances and then send them as part of the exported font-data. This way we're able to simplify the code (primarily on the main-thread), and note that `Font`-instances even had a `disableFontFace`-field already (but it wasn't properly initialized).
Web archive no longer has the revision for the saved PDF referenced in
that test case, so this updates that link to a more recent revision to
enable the tests to run on a clean clone.
The idea is to avoid to have the pasted editor hidding the copied one:
the user could think that nothing happened.
So the top-left corner of the pasted one is moved to the bottom-right corner of the copied one.
This replaces the various copies of this logic with a single helper that
we template for each editor type, similar to what we already do for the
`switchToEditor` helper.
This replaces the various copies of this logic with a single helper that
we template for each editor type, similar to what we already do for the
`switchToEditor` helper.
When a drawing was moved with arrow keys and then printed or saved, the drawing wasn't moved finally.
So the fix is just about calling onTranslated once the translation is done.
With the recently added OpenJPEG no-wasm fallback we need to send the `wasmUrl` option to the worker-thread *regardless* of the value of the `useWorkerFetch` option, since the fallback won't work if we don't have a URL to `import` it from.
For consistency the code is re-factored to always send the factory-urls to the worker-thread, and simply check the `useWorkerFetch` option there instead.
Also, as a follow-up to PR 19525, introduce a new `useWasm` option that can be used in e.g. browser-tests to forcibly disable WebAssembly usage.
This patch fixes an issue when pasting: an exception was thrown when pasting.
And while writing the test and comparing the paths in the svg, I found a difference
which is fixed thanks to call to the right constructor (to take into account the inheritance)
in inkdraw.js
When scrolling quickly, the constant re-rendering of the detail view
significantly affects rendering performance, causing Firefox to
not render even the _background canvas_, which is just a static canvas
not being re-drawn by JavaScript.
This commit changes the viewer to only render the detail view while
scrolling if its rendering hasn't just been cancelled. This means that:
- when the user is scrolling slowly, we have enough time to render the
detail view before that we need to change its area, so the user always
sees the full screen as high resolution.
- when the user is scrolling quickly, as soon as we have to cancel a
rendering we just give up, and the user will see the lower resolution
canvas. When then the user stops scrolling, we render the detail view
for the new visible area.
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.
This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.
Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
This base class contains the generic logic for:
- Creating a canvas and showing when appropriate
- Rendering in the canvas
- Keeping track of the rendering state
Those type tests are performing type checking on a project using DOM APIs, intended to reflect the usage in a non-node project.
Not loading the node types in that project ensures that the library type declarations don't force a dependency on the node types.
Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block.
Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.
During the XRef stream parsing we're attempting to lookup an entry that hasn't yet been found, since parsing is currently running, and given that we'd also cache free/missing XRef entries we'd then return an incorrect value during normal PDF parsing.
The simplest solution here is to just not cache free/missing XRef entries, since a properly generated PDF document shouldn't be trying to access objects it doesn't contain.
Furthermore, the amount of "extra" parsing now needed for such XRef entries shouldn't be significant enough to be an issue.
It fixes#19505.
We were invaliding throwing actions (in setting event.rc to false) and all the event process was stopped.
Now we're just dumping the exception in the console: the action is skipped and event.rc is not set
else the input fields aren't updated wit KeyStroke actions.
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
If either of the factory-urls are missing or invalid, the fallback value would currently become `useWorkerFetch === null`.
While that is obviously a falsy value, which means that the code still works as intended, we should ensure that this is consistent.
- Use `TypedArray.prototype.set()` rather than a manual loop when building the `key`.
- Use an existing local variable to avoid re-computing the length of the `encryptionKey`.
It's now pretty common that we only want to close a `dialog` *if* it's currently active, to avoid throwing errors, and this new method provides a shorter and more convenient way to achieve that.
Rather than modifying the "raw" dimensions of the page, we'll instead apply the `userUnit` as an *additional* scale-factor via CSS.
*Please note:* It's not clear to me if this solution is fully correct either, or if there's other problems with it, but it at least *appears* to work.
---
With these changes, the following CSS variables are now assumed to be available/set as necessary: `--total-scale-factor`, `--scale-factor`, `--user-unit`, `--scale-round-x`, and `--scale-round-y`.
While investigating a bug, that I've not yet had time to fully investigate and report, I found that if there's ever an error thrown from the `Autolinker` class it'll prevent the annotationEditorLayer from rendering *and* the renderTask itself will be treated as having failed.
Given that most inferred links will overlap existing LinkAnnotations, creating a lot of unused `borderStyle` objects seem unnecessary.
Hence we can move that into the `AnnotationLayer.prototype.addLinkAnnotations` method instead, which also allows us to slightly reduce the API-surface.
*Note:* For the issue mentioned on Matrix it'll obviously still make sense to improve the regular expression to detect more URL edge-cases.
However it occurred to me that even once that particular case is fixed there'll always be a risk that inferred links could overlap, and effectively block, the actual LinkAnnotations.
Hence this patch removes the URL comparison to ensure that overlapping inferred links will always be ignored.
To avoid being able to introduce dependencies between tests, and to
bring existing dependencies to the surface, this commit makes sure that
we close the document between tests so that we can't accidentally rely
on state set by a previous test. This prevents multiple tests from
failing if one of them fails and makes debugging easier by being able to
run each test on their own independent of other tests.
This commit, combined with the previous one, is enough to make the ink
editor integration test suite pass consistently if random mode in
Jasmine is enabled, proving that the tests are fully isolated now.
The second test of the basic operations block for the ink editor
depends on the first test to work. This becomes visible if we only run
the second test, using `fit`, which always fails with:
`ProtocolError: Waiting for selector '.annotationEditorLayer' failed:
Runtime.callFunctionOn timed out. Increase the 'protocolTimeout' setting
in launch/connect calls for a higher timeout if needed.`
The problem is that the second test doesn't enable the ink editor and
relies on the first test having done that already (because we don't
close the document between tests yet). This commit fixes the issue by
unconditionally enabling the ink editor in the second test to remove the
dependency between the two tests so they both pass in isolation.
This patch updates the minimum supported browsers as follows:
- Google Chrome 110, which was released on 2023-02-07; see https://chromereleases.googleblog.com/2023/02/stable-channel-update-for-desktop.html
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Currently we have three separate and virtually identical message handlers for this data, which can easily be combined into a single message handler instead.
Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
With recent changed made to the GitHub issues-UI the "Blank issue" alternative is now showing up quite prominently, which can easily negate the point of our bug/feature templates and lead to incomplete issues being filed.
Given that we now have a few different factory-url parameters, we introduce a helper function for parsing them.
*Please note:* These parameters have always been documented as requiring a trailing slash[1], which we can now easily enforce during the `getDocument`-call.
---
[1] I recall that we've occasionally seen issues because users miss that detail, and the new Error should hopefully be more easily actionable than one thrown during rendering/parsing.
This pattern was already followed quite consistently outside of the
freetext editor integration tests, so this commit aligns the remaining
places for consistency. This also helps to make the tests more compact
and to reduce the number of changes in follow-up changes.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
freetext editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
This CSS feature is now available in *most* browsers that we support, with old Chromium-based browsers being the only exception; please see https://developer.mozilla.org/en-US/docs/Web/CSS/color_value/color-mix#browser_compatibility
From this data we see that the feature in question has been supported since Chrome 111, which was released on 2023-03-01 (i.e. almost two years ago).
Please note that we've never guaranteed that all features and functionality will be available in the oldest supported browsers.
Furthermore, even with the `color-mix` fallback removed PopupAnnotations will still function just as before but may render with the default color (defined in the CSS-file) rather than the one specified in the PDF document.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
highlight editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
At the time of PR 12896 the `fontBoundingBox{Ascent, Descent}` properties were not yet available by default in Fírefox, however that's no longer the case since Firefox 116; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1801198.
Hence this patch which replaces the "full" fallback with a warning and uses the `ascent`/`descent` values from the fonts in the PDF document (as we did previously). Obviously the TextLayer won't look as good in that case, but it's a simpler and shorter solution.
Currently we're manually computing the width/height of the /Rect-entry in a number of spots throughout the worker-thread Annotation code, which these new getters help avoid.
Currently we're initializing the image-options for every page, which seems unnecessary since it should suffice to do that once per document.
Also, changes the `BasePdfManager` constructor to improve readability/documentation a little bit.
This patch is adding some code in order to extract a drawing as curves from an image.
The algorithm is basically the following:
- reduce the dimensions
- make it gray
- apply a bilateral filter in order to add some blurryness while keeping the edges
- compute the histogram
- guess what's the background color which should contain a large majority of the pixels
- make a binary image
- extract the contours in using the Suzuki algorithm
- apply the Douglas-Peucker algorithm in order to reduce the number of points
The algorithm is improvable but it should work pretty well if there's a clear difference between
the background and the drawing.
In a v2 we could use a ML model in order to improve the extraction.
There's few changes related to the UI in order to make the tool usable, but they're very basic
for the moment.
This patch fixes a bug that caused incorrect curve shapes when an endpoint lies beyond the page boundaries. It adds a check for the endpoint's position, and if it is outside the page, the point is excluded from the shape's coordinates.
Steps to reproduce this in `master`:
1. Open https://mozilla.github.io/pdf.js/web/viewer.html
2. Use the "Open"-button (in the secondaryToolbar), or drag-and-drop, to load another PDF document.
3. Enable the highlight-editor.
4. Try to pick a new colour.
Note how it's no longer possible to change the default highlight-colour.
The reason for this is that we're only initializing the viewer-toolbar `ColorPicker` *once*, which doesn't work since every PDF document gets its own `AnnotationEditorUIManager`-instance. To address this we simply need to re-initialize the viewer-toolbar `ColorPicker`, and note that this patch won't affect the Firefox PDF Viewer.
With the changes in PR 18843 the `AnnotationEditorUIManager.prototype.updateMode` method is now asynchronous, which we need to take into account when dispatching the "annotationeditormodechanged" event.
After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).
Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.
Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
- We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
- We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
- The amount of available RAM on devices should be a lot higher, since more than a decade has passed.
A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).
---
[1] More specifically:
- `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
- `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
- `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
It fixes#19360.
Each glyph in the test case has a fill and a stroke pattern, so the current transform used
to scale the glyph outline must be the same.
In setting the stroke color to green, I noticed that the last outline contains some non-closed
subpaths, so when generating the glyph outline, every time we 'moveTo', we close the previous
subpath.
Code in the `web/` folder cannot import directly from the `src/` folder, since that could result in most (or all) main-thread code being bundled into the viewer, and must rather be imported via the `pdfjs-lib` alias.
Let's use ESLint to help enforce this, please find additional details in https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-restricted-paths.md
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
ink editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
stamp editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls and other helper
function calls that contained hardcoded IDs (by updating them to take
editor selectors as arguments instead of editor IDs), which helps to
isolate the tests and to simplify follow-up patches.
Puppeteer recently got updated to version 24.0.0+, so we're past version
23.9.1- where the integration test failed before and we can enable it
again now that it passes in Chrome.
This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication.
This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
The `selectorEditor` name is used 57 times, but only in the freetext
editor integration tests file. The `editorSelector` name on the other
hand is used 241 times in all editor integration test files, so this
commit renames the former name to the latter name to achieve consistency
in variable names across all editor integration test files, which also
simplifies upcoming changes.
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.
- A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.
- For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
This prepares for a future where we're using more than one wasm-file, originating in different `external/`-folders, by extending the existing `gulp.watch` usage.
The following diff illustrates how to add more entries:
```diff
diff --git a/gulpfile.mjs b/gulpfile.mjs
index 0e0a5a1ac..1502755be 100644
--- a/gulpfile.mjs
+++ b/gulpfile.mjs
@@ -655,6 +655,10 @@ function createWasmBundle() {
base: "external/openjpeg",
encoding: false,
}),
+ gulp.src(["external/foobar/*.wasm"], {
+ base: "external/foobar",
+ encoding: false,
+ }),
]);
}
@@ -2125,7 +2129,7 @@ gulp.task(
},
function watchWasm() {
gulp.watch(
- "external/openjpeg/*",
+ ["external/openjpeg/*", "external/foobar/*"],
{ ignoreInitial: false },
gulp.series("dev-wasm")
);
```
Rather than waiting for the upstream patch to reach the Firefox version we're using with Puppeteer, let's just set the same preference as done in https://phabricator.services.mozilla.com/D234320.
Currently we're not checking that the response is actually OK before getting the data, which means that rather than throwing an error we can get an empty `ArrayBuffer`.
To avoid duplicating code we can move an existing helper into `src/core/core_utils.js` and re-use it when fetching the JPX wasm-file as well.
Given that we nowadays provide default Node.js versions of these Factory-parameters it no longer seems necessary to mention that environment specifically.
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.
Besides an error message, the new `ResponseException` instances also include:
- A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.
- A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
When a dash separates two digits, it's very likely to not be a hyphen
inserted to split a word into two lines (e.g. "par\n-ser"), but rather
either a minus sign, a range, or a date. For example, in the tracemonkey
PDF there is `2008-02` (a date) split across two lines.
Preserving the dash, similarly to how we do for compound words, allows
searches for "2008-02" to find a match.
In Firefox, double-clicking on a stamp annotation triggers text
selection (selecting the last text element in the dom before the
annotation): this triggers the logic to make annotations not interfere
with text selection, which in turns prevents the double click from
triggering the annotation editor.
This commit fixes the problem by making annotations non-selectable, so
that clicking on them does not trigger text selection. Freetext
annotations were already non-selectable, so this commit doesn't change
that. However, we need to explicitly mark text in popups as selectable.
In the affected font the total number of mapping-entries is `1142348`, and no less than `997473` of them are duplicates.
Given that every duplicate causes a lot of Array elements to be moved this becomes extremely inefficient, which we can avoid by keeping track of seen `charCode`s and directly build the final mappings-Array instead.
Currently we re-implement a number of helper functions specifically for this code, which seems completely unnecessary since there's already general purpose ones available in the `src/core/core_utils.js` file.
It lets the user make a pinch gesture with a finger on page with a drawing
and the second finger on an other page.
On mobile, it's pretty easy to be in such a situation.
It fixes#19239.
When the canvas isn't existing the editor has no image: it's fine because the editor is invisible.
Once it's made visible, the canvas is set when the annotation layer has been rendered.
This appears to have regressed in PR 13808, since it removed the `matrix`-entry from array returned by the `MeshShading.prototype.getIR` method *without* also updating the indexes in the `MeshShadingPattern` constructor.
Reasons for removal:
- These tests never generated any warnings from OSS-Fuzz, in over a year.
- An error thrown during image decoding will lead to a broken/missing image, not a security problem.
- These tests rely on the Jazzer.js library, which has a number of problems: It now causes failures in Node.js v23 in the CI tests, it's no longer being maintained upstream, and it lacks support for some (fairly common) CPU architectures.
The ink editor integration tests already use a helper function for
committing the editor, so this commit mirrors the approach to the
freetext editor integration tests. This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication (220 lines of code removed);
- it reduces the number of `getEditorSelector` calls (32 calls removed)
that contained hardcoded IDs, which helps to isolate the tests and to
simplify follow-up patches.
This commit applies the `waitForUnselectedEditor` helper function to the
remaining places, that most likely predate the introduction of the
helper function, to deduplicate the code and to have a unified way of
checking if a given editor is unselected.
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
The `getSelectedEditors` function is largely a copy of the `getEditors`
function, with three small differences:
1. `getEditors` allows getting any kind of editor whereas
`getSelectedEditors` is harcoded to only getting selected editors.
2. `getEditors` returns editor selectors (strings) whereas
`getSelectedEditors` returns editor IDs (integers).
3. `getSelectedEditors` returns a sorted array of editor IDs whereas
`getEditors` does not ensure that the array is sorted.
This commit makes the `getEditors` function a drop-in replacement for
the `getSelectedEditors` function to deduplicate the code and to have a
unified way of getting editors.
Note that we don't actually use the contents of the returned array
(only its length), so we can safely change `getEditors` to return a
sorted array of integer editor IDs instead. Sorting the array makes the
return value deterministic, which is a nice property for test stability,
and integer IDs are also easier to handle in test assertions. Note that
the corresponding selector strings can also easily be obtained from the
integer IDs using the `getEditorSelector` function if needed.
Originally the code in this file was used in both the GENERIC and MOZCENTRAL builds, however that's no longer the case.
Hence we can now directly call `NetworkManager.prototype.request` and remove these old "helper" methods that only had a single call-site each.
From section [11.6.4.3 Mask Shape and Opacity](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G10.4848628) in the PDF specification:
- An image XObject may contain its own *soft-mask image* in the form of a subsidiary image XObject in the `SMask` entry of the image dictionary (see "Image Dictionaries"). This mask, if present, shall override any explicit or colour key mask specified by the image dictionary's `Mask` entry. Either form of mask in the image dictionary shall override the current soft mask in the graphics state.
As part of the changes in PR 4259, which landed over ten years ago, the `glyphNameMap` property on `Font`-instances was removed.
The reason that this didn't cause any bugs is that we always fallback on `getGlyphsUnicode`, and when using that data we also rely on `StandardEncoding`, hence we should just remove the unused parameter from the `Type2Compiled` constructor.
While [bug 1893645](https://bugzilla.mozilla.org/show_bug.cgi?id=1893645) was fixed some time ago now, it still shouldn't hurt to also assert that the `fontMatrix` is always valid when invoking the `compileGlyph` method.
Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need.
Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function.
Finally, the patch makes the following smaller changes:
- Avoid needlessly re-creating exceptions in the `wrapReason` function.
- Use a slightly shorter parameter name in the `wrapReason` function.
- Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.
This commit makes sure that the font tests are no longer reported as
being unit tests and that the PDF.js logo is shown in the browser tab to
make PDF.js-specific resources/tabs more easily identifyable during e.g.
development.
The reporter is used in both the unit and the font tests, so this commit
moves it to the test root folder to more clearly indicate that this is a
shared resource and so the font tests don't have to reach into the unit
tests folder to import it (which improves separation).
This file is specific to the integration tests, so this commit moves it
to bundle the integration test logic a bit better and to match the
unit/font tests in terms of folder structure for consistency.
Without these calls we'll not actually wait for saving to complete when document destruction runs; compare with other `WorkerTask`-usage in this file.
While I cannot imagine that this has caused any problems for library users, the code is however not technically correct as-is.
Note that PR 19212 tried to change the test-server to fix the intermittent failure in Google Chrome, however that unfortunately caused *other* unit-tests to start failing.
As long as this unit-test still runs successfully in Mozilla Firefox that should be enough, and it doesn't seem like a good use of our time to hunt down a bug that only happens in Google Chrome.
When computing the left offset of the highlighted text, we cannot use
.offsetLeft because the text might have been scaled through CSS, and it
needs to be taken into account.
Use `.getClientRects()`/`.getBoundingClientRect()` instead, which will
return measurements scaled appropriately.
This bug only seems to reproduce in Google Chrome, since browsers apparently sort response headers differently.
When the issue occurs the "raw" response headers string looks like this:
```
content-length: 525404\r\ncontent-type: \r\n
```
and since we trim *any* leading/trailing white-space characters the "content-type" header isn't detected correctly, which thus leads to `new Headers(...)` throwing.
Hence we'll keep regular spaces at the end of the "raw" response headers string, while still removing all other kinds of trailing white-space characters.
*Note:* The response headers parsing was based on https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/getAllResponseHeaders#examples
- it's now possible to start a drawing with a pen and use fingers to zoom
or scroll without interacting with the current drawing;
- it's now possible to draw with a finger and them zoom with two fingers.
When an editor is moved with the keyboard or in dragging it, it is moved in the DOM in order
to make screen readers happy. But this move is slightly postponed thanks to a setTimeout(..., 0).
The failures were very likely due to the fact that intermittently the DOM move was done in
the middle of the next key sequence which was making the move on screen failing.
Instead of conditionally checking if the `.cavnasWrapper` already
has a child element and then inserting the `canvas` before it, we can
use `.prepend` which always injects the new element as the first
child.
While drawing, in zooming fast enough, it's possible, intermittently, to have the canvas
after the svg which makes the svg invisible.
So this patch makes sure to have the canvas at the right position.
It was due the resize observer which is removed thanks to this patch.
In order to reuse the dragAndDrop function in test, this patch slighty refactors it
in order to make it easier to use.
The `Path2D` glyph-objects are cached on the `FontFaceObject`-instance, so we can save a little bit of memory by removing the raw path-strings once they're no longer needed.
The `must check input for US zip format` integration test fails pretty
consistently in Puppeteer 23.4.0+ with `Expected '12341' to equal
'12345'`. This is reproducible with the `pdf.sandbox.external.js` hack
from https://github.com/mozilla/pdf.js/issues/18396#issuecomment-2211273743.
Investigation uncovered two issues at play here:
1. We do two `clearInput` calls, but don't await processing of the two
sandbox events that are triggered by that action. The three tests that
use `issue14307.pdf` are in different `describe` blocks and therefore
reload the PDF file, so we can simply remove those calls because the
inputs are already empty by default.
2. We don't await processing of the sandbox events that occur after
switching to another text field. This causes the expectation failure
because the typing actions will happen too soon and interfere with
the sandbox event processing. We solve the issue by explicitly
awaiting the sandbox roundtrip.
Moreover, similar to PR #19001 and #18399 we remove any remaining room
for intermittent issues by directly checking for the expected value,
which also results in shorter code.
- Warn if the "regular" PDF.js build is used in Node.js environments, since that won't load any of the relevant polyfills.
- Warn if the `require` function cannot be accessed, since currently we're just "swallowing" any errors.
This requires two changes on our side:
- The order of exports in `web/viewer{-geckoview}.js` changes slightly
because `eslint-plugin-perfectionist` aligned the sorting order with
the `eslint-plugin-sort-exports` plugin we used before. This restores
the change from commit 347f155.
- The `eslint-plugin-import` plugin contains a bug that causes the new
version of `eslint-plugin-perfectionist` to be reported as unresolved.
This issue is tracked upstream, and since the plugin works fine we
can simply extend the ignore list we already have to avoid this error
until the upstream bug is fixed.
These DOM elements are `input type="checkbox"` and (natively) only support being toggled with the `Space` key, however we can extend an existing event-listener to "manually" support the `Enter` key as well.
Converting errors to string drops their stack trace, making it more
difficult to debug their actual reason. We can instead pass the error
objects as-is to console.warn/error, so that Firefox/Chrome devtools
will show both the stack trace of the console.warn/error call, and the
original stack trace of the error.
This commit also enables the `unicorn/no-console-spaces` ESLint rule,
which avoids accidental extra spaces when passing multiple parameters to
`console.*` methods.
This helps ensure that loading errors are always handled correctly, and note that both `PDFNetworkStreamFullRequestReader` and `PDFNetworkStreamRangeRequestReader` already provided such a callback.
Similar to the regular toolbarButtons that can be toggled, this ensure that it's always possible to tell when the findbar "buttons" are hovered/focused.
When a user deletes any number of annotations, they are notified of the action
by a popup message with an undo button. Besides that, this change reuses the
existing messageBar CSS class from the new alt-text dialog as much as possible.
Move .messageBar out of .dialog into its own standalone class in order
to reuse as much of it for the upcoming feature for an undo message for
annotations.
Some tests rely on the presence of a server that serves PDF files.
When tests are run from a web browser, the test files and PDF files are
served by the same server (WebServer), but in Node.js that server is not
around.
Currently, the tests that depend on it start a minimal Node.js server
that re-implements part of the functionality from WebServer.
To avoid code duplication when tests depend on more complex behaviors,
this patch replaces createTemporaryNodeServer with the existing
WebServer, wrapped in a new test utility that has the same interface in
Node.js and non-Node.js environments (=TestPdfsServer).
This patch has been tested by running the refactored tests in the
following three configurations:
1. From the browser:
- http://localhost:8888/test/unit/unit_test.html?spec=api
- http://localhost:8888/test/unit/unit_test.html?spec=fetch_stream
2. Run specific tests directly with jasmine without legacy bundling:
`JASMINE_CONFIG_PATH=test/unit/clitests.json ./node_modules/.bin/jasmine --filter='^api|^fetch_stream'`
3. `gulp unittestcli`
This allows us to remove an ESLint disable-statement for `arrow-body-style`, without affecting readability of the code, and fetching the metadata and the page in parallel should be a *tiny* bit more efficient as well.
- Use `this` in all scopes where that's possible, to avoid having to spell out `WorkerMessageHandler` everywhere.
- Inline the `isMessagePort` helper function, since there's only a single call-site.
- Use a static initialization block to move more code into the `WorkerMessageHandler` class itself.
This slightly shortens the code, in various `destroy`-methods, which cannot hurt.
Also, use pre-processor checks to simplify `PDFDocumentLoadingTask.destroy` in the Firefox PDF Viewer since the `PDFWorker.fromPort`-method isn't used there.
The date was create in UTC+0 and then amended in using set-Month/Date which take into account
the user timezone.
With this patch we build all the date in the user timezone.
It helps to slightly decrease memory use in reducing the number of created arrays.
In searching for "a" in pdf.pdf, the time spent in getOriginalIndex is decreased by
around 30%.
This patch makes a clear separation between the way to draw and the editing stuff.
It adds a class DrawEditor which should be extended in order to create new drawing tools.
As an example, the ink tool has been rewritten in order to use it.
Given that `browsertest` repeatedly timeout in Google Chrome, and considering that Firefox is the primary development target, we stop running them on the bots to avoid having to repeatedly deal with this.
Note that we already disabled these tests *on Windows* almost three years ago, because of stability issues; see PR 14392.
test/unit/api_spec.js is the only JS file in the tree with trailing
whitespace. Because `trim_trailing_whitespace = true` in .editorconfig,
any editor supporting EditorConfig would trim whitespace when the file
is changed, which results in test failures.
This commit fixes the issue by trimming the trailing whitespace and
adjusting the test expectations.
The `Array` type takes one parameter that describes the possible types
of the inner elements. This parameter may exist of pipes to indicate
multiple possible types. However, the current type definition provides
multiple parameters; this is incorrect syntax, and TypeScript 5.7+ now
fails on this due to stricter syntax validation.
This commit fixes the issue by changing the type definition to the
proper syntax, which together with the accompanying comment about the
contents of the fingerprints array should document it sufficiently.
The following cases are excluded in the patch:
- The Firefox PDF Viewer, since it has been fixed on the platform side already; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1683940
- The `PDFNodeStream`-implementation, used in Node.js environments, since after recent changes that code only supports `file://`-URLs.
Also updates the `PDFNetworkStreamFullRequestReader.read`-method to await the headers before returning any data, similar to the implementation in `src/display/fetch_stream.js`.
*Note:* The relevant unit-tests are updated to await the `headersReady` Promise before dispatching range requests, since that's consistent with the actual usage in the `src/`-folder.
The test-only createTemporaryNodeServer helper featured a path traversal
vulnerability. This enables attackers with network access to the device
to read arbitrary files while unit tests are running that activate this
test server.
This patch fixes the issue by validation of paths.
To test this vulnerability before the patch:
1. Run the test-only server:
```
node -e 'console.log(require("./test/unit/test_utils.js").createTemporaryNodeServer().port)
```
2. From another terminal, send the following request (modify the port to
the port reported in the previous step):
```
curl --path-as-is http://localhost:45755/../../package.json
```
Before the patch, the second step would traverse the directory, and
return results from the root of the PDF.js repository, instead of files
within test/pdfs/.
With the patch, the server refuses the request with HTTP status 400.
This is fairly old code, and by making the function `async` we can handle initialization errors "automatically" without the need for try-catch statements.
and tweak a bit the highlight one (e.g. it's useless to have 64 bits floating point numbers
when 32 bits ones are enough).
It's a required step for the refactoring of the ink tool (in order to use the draw layer).
It avoids to call several functions acting on the same SVG element.
We can remove most feature testing from this helper function, with the exception of `randomUUID` since that's only available in "secure contexts", and also remove the fallback code-path.
Note that this code was only added for Node.js compatibility, and it's no longer necessary now that the minimum support version is `20`; see also https://developer.mozilla.org/en-US/docs/Web/API/Crypto#browser_compatibility
Finally, this patch also adds a basic unit-test for the helper function.
JSON imports are now supported by all tools used in PDF.js' build
process. The `chromecom.js` file is bundled by webpack and
import attributes are thus removed, so browser compatibility for this
new syntax is not relevant.
This integration test fails intermittently, locally at least in Chrome
with Puppeteer 23.4.0+, with the following errors:
```
In chrome: Expected '123Hello' to equal 'Hello123'.
In chrome: Expected '123Hello' to equal '123'.
```
This happens because the test before it left queued sandbox events
behind. We don't close the document between tests, so those get run
when we click the textbox in this test and that interferes with our
selection/typing actions. This commit fixes the issue by flushing the
queued sandbox events in the first test, which makes sure that state
no longer leaks through to the next test and thus improves isolation.
Morever, similar to commit 3adf8b6 we use safer assertions to avoid
further intermittent failures, and we replace the `page.$eval` call
with a simpler Home button push like we already do in e.g. the test
helpers. This combined makes the code shorter and simpler.
In PR #11300 Gitpod support got introduced, but we re-evaluated that
decision in #11732. In PR #11800 the support was partially reverted,
but the actual Gitpod files were kept to not outright break potential
workflows because at the time we were not sure if, and if so how often,
Gitpod was actually used for contributing to PDF.js.
However, in addition to the concerns mentioned in #11732 after five
years we haven't seen any contributions that clearly originated from
Gitpod, and the files have not been updated after e.g. PR #11807 and
PR #17913 because it's not a workflow that we maintain or are able
to test (nor have we seen Gitpod community contributions for this).
This commit therefore removes the remaining Gitpod files to reduce
maintainance burden for PDF.js. Note that users of Gitpod can still
contribute to PDF.js via the platform; we just don't provide/manage
workspace files from this repository anymore.
- Use optional chaining when clearing various data-structures, since that's slightly shorter.
- Ensure that checking for the existence of `#_hcmCache`-data won't cause it to be initialized. (Note how `this.#hcmCache` is actually a getter.)
- Actually reset the `#_hcmCache`-data when that's appropriate, e.g. when closing the PDF document.
- Ensure that `pdfjsTestingUtils` is available when running benchmarking, since that shouldn't be done in TESTING-mode.
- Exclude the `test/stats/results/` folder from linting, since it'll contain *generated* JSON-files.
When iterating through `useCMap` the value is already available, without having to manually invoke the `lookup`-method.
While this will likely not affect performance in any noticeable way, it's nonetheless unnecessary to lookup an already available value twice.
This was previously attempted in PR 13371, but had to be reverted because of issues related to SystemJS (which has since been removed).
Also, while unrelated, shortens an existing conditional assignment.
The purpose of these changes is to make it more difficult to accidentally include logging statements, used during development and debugging, when submitting patches for review.
For (almost) all code residing in the `src/` folder we should use our existing helper functions to ensure that all logging can be controlled via the `verbosity` API-option.
For the `test/unit/` respectively `test/integration/` folders we shouldn't need any "normal" logging, but it should be OK to print the *occasional* warning/error message.
Please find additional details about the ESLint rule at https://eslint.org/docs/latest/rules/no-console
Given that there are multiple issues with `ImageDecoder` in Chromium browsers, affecting both BMP and JPEG images, for now we (by default) disable that functionality there to avoid problems.
This also means that we can remove the previously added, and separate, `isChrome` API-option.
Python 3.13 is the current version and was released over a month ago
(see https://devguide.python.org/versions). The dependencies we use now
support Python 3.13, most importantly `fonttools` which uses OS-specific
builds and for which compatibility got introduced in
https://github.com/fonttools/fonttools/pull/3656 and the corresponding
`cp313` wheels for all distributions are published on
https://pypi.org/project/fonttools/#files.
Moreover, we fix forgotten `npx` usage in the font tests README which
was encountered while testing this patch.
This is unblocked now that all dependencies have been updated and the
flat configuration format (compatible with ESLint 8 and 9) was
introduced first. The following deprecation warnings during `npm
install` are resolved by this upgrade:
```
npm warn deprecated @humanwhocodes/config-array@0.13.0: Use @eslint/config-array instead
npm warn deprecated @humanwhocodes/object-schema@2.0.3: Use @eslint/object-schema instead
npm warn deprecated eslint@8.57.1: This version is no longer supported. Please see https://eslint.org/version-support for other options.
```
Note that according to https://eslint.org/version-support ESLint 8 is
officially EOL now, and ESLint 9 has been released for over seven
months and is the only officially supported version.
Fixes#17928.
This allows end-users to forcibly disable `ImageDecoder` usage, even if the browser appears to support it (similar to the pre-existing option for `OffscreenCanvas`).
Flat config is the new config system used by ESLint 9.
To make the migration easier, they also added
flat config support to ESLint 8.
This commit migrates the various ESLint configs in the repository to use
the new system, **without** upgrading to ESLint 9 yet.
It fixes#19008.
In Firefox on mac, the default padding is set to 4px and with Firefox for iOS, it's set to 13px.
The padding is useless for such buttons.
Given that `ImageData` has been supported for many years in all browsers, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/ImageData#browser_compatibility), we have a `typeof` check that's only necessary in Node.js environments.
Since the `@napi-rs/canvas` package provides that functionality, we can thus add an `ImageData` polyfill which allows us to ever so slightly simplify the code.
The `@napi-rs/canvas` package has fewer dependencies, which should *hopefully* make installing and using it easier for `pdfjs-dist` end-users. (Over the years we've seen, repeatedly, that `canvas` can be difficult to install successfully.)
Furthermore, this package includes more functionality (such as `Path2D`) which reduces the overall number of dependencies in the PDF.js project.
One point to note is that `@napi-rs/canvas` is a fair bit newer than `canvas`, and has a lot fewer users, however looking at the commit history it does seem to be actively maintained.
Note that I've successfully tested the [Node.js examples](https://github.com/mozilla/pdf.js/tree/master/examples/node), in particular the `pdf2png` one, with this patch applied and things appear to work fine.
Please see:
- https://www.npmjs.com/package/@napi-rs/canvas
- https://github.com/Brooooooklyn/canvas
Fixes https://github.com/mozilla/pdf.js/issues/18957https://github.com/mozilla/pdf.js/pull/18682 introduced a regression that causes the following error:
```
Uncaught TypeError: Failed to construct 'Headers': Invalid name
at PDFNetworkStreamFullRequestReader._onHeadersReceived (pdf.mjs:10214:29)
at NetworkManager.onStateChange (pdf.mjs:10103:22)
```
The mentioned PR replaced a call to `getResponseHeader()` with `getAllResponseHeaders()` without handling cases where it may return null or an empty string. Quote from the [docs](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/getAllResponseHeaders#return_value):
> Returns:
>
>A string representing all of the response's headers (except those whose field name is Set-Cookie) separated by CRLF, or null if no response has been received. If a network error happened, an empty string is returned.
Run the following code and observe the error in the console. Note that the URL is intentionally set to an invalid value to simulate network error
```js
<script src="//mozilla.github.io/pdf.js/build/pdf.mjs" type="module"></script>
<script type="module">
var url = 'blob:';
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.mjs';
var loadingTask = pdfjsLib.getDocument(url);
loadingTask.promise
.then((pdf) => console.log('PDF loaded'))
.catch((reason) => console.error(reason));
</script>
```
The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.
In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here.
*Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
This patch updates the minimum supported environments as follows:
- Node.js 20, which was released on 2023-04-18 and has now entered the "Maintenance"-phase; see https://github.com/nodejs/release#release-schedule
Furthermore, note also that Node.js 18 will fairly soon reach EOL.
This integration test fails intermittently because we're not
(correctly) awaiting the sandbox actions. The `27R` field in
`issue14862.pdf` triggers sandbox events for every typing action, but
for the backspace and "a" character typing actions we weren't awaiting
the sandbox trip at all, and for other places we weren't awaiting it
fully (causing some characters to be missed in the assertion).
This commit fixes the issues by using the appropriate helper functions,
similar to what we did in PR #18399. Not only is this shorter in terms
of code, but it also fixed the near-permafail for this test with newer
versions of Puppeteer.
- This helper function has only a single call-site, and the function is fairly short.
- It'll only be invoked if range requests are *disabled*, or if the entire PDF manages to load *before* the headers are resolved (which is very unlikely).
Hence, by default, this helper function is not invoked.
- By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code.
Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).
I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would *intermittently* cause (some of) the AnnotationLayers to break with errors printed in the console (see below).
In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used.
Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations.
```
Uncaught (in promise)
Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
\#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15
Uncaught (in promise)
Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
```
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
This way we can directly throw Errors, rather than having to "manually" return rejected Promises, which is ever so slightly shorter.
Also, since `useWorkerFetch` is always true in MOZCENTRAL builds these message-handlers should not be invoked there.
One goal is to make the code for drawing with the Ink tool similar to the one to free highlighting:
it doesn't really make sense to have so different ways to do almost the same thing.
When the zoom level is high, it'll avoid to create a too big canvas covering all the page which consume
more memory, makes the drawing very slow and the overall user xp pretty bad.
A second goal is to be able to easily implement more drawing tools where we would just have to implement
how to draw from the pointer coordinates.
We can convert the handler to an `async` function, which removes the need to create a temporary Promise here.
Given the age of this code it shouldn't hurt to simplify it a little bit.
This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64
*Please note:* These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.
- Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already *implicitly* assuming a particular length.
- Move the constants into the `fingerprints`-getter, since they're not used anywhere else.
- Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64
This extends PR 13796 to also handle the case where sub-streams contain invalid data, i.e. anything that isn't a Stream, however please note that in these cases there's no guarantee that we'll render the page "correctly".
Note that Adobe Reader, i.e. the PDF reference implementation, cannot render the last page of the referenced PDF document.
Currently we manually localize and update the DOM-elements of the AltText-button, and it seems nicer to utilize Fluent "properly" for that task.
This can be achieved by introducing an explicit `span`-element on the AltText-button (similar to e.g. the regular toolbar-buttons), and adding a few more l10n-strings, since that allows just setting the `data-l10n-id`-attribute on all the relevant DOM-elements.
Finally, note how we no longer need to localize any strings eagerly when initializing the various editors.
Move the `ImageResizer._goodSquareLength` definition into the class itself, since the current position shouldn't be necessary, and also convert it into an actually private field.
It fixes#18956.
In the patch #18029, for performance reasons and because I thought it was useless, I deliberately chose to not fill the mask
with the backdrop color when it's full black: it was a bad idea.
So in this patch we always add the backdrop color to the mask.
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Given that we've not shipped, nor used, anything except binary CMaps for years let's just cache them unconditionally (since that's a tiny bit less code).
The PDF document is clearly corrupt, since it has /FontFile2 entries that are Dictionaries which obviously isn't correct.
While there's obviously no guarantee that things will look perfect this way, actually rendering the text at all should be an improvement in general.
Unfortunately it turns out to be somewhat common for users to provide a bunch of "unrelated" information in this field, or even stating their entire problem there, rather than placing it under the appropriate headings further down in the template.
This moves more functionality into the base-class, rather than having to duplicate that in the extending classes.
For consistency, also updates the `BaseStandardFontDataFactory` and introduces more `async`/`await` in various relevant code.
With the recent re-factoring of the viewer CSS rules we now have some duplication of the `mask-image` definitions for the print/download buttons in the secondaryToolbar; note 17419de157/web/viewer.css (L1204-L1210)
The `getSpanRectFromText` helper function returns the location as float
values. This could be desirable in cases where the exact values matter
(for example during comparisons), but in the text layer tests we don't
need this precision. Moreover, the Puppeteer `page.mouse.move` API
apparently doesn't work correctly if float values are given as input.
Note that this test only failed because it couldn't move to the initial
selection position; any subsequent moves actually worked because the
`moveInSteps` helper function already rounded all values correctly.
This commit fixes the issue by consistently rounding all values that we
pass to Puppeteer's `page.mouse.move` API.
By using "data-l10n-attrs" it's possible to instruct Fluent to localize *custom* attributes, which means that we don't need to manually translate/update the "default-content" in FreeText editors.
The plugins have originally been introduced in commit d63da81 for the
`eslint-plugin-mozilla` dependency, but the `eslint-plugin-mozilla`
plugin got removed in commit be93d53 and we also don't use the plugins
ourselves in e.g. our `.eslintrc` files (as evidenced by `npx gulp lint`
not failing while it does fail if we remove any of the other plugins).
This way we allow the rest of the packages to be loaded successfully, such that e.g. the Node.js unit-tests work correctly.
Note that this occurred after updating the `node-canvas` package to version `3.0.0-rc2`, however it's not immediately clear to me if it's a problem there or in the `path2d` package; see also nilzona/path2d-polyfill/issues/84.
- Change the "message" event handler function to a private method.
- Remove the "message" event listener with an `AbortSignal`.
- Extend the `LoopbackPort`-class with `AbortSignal` support.
The code parses the /RBGroups entry in the OC configuration dict and adds the property `rbGroups' to instances of the OptionalContentGroup class. rbGroups takes an array of Sets, where each Set instance represents an RB group the OptionalContentGroup instance is a member of. Such a Set instance contains all OCG ids within the corresponding RB group. RB groups an OCG is associated with are processed when its visibility is set to true, as required by the PDF spec.
Given that the sub-title of that document is "Public domain texts for young people." and that the images have clear sources at the end of the document, it should (hopefully) be OK to add it to the repository rather than relying on a linked test-case.
Since this value is used to allocate an array, it makes sense to avoid to use too much memory.
From the specs, this value must be in the range [0; 255] (see section 8.6.6.3).
This patch removes the unused property 'highVal'.
When playing with a pen, I noticed that sometimes a free highlight has still its
gray outline when an other one is drawn: for any reason the pointerleave event isn't
triggered.
This is similar to a lot of other code, where we've been replacing explicit `removeEventListener`-calls.
Given the somewhat limited availability of `AbortSignal.any()`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static#browser_compatibility), this event listener will no longer be immediately removed in older browsers (however that should be fine).
After PR 18845 we're accessing this getter more, hence it seems like a good idea to ensure that the initial `formInfo` access is covered as well.
While unlikely to be a problem in practice, at least theoretically that data may not be available and the code in `fieldObjects` could thus currently be *unintentionally* invoked more than once.
The default `page.type()` API from Puppeteer works for text fields that
only dispatch a sandbox event on e.g. focus loss (i.e. after all
characters have been inserted), and for those we can also use the
default typing delay from Puppeteer instead of defining our own value.
However, it doesn't work correctly for text fields where every character
insertion dispatches a sandbox event. This is because processing the
sandbox event takes some time and Puppeteer must wait for that before it
can (safely) insert the next character. This commit therefore introduces
a helper function to type a given value correctly in such text fields.
Not only does this fix intermittent failures if our delay was too low
for sandbox processing to complete, but it also speeds up the tests by
eliminating our delays in places where they were (much) higher than
necessary. In total the runtime of the scripting integration test suite
goes from 137 seconds before this patch to 100 seconds after this patch.
Note that the new version of `eslint-plugin-import` contains support
for ESLint 9.
Moreover, the new version of Babel removed the leading space for import
comments (see https://github.com/babel/babel/pull/16780), so we update
the corresponding test expectation to match this.
Fixes a part of #17928.
A recent change in Firefox induced too much difference between the text widths computed in using a Canvas
and the ones computed by the text layout engine when rendering the text layer. Consequently, the
text selection can be bad on Windows with some fonts like Arial or Consolas.
This patch is a workaround to try to use in first place some fonts which don't have the problem.
Strangely enough the code works just fine as-is in the GENERIC viewer, so there must be some difference between the Firefox built-in Fluent implementation and the Fluent.js one.
It seems that we can work-around the problem by handling this l10n-arg the same way that we handle dates in the AnnotationLayer, and testing this with the Firefox Devtools it seems that it should work.
It fixes#18849.
When such an annotation is deleted, we make sure that there are some data
to restore.
The version of this patch was making undoing a svg deletion buggy, so it's fixed now and
an integration test has been added for this case.
With the changes made in PR 18385 the `toolbarViewer` element is now shorter than before, since the padding is applied on its `toolbarContainer` parent-element instead.
This causes two separate regressions:
- Clicking at the very top/bottom of the toolbar no longer closes the secondaryToolbar like previously.
- The `CaretBrowsingMode`-constructor no longer computes the toolbar-height correctly.
Given how/where the `container`-property is being used these changes *should* thus be safe.
The current implementation won't work correctly in some cases, e.g. if RBGroups are present, which means that it's possible for the UI to get out-of-sync with the actual optionalContent-state.
To avoid querying the DOM unnecessarily we cache the last known UI-state and compare with the actual optionalContent-state, which thus replaces the previously used hash-comparison.
This patch updates the minimum supported browsers as follows:
- Google Chrome 103[1], which was released on 2022-06-21; see https://chromereleases.googleblog.com/2022/06/stable-channel-update-for-desktop_21.html
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
---
[1] This is consistent with the minimum supported version in the recently updated Google Chrome addon.
We disable any non-default CursorTool in PresentationMode and during Editing, since they don't make sense there and to prevent problems such as e.g. [bug 1792422](https://bugzilla.mozilla.org/show_bug.cgi?id=1792422).
Hence it seems like a good idea to *also* disable the relevant SecondaryToolbar-buttons, to avoid the user being surprised that the CursorTools-buttons do nothing if clicked.
Given the following excerpt from the [Wikipedia article](https://en.wikipedia.org/wiki/Gill_Sans), mapping this to Helvetica should hopefully be fine:
> It has been described as "the British Helvetica" because of its lasting popularity in British design.
Given that the `WorkerTransport`-constructor will access all possible factory-instances, let's ensure that the `transportFactory`-object always has a consistent shape regardless of other options.
Also, since `useWorkerFetch` is always true in MOZCENTRAL builds we never need to create `CMapReaderFactory`/`StandardFontDataFactory`-instances there.
The first goal of this patch was to remove the tabindex because it helps
to improve overall a11y. That led to move some html elements associated
with the buttons which helped to position these elements relatively to their
buttons.
Consequently it was easy to change the toolbar height (configurable in Firefox
with the pref browser.uidensity): it's the second goal of this patch.
For a11y reasons we want to be able to change the height of the toolbar to make
the buttons larger.
This is unblocked because in commit bb302dd the default value for the
constructor got removed, which apparently confused TypeScript before.
Fixes#18770.
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.
As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
Currently, the css for a separator is something like { height: 1px; background-color: ... }.
But its rendering depends on its position on the screen.
So instead of setting the height to 1px, we just set something like { border-top: 1px solid ...; },
this way the final rendering is exactly the same for all the separators.
This code is quite old and has been moved/re-factored a few times over the years, however we can simplify this even further since we don't actually need a function to determine what NetworkStream-implementation to use.
After the removal of 'unsafe-eval' CSP in #18651, WebAssembly fails to
load, resulting in issues such as seen in #18457.
Manifest Version 3 does not allow 'unsafe-eval', does accept the more
specific 'wasm-unsafe-eval' as of Chrome 103. Note that manifest.json
already sets minimum_chrome_version to 103.
This patch also adds `object-src 'self'` because it was required until
Chrome 110. As of Chrome 111, the default is `object-src 'self'` and
`object-src` is no longer required. We could drop `object-src` in the
future, but for now we need to include it to support Chrome 103 - 110.
Note that the textContent is returned in "chunks" from the API, through the use of `ReadableStream`s, and on the main-thread we're (normally) using just one temporary canvas in order to measure the size of the textLayer `span`s; see the [`#layout`](5b4c2fe1a8/src/display/text_layer.js (L396-L428)) method.
*Order of events, for parallel textLayer rendering:*
1. Call [`render`](5b4c2fe1a8/src/display/text_layer.js (L155-L177)) of the textLayer for page A.
2. Immediately call `render` of the textLayer for page B.
3. The first text-chunk for pageA arrives, and it's parsed/layout which means updating the cached [fontSize/fontFamily](5b4c2fe1a8/src/display/text_layer.js (L409-L413)) for the textLayer of page A.
4. The first text-chunk for pageB arrives, which means updating the cached fontSize/fontFamily *for the textLayer of page B* since this data is unique to each `TextLayer`-instance.
5. The second text-chunk for pageA arrives, and we don't update the canvas-font since the cached fontSize/fontFamily still apply from step 3 above.
Where this potentially breaks down is between the last steps, since we're using just one temporary canvas for all measurements but have *individual* fontSize/fontFamily caches for each textLayer.
Hence it's possible that the canvas-font has actually changed, despite the cached values suggesting otherwise, and to address this we instead cache the fontSize/fontFamily globally through a new (static) helper method.
*Note:* Includes a basic unit-test, using dummy text-content, which fails on `master` and passes with this patch.
Finally, pun intended, ensure that temporary textLayer-data is cleared *before* the `render`-promise resolves to avoid any intermittent problems in the unit-tests.
Fix regression from #18711. `urlFilter` is a key of `condition`, not of
the `rule`. Consequently, the feature detection method failed to detect
the availability of the feature in Chrome 128+.
The minimum required version is Chrome 103 because wildcard support for
web_accessible_resources[].extension_id was introduced in 103, in
c9caeb1a08
A way to broaden compatibility is to drop that key. By doing so, the
minimum required Chrome version is then 96, because of the
chrome.storage.session API (and declarativeNetRequestWithHostAccess).
Since gulpfile.js already defines "Chrome >= 98" and there is no real
point in expanding support to older versions, I'm just setting the
minimum version to 103.
- Replace DOM-based pdfHandler.html (background page) with background.js
(extension service worker).
- Adjust logic of background scripts to account for the fact that the
scripts can execute repeatedly during a browser session. Primarily,
register relevant extension event handlers at the top level and use
in-memory storage.session API to keep track of initialization state.
- Extension URL router: replace blocking webRequest with the service
worker-specific "fetch" event.
- PDF detection: replace blocking webRequest with declarativeNetRequest.
This requires Chrome 128+. The next commit will add a fallback for
earlier Chrome versions.
In MV3, the background script is a service worker. localStorage is not
supported in service workers. Switch to storage.local instead.
Data migration is not implemented because it is not needed due to the
privacy-friendly design of the telemetry: In practice the data is reset
about every 4 weeks, when the major version of Chrome is updated.
restoretab.js was added in https://github.com/mozilla/pdf.js/pull/6233
with the purpose of restoring lost tabs when Chrome closes all extension
tabs when it reloads the extension. This forced reload can happen when
the user toggles the "Allow access to file URLs" option.
This logic does not work any more, and since the use of localStorage is
a blocker in migrating to MV3, this patch just drops the logic.
MV3 does not support chrome_style in options_ui. Remove it and replace
it with the minimal amount of styles that still has some spacing around
the individual settings for readability.
webRequestBlocking is unavailable in MV3. Non-blocking webRequest can
still be used to detect the Referer, but we have to use
declarativeNetRequest to change the Referer header as needed.
*Please note:* As a general rule we probably don't need to fix things affecting *custom* implementations of the default viewer, but in this case a "targeted" fix seem possible that shouldn't (famous last words) break anything else.
Ensure that the `.visibleMediumView` class, which is used when the viewer becomes narrow, cannot override the `display`-value for DOM elements that are being *explicitly* hidden either through the associated attribute or class.
The idea is to insert a span in the text layer with an aria-role set to img
and use the bounding box provided by the attribute field in the tag dict in
order to have non-null dimensions for the image to make it "visible".
In hindsight it occurred to me that there's a couple of smaller issues with this method after it's made asynchronous (in PR 18658).
- If the `render`-method is invoked back-to-back the existing caching doesn't guarantee that re-parsing won't occur, which we can address by introducing a new (private) promise.
- If there's any errors fetching and/or parsing the structTree-data, we'd attempt to parse it again on re-rendering despite that being pointless.
It has been tested with Voice Over (mac) and with NVDA (windows).
When an added stamp annotation is focused, the screen reader will announce
that it's a figure containing a graphic with the added alt-text.
The `Headers` functionality is now available in all browsers/environments that we support, which allows us to consolidate and simplify how the `httpHeaders` API-option is handled; see https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility
Also, simplifies the old `NetworkManager`-constructor a little bit.
It was recently brought to my attention that using partial or generated localization ids is bad for maintainability, hence this patch goes through the code-base and replaces any such occurrences.
Currently we repeat virtually the same http/https request code in two different classes in this file, which seems unnecessary and it leads to more code.
With the introduction of Fluent the `getLanguage`-method became synchronous, hence it no longer seems necessary to do the metric-locale check eagerly in the constructor and it can instead be "delayed" until actually needed.
It was recently brought to my attention that using partial or generated localization ids is bad for maintainability, which means that PR 18636 wasn't the correct thing to do.
However, just reverting that one doesn't really fix the problems which is why this patch updates *every* l10n-id in the `PDFDocumentProperties` class (but doesn't touch any `viewer.ftl`-files). Obviously this leads to more verbose code, but that cannot really be helped.
We can remove the `reset`-parameter, since it's redundant, given that it's only used after `PDFDocumentProperties.#reset` has been invoked which means that `this.#fieldData === null` which is equivalent to resetting.
Also, we don't need to have two separate loops in order to update the UI in this method.
Finally, inline the `DEFAULT_FIELD_CONTENT` constant now that it's only used once.
The Node.js url.parse API (https://nodejs.org/api/url.html#urlparseurlstring-parsequerystring-slashesdenotehost)
is deprecated because it's prone to security issues (to the point that Node.js doesn't even publish CVEs for it anymore).
The official reccomendation is to instead use the global URL constructor, available both in Node.js and in browsers.
Node.js filesystem APIs accept URL objects as parameter, so this also avoids a few URL->filepath conversions.
This l10n-string is being re-defined once for every editor, i.e. currently four times, which seems unnecessary.
To avoid having to check if this l10n-string exists first, we can utilize rest parameters to move it into the `AnnotationEditor._l10nPromise` Map-definition instead.
Currently we manually localize and update the DOM-elements of the editor-resizers, and it seems nicer to utilize Fluent for that task.
This can be achieved by updating the l10n-strings to directly target the `aria-label` and then just setting the `data-l10n-id` on the DOM-elements.
In preparation for migrating the Chrome extension to Manifest Version 3,
this patch removes parts of the manifest that are obsolete and/or
unsupported in MV3.
Remove ftp mentions: ftp was dropped from 6 years ago, in Chrome 59.
Remove file_browser_handlers: does not work on Chrome OS according to
https://github.com/mozilla/pdf.js/issues/14161 . MV3 replacement needs
a different manifest key and logic, which will be added later.
Remove content_security_policy: MV3 does not support unsafe-eval CSP,
and PDF.js does not require it. This may result in a mild performance
degradation in PDFs that contain PostScript.
Remove page_action logic: When this logic was originally introduced,
Chrome showed page action buttons in the address bar, which enabled
the extension to display the button on specific PDF viewer tabs only.
In Chrome 49 (2016), pageActions were dropped from the address bar and
put in an UI area that is hidden by default. The user can pin the button
to be unconditionally visible on the toolbar, which is distracting.
Because the UX is no longer serving the original needs, this patch
removes page_action from the manifest.
This major version contains three breaking changes that impact us:
- The `product` option has been renamed to the more suitable `browser`.
- The `page.screenshot()` API returns a `Uint8Array` instead of a
`Buffer`, but since `pngjs` requires a `Buffer` object we need to do
the conversion using `Buffer.from()` before passing data to `pngjs`.
- The browser configuration should be set using a configuration file
instead of environment variables. Note that as a bonus this allows us
to remove the `cross-env` dependency since that was only used to set
the Puppeteer environment variable equally for all operating systems.
For more information about the changes between the old and new Puppeteer
versions refer to https://github.com/puppeteer/puppeteer/releases.
The `AnnotationLayer` may not display correctly formatted data in PopupAnnotations, especially in the GENERIC viewer, since it's using native methods[1] that depend on the *browser* locale instead of the viewer locale as intended.
With Fluent we're able to improve things since it's got built-in support for formatting dates. Not only does this simplify the JavaScript code slightly, but it also gives the localizer more fine-grained control of the desired output.
Please find additional information here:
- https://projectfluent.org/fluent/guide/builtins.html
- https://projectfluent.org/fluent/guide/functions.html
---
[1] `toLocaleDateString`, and `toLocaleTimeString`.
The `PDFDocumentProperties` dialog may not display correctly formatted data, especially in the GENERIC viewer, since it's using native methods[1] that depend on the *browser* locale instead of the viewer locale as intended.
At the time when this dialog was introduced that was probably all we could easily do, but with Fluent we're able to improve things since it's got built-in support for formatting numbers and dates. Not only does this simplify the JavaScript code, but it also gives the localizer more fine-grained control of the desired output.
Please find additional information here:
- https://projectfluent.org/fluent/guide/builtins.html
- https://projectfluent.org/fluent/guide/functions.html
---
[1] `toLocaleString`, `toLocaleDateString`, and `toLocaleTimeString`.
Currently we *manually* fetch the "pdfjs-additional-layers" string and update the DOM-element, which was needed since we want to avoid triggering a bunch of otherwise unnecessary translation when appending the entire layer-tree to the DOM.
By introducing a new helper method in the `L10n`-class we can avoid this, and instead use a "data-l10n-id" attribute on the element (as most other viewer code does nowadays).
Given the length of the l10n-strings we can slightly reduce verbosity, and thus overall code-size, by introducing a helper method for fetching l10n-data.
While testing this I stumbled upon an issue in the `L10n`-class, where an optional chaining operator was placed incorrectly since the underlying method always return an Array; see 48e2a62ed4/fluent-dom/src/localization.js (L38-L77)
- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
- Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
- PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font.
Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see b47c7eca83/src/core/fonts.js (L1186-L1190)
This integration test fails intermittently because we cache the initial
total value to be able to compare it to the new total value at the end
of the test to check that it's different before doing the assertions.
However, this doesn't work as expected because the second `clearInput`
call triggers an intermediate total value calculation because it clicks
on another input field and that triggers a sandbox event.
This results in the `waitForFunction` calls always resolving immediately
and since we don't use other means of waiting until the calculation is
done (using e.g. `waitForSandboxTrip`) we basically rely on the time
between the final click and the assertions to be enough for the sandbox
to do its work. If it's is not done in that time, we do the assertions
with older values and that makes the test fail.
This commit fixes the issue by simply waiting for the total value to be
what we expect it to be. This requires less code, is more consistent
with the other integration tests and removes the possibility of doing
assertions against older values.
In PR #18574 setting `window.uiManager` was moved into the `src` folder
to avoid intermittent integration test failures because at the time we
lacked a way to register event listeners early (before PDF.js loads).
However, in PR #18617 this functionality got introduced, so we can now
use the new way of setting up the event bus in the tests to move this
back to the `test` folder again and to reduce the amount of test-only
code in the main codebase as discussed in PR #18574.
Partially reverts e037c5711d3d2413669e9b6c275986adf24a295b.
The function evaluateOnNewDocument in Puppeteer allow us to execute some js before the pdf.js one
is loaded.
It allows us to stub some setters before there are used and then set some event handlers very soon.
We introduced quite a few new strings recently, but during the last few
rounds of updates we didn't see new translations updates becoming
available. I only just now recalled the announcement that Mozilla is
moving from Mercurial to Git, and indeed the Mercurial page at
https://hg.mozilla.org/l10n-central hasn't been updated since July
anymore and that's were we used to pull our translations from.
This commit fixes the issue by changing the URLs to the Mozilla Git
repositories hosted on GitHub instead.
The way that the debugging hash-parameter parsing is implemented leads to a lot of boilerplate code in this method, since *most* of the cases are very similar.[1]
With just a few exceptions most of the options can be handled automatically, by defining an appropriate checker for each option.
---
[1] With the recent introduction of TESTING-only options the size of this method increased a fair bit.
The problem seems to be caused by the browser trying to "restore" editing input-elements, in the various toolbars, to their previous values when the tab is re-opened.
Hence the simplest solution appears to be to move the event handling into the editor-code, which is also less code overall, since the listener thus won't be registered early enough for the problem to appear.
This patch allows embedders of PDF.js to provide custom match
logic for seaching in PDFs. This is done by subclassing the
PDFFindController class and overriding the `match` method.
`match` is called once per PDF page, receives as parameters the
search query, the page contents, and the page index, and returns
an array of { index, length } objects representing the search
results.
This commit fixes a regression from #18596 where the "Add image" button
styles broke because it's a labeled toolbar button but the labeled
toolbar button CSS rules were only scoped to the secondary toolbar.
However, nowadays labeled toolbar buttons are also used in e.g. the
editor parameters toolbar, and this highlighted that we didn't clearly
encode the concept of labeled toolbar buttons in the CSS so far.
This patch extracts the CSS rules for labeled toolbar buttons that were
previously limited to the secondary toolbar into a dedicated generic
class that can be applied on top of any unlabeled toolbar button to
convert it to a labeled toolbar button, regardless of its position in
the DOM. This also makes the distinction clearer in the HTML, and the
new CSS scope for the toolbar buttons lays the foundation for combining
the other toolbar buttons rules in there later on.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Using the `:is(a)` selector we can move the previously separate rules
into the main `.toolbarButton` override rules.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Now that we have dedicated CSS scopes for the findbar and the secondary
toolbar we have ended up with two CSS blocks with identical selectors,
so now we can combine them for simplicity and to remove some rules in
the first block that were always overridden by the second block.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
We have a number of base-classes that are only intended to be extended, but never to be used directly. To help enforce this during development these base-class constructors will check for direct usage, however that code is obviously not needed in the actual builds.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~2.7` kilo-bytes, which isn't a lot but still cannot hurt.
The secondary toolbar CSS rules predate the general availability of CSS
nesting, which makes them more difficult to understand and change
safely. The primary issues are that the rules are spread over the
`viewer.css` file, they share blocks with other elements and the scope
of the rules is sometimes bigger than necessary.
This refactoring groups all remaining secondary toolbar rules together,
scoped to the top-level `#secondaryToolbar` element, for improved
overview and isolation. Note that this patch only intends to move the
existing rules around and not change any behavior.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Secondary toolbar buttons are toolbar buttons with some extra rules,
mainly to make them wider and have visible labels. However, this
similarity is currently not clearly reflected in the implementation
because the secondary toolbar buttons use a different CSS class,
`secondaryToolbarButton`, compared to the other toolbar buttons that
use the `toolbarButton` CSS class. This also causes some duplication
in the rules and requires extra care to keep the common bits for the
`secondaryToolbarButton` class in sync with the `toolbarButton` class.
Fortunately, now that we have a dedicated CSS scope for the secondary
toolbar, we can simplify this by giving all secondary toolbar buttons
the `toolbarButton` class and explicitly listing the required overrides
in the `#secondaryToolbar` scope instead. Doing so removes most of the
special-casing for secondary toolbar buttons while explicitly listing
the differences in a single place for a better overview. It also lays
the foundation for making all toolbar buttons respect the
`browser.uidensity` Firefox preference later by reducing differences.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The secondary toolbar CSS rules predate the general availability of CSS
nesting, which makes them more difficult to understand and change
safely. The primary issues are that the rules are spread over the
`viewer.css` file, they share blocks with other elements and the scope
of the rules is sometimes bigger than necessary.
This refactoring groups all CSS rules for the secondary toolbar button
container/icons together, scoped to the top-level `#secondaryToolbar`
element, for improved overview and isolation. Note that this patch only
intends to move the existing rules around and not change any behavior.
Moreover, this patch does not move the rules for the secondary toolbar
itself and the secondary toolbar buttons; those will be part of a
follow-up patch and will be easier once this is in place first.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Because of an old oversight (by me) we don't stop sidebar resizing when the browser window loses focus, which seems generally wrong and can also lead to duplicate mouse-related event listeners being registered.
There's a fair number of event listeners in the editor-code that we're currently removing "manually", by keeping references to their event handler functions.
This was necessary since we have a "global" `AbortController` that applies to all event listeners used in the editor-code, however it's now possible to combine multiple `AbortSignal`s; please see https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static
Since this functionality is [fairly new](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static#browser_compatibility) the viewer will check that `AbortSignal.any()` is available before enabling the editing-functionality.
(It should hopefully be fairly straightforward, famous last words, for users to implement a polyfill to allow editing in older browsers.)
Finally, this patch also adds checks and test-only asserts to ensure that we don't add duplicate event listeners in various editor-code.
The findbar CSS rules predate the general availability of CSS nesting,
which makes them more difficult to understand and change safely. The
primary issues are that the findbar rules are spread all over the
`viewer.css` file, they share blocks with non-findbar elements and the
scope of the rules is sometimes bigger than necessary.
This refactoring groups all findbar-related CSS rules together, scoped
to the top-level `#findbar` element, for improved overview and
isolation. Note that this patch only intends to move the existing rules
around and not change any behavior yet, but it does lay the foundation
for e.g. making the findbar respect the `browser.uidensity` preference
in Firefox in follow-up work.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
It looks like this has accidentally been copy/pasted from the
`Textfields and focus` block, which is the only one in which a test
actually uses it. We can therefore safely remove it from all other
blocks where no test uses it.
This commit updates the Babel plugin to:
- apply the same flattening logic that we already
have for blocks, to flatten blocks nested inside
class static blocks
- remove class static blocks when, after flattening
all the blocks they contain, they are empty.
Before this commit, the transform output was the
same as the input.
Given that we're removing event listeners with `AbortSignal` it's no longer necessary to keep a reference to a few of the event handler functions in order to remove them.
Hence we can simply inline the relevant `bind`-calls instead, which reduces the code-size a tiny bit.
The `waitForClick` helper function is functionality-wise mostly a
reduced copy of the more generic `waitForEvent` helper function that
we use for other integration tests, so we can safely replace it to
reduce the amount of code.
Moreover, the `waitForClick` code is prone to intermittent failures
given recent assertion failures we have seen on the bots (one of them
is linked in #18396) while `waitForEvent` has recently been fixed to
avoid intermittent failures, so usiong it should also get rid of the
flakiness for these integration tests.
The `options` handling, for the download-methods, was originally added in PR 16391 and became obsolete in PR 17771.
This fixes failures, in mozilla-central tests, which appeared after landing PR 18527.
Given that these event handlers are virtually identical, obviously with the exception of the name-parameter, let's reduce a little bit of code duplication.
- Shorten the names of the event listeners.
- Use `bind` to pass in the `PDFViewerApplication`-scope to the event handlers.
This makes the event handler code (a lot) less verbose, and this change is possible now that we're removing event listeners with `AbortSignal`.
- Move the GENERIC-only event listeners into the same pre-processor block.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~4.3` kilo-bytes, which isn't a lot but still cannot hurt.
Given that users fairly often report issues with unsupported browsers/environments it cannot hurt to provide a link to the relevant section in the FAQ.
We have a fair number of (effectively) single-line event handlers in the `web/app.js` file, which leads to unnecessarily verbose code. These can, without affecting readability too much, be replaced either by:
- Using `bind` for the simplest cases.
- Using arrow-functions for the remaining ones.
Note that this is possible since we started removing event listeners with `AbortSignal`, which means that we no longer need to keep a reference to the event handler functions to be able to remove them.
Given that the old event handler functions use fairly long function names, and the way that they access `PDFViewerApplication` (given their scope), they impact the overall code-size unnecessarily.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~3.7` kilo-bytes, which isn't a lot but still cannot hurt.
This patch adds a new entry in the secondary menu in order to open a dialog to let the user:
- disables the alt-text generation thanks to a ML model;
- deletes the alt-text model downloaded in Firefox;
- disabled the new alt-text flow.
Unfortunately it turns out (perhaps unsurprisingly) that even the new bug report template isn't stopping users from leaving out the single most important part, i.e. `Attach (recommended) or Link to PDF file`, despite it now being marked as a required field.
Added annotations could have some quadpoints (highlight, ink).
The isNumberArray check was returning false and consequently the annotation wasn't
printable.
The tests didn't catch this issue because the quadpoints were passed as Array.
So driver.js has been updated in order to pass them as Float32Array in order
to be in a situation similar to the real life one.
Over time a couple of event listeners have been placed in the constructor, despite there being an existing helper method for that purpose. To improve the code organization, let's move these to the intended method instead.
This refactoring lays the foundation for making the toolbar height
configurable in Firefox via the `browser.uidensity` preference. For
this to work correctly the toolbar height must be defined in a single
place that can easily be updated dynamically, hence this patch which
moves it to a CSS variable in such a way that the rest of the UI adapts
correctly if the value is changed.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The HTML button elements we use are all regular buttons that don't
submit form data to a server. According to
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/button#notes
those buttons should all have their type set to `button` explicitly, but
we only do that a handful of them. This commit fixes the issue by
consistently giving all our buttons the `button` type.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The sidebar and secondary toolbar both have a reference to their toggle
buttons in their own sections in `getViewerConfiguration`, so it makes
sense for the findbar to do the same.
While we actually have a findbar-specific reference to the toggle
button, I noticed that we don't use it consistently because the toolbar
also has a reference to the exact same toggle button and we use both in
the code. This is probably for historical reasons: the docstring in the
toolbar file indicates that the `viewFind` element is an input to the
component, but that option is never actually used in the code itself.
This commit fixes the issue by removing the toolbar-specific reference,
since it's not actually used (anymore) in the toolbar code, so that we
consistently use the findbar-specific reference everywhere.
This dependency got introduced in PR #10293, almost six years ago now,
because `eslint-plugin-mozilla` didn't work without it but also didn't
require it as a dependency itself.
However, nowadays `eslint-plugin-mozilla` works just fine without it,
and other dependencies that need it correctly require it themselves.
This can be seen using `npm ls globals`:
```
$ npm ls globals
pdf.js
├─┬ @babel/core@7.24.9
│ └─┬ @babel/traverse@7.25.0
│ └── globals@11.12.0
├─┬ @babel/preset-env@7.25.0
│ └─┬ @babel/plugin-transform-classes@7.25.0
│ └── globals@11.12.0
├─┬ eslint-plugin-unicorn@55.0.0
│ └── globals@15.8.0 deduped
├─┬ eslint@8.57.0
│ ├─┬ @eslint/eslintrc@2.1.4
│ │ └── globals@13.24.0
│ └── globals@13.24.0
└── globals@15.8.0
```
Further proof that `eslint-plugin-mozilla` (no longer) uses `globals` is
from a source code search in
https://searchfox.org/mozilla-central/search?q=globals&path=&case=false®exp=false.
The only results for `eslint-plugin-mozilla` refer to a file named
`globals.js`, but the `globals` NPM package is not actually imported
anywhere.
Given this we should be able to safely get rid of this explicit
dependency on our end now.
For the Firefox pdf viewer, we want to use AI to guess an alt-text when adding an image to a pdf.
For now the telemtry stuff is not implemented and will come soon.
In order to test it locally:
- set enableAltText, enableFakeMLManager and enableUpdatedAddImage to true.
or in Firefox:
- set browser.ml.enable, pdfjs.enableAltText and pdfjs.enableUpdatedAddImage to true.
This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.
The current error-messages also mention internal parameters, which an end-user obviously don't have to care about. So, let's try to avoid confusion here by only including the API parameters.
- Initialize all user-options upfront, to allow getting the options without having to check if they exist first. (This is easier to implement after PR 18475 landed.)
- Move the user-options into a private field in `AppOptions`. (This code is old enough to pre-date general availability of that class feature.)
- Move code/methods limited to GENERIC-builds into `AppOptions`.
Currently we'll only dispatch events, for the options that support it, when updating preferences. Since this could potentially lead to inconsistent behaviour, let's avoid any future surprises by *always* dispatching events regardless of how the relevant option is being changed.
Obviously we should then also dispatch events in `AppOptions.set`, and to avoid adding even more duplicated code this method is changed into a wrapper for `AppOptions.setAll`.
While this is technically a tiny bit less efficient I don't think it matters in practice, since outside of development- and debug-mode `AppOptions.set` is barely used.
When selecting text, hovering over an element
causes all the text between (according the the dom
order) the current selection and that element to
be selected.
This means that when, while selecting, the cursor
moves over a link, all the text in the page gets
selected. Setting `user-select: none` on the link
annotations would improve the situation, but it
still makes it impossible to extend the selection
within a link without using Shift+arrows keys on
the keyboard.
This commit fixes the problem by setting
`pointer-events: none` on the `<section>`s in the
annotation layer while selecting some text. This
way, they are ignored for hit-testing and do not
affect selection.
It is still impossible to _start_ a selection
inside a link, as the link text is covered by the
link annotation.
Fixes#18266
Since "localeProperties" is needed in Firefox, let's remove a tiny bit of option duplication by using it in the GENERIC builds as well.
For convenience, the old debug-only "locale" hash-parameter is kept intact.
The `streamqueue` dependency is only used for the test targets in the
Gulpfile to make sure that the test types are run in series. This is
done by modelling the test processes as readable streams and then having
`streamqueue` combine them into a single readable stream for Gulp that
processes the inner readable streams in series (in contrast to the
`ordered-read-streams` dependency which is very similar but processes
the inner streams in parallel).
However, modelling the test processes as readable streams is a bit odd
because we're not actually streaming any data as one might expect.
Instead, we only use them to signal test process completion/abortion.
Fortunately nowadays, with modern Gulp versions, we don't need readable
streams and `streamqueue` anymore because we can achieve the same result
with simple asynchronous functions that can be passed to e.g.
`gulp.series()` calls. Note that we already do this in various places,
and overall it should be a better fit for test process invocations.
For options with varying types, see `useSystemFonts`, we're not sufficiently validating the type when setting a new value. This means that for an option that uses `OptionKind.UNDEF_ALLOWED` we'd allow *any* value, which is obviously not the intention.
Hence we instead introduce a new and *optional* `type`-field that allows specifying exactly which types are valid when multiple ones are supported.
*Note:* This obviously didn't occur to me until after PR 18465 landed, sorry about that!
This option is old enough that it predates e.g. the introduction of AppOptions, so it probably cannot hurt to re-factor this a little bit now.
- In development-mode we can just set this directly in AppOptions.
- In the extension-builds we still need to set it dynamically, however by moving this code we get the benefit of being able to avoid storing a data-URL in that case; note how [the API ignores those anyway](98e772727e/src/display/api.js (L256-L262)).
To avoid introducing any inline "hacks" in the viewer-code this meant adding `useSystemFonts` to the AppOptions, which thus required some new functionality since the default value should be `undefined` given how the option is handled in the API; note [this code](ed83d7c5e1/src/display/api.js (L298-L301)).
Finally, also moves the definition of the development-mode `window.isGECKOVIEW` property to the HTML file such that it's guaranteed to be set regardless of how and when it's accessed.
The old implementation in `PDFViewerApplication.download` means that if the `getDocument`-call hasn't yet downloaded the *entire* PDF document it will be re-downloaded. This seems generally undesirable since:
- In some (probably rare) cases a URL may not be valid an arbitrary number of times, which means that the download may fail.
- It will lead to wasted resources, since we'll end up fetching the same PDF document *twice* in that case (once via the `getDocument`-call and once to allow the user to save it).
Hence this patch suggests that we change this very old code to instead always call the `PDFDocumentProxy.getData` method, since that'll trigger immediate downloading of the remaining document via the existing `getDocument`-call.
Finally, the patch removes the `PDFViewerApplication.downloadComplete` property since it's now unused.
This was only necessary to prevent (unlikely) visual glitches when `disableAutoFetch = true` is being used.
However, it turns out that we can move this functionality into the `ProgressBar` class instead by checking if the entire PDF document has loaded. This works since the API is always reporting 100% loading progress regardless of how the document was loaded; see [this code](ed83d7c5e1/src/display/api.js (L2735-L2740)).
After the changes in PR 18413 we're now relying even more on `AppOptions` and it thus seems like a good idea to ensure that no invalid values can be added.
Hence the `AppOptions.{set, setAll}` methods will now *unconditionally* validate that the type of the values agree with the default-options.
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
To avoid having to request and await various browser data early during the PDF Viewer initialization, we can include that data in the existing preference fetching instead. With other planned changes to the PDF Viewer, the current situation would only become worse over time.
*Note:* Technically this data aren't preference-values, however we're already including other non-prefs in this list (e.g. `isInAutomation`) and doing it this way simplifies the overall implementation.
As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used.
Hence this seems like something that should be kept *internal* and thus only exposed in TESTING-builds.
Switching to an editing mode can be asynchronous (e.g. if an editable annotation exists on a
visible page), so we must add a new editor only when the page rendering is done.
Somehow I managed to mess up the URL creation relevant to e.g. MOZCENTRAL builds, which is breaking the pending PDF.js update in mozilla-central; sorry about that!
To avoid future issues, we'll now always check if absolute filter-URLs are necessary regardless of the build-target.
The Git logic for pushing to the `pdfjs-dist` repository was already
removed in PR #18350, but it was forgotten to also remove the logic to
create a Git repository in the first place. This commit fixes the open
action for that from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661367441.
Moreover, we implement the suggestion from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661364026
about moving the Git URL to a separate variable for consistency and we
update the homepage URL to the HTTPS scheme to avoid an HTTP -> HTTPS
redirect and enforce TLS-encrypted connections.
In PR #18356 the new tab page logic was disabled to prevent Firefox
from logging failed network connections to Contile, the Mozilla Tiles
service that is used for the new tab page [1]. However, recently this
log reappeared locally and on the bots:
```
console.warn: TopSitesFeed: Failed to fetch data from Contile server:
NetworkError when attempting to fetch resource.
```
It looks like Contile communication is also triggered from other places
in Firefox such as the URL bar [2], so this commit fixes the issue by
disabling network connections to Contile [3] altogether regardless of
their origin within Firefox. Note that we don't revert the change from
PR #18356 because as noted in [4] it can't hurt to keep that disabled
too to avoid overhead for a feature we don't use in the tests.
[1] https://github.com/mozilla-services/contile
[2] 196ef8360e/browser/components/urlbar/UrlbarProviderTopSites.sys.mjs (L38)
[3] 196ef8360e/browser/components/newtab/lib/TopSitesFeed.sys.mjs (L111)
[4] https://github.com/mozilla/pdf.js/pull/18356#issuecomment-2200354730
When the mouse was hovering an existing highlight, all the text in the page
was selected.
So when the user is selecting some text or drawing a free highlight, the mouse
is disabled for the existing editors.
Because of editor z-index, the toolbar belonging to an highlight created
before a second adjacent one, can be overlapped by this new one.
So when the user select an editor we just show it on front of all the other
ones to make sure that it can be used normally.
This functionality is purposely limited to development mode and GENERIC builds, since it's unnecessary in e.g. the *built-in* Firefox PDF Viewer, and will only be used when a `<base>`-element is actually present.
*Please note:* We also have tests in mozilla-central that will *indirectly* ensure that relative filter-URLs work as intended in the Firefox PDF Viewer, see https://searchfox.org/mozilla-central/source/toolkit/components/pdfjs/test/browser_pdfjs_filters.js
---
To test that the issue is fixed, the following code can be used:
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<base href=".">
<title>base href (issue 18406)</title>
</head>
<body>
<ul>
<li>Place this code in a file, named `base_href.html`, in the root of the PDF.js repository</li>
<li>Run <pre>npx gulp dist-install</pre></li>
<li>Run <pre>npx gulp server</pre></li>
<li>Open <a href="http://localhost:8888/base_href.html">http://localhost:8888/base_href.html</a> in a browser</li>
<li>Compare rendering with <a href="http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf">http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf</a></li>
</ul>
<canvas id="the-canvas" style="border: 1px solid black; direction: ltr;"></canvas>
<script src="/node_modules/pdfjs-dist/build/pdf.mjs" type="module"></script>
<script id="script" type="module">
//
// If absolute URL from the remote server is provided, configure the CORS
// header on that server.
//
const url = '/test/pdfs/issue16287.pdf';
//
// The workerSrc property shall be specified.
//
pdfjsLib.GlobalWorkerOptions.workerSrc =
'/node_modules/pdfjs-dist/build/pdf.worker.mjs';
//
// Asynchronous download PDF
//
const loadingTask = pdfjsLib.getDocument(url);
const pdf = await loadingTask.promise;
//
// Fetch the first page
//
const page = await pdf.getPage(1);
const scale = 1.5;
const viewport = page.getViewport({ scale });
// Support HiDPI-screens.
const outputScale = window.devicePixelRatio || 1;
//
// Prepare canvas using PDF page dimensions
//
const canvas = document.getElementById("the-canvas");
const context = canvas.getContext("2d");
canvas.width = Math.floor(viewport.width * outputScale);
canvas.height = Math.floor(viewport.height * outputScale);
canvas.style.width = Math.floor(viewport.width) + "px";
canvas.style.height = Math.floor(viewport.height) + "px";
const transform = outputScale !== 1
? [outputScale, 0, 0, outputScale, 0, 0]
: null;
//
// Render PDF page into canvas context
//
const renderContext = {
canvasContext: context,
transform,
viewport,
};
page.render(renderContext);
</script>
</body>
</html>
```
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
Given:
```css
html,
body {
height: 100%;
}
body {
display: flex;
}
```
The `<div>` appended to the `<body>` will take up the full height of the
viewport due to the implicit `align-items: stretch` of flex containers.
This results in an incorrect computed `minFontSize` value.
In the MOZCENTRAL build the `BasePreferences` class is almost unused, since it's only being used to initialize and update the `AppOptions` which means that theoretically we could move the relevant code there.
However for the other build targets (e.g. GENERIC and CHROME) we still need to keep the `BasePreferences` class, although we can re-factor things to move the necessary validation inside of `AppOptions` and thus simplify the code and reduce duplication.
The patch also moves the event dispatching, for changed preference values, into `AppOptions` instead.
Code inspection uncovered that quite a few integration tests don't wait
for scripting to be ready before proceeding, which is a source of
intermittent failures, especially on slower machines where e.g. creating
the sandbox takes longer.
This commit fixes the issues by introducing a `waitForScripting` helper
function and calling it consistently in all scripting integration tests.
Add unit test to check compatability with such cmaps
In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
This unit test fails occasionally (albeit much less than before thanks
to PR #17663), so we change the parsing time check's divisor to prevent
it from happening again. If the last page's rendering time is less than
or equal to 50% of the first page's rendering time that should be enough
proof that no worker thread re-parsing occurred while also providing a
wide enough range to avoid intermittents.
Note that the assertion is now equal to the one we already have in the
"caches image resources at the document/page level, with main-thread
copying of complex images (issue 11518)" unit test which seems to work
reliably so far.
This patch fixes a situation that'll probably never happen, but nonetheless seems like something that we should address.
Currently the "updatedPreference" listener isn't registered *until after* the preferences have been read and initialized, which leaves a very short window of time where a preference change could theoretically be skipped.
If uncaught exceptions occur in the tests (which happened in #17962 and
can be triggered manually by throwing an error in `integration-boot.js`)
the teardown logic of the tests doesn't get to run and thus spawned
browser processes are not closed properly. Given that `test.mjs` is the
only process that has a reference to them they will become orphaned and
keep running if `test.mjs` exits without explicitly closing them.
This commit fixes the issue by always closing the browsers if uncaught
exceptions occur, and we make sure to log them for debugging purposes.
This integration test fails intermittently because we're not (correctly)
awaiting the character limit increase sandbox action.
For the first increase an attempt was made to handle this, but it doesn't
work correctly because the text in the field is `abcdefghijklmnopq` and
it's not be truncated until the sandbox action is completed, so because
we didn't await that we would could pass the `value !== "abcdefgh"` check
because `abcdefghijklmnopq !== abcdefgh` is true. For the second increase
we didn't have a check in place.
This commit fixes the issues by using the `waitForSandboxTrip` and
`waitForSelector` helper functions to make sure that we can only proceed
if the sandbox action is completed and the DOM element is updated.
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
Similar to the `mustBeViewed` method, we can check the relevant parameters within the `mustBeViewedWhenEditing` method itself since that (in my opinion) slightly helps readability of the code in the `src/core/document.js` file.
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
- if the annotation has to be composed with the page then the canvas must be correctly
composed with its parent. That means we should move the canvas under canvasWrapper
and we should extract composing info from the drawing instructions...
Currently it's the case with highlight annotations.
- we use some extra memory for those canvas even if the user will never edit them, which
the case for example when opening a pdf in Fenix.
So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
For provenance, enabled in PR #18352, to work the repository URL in
`package.json` is required to match the repository URL of the GitHub
Actions invocation. This should fix the following error we encountered
publishing a new release today:
```
npm error 422 Unprocessable Entity - PUT https://registry.npmjs.org/pdfjs-dist - Error verifying sigstore provenance bundle: Failed to validate repository information: package.json: "repository.url" is "git+https://github.com/mozilla/pdfjs-dist.git", expected to match "https://github.com/mozilla/pdf.js" from provenance
```
It should help to have such a garbage in the logs:
```
console.warn: TopSitesFeed: Failed to fetch data from Contile server: NetworkError when attempting to fetch resource.
JavaScript error: , line 0: TypeError: NetworkError when attempting to fetch resource.
```
This PR switches from `npm install` to `npm ci` on CI. This enables some additional checks to ensure repo integrity when using CI/CD.
Read more: https://docs.npmjs.com/cli/v10/commands/npm-ci
This commit removes the following warnings from the `npm publish` output:
```
npm warn publish npm auto-corrected some errors in your package.json when publishing. Please run "npm pkg fix" to address these errors.
npm warn publish errors corrected:
npm warn publish Removed invalid "scripts"
npm warn publish "repository.url" was normalized to "git+https://github.com/mozilla/pdfjs-dist.git"
```
For the "scripts" section it turns out that if the package doesn't have
any scripts it's expected to explicitly set it to an empty object; refer
to https://github.com/npm/cli/issues/6918 and
https://github.com/denoland/dnt/pull/414.
Errors related to this `requestAnimationFrame` show up intermittently when running the integration-tests on the bots, however I've been unable to reproduce it locally.
Hence I cannot guarantee that it's enough to fix the timing issues, however this should be generally safe since the `requestAnimationFrame` invokes the `_next`-method and the first thing that one does is check that rendering hasn't been cancelled.
While the event listener is removed during testing, the `requestAnimationFrame` isn't cancelled and that occasionally shows up when the integration-tests are run on the bots.
Debugging #17931 uncovered a race condition in the way we use the
`waitForEvent` function. Currently the following happens:
1. We call `waitForEvent`, which starts execution of the function body
and immediately returns a promise.
2. We do the action that triggers the event.
3. We await the promise, which resolves if the event is triggered or
the timeout is reached.
The problem is in step 1: function body execution has started, but not
necessarily completed. Given that we don't await the promise, we
immediately trigger step 2 and it's not unlikely that the event we
trigger arrives before the event listener is actually registered in the
function body of `waitForEvent` (which is slower because it needs to be
evaluated in the page context and there is some other logic before the
actual `addEventListener` call).
This commit fixes the issue by passing the action to `waitForEvent` as
a callback so `waitForEvent` itself can call it once it's safe to do so.
This should make sure that we always register the event listener before
triggering the event, and because we shouldn't miss events anymore we
can also remove the retry logic for pasting.
The integration tests are currently not consistent in how they do
copy/pasting: some tests use the `kbCopy`/`kbPaste` functions with
waiting for the event inline, some have their own helper function to
combine those actions and some even call `kbCopy`/`kbPaste` without
waiting for the event at all (which can cause intermittent failures).
This commit fixes the issues by providing a set of four helper functions
that all tests use and that abstract e.g. waiting for the event away
from the caller. This makes the invididual tests simpler and consistent,
reduces code duplication and fixes possible intermittent failures
due to not waiting for events to trigger.
Browsers have an accessibility option that allows user to enforce
a minimum font size for all text rendered in the page, regardless
of what the font-size CSS property says. For example, it can be
found in Firefox under `font.minimum-size.x-western`.
When rendering the <span>s in the text layer, this causes the
text layer to not be aligned anymore with the underlying canvas.
While normally accessibility features should not be worked around,
in this case it is *not* improving accessibility:
- the text is transparent, so making it bigger doesn't make it more
readable
- the selection UX for users with that accessibility option enabled
is worse than for other users (it's basically unusable).
While there is tecnically no way to ignore that minimum font size,
this commit does it by multiplying all the `font-size`s in the text
layer by minFontSize, and then scaling all the `<span>`s down by
1/minFontSize.
This code contains the same bug that the previous commit fixed in
`waitForEvent`, namely that we don't clear the timeout if the event
is triggered. By using the now fixed `waitForEvent` function we not
only deduplicate this code but we also fix this issue so that no
incorrect timeout logs show up anymore.
Debugging #17931, by printing all parts of the event lifecycle including
timestamps, uncovered that some events for which a timeout was logged
actually did get triggered correctly in the browser. Going over the code
and discovering https://stackoverflow.com/questions/47107465/puppeteer-how-to-listen-to-object-events#comment117661238_65534026
showed what went wrong: if the event we wait for is triggered then
`Promise.race` resolves, but that doesn't automatically cancel the
timeout. The tests didn't fail on this because `Promise.race` resolved
correctly, but slightly later once the timeout was reached we would see
spurious log lines about timeouts for the already-triggered events.
This commit fixes the issue by canceling the timeout if the event we're
waiting for has triggered.
Currently errors in `afterAll` are logged, but don't fail the tests.
This could cause new errors during test teardown to go by unnoticed.
Moreover, the integration test use a different reporting mechanism which
also handled errors differently (this is extra reason to do #12730).
This patch fixes the issues by consistently handling errors in
`suiteStarted` and `suiteDone` in both reporting mechanisms.
Fixes#18319.
This integration test contains three issues:
- The `page.bringToFront()` call is not awaited, even though it returns
a promise (see https://pptr.dev/api/puppeteer.page.bringtofront). Note
that in other tests we do this correctly already.
- The `page.waitForSelector()` call at the end is unnecessary because
that exact condition is already checked at the end of the
`waitForImage` function we call just before this line; see
https://github.com/mozilla/pdf.js/blob/master/test/integration/stamp_editor_spec.mjs#L74.
- The pages should be closed in reversed order; please refer to the
description in #18318 for more details.
Fixes#18318.
This integration test is currently the only one that spawns a separate
browser instance. However, while it closes the browser once it's done,
it doesn't close the page (and therefore doesn't call the `testingClose`
method) like the other integration tests do.
This commit fixes this difference by closing the page before closing the
browser, thereby ensuring all regular cleanup logic gets called and we
avoid (intermittent) shutdown tracebacks in the logs. This allows
upcoming integration tests that spawn a separate browser instance to
reuse this pattern to cleanly end the test.
Given that we integrate the `closeSinglePage` code from #17962 for this
patch, @calixteman is credited as the co-author.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
This commit fixes the following log line that currently shows up for
every type of tests that involve a browser:
`System addon update list error SyntaxError: XMLHttpRequest.open:
'http://%(server)s/dummy-system-addons.xml' is not a valid URL.`
If Firefox is in testing mode, the system addons update URL is
configured to a dummy URL so that it can't actually update (see for
example the same value in Marionette at
https://searchfox.org/mozilla-central/source/testing/marionette/client/marionette_driver/geckoinstance.py#109),
but this doesn't stop Firefox from trying to update, and when it does
it logs this line because the URL is obviously invalid.
Hence this patch which disables system addon updates altogether so
Firefox doesn't attempt to use the dummy URL anymore. The browser
updates are all managed by Puppeteer, and regular updates have already
been disabled too (see
6937a76f0a/packages/browsers/src/browser-data/firefox.ts (L302-L303)).
Allow apps with supportsIntegratedFind to better monitor the find state.
A recognized use case is the Firefox findbar, its "not found" sound must
consider `entireWord` and only make noise when it is off.
See related implementation in
https://hg.mozilla.org/mozilla-central/rev/16b902cbcf26
This change can help if we have to move the implementation from cpp to jsm.
Ensure that we never round the canvas dimensions above `maxCanvasPixels`
by rounding them to the preceeding multiple of the display ratio rather
than the succeeding one.
This commit introduces a test to ensure that:
- When zooming below the maxCanvasPixels limit, the canvas is rendered
with the correct size (the two sides are multiplied by the zoom factor).
- When zooming above the maxCanvasPixels limit, the canvas size is capped.
This has two advantages, as far as I'm concerned:
- The tests don't need to manually invoke multiple functions to properly clean-up, which reduces the risk of missing something.
- By collecting all the relevant clean-up in one method, rather than spreading it out, we get a much better overview of exactly what is being reset.
The release builds are currently not reproducible because ZIP files
record the modification date of files generated during the build
process, meaning that two builds from identical source code, made
at different times, result in different output.
This is undesirable because it makes detecting differences in the output
harder, for instance recently during the Gulp 5 efforts, because the
modification date differences are irrelevant and could obscure actually
important differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring the ZIP file creation to use
the (fixed) date of the last Git commit for which the release is being
made. With this the build is fully reproducible so that identical source
code builds result in bit-by-bit identical output artifacts.
To improve readability we convert the compression method to take a
parameter object and use template strings where useful.
The JSDoc builds are currently not reproducible because a timestamp is
included in the output, meaning that two builds from identical source
code, made at different times, result in different output.
This is undesirable because it makes diffing the output difficult, for
instance recently during the Gulp 5 efforts, because the timestamp
differences are irrelevant and could obscure actually important
differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring JSDoc to not include the
timestamps in the output. It's not relevant for end users and without it
the build is fully reproducible so that identical source code builds
result in bit-by-bit identical output artifacts.
Note that this option sadly can only be set via a configuration file,
and not via the command line parameters like we used to have, so for
consistency we also move the other options into the configuration file
so they are all in one place and the Gulpfile becomes a bit simpler.
After the previous commit this method has only a single call-site, hence we can inline the needed part of that check directly in `PDFViewerApplication.download` instead.
Currently saving a modified PDF document may fail *intermittently*, if it's triggered before the entire document has been downloaded.
When saving was originally added we only supported forms, and such PDF documents are usually small/simple enough for this issue to be difficult to trigger. However, with editing-support now available as well it's possible to modify much larger documents and this issue thus becomes easier to trigger.
One way to reproduce this issue *consistently* is to:
- Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableHistory=true&disableStream=true&disableAutoFetch=true
- Add an annotation on the first page, it doesn't matter what kind.
- Save the document.
- Open the resulting document, and notice that with the `master` branch the annotation is missing.
This should make the API documentation slightly quicker to access for
users by removing an extra click. Moreover, it makes the API
documentation blend in with the rest of the website/theme (one of the
points in #6526).
Fixes#18249.
This avoids having to repeat the same code multiple times, since besides resolving the promise we also need to send the "configure" message to the worker-thread.
The feature-testing on the worker-thread has been simplified in previous pull requests, which means that we can simplify this main-thread handler as well.
This helps reduce overall indentation in the method, thus leading to slightly less code.
Also, remove an old comment referring to Chrome 15 since that's no longer relevant now.
Wintersmith is no longer maintained given that the most recent version
is from six years ago, and all vulnerabilities that NPM reports
originate from Wintersmith's dependencies. Metalsmith, and its plugins,
on the other hand have recently had releases and don't have known
vulnerabilities. In fact, the number of reported vulnerabilities by NPM
even goes down to zero with this patch applied.
This commit therefore replaces Wintersmith with Metalsmith by providing
a transparent drop-in replacement, in a way that requires the least
amount of changes to the code and the generated output.
Note that this patch does update our versions of jQuery, Bootstrap and
the Highlight.js theme because the previous versions were very outdated
and didn't work correctly with Metalsmith. Moreover, those old versions
contained vulnerabilities that are hereby fixed.
Fixes#18198.
It's recommended to always install dependencies locally in the project
folder because global dependencies can easily conflict with other
projects and, because they are not managed by the project, diverge from
versions defined in e.g. `package.json`. Previously we installed
`gulp-cli` globally because at the time we lacked a convenient mechanism
to use Gulp otherwise, but nowadays NPM provides the `npx` command for
that purpose and recommends using it over global installations (see
https://docs.npmjs.com/downloading-and-installing-packages-globally
and PR #17489 that provided the ground work for using it).
This commit therefore updates our README and website to no longer recommend
installing `gulp-cli` globally but instead installing it locally from the
already existing entries in `package.json` like all other dependencies
we use. Not only does this remove the special-casing for `gulp-cli`
which simplifies the installation procedure, it also ensures that the
version ranges provided in `package.json` are respected.
This change is similar to the change in commit 92de2b7.
Fixes#18232.
Fixes 98ef8a1.
If for example dd:mm is failing we just try with d:m which is equivalent
to the regex /d{1,2}:m{1,2}/. This way it allows the user to forget the
0 for the first days/months.
- Use a CSS rule to display the wait-cursor during copying. Since copying may take a little while in long documents, there's a theoretical risk that something else could change the cursor in the meantime and just resetting to the saved-cursor could thus be incorrect.
- Remove the `interruptCopyCondition` listener with an AbortController, since that's slightly shorter code.
This method has only a single call-site in the viewer, since it's used as a fallback, and the functionality can be moved into the `DownloadManager.download` method instead.
There's no specification for that (even if it's possible to have an idea from
the xfa specs) so we just want to hide them in order to avoid to display something
wrong.
This helper method is simpler/shorter than it originally was[1] and with recent refactoring so is the `render`-method, hence we can just inline this code now.
---
[1] It used to e.g. dispatch the "textlayerrendered" event.
Part of this code is really old and pre-dates general availability of things such as `Blob` and `URL.createObjectURL`. To avoid having to duplicate the Blob-creation in the viewer, we can move this into `DownloadManager.download` instead.
Also, remove a couple of unnecessary `await` statements since the methods in question are synchronous.
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
This is a major version bump, and the changelog at
https://github.com/gulpjs/gulp/releases/tag/v5.0.0 indicates one
breaking change that impacts us, namely that streams are now by default
interpreted/transformed to UTF-8 encoding. This breaks `gulp.src` calls
that work on binary files such as images or CMaps, but is fortunately
easy to fix for us by disabling re-encoding for all `gulp.src` calls
(see https://github.com/gulpjs/gulp/issues/2764#issuecomment-2063415792
for more information). This restores the previous behavior of copying
the files as-is without Gulp performing any transformations to it, which
is what we want because Gulp is only used for bundling and we make sure
that the source files have the right encoding.
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
This parameter allows defining which point should remain
fixed while scaling the document. It can be used, for example,
to implement "zoom around the cursor" or "zoom around
pinch center".
The logic was previously implemented in `web/app.js`, but
moving it to the viewer scaling utilities themselves makes it
easier to implement similar zooming functionalities in
other embedders.
`updateScale` receives a `drawingDelay`, a `scaleFactor` and/or a number of `steps`.
If `scaleFactor` is a positive number different from `1` the current scale is multiplied by
that number. Otherwise, if `steps` if a positive integer the current scale is multiplied by
`DEFAULT_SCALE_DELTA` `steps` times. Finally, if `steps` is a negative integer, the
current scale is divided by `DEFAULT_SCALE_DELTA` `abs(steps)` times.
In order to share common parts between different dialogs, this patch
aims to slightly refactor the css in making it more generic.
This way it'll simplify adding a new dialog (we want to add a new one when
leaving an unsaved document).
After the re-factoring in PR 18104 there's now a *theoretical* risk that a pending `TextLayer` is never removed, which we can avoid by not registering it until `render` is invoked.
Note that this doesn't affect the viewer or tests, but if a third-party user calls `new TextLayer(...)` without a following call of either the `render`- or `cancel`-method we'd block global clean-up without this patch.
The `setTextContentSource` functionality is very old code, and was introduced years ago together with streaming of textContent.
By moving the `streamTextContent`-call into the `TextLayerBuilder` class we collect more functionality in one place and slightly reduce the amount of code needed.
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
This should avoid the latest JS features appearing in the TypeScript definitions, and given the currently supported browsers/environments (in PDF.js) we shouldn't need to target an even older ES-version.
Apparently we forgot to update this in the version 4 release, so better late than never I suppose.
- Use `.mjs` extensions where appropriate.
- Remove mention of the `debugger`-functionality, since that's not really relevant to users.
- Unify the whitespace handling to use spaces consistently.
- Move the definition of the `loadingParams` Object, to simplify the code.
- Add a unit-test, since none existed and the viewer depends on this functionality.
Over time the number of integration tests that get the rectangle for a
given selector has increased quite a bit, and the code to do so has
consequently become duplicated.
This commit refactors the integration tests to move the rectangle
fetching code to a single place, which reduces the code by over 400
lines and makes the individual tests simpler.
- Use `this` rather than explicitly spelling out the class-name in the static `#enableGlobalSelectionListener` method, since that leads to shorter code. Given that all the relevant static fields are *private* ESLint will catch any scope errors in the code.
- Reduce a little bit of duplication when using the `#selectionChangeAbortController` signal.
- Utilize early returns in the "selectionchange" event handler, since that reduces overall indentation which helps readability a tiny bit.
The `merge-stream` dependency is no longer maintained and doesn't work
in combination with Gulp 5 anymore (for more information refer to
https://github.com/gulpjs/gulp/issues/2802#issuecomment-2094130656).
Fortunately the Gulp team maintains a drop-in replacement dependency
called `ordered-read-streams` with the same API as `merge-stream`.
Indeed, running all affected Gulp targets and comparing build artifacts
with `diff -r <old> <new>` confirms that no unexpected changes are made.
Fixes a part of #17922.
This commit replaces most `waitForTimeout` occurrences with calls to
`waitForFunction` or `waitForSandboxTrip`. Note that the occurrences in
the "must check that focus/blur callbacks aren't called" test remain
until we find a good way to ensure that nothing happened after the tab
switches (because currently we can't be sure that nothing happens since
there is nothing to await).
This commit adds a test for 0603d1ac1843bc4098d74382beda6cc511350ccd.
Before the fix the `pagerendered` events would be fired just 2-3
milliseconds after the call to `increaseScale`/`decreaseScale`.
Fixes issue #16843.
In certain cases, the text layer was misaligned
due to a difference between the `lang` attribute
of the viewer and the canvas. This commit addresses
the problem by adding the `lang` attribute to the canvas.
The issue was caused because PDF.js uses serif/sans-serif
fonts to generate the text layer and relies on system fonts.
The difference in the `lang` attribute led to different fonts
being picked, causing the misalignment.
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
The `through2` dependency got introduced over four years ago in #11325 to
replace the unmaintained `gulp-transform` dependency. However, sadly the
same holds for `through2` since the last release was also four years ago.
Fortunately the `through2` dependency can trivially be replaced with the
built-in Node.js `stream.Transform` API nowadays. In fact, the `through2`
dependency mentions themselves in their README already that they are "a
tiny wrapper around Node.js streams.Transform". The `stream.Transform`
API is available in all Node.js versions we support, and in Node.js 6
already the simplified constructor approach for `stream.Transform` got
introduced to simplify creating custom stream transformers; see
https://nodejs.org/docs/latest-v6.x/api/stream.html#stream_new_stream_transform_options.
This commit therefore replaces `through2` by switching to the
`stream.Transform` API directly so we don't need any wrappers anymore.
Note that for our case the only change we have to make is to enable
object mode, see https://nodejs.org/api/stream.html#object-mode, because
we pass in `VinylFile` objects instead of e.g. regular `Buffer` objects.
I have confirmed in two ways that this is indeed a drop-in replacement:
- Running the Gulp targets that call the `transform` function and
diffing the resulting `build` folder before/after this patch, with
`diff -r build-old/ build-new/`, to ensure that there are no
unexpected changes in the output.
- Changing the Gulpfile to, instead of UTF-8, transform the files to
ASCII, and diffing the resulting `build` folder to confirm that the
transformation logic works and produces different results, such as:
```
diff build/lib/core/standard_fonts.js build-ascii/lib/core/standard_fonts.js
284c284
< t["Trinité"] = true;
---
> t["Trinit�"] = true;
```
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.
This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.
Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
*Please note:* This doesn't really affect the viewer, but may affect the library API if multiple PDF documents are opened in parallel.
Since we clean-up "global" textLayer-data when destroying a PDF document, this means that other active PDFs could potentially break by invoking `cleanupTextLayer` unconditionally. Note that textLayer rendering is an asynchronous task, and we thus need to ensure those are all finished before running clean-up.
The `needle` dependency originally got introduced in #12024, almost four
years ago, to be able to use pre-built binaries for the `canvas`
dependency on macOS. However, nowadays the `needle` dependency isn't
used by `canvas` anymore, or any other package we use for that matter,
as shown by the empty NPM dependency tree:
```
$ npm ls needle
pdf.js
└── needle@3.3.1
```
Investigation showed that the `canvas` package depends on the
`node-pre-gyp` package which in turn depended on `needle` (see
https://github.com/Automattic/node-canvas/issues/1110#issuecomment-411232630),
but in version 1.0.0 of `node-pre-gyp` from three years ago the `needle`
dependency got dropped in favor of `node-fetch` (see
a74f5e367c/CHANGELOG.md (L52)).
This explains why the NPM dependency tree is empty now and proves that
we can safely get rid of this dependency now.
In Node.js 14.14.0 the `fs.rmSync` function was added that removes files
and directories. The `recursive` option is used to remove directories
and their contents, making it a drop-in replacement for the `rimraf`
dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `rimraf` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
I broke this accidentally in PR 18089, sorry about that!
Note that since `#processItems` is private we can no longer just "replace" the method as was done in PR 18052.
The issue from #18003 hasn't been shown to be caused by PDF.js, but it
did surface that we don't have (direct) unit test coverage for the
`BaseException` class. This made it more difficult to prove that the
`stack` property was already available on exception instances, but more
importantly it caused the CI to be green even though the suggested
change would have caused the `stack` property to disappear.
To avoid future regressions, for e.g. similar changes or a rewrite from
a closure to a proper class, this commit introduces a dedicated unit
test for `BaseException` that asserts that our exception instances
indeed expose all expected properties.
The Puppeteer update should in particular be helpful for us because it
contains improved WebDriver BiDi compatibility, a newer Chrome version
(both might help for #17962) and an official deprecation of CDP for
Firefox. Note that the latter doesn't require changes on our end because
we already use WebDriver BiDi unconditionally for Firefox since commit
4db0174. The full release notes can be found at
https://github.com/puppeteer/puppeteer/releases/tag/puppeteer-core-v22.8.0.
When seleciting on a touch screen device, whenever the finger moves to a
blank area (so over `div.textLayer` directly rather than on a `<span>`),
the selection jumps to include all the text between the beginning of the
.textLayer and the selection side that is not being moved.
The existing selection flickering fix when using the mouse cannot be
trivially re-used on mobile, because when modifying a selection on
a touchscreen device Firefox will not emit any pointer event (and
Chrome will emit them inconsistently). Instead, we have to listen to the
'selectionchange' event.
The fix is different in Firefox and Chrome:
- on Firefox, we have to make sure that, when modifying the selection,
hovering on blank areas will hover on the .endOfContent element
rather than on the .textLayer element. This is done by adjusting the
z-indexes so that .endOfContent is above .textLayer.
- on Chrome, hovering on blank areas needs to trigger hovering on an
element that is either immediately after (or immediately before,
depending on which side of the selection the user is moving) the
currently selected text. This is done by moving the .endOfContent
element around between the correct `<span>`s in the text layer.
The new anti-flickering code is also used when selecting using a mouse:
the improvement in Firefox is only observable on multi-page selection,
while in Chrome it also affects selection within a single page.
After this commit, the `z-index`es inside .textLayer are as follows:
- .endOfContent has `z-index: 0`
- everything else has `z-index: 1`
- except for .markedContent, which have `z-index: 0`
and their contents have `z-index: 1`.
`.textLayer` has an explicit `z-index: 0` to introduce a new stacking context,
so that its contents are not drawn on top of `.annotationLayer`.
- Change all possible semi-private methods into properly private ones. Note that this code is old enough to predate standard classes.
- Move the `appendText` helper function into `TextLayerRenderTask`, as a private method, to avoid having to manually pass in the scope.
- Simplify `#layoutText` by directly passing in all necessary data. This is possible after the changes PR 18052.
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
- These changes will allow a simpler way of implementing PR 17770.
- The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).
- This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
- We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).
- Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
*Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
If a user manually calls `PDFPageView.prototype.update()` with a `drawingDelay`-option then it'll always be necessary to re-call the method *without* a delay afterwards, regardless of the `maxCanvasPixels`-value (e.g. even when CSS-only zooming is used).
This only happens when it's safe to do so. The exceptions are:
- when the class extends another subclass: removing the constructor would remove the error about the missing super() call
- when there are default parameters, that could have side effects
- when there are destructured prameters, that could have side effects
- The `stopAtErrors` API option, which is the inverse of the "internal" `ignoreErrors` option, is explicitly documented as applying to *parsing* (i.e. the worker-thread) while the `FontFaceObject` class is used during rendering (i.e. the main-thread); see b6765403a1/src/display/api.js (L164-L167)
- A glyph that fails in the `FontRendererFactory`, on the worker-thread, will already cause (overall) parsing to stop when `ignoreErrors === false` hence checking the option on the main-thread as well seems redundant; see b6765403a1/src/core/evaluator.js (L4527-L4533)
- Removing this option simplifies the code, and slightly reduces the number of options that we need to handle in the main-thread code.
This avoids having to add a couple of event listeners in the viewer, when debugging is enabled, and is consistent with the existing handling of `FontInspector` and `StepperManager` in the API.
This global was only introduced to work-around problems caused by the GENERIC PDF.js build using top level await. Since that was removed in the previous commit, this global is now dead code.
This limit is currently completely non-functional, since the check happens *after* the entire textLayer has been parsed and appended to the DOM. It seems that this has been *accidentally* broken ever since the introduction of `ReadableStream` support.
The reason that this hasn't caused noticeable textLayer-related performance issues in practice is probably because we nowadays manage to coalesce the textLayer into fewer overall DOM elements, whereas years ago many PDF documents ended up with one DOM element *per* glyph.
By moving this check, and thus restoring the functionality, we're also able to remove the `render` helper function and simplify the code.
The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.
Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.
Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".
Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
This commit replaces `waitForFunction` calls that use
`document.activeElement` to wait for an element to get focus by simpler
`waitForSelector` expressions that use the `:focus` selector. Note that
we already use this in other tests, so this improves consistency too.
This commit replaces a `waitForTimeout` occurrence with an equivalent
`waitForSelector` expression, and removes two other `waitForTimeout`
occurrences that are obsolete because we already wait for an observable
event to trigger or class change to happen.
Note that the other `waitForTimeout` occurrences in this file are either
part of #17931 or remain until we find a good way to ensure that nothing
happened (because currently there is nothing we can await there).
- Check that the `filename` is actually a string, before parsing it further.
- Use proper "shadowing" in the `filename` getter.
- Add a bit more validation of the data in `pickPlatformItem`.
- Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
It fixes issues #14982 and #14724.
The main problem of upscaling a canvas is that it can induces some pixelation
(see issue #14724). So this patch is just removing the limit and as a side
effect it fixes issue #14982.
As far as I can tell, in looking different profiles (especially some memory profile)
in using the Firefox profiler, I don't see any noticeable difference in term of
memory use.
and implement then in using some SVG filters and composition.
Composing in using destination-in in order to multiply RGB components by
the alpha from the mask isn't perfect: it'd be a way better to natively have
alpha masks support, it induces some small rounding errors and consequently
computed RGB are approximatively correct.
In term of performance, it's a real improvement, for example, the pdf in
issue #17779 is now rendered in few seconds.
There are still some room for improvement, but overall it should be a way
better.
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
We no longer need the helper method to *potentially* call itself once data is available, and can instead take full advantage of async/await by inlining the code.
Rather than having to maintain a dummy `IPDFLinkService` implementation, we can simply let it extend the primary `PDFLinkService` class instead.
Note how *some* methods already checked for an active PDF document, and those checks only needed to be used consistently throughout `PDFLinkService` in order for this to work.
Node.js 22 was just released, and it seems like it's not compatible
with the `canvas` package. This commit pins the version on GitHub
actions to Node.js 21 as a temporary workaround.
This commit should be reverted once
https://github.com/Automattic/node-canvas/issues/2377
is fixed.
This commit replaces all `waitForTimeout` occurrences with the
appropriate `waitForFunction` calls. Note that in some places they were
already present, so in those cases we could simply remove the
`waitForTimeout` call altogether.
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
Given that the `decode` method only returns the actual image-data, a user would now need to invoke `parseImageProperties` to obtain e.g. the width and height.
This method only accepts `BaseStream`-instances, which are (obviously) not exposed, hence we extend it in IMAGE_DECODERS builds to wrap TypedArray data into the expected format.
By using the `signal` option when invoking `addEventListener`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#signal), we're able to remove an arbitrary number of event listeners with (effectively) a single line of code.
Besides getting rid of a bunch of `removeEventListener`-calls, which means shorter code, we no longer need to manually keep track of event-handling functions.
By using the `signal` option when invoking `addEventListener`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#signal), we're able to remove an arbitrary number of event listeners with (effectively) a single line of code.
Besides getting rid of a bunch of `removeEventListener`-calls, which means shorter code, we no longer need to manually keep track of event-handling functions.
The timeout was introduced in commit 402e3fe with equal timeouts around
the helper function call. In commit 55e5af2 the timeouts around the
helper function call have been removed, and it looks like the helper
function itself was not updated purely due to an oversight.
The operations here should not require any timeouts because the promises
only resolve once the action is completed, which also explains why
removing the timeouts surrounding the helper function calls went without
any problems. It should therefore be safe to remove this timeout too.
This commit changes the `JpxImage.decode` method signature to define the
`ignoreColorSpace` argument as optional with a default value. Note that
we already set this default value in the `getBytes` method of the
`src/core/decode_stream.js` file since this option only seems useful for
certain special cases and therefore shouldn't be mandatory to provide.
Moreover, the JPX fuzzer is changed to use the new `JpxImage` API.
We should not wait for an arbitrary amount of time, which can easily
cause intermittent failures, but wait for a value change instead. Note
that this patch mirrors the approach we already use in other scripting
integration tests that also check for a value change; see e.g. the
"must check that a field has the correct formatted value" test.
This patch updates the minimum supported browsers as follows:
- Safari 16.4, which was released on 2023-03-27; see https://developer.apple.com/documentation/safari-release-notes/safari-16_4-release-notes
Nowadays we usually we try, where feasible and possible, to support browsers/environments that are about two years old. The reasons for limiting support to a slightly more recent Safari version include:
- Safari has always been slower, compared to other browsers, at implementing e.g. new JavaScript features.
- Trying to provide support for Safari is often difficult, and over the years we have seen *a lot* of bugs that are specific to Safari.
- Safari is, and has been for many years, only listed as "mostly" supported in the FAQ.
- This allows us to remove feature-testing, only relevant to Safari, from the main code-base.
By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the built-in Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js core contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
We should not wait for an arbitrary amount of time, which can easily
cause intermittent failures, but wait for a property value change
instead. Note that this patch mirrors the approach we already use in
other scripting integration tests that also check for a visibility
change; see e.g. the "must show a text field and then make in invisible
when content is removed" test.
In Node.js 14.14.0 the `fs.rmSync` function was added that removes files
and directories. The `recursive` option is used to remove directories
and their contents, making it a drop-in replacement for the `rimraf`
dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `rimraf` and reduce the
number of project dependencies in the test folder.
This commit also gets rid of the indirection via the `removeDirSync`
test helper function by simply calling `fs.rmSync` directly.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
In Node.js 10.12.0 the `recursive` option was added to `fs.mkdirSync`.
This option allows us to create a directory and all its parent
directories if they do not exist, making it a drop-in replacement for
the `mkdirp` dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `mkdirp` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
This allows us to get rid of the `@private` JSDoc comments, which were
used to convey intent back when proper private methods could not be used
yet in JavaScript. This improves code readability/maintenance and enables
better usage validation by tooling such as ESlint.
This commit implements the following improvements for the test:
- Replace the hardcoded highlight padding with a dynamic calculation. It
turns out that the amount of padding can differ per system, possibly
due to e.g. local (font) settings, which could cause the test to fail.
The test now no longer implicitly depends on the environment.
- Remove the page offset subtraction. It's only needed for one assertion,
and we can simply add the page offset there once.
- Improve the "magic" numbers in the test. The number 5 is removed now
that the padding calculation made it obsolete, the number 28 is changed
to 30 because that's the actual value in the PDF data and the number 4
is explained a bit more to state that the space also counts as a glyph.
- Annotate the steps and checks to improve readability of the test and
to explain why certain calculations have to be performed.
These CSS rules were imported straight from mozilla-central, and contains a property that causes the RTL-rule to be skipped everywhere else (even when using the development viewer in Firefox).
The `waitForTimeout` function should not be used anymore and only exists
for old usages that have to be rewritten, but there was nothing in place
to signal this. This commit therefore implements a linting rule, specific
to the integration tests, to make it clear that this function should no
longer be used. We exclude the old usages from it because we are already
tracking those in #17656 (so this patch is mostly to not make the scope
of that issue bigger).
It's recommended to always install dependencies locally in the project
folder because global dependencies can easily conflict with other
projects and, because they are not managed by the project, diverge from
versions defined in e.g. `package.json`. Previously we installed
`gulp-cli` globally because at the time we lacked a convenient mechanism
to use Gulp otherwise, but nowadays NPM provides the `npx` command for
that purpose and recommends using it over global installations (see
https://docs.npmjs.com/downloading-and-installing-packages-globally
and PR #17489 that provided the ground work for using it).
This commit therefore updates our GitHub Actions workflows to no longer
install `gulp-cli` globally but instead install it locally from the
already existing entries in `package.json` like all other dependencies
we use. Not only does this remove the special-casing for `gulp-cli`
which simplifies the workflow definitions, it also ensures that the
version ranges provided in `package.json` are respected. This makes the
local and workflow setups more similar, but is also relevant for the
upcoming upgrade to Gulp 5 which from a quick try is a bit involved and
having `package.json` be the single source of truth for the dependency
versions we use is therefore important.
The code was added in order to guess if an editor has been moved but
it's simpler to just set a variable when it's dragged or moved with
the keyboard. This way we remove a bit of asynchronicity.
The PDF specification states that empty dash arrays, i.e. arrays with
zero elements, are in fact valid. In that case the dash array simply
corresponds to a solid, unbroken line. However, this case was erroneously
being flagged as invalid and therefore the annotation was not drawn
because its width was set to zero. This commit fixes the issue by
allowing dash arrays to have a length of zero.
This allows us to get rid of the `@private` JSDoc comments, which were
used to convey intent back when proper private methods could not be used
yet in JavaScript. This improves code readability/maintenance and enables
better usage validation by tooling such as ESlint.
This commit removes the final callbacks in this code by switching to a
promises-based interface, overall simplifying the code. Moreover, we
document why we write to files on disk and modernize the code using e.g.
template strings.
The original `test.py` code, see
c2376e5cea/test/test.py,
did not have any timeout logic for TTX, but it got introduced when
`test.py` was ported from Python to JavaScript as `test.js` in
c2376e5cea (diff-a561630bb56b82342bc66697aee2ad96efddcbc9d150665abd6fb7ecb7c0ab2f).
However, I don't think we've ever actually seen TTX timing out.
Moreover, back then we used a very old version of TTX and ran the font
tests on the bots (where a hanging process would block other jobs and
would require a manual action to fix), so this code was most likely
only included defensively.
Fortunately, nowadays it should not be necessary anymore because we use
the most recent version of TTX (which either returns the result or
errors out, but isn't known to hang on inputs) and we run the font tests
on GitHub Actions which doesn't block other jobs anymore and also
automatically times the job out for us in the unlikely event that a hang
would ever occur.
In short, we can safely remove this logic to simplify the code and to get
rid of a callback.
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers
The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
- In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
- In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
- In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
- In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
- For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
- For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
- In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.
---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
With recent improvements to the `AppOptions`, e.g. with better validation and testing, one remaining "annoyance" is the `compatibilityParams` handling. Especially since there's only *a single* parameter left, limited to GENERIC builds.
To further reduce the amount of unnecessary code in e.g. the Firefox PDF Viewer, we can move the `compatibilityParams` handling into the user-options instead since that keeps the previous precedence order between default/user-options.
This patch updates the minimum supported browsers as follows:
- Google Chrome 98, which was released on 2022-02-01; see https://chromereleases.googleblog.com/2022/02/stable-channel-update-for-desktop.html
The primary reason for this version bump is `structuredClone` support, please see https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility
At this point in time we're using `structuredClone` in different parts of the code-base, and it's unfortunately functionality that's difficult to polyfill *completely* (affects the `transfer` option specifically).
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
- Use slightly shorter variable names when initializing the preferences.
- Correctly copy the "old" preference-values before writing to storage, since Objects are passed by reference in JavaScript. (Only applies to the GENERIC viewer.)
- Use `await` fully when writing new preference-values to storage. (Only applies to the GENERIC viewer.)
- Stub the `get`-method in the Firefox PDF Viewer, since it's unused there nowadays.
The original bug was because the parent was null when trying to show
an highlight annotation which led to an exception.
That led me to think about having some null/non-null parent when removing
an editor: it's a mess especially if a destroyed parent is still attached
to an editor. Consequently, this patch always sets the parent to null when
deleting the editor.
This old method is essentially just adding, a small amount of, unnecessary indirection and we can easily invoke `PDFViewerApplication.open` directly from the extension-specific code instead.
The following are some highlights of this patch:
- In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.
- In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.
- Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]
---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
Given that the Fetch API is supported since Node.js 18 we should be able to use it when downloading l10n files, which allows us to simplify the code and to make it fully `async`.
The fact that the highlight-thickness can only be changed in "free" mode isn't really obvious visually in the toolbar, so attempt to provide at least some indication of the `disabled`-state by "dimming" the slider.
In implementing caret browsing mode in pdf.js, I didn't notice that selectstart isn't always triggered.
So this patch removes the use of selectstart and rely only on selectionchange.
In order to simplify the selection management, the selection code is moved in the AnnotationUIManager:
- it simplifies the code;
- it allows to have only one listener for selectionchange instead of having one by visible page
for selectstart.
I had to add a delay in the integration tests for highlighting (there's a comment with an explanation),
it isn't really nice, but it's the only way I found and in real life there always is a delay between
press and release.
The function caretPositionFromPoint return the position within the last visible element
and sometimes there are some elements on top of the ones in the text layer.
So the idea is to hide the visible elements which aren't in the text layer in order
to get the right caret position.
*Please note:* This is a micro optimization, hence I fully understand if the patch is rejected.
Currently we create two temporary Arrays and have to iterate twice in total when building the final `hexNumbers` Array.
With this patch there's only one temporary Array and a single iteration required to build the final `hexNumbers` Array.
When highlighting, the annotation editor layer is disabled to get pointer events
from the text layer, but the annotation layer must be then disabled either in
order to avoid bad interactions.
Previously we'd simply export this directly from `web/app_options.js`, which meant that it'd be technically possible to *accidentally* modify the `compatibilityParams` Object when accessing it.
To avoid this we instead introduce a new `AppOptions`-method that is used to lookup data in `compatibilityParams`, which means that we no longer need to export this Object.
Based on these changes, it's now possible to simplify some existing code in `AppOptions` by taking full advantage of the nullish coalescing (`??`) operator.
Given that the "PREFERENCE" kind is used e.g. to generate the preference-list for the Firefox PDF Viewer, those options need to be carefully validated.
With this patch we'll now check this unconditionally in development mode, during testing, and when creating the preferences in the gulpfile.
Over time, as we've started relying more and more on import maps, the number of aliases have increased a lot. This is now affecting the size and readability of `createWebpackConfig`, which was already fairly large and complex, hence moving the aliases to their own function should help improve things a little bit.
As part of the changes in PR 17686 we "accidentally" enabled source-maps for the *minified* builds, which seems unnecessary since those have never been included in the `pdfjs-dist` output.
Locally this patch reduces the run-time of `gulp minified` by ~15 percent.
Rather than first building the library and then use Terser "manually" to minify the files, we can utilize a Webpack plugin to combine these steps which helps to simplify the gulpfile.
The `handler` method contained this code in two inline functions,
triggered via callbacks, which made the `handler` method big and harder
to read. Moreover, this code relied on variables from the outer scope,
which made it harder to reason about because the inputs and outputs
weren't easily visible.
This commit fixes the problems by extracting the request checking code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings. The logic is now
self-contained in a single method that can be read from top to bottom
without callbacks and with comments annotating each check/section.
- Run the minification in "parallel" since that should be a *tiny* bit more efficient.
- Don't rename the minified files since that seems unnecessary, especially considering that they are only used in the `dist-pre` target where we currently change the name back manually.
After the changes in PR 17637 there's no longer any reason to invoke `tweakWebpackOutput` without an argument, since the `__non_webpack_import__` re-writing was moved into the Babel plugin.
This way we can avoid a (little) bit of unnecessary parsing during building.
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
Note how we're using custom `__non_webpack_import__`-calls in the code-base, that we replace during the post-processing stage of the build, to be able to write `import`-calls that Webpack will leave alone during parsing.
This work-around is necessary since we let Babel discards all comments, given that we generally don't need/want them in the builds, hence why we cannot utilize `/* webpackIgnore: true */`-comments in the source-code.
After the changes in PR 17563 it thus seems to me that we should be able to just move this re-writing into the Babel plugin instead.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the directory listing code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the range file serving code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the file serving code into
a dedicated private method, and modernizing it to use e.g. `const`/`let`
instead of `var` and using template strings.
Currently the `web/app.js` file pulls in various build-specific dependencies, via the use of import maps, and those files in turn import from `web/app.js` thus creating undesirable import cycles.
To avoid this we instead pass in a `PDFViewerApplication`-reference, immediately after it's been created, to the relevant code.
Note that we use an ESLint plugin rule, see `import/no-cycle`, that is normally able to catch import cycles. However, in this case import maps are involved which is why this wasn't caught.
Looking at the *built* files you'll notice some lines containing nothing more than a semicolon. This is the result of (mostly top-level) `if`-statements, which include `PDFJSDev`-checks, that evalute to `false` during Babel parsing.
This has always annoyed me a bit, and looking at Babel plugin it seems that we can fix this simply by *removing* the relevant nodes.
This part of the (modern) preprocessor is now dead code, since we no longer use `require` statements anywhere in the main code-base.
Note that as part of the changes leading up to PDF.js version `4` we removed all[1] the remaining `require` statements, and we also have an ESLint rule to ensure that no new ones are accidentally added.
---
[1] With two small exceptions, in benchmarking-code and in the Webpack-example.
Given that we need to pass in a `PDFDataRangeTransport`-instance a number of the needed parameters can be obtained from it, rather than having to specify them manually.
This should be a *tiny* bit more efficient, since it avoids parsing substrings that we don't care about.
*Please note:* I cannot find an ESLint rule to enforce this automatically.
- Ensure that localization works in the GENERIC viewer, even if the necessary locale files cannot be loaded.
This was the behaviour prior to the introduction of Fluent, and it seems worthwhile to keep that (especially since we already bundle the en-US strings anyway).
- Let the `GenericL10n`-implementation use the *bundled* en-US strings directly when no language is provided.
- Remove the `NullL10n`-implementation, and simply fallback to `GenericL10n`, to reduce the maintenance burden of viewer-components localization.
- Indirectly, given the previous point, stop exporting `NullL10n` in the viewer-components since it's now removed.
Note that it was never really intended to be used directly and only existed as a fallback.
*Please note:* This doesn't affect the Firefox PDF Viewer, thanks to the use of import maps.
When an highlight is self-intersecting, the outline was drawn inside.
In order to remove it, we use a svg mask to exclude the shape inside
when drawing the outlines.
That leads to change the outline 1px,white-2px,blue-1px,white to a
2px,white-2px,blue: the part of the stroke which is inside the shape
is removed because of the mask.
All of our static evaluation & dead-code elimination transforms need to
happen in post-order, transforming inner nodes first. This is so that
in complex nested cases all transforms see the simplified version of
their inner nodes.
For example:
async getNimbusExperimentData() {
if (!PDFJSDev.test("GECKOVIEW")) { return null; }
// other code
}
-> [evaluation of PDFJSDev.*]
async getNimbusExperimentData() {
if (!false) { return null; }
// other code
}
-> [!false -> true]
async getNimbusExperimentData() {
if (true) { return null; }
// other code
}
-> [if (true) -> replace with the if branch]
async getNimbusExperimentData() {
return null;
// other code
}
-> [early return -> remove dead code]
async getNimbusExperimentData() {
return null;
// other code
}
This was done correctly in all cases except for our `UnaryExpression`
transform, which was happening in pre-order.
Having this parameter among a list of DOM-elements seems slightly strange now, however this is very old code hence the explanation for why this was done is for historical reasons (as is often the case).
Hence we can simply move this into `AppOptions` instead, which seems more appropriate overall.
Given that only the GENERIC viewer supports opening more than one PDF document, we can simplify things a tiny bit by instead generating the necessary DOM-element in JavaScript.
This unit-test is now failing in up to date versions of Node.js respectively Chromium-browsers, since `CompressionStream` no longer produces consistent data across all environments/browsers.
However logging the compressed TypedArray produced by `writeStream`, with Firefox respectively Chrome, and then feeding *both* of those TypedArray as input to `DecompressionStream` produced the same (correct) result in both browsers.
Hence the *exact* output of `CompressionStream` shouldn't matter, as long as we're able to successfully decompress it when the resulting PDF document is opened with the PDF.js library, and the unit-test is thus extended to check this.
Starting with Chrome 120.0.6099.109 (shipped with Puppeteer 21.8.0+) the
unit test fails in Chrome as well. The issue is tracked in #17399, but
for now we'll only run the unit test in Firefox so we can continue to
update Puppeteer while also still having a browser in which it runs,
until we figure out why the behavior of `CompressionStream` changed.
The `DefaultExternalServices` code, which is used to provide build-specific functionality, is very old. This results in a pattern where we first initialize `PDFViewerApplication.externalServices` and then *override* it for the different builds.
By converting `DefaultExternalServices` into a "regular" class, and leveraging import maps, we can directly initialize the correct instance depending on the build.
Given the simplicity of the `createPreferences` method, we can leverage import maps to directly initialize the correct `Preferences`-instance depending on the build.
Given the simplicity of the `createDownloadManager` method, we can leverage import maps to directly initialize the correct `DownloadManager`-instance depending on the build.
The latest mozilla-central update has test failures, because some CSS variables are not "properly" referenced; in particular:
- Give `--hcm-highlight-selected-filter` a default value, of `none`, similar to the previously existing HCM filter.
- Remove the `--mix-blend-mode` variable, since it's unused.
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
This commit converts the pdfjsdev-loader transform into a Babel plugin,
to skip a AST->string->AST round-trip.
Before this commit, the webpack build process was:
1. Babel parses the code
2. Babel transforms the AST
3. Babel generates the code
4. Acorn parses the code
5. pdfjsdev-loader transforms the AST
6. @javascript-obfuscator/escodegen generates the code
7. Webpack parses the file
8. Webpack concatenates the files
After this commit, it is reduced to:
1. Babel parses the code
2. Babel transforms the AST
3. babel-plugin-pdfjs-preprocessor transforms the AST
4. Babel generates the code
5. Webpack parses the file
6. Webpack concatenates the files
This change improves the build time by ~25% (tested on MacBook Air M2):
- `gulp lib`: 3.4s to 2.6s
- `gulp dist`: 36s to 29s
- `gulp generic`: 5.5s to 4.0s
- `gulp mozcentral`: 4.7s to 3.2s
The new Babel plugin doesn't support the `saveComments` option of
pdfjsdev-loader, and it just always discards comments. Even though
pdfjsdev-loader supported multiple values for that option, it was
effectively ignored due to `acorn` dropping comments by default.
This is in preparation for the next commit, which will convert
preprocessor2.mjs to a Babel plugin. The purpose of this commit
is to help git track the rename regardless of the large amount
of changes.
In the Gulpfile only the exit codes of `test.mjs` child processes
erroneously aren't checked. This causes failures in `test.mjs` to be
logged but not propagated to the master process, which in turn causes
test runners such as GitHub Actions to succeed because they only
monitor the master process. This is easy to reproduce by throwing an
error at the top of `test.mjs` and running `gulp makeref` or `gulp
unittest`: the error is logged, but the task that spawned the child
process succeeds and the master process exits with exit code 0. This is
problematic because it can easily cause errors to go by unnoticed.
This commit fixes the issue by making sure that the `test.mjs`
invocations are handled in the same way as the other child processes
in the file, i.e., if the child process exits with a non-zero exit code
then the master process also exits with a non-zero exit code. After this
patch the error is still logged, but the task now also fails and the
master process exits with exit code 1 to properly signal failure.
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.
Please see https://eslint.org/docs/latest/rules/arrow-body-style
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.
For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
The `if` statement is no longer necessary because the Node.js versions
that didn't provide `dns.setDefaultResultOrder` are no longer supported,
but looking into this a bit more it turns out that the entire workaround
is no longer necessary because the issue got fixed in Firefox 105 in bug
1769994. Indeed, Firefox now starts nicely with the workaround removed.
Reverts 60ed3cd297c4045b90f4114a74e5baa4ef1c5056.
In order to do that we must change the text layer opacity to 1 but
it has several implications:
- the selection color must have an alpha component,
- the background color of the span used for highlighted words
must have an alpha component either, but now the opacity is 1
we can use some backdrop-filters in HCM making the highlighted
words more visible.
- fix a regression caused by #17196: the css variable --hcm-highlight-filter
has to live under the #viewer element because in HCM it's overwritten
by js at this level, hence links annotations for example didn't
have the right colors when hovered.
The free highlighting is enabled when the mouse pointer isn't on some text.
Then we draw a shape with smoothed borders corresponding to the movement of
the mouse.
Printing/saving and changing the thickness will come later.
and try to load the font family (guessed from the font name) before trying
the local substitution.
The local(...) command expects to have a real font name and not a predefined
substitution it's why we try the font family.
By always removing the "visibilitychange" listener in the `PDFViewer.#onePageRenderedOrForceFetch`-method we can (ever so slightly) reduce duplication in the code.
Ensure that users cannot provide incorrect values when trying to set the global worker-options.
This patch was prompted by occasionally seeing users manually loading the `pdf.worker.mjs`-file and then assigning it to the `workerSrc`-option, something that obviously doesn't make sense and will cause fake-workers to be used (with poor performance as a result).
When the text of an annotation is extracted in using getTextContent, consecutive white spaces
are just replaced by one space and. So this patch add an option to make sure that white
spaces are preserved when appearance is parsed.
For the case where there's no appearance, we can have a fast path to get the correct string
from the Content entry.
When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones
to make to see all of them on screen.
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/secondary_toolbar.js` file.
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/toolbar.js` file.
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
This commit changes the code to use a template string and to use `const`
instead of `var`. Combined with the previous commits this allows for
enabling the ESLint `no-var` rule for this file now.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- uses `Array.includes` for shorter response code checking code.
This test intermittently fails, likely because the auto-print is triggered fast enough
that we don't manage to get it.
So this patch aims to try to set a listener very early in order to be sure that
we'll be aware that a print has been triggered.
It seems this unit-test now fails consistently in "all" up-to-date Node.js versions. We should probably try and understand why, but for now just disable it to get passing CI tests.
There's obviously a few things wrong with the Annotations in the referenced PDF document, however parsing of an Annotation shouldn't just break if the /BS-entry isn't a dictionary.
When opening a pdf from the secondary toolbar, a second color picker is added.
So in order to avoid that, we just stop listening for annotationeditoruimanager
in the toolbar.
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
When a pdf as a FreeText without appearance, we use a fake font in order to render it
and that leads to create few new refs for the font.
But then when we're saving, we create some new refs which start at the same number
as the previous created ones.
Consequently, when saving we're using some wrong objects (like a font) to check if
we're able to render the newly added FreeText.
In order to fix this bug, we just remove the persistent refs (which are only used
when rendering/printing) during the saving.
Having just tested PR 17337 locally I noticed that especially the `JpxImage`-test causes a "ridiculous" amount of warning messages to be printed, which doesn't seem helpful.
Given that only actual `Error`s should be relevant here, we can easily disable this logging during the tests.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
This unfortunately broke in PR 17060, since I had completely forgotten about https://bugzilla.mozilla.org/show_bug.cgi?id=1632644#c5 when writing that patch.
The easiest solution, while slightly unfortunate, seems to be to add a couple of non-standard hash parameters specifically for the PDF attachment use-case in the Firefox PDF Viewer. (Note that we cannot use "nameddest" here, since we also need to support the stringified destination-Array case.)
It seems this unit-test started failing in Node.js version 20.10 as well. We should probably try and understand why, but for now just disable it to get passing CI tests.
Given that this event listener is only used to trigger rendering after the sidebar has been opened/closed, we can utilize the existing one in the `PDFSidebar` class for this purpose instead. That one is registered on the sidebar DOM-element, and is needed to remove a CSS-class indicating that the sidebar is moving.
It fixes few errors in the CSS for HCM.
It now complies to the specs from UI/UX.
Only the foreground must change in HCM and not the background, similarly to what
we had for the alt-text button before moving it.
After the two previous commits, which removed the remaining call-sites, this method is no longer used and can thus be removed.
As mentioned in the JSDocs for the now removed method, synchronous communication between the viewer and the platform code isn't really a good idea.
Once this patch has landed in mozilla-central some additional clean-up of the platform code will also be possible.
The return value is not, nor has it ever been, used for anything and we should thus be able to just send the message.
Note that the responses are already handled by the "message" event listener registered above.
This commit fixes the JSDoc comment for the `annotationEditorMode` setter.
The types tests fail on that now because the input value was changed from
a number to an object with various properties in recent patches, but the
JSDoc comment was not updated accordingly.
Moreover, the types tests also fail because TypeScript 5.3 assumes that
getters and setters have equal return and input value types, which is
arguably also what one would expect, but our `annotationEditorMode`
getter and setter deviate from that because the getter returns a number
while the setter accepts an object. Given that it seems more important
to document the setter entirely, including the meaning and types of its
properties, and the type of the getter can easily be inferred from this
comment and the other JSDoc comments that have `annotationEditorMode` in
it, we remove the getter type to make the types tests pass again.
- Extend the `fetchData` helper function to also support fetching of "blob" data.
- Use the `fetchData` helper function more in the code-base, when fetching non-PDF data. Given that the Fetch API isn't supported for all protocols, this should improve compatibility for the PDF.js library.
Currently the SVG images for the loading-icons exist in two versions, for the light- respectively dark-theme, which nowadays are the only "duplicated" icons left.
The reason for this is that these icons are being used in `input`-elements, where the regular `mask-image` approach used for all buttons don't work.
To address this we add containers for the `input`-elements, such that we have a "regular" DOM-element where we can use `mask-image`.
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
This is consistent with the implementation used in the (now removed) webL10n-library, and by only using lowercase language-codes internally in the `L10n`-implementations we should avoid future issues e.g. when users manually set the `locale`-option (in the default viewer).
Some fields, somewhere under the Fields entry in Acroform, could have no name (in T)
but with a parent which has a name but which isn't somewhere under Fields.
As a side-effect, this patch prevents infinite loops because of potential cycles
under Fields.
This commit migrates the font tests away from the bots. Not only are the
font tests broken on the Windows bot since some time, they also run on
Python 2 (end of life since January 2020) and `ttx` 3.19.0 (released in
November 2017). The latter is installed via a submodule, which requires
more complicated logic for finding and running `ttx`.
We solve the issues by implementing a modern workflow that installs the
most recent stable Python and `ttx` (`fonttools` package) versions. This
simplifies the `ttx` driver code as well because it can now assume `ttx`
is available on the path (just like we do for e.g. `node` invocations).
GitHub Actions takes care of creating a virtual environment with
`fonttools` in it so that the `ttx` entrypoint is available. Locally
the font tests can be run in a similar way by creating and sourcing a
virtual environment with `fonttools` in it before running the font
tests, and a README file is included with instructions for doing so.
This commit prepares for running the font tests on GitHub Actions where
we can't spin up headful browsers because there are no display
capabilities on the workers. This will also be useful for porting other
test targets to GitHub Actions at a later time, as well as running the
tests locally in headless mode.
This commit prepares for the introduction of extra options in later
commits by changing the function signatures of the `startBrowser(s)`
functions to take parameter objects instead of plain parameters. This
makes the call sites explicitly state which parameters they pass,
improving overall readability as well.
The current logic is more complicated than it needs to be because it's
passing a callback function to `startBrowsers` instead of a string.
This commit simplifies the logic by passing the base URL as a string to
`startBrowsers` and having it do further augmentation internally,
thereby removing all indirection of the function calls to `makeTestUrl`
and the inner function it returned.
After recent PRs the size and scope of the CI workflow is now reduced, and this patch tries to simplify things further. More specifically we can directly specify the gulp-tasks in the workflow, and thus clean-up the `gulpfile` a tiny bit.
Note that this will technically be slower, since the tests are now run in series (rather than in parallel), however `gulp externaltest` runs so quickly that it really won't matter in practice.
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
Currently the background-color of the `editorParamsToolbar`s don't match that of the arrow, which is especially noticable in dark mode (see zoomed-in screen-shots below).
The simplest solution seem to be to just style the `editorParamsToolbar`s like the `secondaryToolbar`, to limit the amount of CSS changes required.
This should *hopefully* fix 17228, by tweaking the build scripts to give the GENERIC viewer something to await to avoid breaking third-party users of the standalone viewer components.
This button is *only* used in the GENERIC viewer, and will currently be visible either in the main or secondary toolbars (depending on the viewer width).
To simplify upcoming changes, and to avoid then having to complicate the relevant CSS rules unnecessarily, let's place the "Open file"-button permanently in the secondary toolbar instead.
(Note that the GENERIC viewer also, since five years, supports drag-and-drop in order to open local files.)
The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist.
The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.
The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break.
To address this we change all data-lookups to be asynchronous instead.
Mozilla takes the security of our software seriously. If you believe you have found a security vulnerability in PDF.js, please report it to us as described below.
## Reporting security vulnerabilities
**Please don't report security vulnerabilities through public GitHub issues.**
Instead, please report security vulnerabilities in [Bugzilla](https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&component=PDF%20Viewer&groups=firefox-core-security) and make sure that the checkbox in the "Security" section is checked so the required access controls are automatically configured:
The Mozilla security team will process the bug as described in [Mozilla's security bugs policy](https://www.mozilla.org/en-US/about/governance/policies/security-group/bugs).
Feel free to stop by our [Matrix room](https://chat.mozilla.org/#/room/#pdfjs:mozilla.org) for questions or guidance.
@ -40,7 +40,7 @@ PDF.js is built into version 19+ of Firefox.
+ The official extension for Chrome can be installed from the [Chrome Web Store](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm).
*This extension is maintained by [@Rob--W](https://github.com/Rob--W).*
+ Build Your Own - Get the code as explained below and issue `gulp chromium`. Then open
+ Build Your Own - Get the code as explained below and issue `npx gulp chromium`. Then open
Chrome, go to `Tools > Extension` and load the (unpackaged) extension from the
directory `build/chromium`.
@ -52,19 +52,15 @@ To get a local copy of the current code, clone it using git:
$ cd pdf.js
Next, install Node.js via the [official package](https://nodejs.org) or via
[nvm](https://github.com/creationix/nvm). You need to install the gulp package
globally (see also [gulp's getting started](https://github.com/gulpjs/gulp/tree/master/docs/getting-started)):
$ npm install -g gulp-cli
If everything worked out, install all dependencies for PDF.js:
[nvm](https://github.com/creationix/nvm). If everything worked out, install
all dependencies for PDF.js:
$ npm install
Finally, you need to start a local web server as some browsers do not allow opening
PDF files using a `file://` URL. Run:
$ gulp server
$ npx gulp server
and then you can open:
@ -81,11 +77,11 @@ It is also possible to view all test PDF files on the right side by opening:
In order to bundle all `src/` files into two production scripts and build the generic
viewer, run:
$ gulp generic
$ npx gulp generic
If you need to support older browsers, run:
$ gulp generic-legacy
$ npx gulp generic-legacy
This will generate `pdf.js` and `pdf.worker.js` in the `build/generic/build/` directory (respectively `build/generic-legacy/build/`).
Both scripts are needed but only `pdf.js` needs to be included since `pdf.worker.js` will
@ -94,7 +90,7 @@ be loaded by `pdf.js`. The PDF.js files are large and should be minified for pro
## Using PDF.js in a web application
To use PDF.js in a web application you can choose to use a pre-built version of the library
or to build it from source. We supply pre-built versions for usage with NPM and Bower under
or to build it from source. We supply pre-built versions for usage with NPM under
the `pdfjs-dist` name. For more information and examples please refer to the
[wiki page](https://github.com/mozilla/pdf.js/wiki/Setup-pdf.js-in-a-website) on this subject.
@ -111,7 +107,7 @@ You can play with the PDF.js API directly from your browser using the live demos
More examples can be found in the [examples folder](https://github.com/mozilla/pdf.js/tree/master/examples/). Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via `gulp dist-install` command.
More examples can be found in the [examples folder](https://github.com/mozilla/pdf.js/tree/master/examples/). Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via `npx gulp dist-install` command.
For an introduction to the PDF.js code, check out the presentation by our
We're currently working on <ahref="draft/index.html">better API docs</a>, but the API is well documented in [api.js](https://github.com/mozilla/pdf.js/blob/master/src/display/api.js).
The generated API documentation, from the inline comments in [api.js](https://github.com/mozilla/pdf.js/blob/master/src/display/api.js), is available below.
<iframesrc="draft/index.html"title="PDF.js API documentation"></iframe>
└── package.json - package definition and dependencies
```
## Trying the Viewer
With the prebuilt or source version, open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `gulp server`.
With the prebuilt or source version, open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `npx gulp server`.
"description":"The theme to use.\n0 = Use system theme.\n1 = Light theme.\n2 = Dark theme.",
"type":"integer",
"enum":[0,1,2],
"default":2
},
"showPreviousViewOnLoad":{
"description":"DEPRECATED. Set viewOnLoad to 1 to disable showing the last page/position on load.",
"type":"boolean",
@ -10,11 +17,7 @@
"title":"View position on load",
"description":"The position in the document upon load.\n -1 = Default (uses OpenAction if available, otherwise equal to `viewOnLoad = 0`).\n 0 = The last viewed page/position.\n 1 = The initial page/position.",
"type":"integer",
"enum":[
-1,
0,
1
],
"enum":[-1,0,1],
"default":0
},
"defaultZoomDelay":{
@ -34,13 +37,7 @@
"title":"Sidebar state on load",
"description":"Controls the state of the sidebar upon load.\n -1 = Default (uses PageMode if available, otherwise the last position if available/enabled).\n 0 = Do not show sidebar.\n 1 = Show thumbnails in sidebar.\n 2 = Show document outline in sidebar.\n 3 = Show attachments in sidebar.",
"type":"integer",
"enum":[
-1,
0,
1,
2,
3
],
"enum":[-1,0,1,2,3],
"default":-1
},
"enableHandToolOnLoad":{
@ -48,14 +45,49 @@
"type":"boolean",
"default":false
},
"enableHWA":{
"title":"Enable hardware acceleration",
"description":"Whether to enable hardware acceleration.",
"type":"boolean",
"default":true
},
"enableAltText":{
"type":"boolean",
"default":false
},
"enableGuessAltText":{
"type":"boolean",
"default":true
},
"enableAltTextModelDownload":{
"type":"boolean",
"default":true
},
"enableNewAltTextWhenAddingImage":{
"type":"boolean",
"default":true
},
"altTextLearnMoreUrl":{
"type":"string",
"default":""
},
"commentLearnMoreUrl":{
"type":"string",
"default":""
},
"enableSignatureEditor":{
"type":"boolean",
"default":false
},
"enableUpdatedAddImage":{
"type":"boolean",
"default":false
},
"cursorToolOnLoad":{
"title":"Cursor tool on load",
"description":"The cursor tool that is enabled upon load.\n 0 = Text selection tool.\n 1 = Hand tool.",
"type":"integer",
"enum":[
0,
1
],
"enum":[0,1],
"default":0
},
"pdfBugEnabled":{
@ -70,6 +102,14 @@
"description":"Whether to allow execution of active content (JavaScript) by PDF files.",
"description":"Whether to disable range requests (not recommended).",
@ -101,23 +141,14 @@
"title":"Text layer mode",
"description":"Controls if the text layer is enabled, and the selection mode that is used.\n 0 = Disabled.\n 1 = Enabled.",
"type":"integer",
"enum":[
0,
1
],
"enum":[0,1],
"default":1
},
"externalLinkTarget":{
"title":"External links target window",
"description":"Controls how external links will be opened.\n 0 = default.\n 1 = replaces current window.\n 2 = new window/tab.\n 3 = parent.\n 4 = in top window.",
"type":"integer",
"enum":[
0,
1,
2,
3,
4
],
"enum":[0,1,2,3,4],
"default":0
},
"disablePageLabels":{
@ -137,24 +168,18 @@
},
"annotationMode":{
"type":"integer",
"enum":[
0,
1,
2,
3
],
"enum":[0,1,2,3],
"default":2
},
"annotationEditorMode":{
"type":"integer",
"enum":[
-1,
0,
3,
15
],
"enum":[-1,0,3,15],
"default":0
},
"capCanvasAreaFactor":{
"type":"integer",
"default":200
},
"enablePermissions":{
"type":"boolean",
"default":false
@ -183,25 +208,14 @@
"title":"Scroll mode on load",
"description":"Controls how the viewer scrolls upon load.\n -1 = Default (uses the last position if available/enabled).\n 3 = Page scrolling.\n 0 = Vertical scrolling.\n 1 = Horizontal scrolling.\n 2 = Wrapped scrolling.",
"type":"integer",
"enum":[
-1,
0,
1,
2,
3
],
"enum":[-1,0,1,2,3],
"default":-1
},
"spreadModeOnLoad":{
"title":"Spread mode on load",
"description":"Whether the viewer should join pages into spreads upon load.\n -1 = Default (uses the last position if available/enabled).\n 0 = No spreads.\n 1 = Odd spreads.\n 2 = Even spreads.",
"type":"integer",
"enum":[
-1,
0,
1,
2
],
"enum":[-1,0,1,2],
"default":-1
},
"forcePageColors":{
@ -218,6 +232,21 @@
"description":"The color is a string as defined in CSS. Its goal is to help improve readability in high contrast mode",
"type":"string",
"default":"CanvasText"
},
"enableAutoLinking":{
"description":"Enable creation of hyperlinks from text that look like URLs.",
"type":"boolean",
"default":true
},
"enableComment":{
"description":"Enable creation of comment annotations.",
"type":"boolean",
"default":false
},
"enableOptimizedPartialRendering":{
"description":"Enable tracking of PDF operations to optimize partial rendering.",
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.