This commit updates the release pipeline to use OIDC trusted publishing
now that we have configured it between GitHub Actions and NPM. This
solution allows us to remove the token variable (because there is no
longer a fixed token) and provenance flag (because provenance
attestations are generated by default with this approach); refer to
https://docs.npmjs.com/trusted-publishers for more information.
For now its use is limited to the comment sidebar but it'll be used for the new one
containing the thumbnails.
And make the sidebar more accessible with the keyboard or a screen reader.
This dependency got introduced during the move to Fluent for
localization (bug 1858715), but it wasn't added to `package.json`
at the time. This worked because other dependencies already
installed it, but we shouldn't rely on that, so this commit
explicitly includes it in `package.json` instead.
Fixes 66982a2a.
The linter found some issues in viewer.html with </input> which isn't required
and a missing closing div in test/resources/reftest-analyzer.html.
The HTML can now be nicely formatted. In order to not break the build for
mozilla-central, the preprocessor has been fixed in order to take into account
the white spaces at the beginning of a comment line.
And finally, make .prettierrc (which is supposed to be either json or yaml)
itself lintable.
Doing so has a number of advantages:
- it removes code duplication, thereby improving readability;
- it removes hardcoded editor IDs, by using the `getNextEditorId` helper
function that was previously introduced for the highlight editor
integration tests, thereby improving readability and reusability;
- it removes potential for intermittent failures by not proceeding until
the freetext editor is fully created and all assertions pass, which
didn't happen consistently before because the code wasn't centralized.
For now it's just possible to create a single pdf in selecting some pages in different pdf sources.
The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now:
- the struct trees
- the page labels
- the outlines
- named destinations
For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other
one (encrypted) which is just rewritten.
The ref images are generated from the original pdfs in selecting the page we want and the new images are
taken from the generated pdfs.
Metric-compatible font with Times New Roman created by ParaType, based on
their serif font PT Serif, released under OFL-1.1 license.
https://www.paratype.com/fonts/pt/pt-astra-serif
Signed-off-by: Coelacanthus <uwu@coelacanthus.name>
It'll help to make math equations "visible" for screen readers.
MS Office has a specific way to add some MathML code to struc tree leaf
and this patch handles it.
Move current pointer field of DrawingEditor to CurrentPointer class in tools.js: The pointer types fields have been moved to a CurrentPointer object in tools.js. This object is used by eraser.js and ink.js.
Only reset pointer type when user select a new mode: Clear the pointer type when changing mode, instead of at the end of the session. It seems more stable, as the method is not called this way when the user changes pages. Also, clear the pointer type when the mode is changed by an event (the user changes the editor type), otherwise, the same pointer type is kept (the document is changed for example)
On GitHub Actions this test could fail with `Expected 1.3125 to be less
than 1` due to slight differences in the decimals of the `rect.y` value.
However, for this check we're only really interested in full pixels, so
this commit rounds the value up to the next integer to not be affected
by small decimal differences so the test passes on GitHub Actions too.
For the integration tests we prefer non-linked test cases because those
PDF files are directly checked into the Git repository and thus don't
need a separate download step that linked test cases do.
However, for the freetext and ink integration tests we currently use
`aboutstacks.pdf` which is a linked test case. Fortunately we don't need
to use it because for most tests we don't actually use any properties of
it: we only create editors on top of the canvas, but for that any PDF
file works, so we can simply use the non-linked `empty.pdf` file instead.
The only exception is the "aria-owns" test that needs a line of text from
the PDF file, so we move that particular test to a dedicated `describe`
block and adapt it to use the non-linked `attachment.pdf` file that just
contains a single line of text that can be used for this purpose.
The changes combined make 12 more integration tests run out-of-the-box
after a Git clone, which also simplifies running on GitHub Actions.
We used a SVG string which can be pass to the Path2D ctor but it's a bit slower than
building the path step by step.
Having numerical data instead of a string will help the font data serialization.
The position of the text rendered by `showText` is affected
incrementally by the preceding `showText` operations "on the same line".
For this reason, we keep track of all of them (with their dependencies)
in `sameLineText`.
`sameLineText` can be reset whenever we explicitly position the text
somewhere else. We were previously only doing it for `moveText`, and
this patch updates the code to also do it on `setTextMatrix` (which
resets `this.current.x/y` in `CanvasGraphics` to 0).
The complexity of subsequent `sameLineText` operations dependency
tracking grows quadratically with the number of operations on the same
line, so this patch fixes the performance problem when there are whole
pages of text that only use `setTextMatrix` and not `moveText`.
Python 3.14 is the current stable version, released on October 7th. The
dependencies we use also support Python 3.14 now, most importantly
`fonttools` for which the OS-specific builds have been published (see
the `cp314` wheels on https://pypi.org/project/fonttools/#files).
Follow up on https://github.com/mozilla/pdf.js/pull/20197,
This serializes pattern data into an ArrayBuffer which is
then transferred from the worker to the main thread.
It sets up the stage for us to eventually switch to a
SharedArrayBuffer in the future.
Instead of looking at every bbox, we use a grid (64x64) where each cell of the grid is associated with the bboxes
touching it.
In order to get the potential bboxes containing a point, we just have to compute the number of the cell containing
it and in using the associated described above, we can quickly know if the point is contained.
With the pdf in the mentioned bug, it's ~20 times faster.
The `width` and `height` arguments for `setDims` have been removed in
PR #20285, but for two calls in the highlight code they remained. This
commit removes them as they are no longer used in the method itself.
Fixes 0722faa9.
But keep a lower quality when enableOptimizedPartialRendering is true because we need to compensate the time
used to compute the bboxes and since subsequent rendering are faster it's more acceptable to see
a lower quality image for few tenths of seconds.
[Annotation] In reading mode with new commment stuff enabled, use the comment popup for annotations without a popup but with some contents (bug 1991029)
This PR serializes font data into an ArrayBuffer
that is then transfered from the worker to the
main thread. It's more efficient than the current
solution which clones the "export data" object
which includes the font data as a Uint8Array.
It prepares us to switch to a SharedArrayBuffer
in the future, which would allow us to share
the font data with multiple agents, which would be
crucial for the upcoming "renderer" worker.
Change info to use console.info and warn function
to use console.warn, this not only makes sense semantically
but also in practice server side runtimes like deno
write console.log to stdout, and console.warn
to stderr (info goes to stdout, unfortunately?)
this is important because logging to stdout
can break some cli apps.
Before this patch, when `enableOptimizedPartialRendering`
is enabled we would record the bounding boxes of the
various operations on the first render.
This patches change it to happen on the first render that we
know will also need a detail view, so that the performance
cost is not paid for the case when the detail view is not used.
The size of the canvas has significant impact on the rendering
performance. If we are going to render a high-res detail
view on top of the full-page canvas, we can further
reduce the full-page canvas resolution to improve
rendering time without affecting the resolution seen by
the user.
Users will se the lower resolution when quickly scrolling around the
page, but it will then be replaced with the high-res
detail view.
It can happen with a pdf having a large text layer.
Instead of waiting for the first rendered page to enable the buttons we wait for
a rendered annotation editor layer.
In order to see the issue this patch is fixing:
- open a pdf with some highlights and a comment on page 1, at page 7
- open the comment sidebar
- click on the comment on page 1
Opening at page 7 lets a not fully rendered page which means that when jumping to it
with the sidebar, we re-use what we've instead of redrawing it.
This PR changes the way we store bounding boxes so that they use less
memory and can be more easily shared across threads in the future.
Instead of storing the bounding box and list of dependencies for each
operation that renders _something_, we now only store the bounding box
of _every_ operation and no dependencies list. The bounding box of
each operation covers the bounding box of all the operations affected
by it that render something. For example, the bounding box of a
`setFont` operation will be the bounding box of all the `showText`
operations that use that font.
This affects the debugging experience in pdfBug, since now the bounding
box of an operation may be larger than what it renders itself. To help
with this, now when hovering on an operation we also highlight (in red)
all its dependents. We highlight with white stripes operations that do
not affect any part of the page (i.e. with an empty bbox).
To save memory, we now save bounding box x/y coordinates as uint8
rather than float64. This effectively gives us a 256x256 uniform grid
that covers the page, which is high enough resolution for the usecase.
Bug 1978027 has been fixed upstream 10 days ago, so this integration
test can be enabled for Firefox too now that it passed with recent
Nightly versions.
Before the introduction of the `renderRichText` helper function we
exclusively used `this.#html` for XFA rich text and exclusively used
`this.#contentsObj` for plain text. However, after the refactoring we
tried to access `this.#contentsObj.dir` in both cases, which fails for
XFA rich text because `this.#contentsObj` is `null` in that case.
This commit fixes the issue by using optional chaining to make sure we
don't try to access non-existent `this.#contentsObj` properties, which
makes the `must update an existing annotation and show the right popup`
freetext integration pass again.
Fixes#20237.
Fixes 35c90984.
Most places have a newline before/after `before{Each,All}`,
`after{Each,All}` and `it` to visually separate the blocks for clarity,
but in a handful of places this wasn't done. This commit removes the
inconsistencies so that the test code is formatted consistently.
The helper function was used in a number of places, but also a lot of
places contained the annotation selector string inline. This commit
makes sure that all places use `getAnnotationSelector` consistently to
make sure the annotation selector string is only defined in a single
place and to improve readability of the test code.
This test called `closeSinglePage` manually at the end of the test,
which is inconsistent with all other tests that call `closePages` in an
`afterEach` block. This commit fixes the difference for consistency.
If the document changes the comment state from the old document should
be replaced with that of the new document. To do this the comment
manager is destroyed, but the corresponding comment sidebar wasn't
destroyed yet, which resulted in the comment state from the old document
still being visible for the new document.
This commit fixes the issue by hiding the comment sidebar if the comment
manager is destroyed. Note that hiding the comment sidebar effectively
destroys all its state, and we already set the annotation mode to "none"
on document change so we don't want to keep showing the comment sidebar
anyway.
This function is required for the commenting feature: the user will be able to click
on a comment in the sidebar and the document will be scrolled accordingly.
Locally, on Arch Linux, this integration test permafails:
```
1) Text layer Text selection using selection carets doesn't jump when moving selection
Message:
second selection:
Expected '(frequently executed) bytecode sequences, records
them, and compiles them to fast native code. We call such a s' to roughly match /frequently .* We call such a se/s.
Stack:
at <Jasmine>
at UserContext.<anonymous> (file:///home/timvandermeij/Documenten/Ontwikkeling/pdf.js/Code/test/integration/text_layer_spec.mjs:521:12)
Message:
third selection:
Expected '(frequently executed) bytecode sequences, records
them, and compiles them to fast native code. We call such a s' to roughly match /frequently .* We call such a se/s.
Stack:
at <Jasmine>
at UserContext.<anonymous> (file:///home/timvandermeij/Documenten/Ontwikkeling/pdf.js/Code/test/integration/text_layer_spec.mjs:529:12
```
The exact selection can differ a bit per OS/browser. In this case the
last character was consistently not selected while on other platforms it
is, so this commit fixes the issue by relaxing the regex to not consider
the final character so that the test passes if the rest matches.
This removes the following error from the integration test logs:
```
JavaScript error: http://127.0.0.1:59283/build/generic/build/pdf.mjs, line 3879:
TypeError: EventTarget.addEventListener: 'signal' member of AddEventListenerOptions is not an object.
```
Fixes 636ff50.
Extends 63651885.
This commit is a first step towards #6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.
This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.
Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.
This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.
All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.
The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
We don't need AI/ML features in the tests, so this should reduce CPU
usage by not having the inference process running. Moreover, it prevents
the following lines from being logged in the test output:
```
JavaScript error: resource://gre/actors/MLEngineParent.sys.mjs, line 509: Error: Unable to get the ML engine from Remote Settings.
JavaScript error: resource://gre/actors/MLEngineParent.sys.mjs, line 1279: TypeError: can't access property "postMessage", this[#port] is null
```
Implement a delay for Chrome protocol calls in the integration tests, and skip the "must check that an existing highlight is ignored on hovering" integration test on Windows
This is a temporary measure to reduce noise until #20136 is fixed. Note
that this shouldn't be an issue in terms of coverage because we still
run the test on Linux.
In Chrome protocol calls are faster than in Firefox and thus trigger in
quicker succession. This can cause intermittent failures because new
protocol calls can run before events triggered by the previous protocol
calls had a chance to be processed (essentially causing events to get
lost).
This commit fixes the issue by configuring Chrome with a protocol call
delay value that gives it a more similar execution speed as Firefox
(which also gives us more consistency between the two browser runs).
Note that this doesn't negatively impact the overall runtime of the
integration tests because Puppeteer already waits for a test to complete
in both browsers before continuing to the next one and Chrome
consistently was, and with this patch still slightly is, faster in
completing the tests.
It should fix the error:
```
JavaScript error: http://127.0.0.1:43303/build/generic/build/pdf.mjs, line 1445:
TypeError: EventTarget.addEventListener: 'signal' member of AddEventListenerOptions is not an object.
```
we've when running integration tests on the Linux bot.
In #20016, the `canvasContext` property of `RenderParameters` was deprecated in favor of the new `canvas` property.
The JSDoc was updated to include the new parameter along with the old one.
I think the old one should be enclosed in `[]` to mark it as optional (and to allow usage from TypeScript with just the `canvas` parameter provided).
I also reordered the properties so that all required properties come first, follow by optional ones.
The fixed -400px horizontal offset used by
scrollIntoView led to horizontal scroll only moving
part-way right on narrow screens. The highlights near
the right-edge remained party or completely off
screen.
This centres the highlighted match on any viewport width while
clamping the left margin to 20-400px. On very narrow screens
the scrollbar now moves all the way to the right instead of
stopping midway.
It's useful for users highlighting with NVDA.
They've to enable native selection and then selection some text.
In this case only a selectionchange is triggered once the selection is done.
Typing in the text field causes a sandbox event to trigger, which we
should await to avoid continuing to the next part of the test before
the sandbox event is fully processed.
Ensure that negative “LW” entries in an ExtGState dictionary are converted to their absolute values when the “gs” operator is processed.
See PR https://github.com/mozilla/pdf.js/pull/19639
It takes some time for the viewer alert to be updated after the editor
is committed, but the current tests don't await that and proceed too
fast to the viewer alert string assertion. This commit fixes the issue
by waiting for the expected viewer alert string to appear instead.
It fixes#20065.
The only to get a path (from the path generator) is when the font is embedded.
So when we need a path (disableFontFace: true or when we want to use a pattern for stroking/filling), it's impossible
to fulfil.
In order to screenshot the page and assert that it's monochrome,
providing a regression test for #18694, the viewer background is
configured to match the page background because screenshotting the page
always captures a small part of the viewer background as well, and this
way we can easily go over all pixels and check that they are all equal.
However, in addition to configuring the viewer background the test also
hides the toolbar and removes the page border. Especially the latter
makes `scrollIntoView` fail in both Chrome and Firefox with recent
Puppeteer versions, for reasons which remain a bit unclear.
Fortunately both hiding the toolbar and removing the page border is not
actually necessary (anymore) for the test to work, so we can simply
remove those actions to fix the issue and reduce the amount of code. To
make sure that the test still covers the original issue correctly we've
reverted the changes from #18698 and then test still fails as expected.
Fixes#19811.
Fixes 68332ec2.
This is a precursor to moving the call into a
worker thread to let us use `OffscreenCanvas`. The
current position wouldn't work since we make
transformations to the canvas object after the
getContext call, which isn't allowed for
OffscreenCanvas. Also it isn't allowed to clone or
`transferControlToOffscreen` the canvas after the
`getContext` call.
Necessary because when there is no Popup annotation created along
with a Text annotation, the Popup annotation created by pdf.js
does not receive the noRotate flag
The shadow was taken into account when computing the bounding box of the section
containing the link and it was making the clip path wrong.
Since the shadow is almost invisible because of the opacity, the yellow color and the clip
we can remove it without causing any visual regressions (and as a side effect it'll avoid
to use resources to compute it when displayed).
The fixed -400px horizontal offset used by
scrollIntoView led to horizontal scroll only moving
part-way right on narrow screens. The highlights near
the right-edge remained party or completely off
screen.
This centres the highlighted match on any viewport width while
clamping the left margin to 20-400px. On very narrow screens
the scrollbar now moves all the way to the right instead of
stopping midway.
After clicking the find button we need to wait for the input field to
appear before we try to type in it, but instead we were waiting for the
button to appear. However, the button is always present, so the current
`waitForSelector` call is basically a no-op, and this can cause the
integration tests to fail intermittently if we continue before the
input field is actually visible.
This appears to be due a typo first introduced in commit
d10da907dac86c103b787127110a59aed82c7aaa that has been copy/pasted.
This commit fixes the issue by waiting for the input field to be visible
before we continue typing in it.
On mac, the pdf backend used when printing is using the cid from the font,
so if a char has null cid then it's equivalent to .notdef and some viewers
don't display it.
The problem in the original code is that `getTextAt` is called too
early and therefore returns unexpected text content. This can happen
because we call `increaseScale` and then wait for the page's text layer
to be visible, but it can take some time before the zoom actually
occurs/completes in the viewer and in the meantime the old (pre-zoom)
text layer may still be visible, causing us to continue too soon because
we don't validate that we're dealing with the post-zoom text layer.
This commit fixes the issue by simply waiting for the expected text to
show up at the given origin coordinates, which makes the test work
independent of viewer actions/timing.
This isn't directly part of the official API, and having this class in its own file could help avoid future changes (e.g. issue 18148) affecting the size of the `src/display/api.js` file unnecessarily.
Given that this file represents the official API, it's difficult to avoid it becoming fairly large as we add new functionality. However, it also contains a couple of smaller (and internal) helpers that we can move into a new utils-file.
Also, we inline the `DEFAULT_RANGE_CHUNK_SIZE` constant since it's only used *once* and its value has never been changed in over a decade.
The `constructPath` op receives as arguments not only the
information to construct the path, but also the op to apply
the path to (such as "fill", or "stroke").
This commit updates the Stepper tool in the debugger to decode
that op, showing its name rather than just its numeric ID.
To make it easier to tell which PDF.js version/commit that the *built* files correspond to, they have (since many years) included `pdfjsVersion` and `pdfjsBuild` constants with that information.
As currently implemented this has a few shortcomings:
- It requires manually adding the code, with its preprocessor statements, in all relevant files.
- It requires ESLint disable statements, since it's obviously unused code.
- Being unused, this code is removed in the minified builds.
- This information would be more appropriate as comments, however Babel discards all comments during building.
- It would be helpful to have this information at the top of the *built* files, however it's being moved during building.
To address all of these issues, we'll instead utilize Webpack to insert the version/commit information as a comment placed just after the license header.
We have a fallback for the common case of Type3 fonts without a /FontDescriptor dictionary, however we also need to handle the case where it's present but lacking the required /FontName entry.
The `Page.evaluate()` and `Mouse.click()` APIs in Puppeteer both return
a promise; see https://pptr.dev/api/puppeteer.page.evaluate and
https://pptr.dev/api/puppeteer.mouse.click, and should therefore be
awaited before proceeding in tests to make sure that the test's behavior
is deterministic and avoid intermittent failures.
The following command was used to find potential places to fix:
`grep -nEr "[^await|return] page\." test/integration/*`
The clipboard, used via the `copyImage` helper function, is a shared
resource, so access to it cannot happen concurrently because it could
result in tests overwriting each other's contents. Most tests using
the clipboard are therefore run sequentially, but only the stamp
editor's undo-related tests weren't, so this commit fixes the issue
by running those tests sequentially too.
Given that Node.js has full support for the Fetch API since version 21, see the "History" data at https://nodejs.org/api/globals.html#fetch, it seems unnecessary for us to manually check for various globals before using it.
Since our primary development target is browsers in general, and Firefox in particular, being able to remove Node.js-specific compatibility code is always helpful.
Note that we still, for now, support Node.js version 20 and if the relevant globals are not available then Errors will instead be thrown from within the `PDFFetchStream` class.
This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.
Without this "fake" workers may be ignored in the API, which isn't really what you want when manually providing the `disableWorker=true` hash parameter. (Note that this requires the `pdfBugEnabled` option/preference to be set as well.)
Also, after the changes in PR 19810 we can just load the "fake" worker directly in development mode and don't need to manually assign it to the global scope.
Emscripten generates code that allows the caller to provide the Wasm
module (thorugh Module.instantiateWasm), with a fallback in case
.instantiateWasm is not provided. We always define instantiateWasm, so
we can hard-code the check and let our dead code elimination logic
remove the unused fallback.
This commit also improved the dead code elimination logic so that if
a function declaration becomes unused as a result of removing dead
code, the function itself is removed.
They were removed in the minified build, but the code that made the comments necessary was still there (just minified). This commit updates the Terser config to preserve them.
The default value of Terser's `comments` option is [`/@preserve|@copyright|@lic|@cc_on|^\**!/i`](d528103b7c/lib/output.js (L178C12-L178C53)), however the only type of comment it was actually matching in our case is `@lic`, for the license header in minified files. Thus the new regexp is `/@lic|webpackIgnore|@vite-ignore/i`.
*This is something that occurred to me when reviewing the latest PDF.js update in mozilla-central.*
Currently we duplicate essentially the same code in both the `OutputScale.prototype.limitCanvas` and `PDFPageDetailView.prototype.update` methods, which seems unnecessary, and to avoid that we introduce a new `OutputScale.capPixels` method that is used to compute the maximum canvas pixels.
When the /OpenAction data is an Array we're currently using it as-is which could theoretically cause problems in corrupt PDF documents, hence we ensure that a "raw" destination is actually valid. (This change is covered by existing unit-tests.)
*Note:* In the Dictionary case we're using the `Catalog.parseDestDictionary` method, which already handles all of the necessary validation.
This way it helps to reduce the overall canvas dimensions and make the rendering faster.
The drawback is that when scrolling, the page can be blurry in waiting for the rendering.
The default value is 200% on desktop and will be 100% for GeckoView.
The effect is probably not even measurable, however this patch ever so slightly reduces the asynchronicity in the `fieldObjects` getter. These changes should be safe since:
- We're inside of the `PDFDocument`-class and the `annotationGlobals`-getter, which will always return a (shadowed) Promise and won't throw `MissingDataException`s, can be accessed directly without going through the `BasePdfManager`-instance.
- The `acroForm`-dictionary can be accessed through the `annotationGlobals`-data, removing the need to "manually" look it up and thus the need for using `Promise.all` here.
- We can also lookup the /Fields-data, in the `acroForm`-dictionary, synchronously since the initial `formInfo.hasFields` check guarantees that it's available.
Currently we repeat the same identical code five times in the `Page`-class when creating a `PartialEvaluator`-instance, which given the number of parameters it needs seems like unnecessary duplication.
Node.js version 24 was just released, see https://github.com/nodejs/release#release-schedule, hence we should run tests in that version in order to help catch any possible issues as soon as possible.
Also, since version 23 will reach EOL (end-of-life) in less than a month we stop running tests in that version.
The `ObjectLoader.prototype.load` method has a fast-path, which avoids any lookup/parsing if the entire PDF document is already loaded.
However, we still need to create an `ObjectLoader`-instance which seems unnecessary in that case.
Hence we introduce a *static* `ObjectLoader.load` method, which will help avoid creating `ObjectLoader`-instances needlessly and also (slightly) shortens the call-sites.
To ensure that the new method will be used, we extend the `no-restricted-syntax` ESLint rule to "forbid" direct usage of `new ObjectLoader()`.
Given that all the methods are already asynchronous we can just use `await` more throughout this code, rather than having to explicitly return function-calls and `undefined`.
Note also how none of the `ObjectLoader.prototype.load` call-sites use the return value.
Rather than "manually" invoking the methods from the `src/core/worker.js` file we introduce a single `PDFDocument`-method that handles this for us, and make the current methods private.
Since this code is only invoked at most *once* per document, and only for XFA documents, we can use `BasePdfManager.prototype.ensureDoc` directly rather than needing a stand-alone method.
Currently we repeat virtually the same code when calling the `PartialEvaluator.prototype.handleSetFont` method, which we can avoid by introducing an inline helper function.
Considering the name of the method, and how it's actually being used, you'd expect it to return a boolean value.
Given how it's currently being used this inconsistency doesn't cause any issues, however we should still fix this.
Rather than having a dedicated `BasePdfManager`-method for this one call-site we can instead change `PDFDocument.prototype.serializeXfaData` to a non-async method, that we invoke via `BasePdfManager.prototype.ensureDoc`.
Currently we create an intermediate `Dict` during parsing, however that seems unnecessary since (note especially the second point):
- The `NameOrNumberTree.prototype.getAll` method will already resolve any references, as needed, during parsing.
- The `Catalog.prototype.xfaImages` getter is invoked, via the `BasePdfManager`-instance, such that any `MissingDataException`s are already handled correctly.
Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.
The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification.
Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
- Parsing named destinations[2] and URLs.
- Handling "strings" that are actually /Name-instances.
- Building a lookup Object/Map based on some PDF data-structure.
*NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above.
---
[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
Instead, we update the visible canvas every 500ms.
With large canvas, updating at 60fps lead to a lot gfx transactions and it can take a lot of time.
For example, with wuppertal_2012.pdf on Windows, displaying it at 150% takes around 14 min !!! without
this patch when it takes only around 14 sec with. Even at 30% it helps to improve the performance
by around 20%.
Modern Node.js versions now include a `navigator` implementation, with a few basic properties, that's actually enough for the PDF.js use-cases; please see https://nodejs.org/api/globals.html#navigator
Unfortunately we still support Node.js version `20`, hence we add a basic polyfill since that allows simplifying the code slightly.
This integration test fails intermittently because the undo button can
only be activated if focus can be put on it, and that in turn can only
happen if it's visible. The test tried to make sure that the undo bar
is visible, but checking for the absence of the `hidden` attribute is
unfortunately not enough to assert visibility according to Puppeteer
documentation [1]. Moreover, the undo button wasn't checked at all.
To fix the issue we let Puppeteer do the visibility detection for the
undo bar by providing the `visible: true` option to `waitForSelector`
[2]. This is consistent with the other tests that already do this, and
also with the existing code that detects if the undo bar is hidden
(which uses the `hidden: true` option of `waitForSelector`). Moreover,
we wait for the undo button to be present before putting focus on it.
For consistency, and to avoid intermittent failures elsewhere, we mirror
this solution to the other undo bar/button tests of the various editors.
[1] https://pptr.dev/api/puppeteer.elementhandle.isvisible
[2] https://pptr.dev/api/puppeteer.waitforselectoroptions
The intention with PR 19819 was to change how colours are specified for the viewer UI, however it wasn't intended to affect elements in the Annotation- and XFA-layers.
Hence we enforce the light `color-scheme` for these layers, and also make sure to provide a default `color` for PopupAnnotations.
Perhaps we should update the debugger CSS to properly account for the light/dark theme, however since that's not UI that end-users ever see we simply force using the light theme for now.
In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding.
To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.
This complements, and extends, the existing check of the `Array.prototype` in the worker-thread.
To simplify the implementation we'll now abort immediately, rather than collecting all "bad" properties.
This is a small, and quite possibly pointless, optimization which ensures that any "local" /Resources aren't empty, to avoid needlessly trying to load and merge dictionaries.
Remove debug code from the integration tests, and skip the "must check that canvas perfectly fits the page whatever the zoom level" integration test in Chrome
This is a temporary measure to reduce noise until #19811 is fixed. Note
that this shouldn't be an issue in terms of coverage because we still
run the test in Firefox.
In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.
Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.
*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
It's no longer necessary after commit 1c73e52 that caused the document
to be closed properly between tests, and this therefore partly reverts
commit 973b67f.
This commit configures Jasmine to no longer run the tests in a fixed
order, which combined with the previous isolation commits avoids being
able to accidentally introduce dependencies between integration tests.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
To avoid being able to introduce dependencies between tests this commit
makes sure that we close the document between tests so that we can't
accidentally rely on state set by a previous test.
This patch uses nullish coalescing assignment in cases where it's immediately obvious from surrounding code that doing so is safe, and logical OR assignment elsewhere (mostly the changes in XFA code).
Using a `Map` rather than an `Object` is a nicer, since it has better support for both iteration and checking if a key exists.
We also change the initial values to be `null`, rather than empty strings, and reduce duplication when creating the `Map`.
*Please note:* Since this is worker-thread code, these changes are "invisible" at the API-level.
For simplicity we will abort /Form XObject parsing *immediately* when encountering a circular reference, rather than letting it continue up until some limit (as e.g. PDFium appears to do), which should be fine since there are never any guarantees if/how *corrupt* PDF documents will render.
In rare cases /Resources are also found in the /Contents stream-dict, in addition to in the /Page dict, hence we need to prefer those when available; see `issue18894.pdf`.
Previously we'd only do this for Type1/CFF fonts, see e.g. PR 6736, since the font-program may update the /FontMatrix.
However, it seems that we should do this unconditionally to account for fonts with non-default /FontMatrix-entries in the font-dictionary (which seem to be pretty rare).
In PDF version 2.0 the handling of Indexed color spaces was clarified as follows:
> The index value should be an integer in the range 0 to hival. If the value is a real number, it shall be rounded to the nearest integer (0.5 values shall be rounded up); if it is outside the range 0 to hival, it shall be adjusted to the nearest value within that range.
Please refer to https://github.com/pdf-association/pdf-differences/tree/main/IndexedColor
After the introduction of `OffscreenCanvas` support we now have *two separate* mask-methods in the `PDFImage` class, and the reason that they were not combined is likely that we need the "raw" bytes when parsing Type3-glyph image masks.
However, that case is easy to support simply by disabling `OffscreenCanvas` usage when parsing Type3-glyphs and that way we're able to reduce some code duplication.
Another slightly strange property of the `PDFImage.createMask` method is that it needs various image-dictionary parameters *manually* provided, which is probably because this is very old code.
That feels slightly unwieldy, and we instead change the method to pass in the image-stream directly and do the necessary data-lookup internally.
A side-effect of this re-factoring is that we now support using the custom `isSingleOpaquePixel` operator in Type3-glyphs, which shouldn't hurt even though it seems extremely unlikely for that to ever happen in Type3-glyphs.
This disallowd the following types of `export` declaration:
- `export class A {}`/`export function A() {}`
- `export default class A {}`/`export default function A() {}`
- `export let A`/`export const A`/`export var A`
While allowing
- `export { A }`
- `export default A`
The current text layer approach based on absolutely positioned
`<span>` elements by default causes flickering with text selection,
and we have browser-specific workarounds to solve that.
In Chrome, the workaround involves moving the `.endOfContent` element to
right after the last element that contains some selected content. This
works well in simple PDFs, but breaks when we have `span.markedContent`
elements. Given a text layer structure like the following, rendered
as four consecutive lines:
```html
<span class="markedContent">
<br>
<span>development enter the construction phase (estimated at around</span>
</span>
<span class="markedContent">
<br>
<span>300 MEUR).</span>
</span>
<span class="markedContent">
<br>
<span>Kreate's EBITA increased to 2.8 MEUR (Q4'23: 2.7 MEUR) and the</span>
</span>
<span class="markedContent">
<br>
<span>margin rose to 3.7% (Q4'23: 3.4%). However, profitability was</span>
</span>
```
when starting to select from inside the first line and dragging down
to the empty space after the second line, Chrome will anchor the
selection at the beginning of either the `<br>` or the `<span>` inside
the last `.markedContent`, depending on whether the selection is in
"per-character mode" (i.e. click and drag) or "per-word mode" (i.e.
double click and drag). This causes us to insert the `.endOfContent`
element in the wrong place (one element too far), which causes one
more line to be selected, which triggers another `"selecctionchange"`
event, which causes us to move `.endOfContent` again, and so on, looping
until when the whole page is selected.
This commit fixes the issue by making sure that when the end of the
selection range points to the _begining_ of an element, we walk back
the dom finding the first non-empty element, and attatch `.endOfContent`
to the end of that.
These `getAll` methods are not used anywhere within the PDF.js code-base, outside of tests, and were mostly added (speculatively) for third-party users.
To still allow access to the same data we instead introduce iterators on these classes, which (slightly) shortens the code and allows us to remove the `objectFromMap` helper function.
A summary of the changes in this patch:
- Replace the `getAll` methods with iterators in the following classes: `AnnotationStorage`, `Metadata`, and `OptionalContentGroup`.
- Change, and also re-name, `AnnotationStorage.prototype.setAll` into a test-only method since it's not used elsewhere.
- Remove the `Metadata.prototype.has` method, since it's only used in tests and can be trivially replaced by calling `Metadata.prototype.get` and checking if the returned value is `null`.
In order to use the PDF.js library in Node.js environments the `process.getBuiltinModule` functionality must be available, which was released in [version `20.16.0`](https://nodejs.org/en/blog/release/v20.16.0), however we've seen repeated issues filed by users on older `20.x` versions.
We want to iterate through the data in the `computeMD5` function, and `Map`s have "nicer" support for that than generic objects.
(Somewhat recently `Map` performance was improved in Firefox, however this also isn't really performance sensitive code.)
Note that we load all wasm-files manually, however the Emscripten Compiler (emcc) unfortunately generates `URL`s for fallback wasm-file loading.
In the PDF.js build-scripts we work-around that by using suitable Webpack-options, however that apparently doesn't work when third-party users re-bundle our code and we thus try to work-around this by adding "ignore comments" to these `URL`s (similar to how we handle `import`-statements).
It seems that the `@napi-rs/canvas` dependency has *basic* canvas-filter support, whereas the "old" `canvas` dependency didn't, hence we no longer need the Node.js-specific checks in the `src/display/canvas.js` file.
Note that I've successfully tested the [`pdf2png` example](https://github.com/mozilla/pdf.js/tree/master/examples/node/pdf2png) with this patch applied and things appear to work as before.
After PR 19731 the format of compiled Type3-glyphs is now simple enough that the compilation can be moved to the worker-thread, without introducing any significant additional complexity.
This allows us to, ever so slightly, simplify the implementation in `src/display/canvas.js` since the Type3 operatorLists will now directly include standard path-rendering operators (using the format introduced in PR 19689).
As part of these changes we also stop caching Type3 image masks since: we've not come across any cases where that actually helps, they're usually fairly small, and it simplifies the code.
Note that one "negative" change introduced in this patch is that we'll now compile Type3-glyphs *eagerly*, whereas previously we'd only do that lazily upon their first use.
However, this doesn't seem to impact performance in any noticeable way since the compilation is fast enough (way below 1 ms/glyph in my testing) and Type3-fonts are also limited to just 256 glyphs. Also, many (or most?) Type3-fonts don't even use image masks and are thus not affected by these changes.
In using the Firefox profiler (with JS allocations tracking) and wuppertal.pdf, I noticed
we were using a bit too much memory for a function which is supposed to just compute 2 numbers.
The memory used by itself isn't so important but having a too much objects lead to waste some time
to gc them.
So this patch aims to simplify it a bit.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 5 to 0 by fixing the following issues
in the "basic operations" block:
- Most tests relied on the first test to enable freetext editing mode.
For isolation we now do it explicitly in all tests.
- Most tests relied on the other tests having created editors. For
isolation we now create the editors explicitly in the tests themselves.
- Most tests relied on previous tests for the editor numbering. For
isolation we change the editor numbering to the one after initial
document load. Since we can't have state (editors) from a previous
test anymore we can remove various `clearAll` calls as well.
Currently we explicitly specify the fn-`OPS` both when adding entries to the operatorList and to the image-caches, and by using a temporary variable we can reduce a bit of duplication (similar to the existing args-handling).
The color-parameter is already available through `IR` (i.e. the internal representation), and after the changes in PR 4824 (which landed in 2014) we no longer need any special handling for it.
In the included PDF document the Type3-font doesn't contain any glyph definition for "space", despite that character being referenced in the /Contents stream.
While missing Type3-glyphs obviously cannot be rendered, we still need to update the current canvas position such that any char/word-spacing is correctly applied.
The test-case was found at https://github.com/pdf-association/pdf-differences/tree/main/Type3WordSpacing
Similar to Webpack there's apparently other bundlers that will not leave `import`-calls alone unless magic comments are used.
Hence we extend the builder to also append `/* @vite-ignore */` comments to `import`-calls, in order to attempt to improve support for using the PDF.js builds together with Vite.
This patch also renames `__non_webpack_import__` to `__raw_import__` since the functionality is no longer bundler-specific.
***PLEASE NOTE:*** This patch is provided as-is, and it does *not* mean that the PDF.js project can/will provide official support for Vite.
Originally this function would "manually" invoke the rendering commands for Type3-glyphs, however that was changed some time ago:
- Initial `Path2D` support was added in PR 14858, but the old code kept for Node.js compatibility.
- Since PR 15951 we've been using a `Path2D` polyfill in Node.js environments.
Hence, after the previous commit, we can further simplify this function by *directly* returning/using the `Path2D` object when rendering Type3-glyphs; see also https://github.com/mozilla/pdf.js/pull/19731#discussion_r2018712695
While this won't improve performance significantly, when compared to the introduction of `Path2D`, it definately cannot hurt.
Rather than updating the transform every time that we're painting a Type3-glyph, we can instead just compute the "final" coordinates during building of the `Path2D` objects.
NVDA behaves differently depending if the user is hovering or focusing an added signature.
An aria-description is read in both cases while an aria-label is not.
By changing the return format of the various `pointsCallback` functions we can use the `Util.rectBoundingBox` helper in the `MarkupAnnotation.prototype._setDefaultAppearance` method as well, thus shortening the code slightly.
This avoids the current situation where we're accessing it through various dictionaries, since that's a somewhat brittle solution given that in the general case a `Dict`-instance may not have the `xref`-field set (see e.g. the empty-Dict).
By tweaking a few local variable names we can shorten various viewer-component initialization code, and we can also reduce some duplication when assigning components to the `PDFViewerApplication`-scope.
Currently we have a `Util`-helper for computing the bounding-box of a Bézier curve, however for simple points and rectangles we repeat virtually identical code in many spots throughout the code-base.
- Introduce new `Util.pointBoundingBox` and `Util.rectBoundingBox` helpers.
- Remove the "fallback" from `Util.bezierBoundingBox` and only support passing in a `minMax`-array, since there's only a single call-site using the other format and it could be easily updated.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 6 to 5 by fixing the following issues
in the "create editor with keyboard" block:
- The second test relied on the first test to enable freetext editing
mode and put focus on the page (annotation layer). For isolation we
now do that explicitly in the second test.
- The second test relied on the first test for the editor numbering. For
isolation we change the editor numbering to the one after initial
document load.
Moreover, the test names have been updated to clarify with scenario is
being tested, which came up during comparison of the changes against
commit ea5eafa to make sure that we are still testing the originally
intended scenarios (confirmed by disabling the relevant code from the
commit per scenario and noticing the corresponding test failing).
The rendering bug with issue17779.pdf is due to the fact that we call save on the suspended ctx
but not on the the current ctx. So each time we've something like save/transform/restore then
the transform not "removed" when restoring.
So this patch just apply the save/restore operations to ctx which are mirrored on the suspended one.
This commit reduces the number of freetext editor integration test suite
failures, in full isolation, from 8 to 6 by fixing the following issues
in the "move editor with arrows" block:
- The second and third test relied on the first test to enable freetext
editing mode. For isolation we now do it explicitly in both tests.
- The second test relied on the first test having created an editor. For
isolation we now create the editor explicitly in the second test.
- The third test relied on the previous tests for the editor numbering.
For isolation we change the editor numbering to the one after initial
document load. Since we can't have state (editors) from a previous
test anymore we can remove the `clearAll` call as well.
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
To avoid being able to introduce dependencies between tests, and to
bring existing dependencies to the surface, this commit makes sure that
we close the document between tests so that we can't accidentally rely
on state set by a previous test. This prevents multiple tests from
failing if one of them fails and makes debugging easier by being able to
run each test on their own independent of other tests.
This commit, combined with the previous ones, is enough to make the
scripting integration test suite pass consistently if random mode in
Jasmine is enabled, proving that the tests are fully isolated now.
The integration tests are order-dependent because they rely on input
field state set by a previous test. This commit fixes the issue by
updating the values to match the initial state of the document, which
makes sure that we don't build upon values from previous tests while
still testing the intended logic in the individual tests like before.
By checking for the expected value directly we can shorten the code, and
it simplifies removing the dependencies between the tests in the next
commit (by having fewer places to change). Note that this follows the
same pattern as PRs #19192, #19001 and #18399 and also helps to remove
any further possibilities for intermittent failures.
It's already enabled by default in Firefox, and since there's no open issues regarding auto-linking I suppose that we can attempt to enable it unconditionally.
- Let the `writeString` helper function return the new offset, to avoid having to recompute that in multiple spots.
- In the `computeMD5` helper function we can create the `md5Buffer` via Array-destructuring, rather than using a manual loop.
Currently the MD5 computation doesn't actually work (at all?), since we're invoking the `calculateMD5` function without providing all of the necessary parameters and the PDF-data thus isn't taken into account.
Fixing this caused unit-tests to fail, which isn't that surprising since the current date/time is used in the MD5 computation, and we thus utilize Jasmine to work-around that.
- Point the `addSignatureDescription` respectively `editSignatureDescription` labels to their actual `input`-elements (this way clicking the label will actually focus the input).
- Add the event listener to the `addSignatureDescription`-input, rather than its `span`-element (this is consistent with the `editSignatureDescription` case).
- Correctly check if the `addSignatureDescription`-input is empty, since we're accidentally comparing with its `span`-element.
- Remove unbalanced, and likely accidentally added, `</span>` tags.
This is an admittedly very basic polyfill, to allow us to remove a bunch of inline feature testing, that I've thrown together based on reading https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static and related MDN articles.
Compared to PR 19218 it's obviously much more "primitive", however the implementation is simple and it doesn't suffer from any licensing issues (since I wrote the code myself).
Looking at PR 18492 it doesn't seem that the `$percent` variable, for the `pdfjs-editor-new-alt-text-ai-model-downloading-progress` l10n-string, was ever used.
This integration test fails intermittently because of concurrent
clipboard access due to running the test in parallel in both browsers.
It can be reproduced by introducing `await waitForTimeout(1000)` between
the copy and paste operations.
This commit fixes the issue by running the test sequentially instead,
mirroring the change from commit 0e94f2bd.
This commit improves validation of the API options for the ICC color
space logic. If `useWasm` is `true` but the corresponding `wasmUrl`
or `iccUrl` API options are not provided we can avoid requesting
files with `null` URLs which always results in a 404 response.
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
This is a major version bump, and the changelog at
https://github.com/metalsmith/layouts/releases/tag/v3.0.0
indicates a breaking change that impacts us, namely that we need to
explicitly define the pattern and transformer that we wish to use.
- Add a couple of `limit` parameters in cases where those were "missing", for `String.prototype.split()` calls, to avoid unnecessary parsing.
- Remove some "pointless" initial trimming of leading/trailing spaces, since that's already done at a later parsing step in many cases.
Given that the `draw` method is already asynchronous we can easily inline this old helper method, which shortens the code and improves consistency in the code-base (note the `BasePDFPageView`-implementation).
Having some interactive elements forces the screen readers to switch to form mode
and consequently they delegate the keyboard stuff to the browser.
This patch sets an aria label on each editor in order to have a better description than just
'application'.
Currently we lookup the `devicePixelRatio`, with fallback handling, in a number of spots in the code-base.
Rather than duplicating code we can instead add a new static method in the `OutputScale` class, since that one is now exposed in the API.
This patch extends the approach of PR 14543, by also treating e.g. minus signs followed by '(' or '<' as zero.
Inside of a /Contents stream those characters will generally mean the start of one or more glyphs.
With the exception of a different hashing function these classes are identical, and the base class thus help reduce code duplication.
This patch reduces the size of the `gulp mozcentral` build with 1344 bytes, which isn't a lot but still cannot hurt.
This patch fixes:
- the style of the primary/secondary buttons in the dialog which weren't fully compliant to the last specs.
- the style of the input field in HCM (wrong background)
- the color of the link in the image tab.
When the DOM structure of the viewer was updated in PR 18385 it caused the `secondaryToolbar` to accidentally start closing when clicking inside of it, since the `secondaryToolbar` now reside *under* the `toolbar` in the DOM.
**Steps to reproduce:**
- Open the viewer.
- Open the `secondaryToolbar`.
- Try to change document rotation at least *twice*.
**Expected behaviour:**
The document rotation can be changed an arbitrary number of times.
**Actual results:**
The `secondaryToolbar` closes after changing rotation just once.
For Type3 glyphs with `d1` operators it's easy to compute a fallback bounding box, however for `d0` the situation is more difficult.
Given that we nowadays compute the min/max of basic path-rendering operators on the worker-thread, we can utilize that by parsing these Type3 operatorLists to guess a more suitable fallback bounding box.
This addresses an inconsistency in the viewer, since the thumbnails don't respect the `maxCanvasPixels` option.
Note that, as far as I know, this has not lead to any bugs since the thumbnails render with a fixed (and small) width, however it really cannot hurt to address this (especially after the introduction of the `maxCanvasDim` option).
To support this a new `OutputScale`-method was added, to avoid having to duplicate code in multiple files.
One of the images have a corrupt SMask, where the /Height-entry is bogus; see the excerpt below (via https://brendandahl.github.io/pdf.js.utils/browser/).
```
SMask (stream) [id: 17, gen: 0]
ColorSpace = /DeviceGray
Height = /Length
Subtype = /Image
Filter = /FlateDecode
Type = /XObject
Width = 157
Matte (array)
BitsPerComponent = 8
Length = 3893
<view contents> download
```
Hence we enable SMask/Mask images to fallback to the parent image dimensions, and also add more validation of the width/height to get a better error message when that data is wrong.
These methods were introduced in *the first* commit of PR 4938, however they became unused in *the second* commit of that PR.
Hence it seems that we've been accidentally shipping, a small amount of, unused code for over a decade.
While initializing the `padded` Uint8Arrays we're manually assigning zeros to some entries, which is completely unnecessary since that's the default value for a TypedArray, and instead we can just increment the index.
When creating the `StyleMapping` for the "xfa-font-horizontal-scale" and "xfa-font-vertical-scale" properties there's currently pointless `Math.min` usage, since we're not actually comparing with anything.
When zooming, we should skip rendering the detail canvas until the
zoom is done, similarly to how normal page rendering is delayed.
To do so is enough to skip details view while zooming, since the
main view rendering that already happens after the delay will also
trigger rendering of the detail views.
This combines the `PDFFunctionFactory.create` and `PDFFunctionFactory.createFromArray` methods, which helps simplify and shorten the code.
Additionally, to simplify the parameter handling we pass the `PDFFunctionFactory`-instance directly to the various `PDFFunction`-methods.
This may not be possible to trigger in practice, however it seems that if `StampAnnotation.prototype.mustBeViewedWhenEditing` is called back-to-back with `isEditing === true` set then the second invocation could overwrite the `#savedHasOwnCanvas`-field and thus lose its initial state.
Currently we have a number of spots in the code-base where we need to clamp a value to a [min, max] range. This is either implemented using `Math.min`/`Math.max` or with a local helper function, which leads to some unnecessary duplication.
Hence this patch adds and re-uses a single helper function for this, which we'll hopefully be able to remove in the future once https://github.com/tc39/proposal-math-clamp/ becomes generally available.
With the changes in PR 19564 the actual `ColorSpace`-classes where separated from the various static "helper" methods.
Hence it seems that we can now simplify/shorten this old code to instead cache the "standard" ColorSpaces directly on the `ColorSpaceUtils`-class.
This patch reduces the number of `ColorSpaceUtils` static-methods, and in particular the `parseAsync` method is removed and it's now instead possible to have `parse` optionally return a Promise.
This thus removes the need to manually check if a `ColorSpace`-instance is cached, note the changes in the `src/core/evaluator.js` file.
Browsers not only limit the maximum total canvas area, but additionally also limit their maximum width/height which affects PDF documents with e.g. very tall and narrow pages.
To address this we add a new `maxCanvasDim` viewer-option, which in Firefox will use a browser preference, such that both the total canvas area and the width/height will affect when CSS-zooming is used.
Some ColorSpaces can reference other ColorSpaces, and since it's fairly common for many pages to use the same ColorSpaces (see e.g. `issue17061.pdf`) this can help reduce a lot of unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces.
The current logic assumes that all spans in the text layer contain
only one text node, and thus that the position information
returned by `highlighter._convertMatches` can be directly used
on the element's only child.
This is not true in case of highlighted search results: they will be
injected in the DOM as `<span>` elements, causing the `<span>`s
in the text layer to have more than one child.
This patch fixes the problem by properly converting the (span, offset)
pair in a (textNode, offset) pair that points to the right text node.
Update the test to wait for the `pagerendered`` event of a specific
page (1), so that the `pagerendered`` event of other pages from a
previously running render doesn't resolve the `waitForPageRendered`
promise.
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
Currently we re-implement the same helper function twice, which in hindsight seems like the wrong decision since that way it's quite easy for the implementations to accidentally diverge.
The reason for doing it this way was because the code in the worker-thread is able to check for `Ref`- and `Name`-instances directly, which obviously isn't possible in the viewer but can be solved by passing validation-functions to the helper.
This appears to have broken in PR 17431, since prior to that the `downloadFile` function returned a string (via a callback) on failure and now we're instead rejecting with an Error.
Auto-linking requires a normal textLayer which XFA documents obviously don't have, and currently XFA documents cause "pointless" error messages to be logged in the console.
Currently we're first loading the font, and then for Type3 fonts we're invoking `loadType3Data` every time that the font is encountered.
That seems completely unnecessary, and it's probably connected to the age of this code, since the `loadType3Data`-method will only run once anyway (note the caching).
The `viewerCssTheme` was removed in #17222 and subsequently reenabled in #17293,
but only for Chromium and generic builds. This commit reenables the
function using the new method introduced in #17293.
This restores the behaviour that existed prior to PR 19128, see e.g. 8727a04ae5/web/pdf_page_view.js (L908-L911), since `RenderingCancelledException` should still overwrite any previously seen Error.
Currently this `assert` isn't actually doing what it's supposed to, since the `FontLoader`-class doesn't have a `disableFontFace`-field.
The `FontFaceObject`-class on the other hand has such a field, hence we update the method-signature to be able to check the intended thing.
None of the "composite", "subtype", or "type" properties are normally used on the main-thread and/or in the API, hence there's no need to include them in the exported font-data by default.
Given that these properties may still be useful when debugging, and that `debugger.mjs` actually relies on the "type" property, they will instead only be sent to the main-thread when the `fontExtraProperties` API-option is used.
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
These options are needed in the `FontFaceObject` class, and indirectly in `FontLoader` as well, which means that we currently need to pass them around manually in the API.
Given that the options are (obviously) available on the worker-thread, it's very easy to just provide them when creating `Font`-instances and then send them as part of the exported font-data. This way we're able to simplify the code (primarily on the main-thread), and note that `Font`-instances even had a `disableFontFace`-field already (but it wasn't properly initialized).
Web archive no longer has the revision for the saved PDF referenced in
that test case, so this updates that link to a more recent revision to
enable the tests to run on a clean clone.
The idea is to avoid to have the pasted editor hidding the copied one:
the user could think that nothing happened.
So the top-left corner of the pasted one is moved to the bottom-right corner of the copied one.
This replaces the various copies of this logic with a single helper that
we template for each editor type, similar to what we already do for the
`switchToEditor` helper.
This replaces the various copies of this logic with a single helper that
we template for each editor type, similar to what we already do for the
`switchToEditor` helper.
When a drawing was moved with arrow keys and then printed or saved, the drawing wasn't moved finally.
So the fix is just about calling onTranslated once the translation is done.
With the recently added OpenJPEG no-wasm fallback we need to send the `wasmUrl` option to the worker-thread *regardless* of the value of the `useWorkerFetch` option, since the fallback won't work if we don't have a URL to `import` it from.
For consistency the code is re-factored to always send the factory-urls to the worker-thread, and simply check the `useWorkerFetch` option there instead.
Also, as a follow-up to PR 19525, introduce a new `useWasm` option that can be used in e.g. browser-tests to forcibly disable WebAssembly usage.
This patch fixes an issue when pasting: an exception was thrown when pasting.
And while writing the test and comparing the paths in the svg, I found a difference
which is fixed thanks to call to the right constructor (to take into account the inheritance)
in inkdraw.js
When scrolling quickly, the constant re-rendering of the detail view
significantly affects rendering performance, causing Firefox to
not render even the _background canvas_, which is just a static canvas
not being re-drawn by JavaScript.
This commit changes the viewer to only render the detail view while
scrolling if its rendering hasn't just been cancelled. This means that:
- when the user is scrolling slowly, we have enough time to render the
detail view before that we need to change its area, so the user always
sees the full screen as high resolution.
- when the user is scrolling quickly, as soon as we have to cancel a
rendering we just give up, and the user will see the lower resolution
canvas. When then the user stops scrolling, we render the detail view
for the new visible area.
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.
This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.
Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
This base class contains the generic logic for:
- Creating a canvas and showing when appropriate
- Rendering in the canvas
- Keeping track of the rendering state
Those type tests are performing type checking on a project using DOM APIs, intended to reflect the usage in a non-node project.
Not loading the node types in that project ensures that the library type declarations don't force a dependency on the node types.
Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block.
Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.
During the XRef stream parsing we're attempting to lookup an entry that hasn't yet been found, since parsing is currently running, and given that we'd also cache free/missing XRef entries we'd then return an incorrect value during normal PDF parsing.
The simplest solution here is to just not cache free/missing XRef entries, since a properly generated PDF document shouldn't be trying to access objects it doesn't contain.
Furthermore, the amount of "extra" parsing now needed for such XRef entries shouldn't be significant enough to be an issue.
It fixes#19505.
We were invaliding throwing actions (in setting event.rc to false) and all the event process was stopped.
Now we're just dumping the exception in the console: the action is skipped and event.rc is not set
else the input fields aren't updated wit KeyStroke actions.
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
If either of the factory-urls are missing or invalid, the fallback value would currently become `useWorkerFetch === null`.
While that is obviously a falsy value, which means that the code still works as intended, we should ensure that this is consistent.
- Use `TypedArray.prototype.set()` rather than a manual loop when building the `key`.
- Use an existing local variable to avoid re-computing the length of the `encryptionKey`.
It's now pretty common that we only want to close a `dialog` *if* it's currently active, to avoid throwing errors, and this new method provides a shorter and more convenient way to achieve that.
Rather than modifying the "raw" dimensions of the page, we'll instead apply the `userUnit` as an *additional* scale-factor via CSS.
*Please note:* It's not clear to me if this solution is fully correct either, or if there's other problems with it, but it at least *appears* to work.
---
With these changes, the following CSS variables are now assumed to be available/set as necessary: `--total-scale-factor`, `--scale-factor`, `--user-unit`, `--scale-round-x`, and `--scale-round-y`.
While investigating a bug, that I've not yet had time to fully investigate and report, I found that if there's ever an error thrown from the `Autolinker` class it'll prevent the annotationEditorLayer from rendering *and* the renderTask itself will be treated as having failed.
Given that most inferred links will overlap existing LinkAnnotations, creating a lot of unused `borderStyle` objects seem unnecessary.
Hence we can move that into the `AnnotationLayer.prototype.addLinkAnnotations` method instead, which also allows us to slightly reduce the API-surface.
*Note:* For the issue mentioned on Matrix it'll obviously still make sense to improve the regular expression to detect more URL edge-cases.
However it occurred to me that even once that particular case is fixed there'll always be a risk that inferred links could overlap, and effectively block, the actual LinkAnnotations.
Hence this patch removes the URL comparison to ensure that overlapping inferred links will always be ignored.
To avoid being able to introduce dependencies between tests, and to
bring existing dependencies to the surface, this commit makes sure that
we close the document between tests so that we can't accidentally rely
on state set by a previous test. This prevents multiple tests from
failing if one of them fails and makes debugging easier by being able to
run each test on their own independent of other tests.
This commit, combined with the previous one, is enough to make the ink
editor integration test suite pass consistently if random mode in
Jasmine is enabled, proving that the tests are fully isolated now.
The second test of the basic operations block for the ink editor
depends on the first test to work. This becomes visible if we only run
the second test, using `fit`, which always fails with:
`ProtocolError: Waiting for selector '.annotationEditorLayer' failed:
Runtime.callFunctionOn timed out. Increase the 'protocolTimeout' setting
in launch/connect calls for a higher timeout if needed.`
The problem is that the second test doesn't enable the ink editor and
relies on the first test having done that already (because we don't
close the document between tests yet). This commit fixes the issue by
unconditionally enabling the ink editor in the second test to remove the
dependency between the two tests so they both pass in isolation.
This patch updates the minimum supported browsers as follows:
- Google Chrome 110, which was released on 2023-02-07; see https://chromereleases.googleblog.com/2023/02/stable-channel-update-for-desktop.html
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Currently we have three separate and virtually identical message handlers for this data, which can easily be combined into a single message handler instead.
Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
With recent changed made to the GitHub issues-UI the "Blank issue" alternative is now showing up quite prominently, which can easily negate the point of our bug/feature templates and lead to incomplete issues being filed.
Given that we now have a few different factory-url parameters, we introduce a helper function for parsing them.
*Please note:* These parameters have always been documented as requiring a trailing slash[1], which we can now easily enforce during the `getDocument`-call.
---
[1] I recall that we've occasionally seen issues because users miss that detail, and the new Error should hopefully be more easily actionable than one thrown during rendering/parsing.
This pattern was already followed quite consistently outside of the
freetext editor integration tests, so this commit aligns the remaining
places for consistency. This also helps to make the tests more compact
and to reduce the number of changes in follow-up changes.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
freetext editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
This CSS feature is now available in *most* browsers that we support, with old Chromium-based browsers being the only exception; please see https://developer.mozilla.org/en-US/docs/Web/CSS/color_value/color-mix#browser_compatibility
From this data we see that the feature in question has been supported since Chrome 111, which was released on 2023-03-01 (i.e. almost two years ago).
Please note that we've never guaranteed that all features and functionality will be available in the oldest supported browsers.
Furthermore, even with the `color-mix` fallback removed PopupAnnotations will still function just as before but may render with the default color (defined in the CSS-file) rather than the one specified in the PDF document.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
highlight editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
At the time of PR 12896 the `fontBoundingBox{Ascent, Descent}` properties were not yet available by default in Fírefox, however that's no longer the case since Firefox 116; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1801198.
Hence this patch which replaces the "full" fallback with a warning and uses the `ascent`/`descent` values from the fonts in the PDF document (as we did previously). Obviously the TextLayer won't look as good in that case, but it's a simpler and shorter solution.
Currently we're manually computing the width/height of the /Rect-entry in a number of spots throughout the worker-thread Annotation code, which these new getters help avoid.
Currently we're initializing the image-options for every page, which seems unnecessary since it should suffice to do that once per document.
Also, changes the `BasePdfManager` constructor to improve readability/documentation a little bit.
This patch is adding some code in order to extract a drawing as curves from an image.
The algorithm is basically the following:
- reduce the dimensions
- make it gray
- apply a bilateral filter in order to add some blurryness while keeping the edges
- compute the histogram
- guess what's the background color which should contain a large majority of the pixels
- make a binary image
- extract the contours in using the Suzuki algorithm
- apply the Douglas-Peucker algorithm in order to reduce the number of points
The algorithm is improvable but it should work pretty well if there's a clear difference between
the background and the drawing.
In a v2 we could use a ML model in order to improve the extraction.
There's few changes related to the UI in order to make the tool usable, but they're very basic
for the moment.
This patch fixes a bug that caused incorrect curve shapes when an endpoint lies beyond the page boundaries. It adds a check for the endpoint's position, and if it is outside the page, the point is excluded from the shape's coordinates.
Steps to reproduce this in `master`:
1. Open https://mozilla.github.io/pdf.js/web/viewer.html
2. Use the "Open"-button (in the secondaryToolbar), or drag-and-drop, to load another PDF document.
3. Enable the highlight-editor.
4. Try to pick a new colour.
Note how it's no longer possible to change the default highlight-colour.
The reason for this is that we're only initializing the viewer-toolbar `ColorPicker` *once*, which doesn't work since every PDF document gets its own `AnnotationEditorUIManager`-instance. To address this we simply need to re-initialize the viewer-toolbar `ColorPicker`, and note that this patch won't affect the Firefox PDF Viewer.
With the changes in PR 18843 the `AnnotationEditorUIManager.prototype.updateMode` method is now asynchronous, which we need to take into account when dispatching the "annotationeditormodechanged" event.
After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).
Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.
Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
- We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
- We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
- The amount of available RAM on devices should be a lot higher, since more than a decade has passed.
A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).
---
[1] More specifically:
- `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
- `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
- `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
It fixes#19360.
Each glyph in the test case has a fill and a stroke pattern, so the current transform used
to scale the glyph outline must be the same.
In setting the stroke color to green, I noticed that the last outline contains some non-closed
subpaths, so when generating the glyph outline, every time we 'moveTo', we close the previous
subpath.
Code in the `web/` folder cannot import directly from the `src/` folder, since that could result in most (or all) main-thread code being bundled into the viewer, and must rather be imported via the `pdfjs-lib` alias.
Let's use ESLint to help enforce this, please find additional details in https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-restricted-paths.md
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
ink editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
In most integration tests we already use the pattern of defining the
editor selector once and reusing it in the rest of the test, but it's
not fully consistent everywhere yet. This commit fixes that for the
stamp editor integration tests, which has multiple advantages:
- it improves consistency between the various editor integration tests;
- it removes duplicate function calls and aligns variable definitions;
- it reduces the number of `getEditorSelector` calls and other helper
function calls that contained hardcoded IDs (by updating them to take
editor selectors as arguments instead of editor IDs), which helps to
isolate the tests and to simplify follow-up patches.
Puppeteer recently got updated to version 24.0.0+, so we're past version
23.9.1- where the integration test failed before and we can enable it
again now that it passes in Chrome.
This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication.
This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication;
- it reduces the number of `getEditorSelector` calls that contained
hardcoded IDs, which helps to isolate the tests and to simplify
follow-up patches.
The `selectorEditor` name is used 57 times, but only in the freetext
editor integration tests file. The `editorSelector` name on the other
hand is used 241 times in all editor integration test files, so this
commit renames the former name to the latter name to achieve consistency
in variable names across all editor integration test files, which also
simplifies upcoming changes.
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.
- A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.
- For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
This prepares for a future where we're using more than one wasm-file, originating in different `external/`-folders, by extending the existing `gulp.watch` usage.
The following diff illustrates how to add more entries:
```diff
diff --git a/gulpfile.mjs b/gulpfile.mjs
index 0e0a5a1ac..1502755be 100644
--- a/gulpfile.mjs
+++ b/gulpfile.mjs
@@ -655,6 +655,10 @@ function createWasmBundle() {
base: "external/openjpeg",
encoding: false,
}),
+ gulp.src(["external/foobar/*.wasm"], {
+ base: "external/foobar",
+ encoding: false,
+ }),
]);
}
@@ -2125,7 +2129,7 @@ gulp.task(
},
function watchWasm() {
gulp.watch(
- "external/openjpeg/*",
+ ["external/openjpeg/*", "external/foobar/*"],
{ ignoreInitial: false },
gulp.series("dev-wasm")
);
```
Rather than waiting for the upstream patch to reach the Firefox version we're using with Puppeteer, let's just set the same preference as done in https://phabricator.services.mozilla.com/D234320.
Currently we're not checking that the response is actually OK before getting the data, which means that rather than throwing an error we can get an empty `ArrayBuffer`.
To avoid duplicating code we can move an existing helper into `src/core/core_utils.js` and re-use it when fetching the JPX wasm-file as well.
Given that we nowadays provide default Node.js versions of these Factory-parameters it no longer seems necessary to mention that environment specifically.
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.
Besides an error message, the new `ResponseException` instances also include:
- A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.
- A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
When a dash separates two digits, it's very likely to not be a hyphen
inserted to split a word into two lines (e.g. "par\n-ser"), but rather
either a minus sign, a range, or a date. For example, in the tracemonkey
PDF there is `2008-02` (a date) split across two lines.
Preserving the dash, similarly to how we do for compound words, allows
searches for "2008-02" to find a match.
In Firefox, double-clicking on a stamp annotation triggers text
selection (selecting the last text element in the dom before the
annotation): this triggers the logic to make annotations not interfere
with text selection, which in turns prevents the double click from
triggering the annotation editor.
This commit fixes the problem by making annotations non-selectable, so
that clicking on them does not trigger text selection. Freetext
annotations were already non-selectable, so this commit doesn't change
that. However, we need to explicitly mark text in popups as selectable.
In the affected font the total number of mapping-entries is `1142348`, and no less than `997473` of them are duplicates.
Given that every duplicate causes a lot of Array elements to be moved this becomes extremely inefficient, which we can avoid by keeping track of seen `charCode`s and directly build the final mappings-Array instead.
Currently we re-implement a number of helper functions specifically for this code, which seems completely unnecessary since there's already general purpose ones available in the `src/core/core_utils.js` file.
It lets the user make a pinch gesture with a finger on page with a drawing
and the second finger on an other page.
On mobile, it's pretty easy to be in such a situation.
It fixes#19239.
When the canvas isn't existing the editor has no image: it's fine because the editor is invisible.
Once it's made visible, the canvas is set when the annotation layer has been rendered.
This appears to have regressed in PR 13808, since it removed the `matrix`-entry from array returned by the `MeshShading.prototype.getIR` method *without* also updating the indexes in the `MeshShadingPattern` constructor.
Reasons for removal:
- These tests never generated any warnings from OSS-Fuzz, in over a year.
- An error thrown during image decoding will lead to a broken/missing image, not a security problem.
- These tests rely on the Jazzer.js library, which has a number of problems: It now causes failures in Node.js v23 in the CI tests, it's no longer being maintained upstream, and it lacks support for some (fairly common) CPU architectures.
The ink editor integration tests already use a helper function for
committing the editor, so this commit mirrors the approach to the
freetext editor integration tests. This has multiple advantages:
- it improves consistency between the various editor integration tests;
- it makes the code easier to read/understand;
- it reduces code duplication (220 lines of code removed);
- it reduces the number of `getEditorSelector` calls (32 calls removed)
that contained hardcoded IDs, which helps to isolate the tests and to
simplify follow-up patches.
This commit applies the `waitForUnselectedEditor` helper function to the
remaining places, that most likely predate the introduction of the
helper function, to deduplicate the code and to have a unified way of
checking if a given editor is unselected.
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
The `getSelectedEditors` function is largely a copy of the `getEditors`
function, with three small differences:
1. `getEditors` allows getting any kind of editor whereas
`getSelectedEditors` is harcoded to only getting selected editors.
2. `getEditors` returns editor selectors (strings) whereas
`getSelectedEditors` returns editor IDs (integers).
3. `getSelectedEditors` returns a sorted array of editor IDs whereas
`getEditors` does not ensure that the array is sorted.
This commit makes the `getEditors` function a drop-in replacement for
the `getSelectedEditors` function to deduplicate the code and to have a
unified way of getting editors.
Note that we don't actually use the contents of the returned array
(only its length), so we can safely change `getEditors` to return a
sorted array of integer editor IDs instead. Sorting the array makes the
return value deterministic, which is a nice property for test stability,
and integer IDs are also easier to handle in test assertions. Note that
the corresponding selector strings can also easily be obtained from the
integer IDs using the `getEditorSelector` function if needed.
Originally the code in this file was used in both the GENERIC and MOZCENTRAL builds, however that's no longer the case.
Hence we can now directly call `NetworkManager.prototype.request` and remove these old "helper" methods that only had a single call-site each.
From section [11.6.4.3 Mask Shape and Opacity](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G10.4848628) in the PDF specification:
- An image XObject may contain its own *soft-mask image* in the form of a subsidiary image XObject in the `SMask` entry of the image dictionary (see "Image Dictionaries"). This mask, if present, shall override any explicit or colour key mask specified by the image dictionary's `Mask` entry. Either form of mask in the image dictionary shall override the current soft mask in the graphics state.
As part of the changes in PR 4259, which landed over ten years ago, the `glyphNameMap` property on `Font`-instances was removed.
The reason that this didn't cause any bugs is that we always fallback on `getGlyphsUnicode`, and when using that data we also rely on `StandardEncoding`, hence we should just remove the unused parameter from the `Type2Compiled` constructor.
While [bug 1893645](https://bugzilla.mozilla.org/show_bug.cgi?id=1893645) was fixed some time ago now, it still shouldn't hurt to also assert that the `fontMatrix` is always valid when invoking the `compileGlyph` method.
Rather than having to manually implement the exception-handling for the "DocException" message, we can instead re-use (and slightly extend) the existing `wrapReason` function since that one already does what we need.
Furthermore, we can also simplify handling of the "PasswordRequest" message a little bit and again re-use the `wrapReason` function.
Finally, the patch makes the following smaller changes:
- Avoid needlessly re-creating exceptions in the `wrapReason` function.
- Use a slightly shorter parameter name in the `wrapReason` function.
- Remove the unused entries in the `CallbackKind`/`StreamKind` enumerations.
This commit makes sure that the font tests are no longer reported as
being unit tests and that the PDF.js logo is shown in the browser tab to
make PDF.js-specific resources/tabs more easily identifyable during e.g.
development.
The reporter is used in both the unit and the font tests, so this commit
moves it to the test root folder to more clearly indicate that this is a
shared resource and so the font tests don't have to reach into the unit
tests folder to import it (which improves separation).
This file is specific to the integration tests, so this commit moves it
to bundle the integration test logic a bit better and to match the
unit/font tests in terms of folder structure for consistency.
Without these calls we'll not actually wait for saving to complete when document destruction runs; compare with other `WorkerTask`-usage in this file.
While I cannot imagine that this has caused any problems for library users, the code is however not technically correct as-is.
Note that PR 19212 tried to change the test-server to fix the intermittent failure in Google Chrome, however that unfortunately caused *other* unit-tests to start failing.
As long as this unit-test still runs successfully in Mozilla Firefox that should be enough, and it doesn't seem like a good use of our time to hunt down a bug that only happens in Google Chrome.
When computing the left offset of the highlighted text, we cannot use
.offsetLeft because the text might have been scaled through CSS, and it
needs to be taken into account.
Use `.getClientRects()`/`.getBoundingClientRect()` instead, which will
return measurements scaled appropriately.
This bug only seems to reproduce in Google Chrome, since browsers apparently sort response headers differently.
When the issue occurs the "raw" response headers string looks like this:
```
content-length: 525404\r\ncontent-type: \r\n
```
and since we trim *any* leading/trailing white-space characters the "content-type" header isn't detected correctly, which thus leads to `new Headers(...)` throwing.
Hence we'll keep regular spaces at the end of the "raw" response headers string, while still removing all other kinds of trailing white-space characters.
*Note:* The response headers parsing was based on https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/getAllResponseHeaders#examples
- it's now possible to start a drawing with a pen and use fingers to zoom
or scroll without interacting with the current drawing;
- it's now possible to draw with a finger and them zoom with two fingers.
When an editor is moved with the keyboard or in dragging it, it is moved in the DOM in order
to make screen readers happy. But this move is slightly postponed thanks to a setTimeout(..., 0).
The failures were very likely due to the fact that intermittently the DOM move was done in
the middle of the next key sequence which was making the move on screen failing.
Instead of conditionally checking if the `.cavnasWrapper` already
has a child element and then inserting the `canvas` before it, we can
use `.prepend` which always injects the new element as the first
child.
While drawing, in zooming fast enough, it's possible, intermittently, to have the canvas
after the svg which makes the svg invisible.
So this patch makes sure to have the canvas at the right position.
It was due the resize observer which is removed thanks to this patch.
In order to reuse the dragAndDrop function in test, this patch slighty refactors it
in order to make it easier to use.
The `Path2D` glyph-objects are cached on the `FontFaceObject`-instance, so we can save a little bit of memory by removing the raw path-strings once they're no longer needed.
The `must check input for US zip format` integration test fails pretty
consistently in Puppeteer 23.4.0+ with `Expected '12341' to equal
'12345'`. This is reproducible with the `pdf.sandbox.external.js` hack
from https://github.com/mozilla/pdf.js/issues/18396#issuecomment-2211273743.
Investigation uncovered two issues at play here:
1. We do two `clearInput` calls, but don't await processing of the two
sandbox events that are triggered by that action. The three tests that
use `issue14307.pdf` are in different `describe` blocks and therefore
reload the PDF file, so we can simply remove those calls because the
inputs are already empty by default.
2. We don't await processing of the sandbox events that occur after
switching to another text field. This causes the expectation failure
because the typing actions will happen too soon and interfere with
the sandbox event processing. We solve the issue by explicitly
awaiting the sandbox roundtrip.
Moreover, similar to PR #19001 and #18399 we remove any remaining room
for intermittent issues by directly checking for the expected value,
which also results in shorter code.
- Warn if the "regular" PDF.js build is used in Node.js environments, since that won't load any of the relevant polyfills.
- Warn if the `require` function cannot be accessed, since currently we're just "swallowing" any errors.
This requires two changes on our side:
- The order of exports in `web/viewer{-geckoview}.js` changes slightly
because `eslint-plugin-perfectionist` aligned the sorting order with
the `eslint-plugin-sort-exports` plugin we used before. This restores
the change from commit 347f155.
- The `eslint-plugin-import` plugin contains a bug that causes the new
version of `eslint-plugin-perfectionist` to be reported as unresolved.
This issue is tracked upstream, and since the plugin works fine we
can simply extend the ignore list we already have to avoid this error
until the upstream bug is fixed.
These DOM elements are `input type="checkbox"` and (natively) only support being toggled with the `Space` key, however we can extend an existing event-listener to "manually" support the `Enter` key as well.
Converting errors to string drops their stack trace, making it more
difficult to debug their actual reason. We can instead pass the error
objects as-is to console.warn/error, so that Firefox/Chrome devtools
will show both the stack trace of the console.warn/error call, and the
original stack trace of the error.
This commit also enables the `unicorn/no-console-spaces` ESLint rule,
which avoids accidental extra spaces when passing multiple parameters to
`console.*` methods.
This helps ensure that loading errors are always handled correctly, and note that both `PDFNetworkStreamFullRequestReader` and `PDFNetworkStreamRangeRequestReader` already provided such a callback.
Similar to the regular toolbarButtons that can be toggled, this ensure that it's always possible to tell when the findbar "buttons" are hovered/focused.
When a user deletes any number of annotations, they are notified of the action
by a popup message with an undo button. Besides that, this change reuses the
existing messageBar CSS class from the new alt-text dialog as much as possible.
Move .messageBar out of .dialog into its own standalone class in order
to reuse as much of it for the upcoming feature for an undo message for
annotations.
Some tests rely on the presence of a server that serves PDF files.
When tests are run from a web browser, the test files and PDF files are
served by the same server (WebServer), but in Node.js that server is not
around.
Currently, the tests that depend on it start a minimal Node.js server
that re-implements part of the functionality from WebServer.
To avoid code duplication when tests depend on more complex behaviors,
this patch replaces createTemporaryNodeServer with the existing
WebServer, wrapped in a new test utility that has the same interface in
Node.js and non-Node.js environments (=TestPdfsServer).
This patch has been tested by running the refactored tests in the
following three configurations:
1. From the browser:
- http://localhost:8888/test/unit/unit_test.html?spec=api
- http://localhost:8888/test/unit/unit_test.html?spec=fetch_stream
2. Run specific tests directly with jasmine without legacy bundling:
`JASMINE_CONFIG_PATH=test/unit/clitests.json ./node_modules/.bin/jasmine --filter='^api|^fetch_stream'`
3. `gulp unittestcli`
This allows us to remove an ESLint disable-statement for `arrow-body-style`, without affecting readability of the code, and fetching the metadata and the page in parallel should be a *tiny* bit more efficient as well.
- Use `this` in all scopes where that's possible, to avoid having to spell out `WorkerMessageHandler` everywhere.
- Inline the `isMessagePort` helper function, since there's only a single call-site.
- Use a static initialization block to move more code into the `WorkerMessageHandler` class itself.
This slightly shortens the code, in various `destroy`-methods, which cannot hurt.
Also, use pre-processor checks to simplify `PDFDocumentLoadingTask.destroy` in the Firefox PDF Viewer since the `PDFWorker.fromPort`-method isn't used there.
The date was create in UTC+0 and then amended in using set-Month/Date which take into account
the user timezone.
With this patch we build all the date in the user timezone.
It helps to slightly decrease memory use in reducing the number of created arrays.
In searching for "a" in pdf.pdf, the time spent in getOriginalIndex is decreased by
around 30%.
This patch makes a clear separation between the way to draw and the editing stuff.
It adds a class DrawEditor which should be extended in order to create new drawing tools.
As an example, the ink tool has been rewritten in order to use it.
Given that `browsertest` repeatedly timeout in Google Chrome, and considering that Firefox is the primary development target, we stop running them on the bots to avoid having to repeatedly deal with this.
Note that we already disabled these tests *on Windows* almost three years ago, because of stability issues; see PR 14392.
test/unit/api_spec.js is the only JS file in the tree with trailing
whitespace. Because `trim_trailing_whitespace = true` in .editorconfig,
any editor supporting EditorConfig would trim whitespace when the file
is changed, which results in test failures.
This commit fixes the issue by trimming the trailing whitespace and
adjusting the test expectations.
The `Array` type takes one parameter that describes the possible types
of the inner elements. This parameter may exist of pipes to indicate
multiple possible types. However, the current type definition provides
multiple parameters; this is incorrect syntax, and TypeScript 5.7+ now
fails on this due to stricter syntax validation.
This commit fixes the issue by changing the type definition to the
proper syntax, which together with the accompanying comment about the
contents of the fingerprints array should document it sufficiently.
The following cases are excluded in the patch:
- The Firefox PDF Viewer, since it has been fixed on the platform side already; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1683940
- The `PDFNodeStream`-implementation, used in Node.js environments, since after recent changes that code only supports `file://`-URLs.
Also updates the `PDFNetworkStreamFullRequestReader.read`-method to await the headers before returning any data, similar to the implementation in `src/display/fetch_stream.js`.
*Note:* The relevant unit-tests are updated to await the `headersReady` Promise before dispatching range requests, since that's consistent with the actual usage in the `src/`-folder.
The test-only createTemporaryNodeServer helper featured a path traversal
vulnerability. This enables attackers with network access to the device
to read arbitrary files while unit tests are running that activate this
test server.
This patch fixes the issue by validation of paths.
To test this vulnerability before the patch:
1. Run the test-only server:
```
node -e 'console.log(require("./test/unit/test_utils.js").createTemporaryNodeServer().port)
```
2. From another terminal, send the following request (modify the port to
the port reported in the previous step):
```
curl --path-as-is http://localhost:45755/../../package.json
```
Before the patch, the second step would traverse the directory, and
return results from the root of the PDF.js repository, instead of files
within test/pdfs/.
With the patch, the server refuses the request with HTTP status 400.
This is fairly old code, and by making the function `async` we can handle initialization errors "automatically" without the need for try-catch statements.
and tweak a bit the highlight one (e.g. it's useless to have 64 bits floating point numbers
when 32 bits ones are enough).
It's a required step for the refactoring of the ink tool (in order to use the draw layer).
It avoids to call several functions acting on the same SVG element.
We can remove most feature testing from this helper function, with the exception of `randomUUID` since that's only available in "secure contexts", and also remove the fallback code-path.
Note that this code was only added for Node.js compatibility, and it's no longer necessary now that the minimum support version is `20`; see also https://developer.mozilla.org/en-US/docs/Web/API/Crypto#browser_compatibility
Finally, this patch also adds a basic unit-test for the helper function.
JSON imports are now supported by all tools used in PDF.js' build
process. The `chromecom.js` file is bundled by webpack and
import attributes are thus removed, so browser compatibility for this
new syntax is not relevant.
This integration test fails intermittently, locally at least in Chrome
with Puppeteer 23.4.0+, with the following errors:
```
In chrome: Expected '123Hello' to equal 'Hello123'.
In chrome: Expected '123Hello' to equal '123'.
```
This happens because the test before it left queued sandbox events
behind. We don't close the document between tests, so those get run
when we click the textbox in this test and that interferes with our
selection/typing actions. This commit fixes the issue by flushing the
queued sandbox events in the first test, which makes sure that state
no longer leaks through to the next test and thus improves isolation.
Morever, similar to commit 3adf8b6 we use safer assertions to avoid
further intermittent failures, and we replace the `page.$eval` call
with a simpler Home button push like we already do in e.g. the test
helpers. This combined makes the code shorter and simpler.
In PR #11300 Gitpod support got introduced, but we re-evaluated that
decision in #11732. In PR #11800 the support was partially reverted,
but the actual Gitpod files were kept to not outright break potential
workflows because at the time we were not sure if, and if so how often,
Gitpod was actually used for contributing to PDF.js.
However, in addition to the concerns mentioned in #11732 after five
years we haven't seen any contributions that clearly originated from
Gitpod, and the files have not been updated after e.g. PR #11807 and
PR #17913 because it's not a workflow that we maintain or are able
to test (nor have we seen Gitpod community contributions for this).
This commit therefore removes the remaining Gitpod files to reduce
maintainance burden for PDF.js. Note that users of Gitpod can still
contribute to PDF.js via the platform; we just don't provide/manage
workspace files from this repository anymore.
- Use optional chaining when clearing various data-structures, since that's slightly shorter.
- Ensure that checking for the existence of `#_hcmCache`-data won't cause it to be initialized. (Note how `this.#hcmCache` is actually a getter.)
- Actually reset the `#_hcmCache`-data when that's appropriate, e.g. when closing the PDF document.
- Ensure that `pdfjsTestingUtils` is available when running benchmarking, since that shouldn't be done in TESTING-mode.
- Exclude the `test/stats/results/` folder from linting, since it'll contain *generated* JSON-files.
When iterating through `useCMap` the value is already available, without having to manually invoke the `lookup`-method.
While this will likely not affect performance in any noticeable way, it's nonetheless unnecessary to lookup an already available value twice.
This was previously attempted in PR 13371, but had to be reverted because of issues related to SystemJS (which has since been removed).
Also, while unrelated, shortens an existing conditional assignment.
The purpose of these changes is to make it more difficult to accidentally include logging statements, used during development and debugging, when submitting patches for review.
For (almost) all code residing in the `src/` folder we should use our existing helper functions to ensure that all logging can be controlled via the `verbosity` API-option.
For the `test/unit/` respectively `test/integration/` folders we shouldn't need any "normal" logging, but it should be OK to print the *occasional* warning/error message.
Please find additional details about the ESLint rule at https://eslint.org/docs/latest/rules/no-console
Given that there are multiple issues with `ImageDecoder` in Chromium browsers, affecting both BMP and JPEG images, for now we (by default) disable that functionality there to avoid problems.
This also means that we can remove the previously added, and separate, `isChrome` API-option.
Python 3.13 is the current version and was released over a month ago
(see https://devguide.python.org/versions). The dependencies we use now
support Python 3.13, most importantly `fonttools` which uses OS-specific
builds and for which compatibility got introduced in
https://github.com/fonttools/fonttools/pull/3656 and the corresponding
`cp313` wheels for all distributions are published on
https://pypi.org/project/fonttools/#files.
Moreover, we fix forgotten `npx` usage in the font tests README which
was encountered while testing this patch.
This is unblocked now that all dependencies have been updated and the
flat configuration format (compatible with ESLint 8 and 9) was
introduced first. The following deprecation warnings during `npm
install` are resolved by this upgrade:
```
npm warn deprecated @humanwhocodes/config-array@0.13.0: Use @eslint/config-array instead
npm warn deprecated @humanwhocodes/object-schema@2.0.3: Use @eslint/object-schema instead
npm warn deprecated eslint@8.57.1: This version is no longer supported. Please see https://eslint.org/version-support for other options.
```
Note that according to https://eslint.org/version-support ESLint 8 is
officially EOL now, and ESLint 9 has been released for over seven
months and is the only officially supported version.
Fixes#17928.
This allows end-users to forcibly disable `ImageDecoder` usage, even if the browser appears to support it (similar to the pre-existing option for `OffscreenCanvas`).
Flat config is the new config system used by ESLint 9.
To make the migration easier, they also added
flat config support to ESLint 8.
This commit migrates the various ESLint configs in the repository to use
the new system, **without** upgrading to ESLint 9 yet.
It fixes#19008.
In Firefox on mac, the default padding is set to 4px and with Firefox for iOS, it's set to 13px.
The padding is useless for such buttons.
Given that `ImageData` has been supported for many years in all browsers, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/ImageData#browser_compatibility), we have a `typeof` check that's only necessary in Node.js environments.
Since the `@napi-rs/canvas` package provides that functionality, we can thus add an `ImageData` polyfill which allows us to ever so slightly simplify the code.
The `@napi-rs/canvas` package has fewer dependencies, which should *hopefully* make installing and using it easier for `pdfjs-dist` end-users. (Over the years we've seen, repeatedly, that `canvas` can be difficult to install successfully.)
Furthermore, this package includes more functionality (such as `Path2D`) which reduces the overall number of dependencies in the PDF.js project.
One point to note is that `@napi-rs/canvas` is a fair bit newer than `canvas`, and has a lot fewer users, however looking at the commit history it does seem to be actively maintained.
Note that I've successfully tested the [Node.js examples](https://github.com/mozilla/pdf.js/tree/master/examples/node), in particular the `pdf2png` one, with this patch applied and things appear to work fine.
Please see:
- https://www.npmjs.com/package/@napi-rs/canvas
- https://github.com/Brooooooklyn/canvas
Fixes https://github.com/mozilla/pdf.js/issues/18957https://github.com/mozilla/pdf.js/pull/18682 introduced a regression that causes the following error:
```
Uncaught TypeError: Failed to construct 'Headers': Invalid name
at PDFNetworkStreamFullRequestReader._onHeadersReceived (pdf.mjs:10214:29)
at NetworkManager.onStateChange (pdf.mjs:10103:22)
```
The mentioned PR replaced a call to `getResponseHeader()` with `getAllResponseHeaders()` without handling cases where it may return null or an empty string. Quote from the [docs](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/getAllResponseHeaders#return_value):
> Returns:
>
>A string representing all of the response's headers (except those whose field name is Set-Cookie) separated by CRLF, or null if no response has been received. If a network error happened, an empty string is returned.
Run the following code and observe the error in the console. Note that the URL is intentionally set to an invalid value to simulate network error
```js
<script src="//mozilla.github.io/pdf.js/build/pdf.mjs" type="module"></script>
<script type="module">
var url = 'blob:';
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.mjs';
var loadingTask = pdfjsLib.getDocument(url);
loadingTask.promise
.then((pdf) => console.log('PDF loaded'))
.catch((reason) => console.error(reason));
</script>
```
The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.
In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here.
*Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
This patch updates the minimum supported environments as follows:
- Node.js 20, which was released on 2023-04-18 and has now entered the "Maintenance"-phase; see https://github.com/nodejs/release#release-schedule
Furthermore, note also that Node.js 18 will fairly soon reach EOL.
This integration test fails intermittently because we're not
(correctly) awaiting the sandbox actions. The `27R` field in
`issue14862.pdf` triggers sandbox events for every typing action, but
for the backspace and "a" character typing actions we weren't awaiting
the sandbox trip at all, and for other places we weren't awaiting it
fully (causing some characters to be missed in the assertion).
This commit fixes the issues by using the appropriate helper functions,
similar to what we did in PR #18399. Not only is this shorter in terms
of code, but it also fixed the near-permafail for this test with newer
versions of Puppeteer.
- This helper function has only a single call-site, and the function is fairly short.
- It'll only be invoked if range requests are *disabled*, or if the entire PDF manages to load *before* the headers are resolved (which is very unlikely).
Hence, by default, this helper function is not invoked.
- By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code.
Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).
I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would *intermittently* cause (some of) the AnnotationLayers to break with errors printed in the console (see below).
In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used.
Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations.
```
Uncaught (in promise)
Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
\#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15
Uncaught (in promise)
Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
```
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
This way we can directly throw Errors, rather than having to "manually" return rejected Promises, which is ever so slightly shorter.
Also, since `useWorkerFetch` is always true in MOZCENTRAL builds these message-handlers should not be invoked there.
One goal is to make the code for drawing with the Ink tool similar to the one to free highlighting:
it doesn't really make sense to have so different ways to do almost the same thing.
When the zoom level is high, it'll avoid to create a too big canvas covering all the page which consume
more memory, makes the drawing very slow and the overall user xp pretty bad.
A second goal is to be able to easily implement more drawing tools where we would just have to implement
how to draw from the pointer coordinates.
We can convert the handler to an `async` function, which removes the need to create a temporary Promise here.
Given the age of this code it shouldn't hurt to simplify it a little bit.
This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64
*Please note:* These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.
- Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already *implicitly* assuming a particular length.
- Move the constants into the `fingerprints`-getter, since they're not used anywhere else.
- Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64
This extends PR 13796 to also handle the case where sub-streams contain invalid data, i.e. anything that isn't a Stream, however please note that in these cases there's no guarantee that we'll render the page "correctly".
Note that Adobe Reader, i.e. the PDF reference implementation, cannot render the last page of the referenced PDF document.
Currently we manually localize and update the DOM-elements of the AltText-button, and it seems nicer to utilize Fluent "properly" for that task.
This can be achieved by introducing an explicit `span`-element on the AltText-button (similar to e.g. the regular toolbar-buttons), and adding a few more l10n-strings, since that allows just setting the `data-l10n-id`-attribute on all the relevant DOM-elements.
Finally, note how we no longer need to localize any strings eagerly when initializing the various editors.
Move the `ImageResizer._goodSquareLength` definition into the class itself, since the current position shouldn't be necessary, and also convert it into an actually private field.
It fixes#18956.
In the patch #18029, for performance reasons and because I thought it was useless, I deliberately chose to not fill the mask
with the backdrop color when it's full black: it was a bad idea.
So in this patch we always add the backdrop color to the mask.
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Given that we've not shipped, nor used, anything except binary CMaps for years let's just cache them unconditionally (since that's a tiny bit less code).
The PDF document is clearly corrupt, since it has /FontFile2 entries that are Dictionaries which obviously isn't correct.
While there's obviously no guarantee that things will look perfect this way, actually rendering the text at all should be an improvement in general.
Unfortunately it turns out to be somewhat common for users to provide a bunch of "unrelated" information in this field, or even stating their entire problem there, rather than placing it under the appropriate headings further down in the template.
This moves more functionality into the base-class, rather than having to duplicate that in the extending classes.
For consistency, also updates the `BaseStandardFontDataFactory` and introduces more `async`/`await` in various relevant code.
With the recent re-factoring of the viewer CSS rules we now have some duplication of the `mask-image` definitions for the print/download buttons in the secondaryToolbar; note 17419de157/web/viewer.css (L1204-L1210)
The `getSpanRectFromText` helper function returns the location as float
values. This could be desirable in cases where the exact values matter
(for example during comparisons), but in the text layer tests we don't
need this precision. Moreover, the Puppeteer `page.mouse.move` API
apparently doesn't work correctly if float values are given as input.
Note that this test only failed because it couldn't move to the initial
selection position; any subsequent moves actually worked because the
`moveInSteps` helper function already rounded all values correctly.
This commit fixes the issue by consistently rounding all values that we
pass to Puppeteer's `page.mouse.move` API.
By using "data-l10n-attrs" it's possible to instruct Fluent to localize *custom* attributes, which means that we don't need to manually translate/update the "default-content" in FreeText editors.
The plugins have originally been introduced in commit d63da81 for the
`eslint-plugin-mozilla` dependency, but the `eslint-plugin-mozilla`
plugin got removed in commit be93d53 and we also don't use the plugins
ourselves in e.g. our `.eslintrc` files (as evidenced by `npx gulp lint`
not failing while it does fail if we remove any of the other plugins).
This way we allow the rest of the packages to be loaded successfully, such that e.g. the Node.js unit-tests work correctly.
Note that this occurred after updating the `node-canvas` package to version `3.0.0-rc2`, however it's not immediately clear to me if it's a problem there or in the `path2d` package; see also nilzona/path2d-polyfill/issues/84.
- Change the "message" event handler function to a private method.
- Remove the "message" event listener with an `AbortSignal`.
- Extend the `LoopbackPort`-class with `AbortSignal` support.
The code parses the /RBGroups entry in the OC configuration dict and adds the property `rbGroups' to instances of the OptionalContentGroup class. rbGroups takes an array of Sets, where each Set instance represents an RB group the OptionalContentGroup instance is a member of. Such a Set instance contains all OCG ids within the corresponding RB group. RB groups an OCG is associated with are processed when its visibility is set to true, as required by the PDF spec.
Given that the sub-title of that document is "Public domain texts for young people." and that the images have clear sources at the end of the document, it should (hopefully) be OK to add it to the repository rather than relying on a linked test-case.
Since this value is used to allocate an array, it makes sense to avoid to use too much memory.
From the specs, this value must be in the range [0; 255] (see section 8.6.6.3).
This patch removes the unused property 'highVal'.
When playing with a pen, I noticed that sometimes a free highlight has still its
gray outline when an other one is drawn: for any reason the pointerleave event isn't
triggered.
This is similar to a lot of other code, where we've been replacing explicit `removeEventListener`-calls.
Given the somewhat limited availability of `AbortSignal.any()`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static#browser_compatibility), this event listener will no longer be immediately removed in older browsers (however that should be fine).
After PR 18845 we're accessing this getter more, hence it seems like a good idea to ensure that the initial `formInfo` access is covered as well.
While unlikely to be a problem in practice, at least theoretically that data may not be available and the code in `fieldObjects` could thus currently be *unintentionally* invoked more than once.
The default `page.type()` API from Puppeteer works for text fields that
only dispatch a sandbox event on e.g. focus loss (i.e. after all
characters have been inserted), and for those we can also use the
default typing delay from Puppeteer instead of defining our own value.
However, it doesn't work correctly for text fields where every character
insertion dispatches a sandbox event. This is because processing the
sandbox event takes some time and Puppeteer must wait for that before it
can (safely) insert the next character. This commit therefore introduces
a helper function to type a given value correctly in such text fields.
Not only does this fix intermittent failures if our delay was too low
for sandbox processing to complete, but it also speeds up the tests by
eliminating our delays in places where they were (much) higher than
necessary. In total the runtime of the scripting integration test suite
goes from 137 seconds before this patch to 100 seconds after this patch.
Note that the new version of `eslint-plugin-import` contains support
for ESLint 9.
Moreover, the new version of Babel removed the leading space for import
comments (see https://github.com/babel/babel/pull/16780), so we update
the corresponding test expectation to match this.
Fixes a part of #17928.
A recent change in Firefox induced too much difference between the text widths computed in using a Canvas
and the ones computed by the text layout engine when rendering the text layer. Consequently, the
text selection can be bad on Windows with some fonts like Arial or Consolas.
This patch is a workaround to try to use in first place some fonts which don't have the problem.
Strangely enough the code works just fine as-is in the GENERIC viewer, so there must be some difference between the Firefox built-in Fluent implementation and the Fluent.js one.
It seems that we can work-around the problem by handling this l10n-arg the same way that we handle dates in the AnnotationLayer, and testing this with the Firefox Devtools it seems that it should work.
It fixes#18849.
When such an annotation is deleted, we make sure that there are some data
to restore.
The version of this patch was making undoing a svg deletion buggy, so it's fixed now and
an integration test has been added for this case.
With the changes made in PR 18385 the `toolbarViewer` element is now shorter than before, since the padding is applied on its `toolbarContainer` parent-element instead.
This causes two separate regressions:
- Clicking at the very top/bottom of the toolbar no longer closes the secondaryToolbar like previously.
- The `CaretBrowsingMode`-constructor no longer computes the toolbar-height correctly.
Given how/where the `container`-property is being used these changes *should* thus be safe.
The current implementation won't work correctly in some cases, e.g. if RBGroups are present, which means that it's possible for the UI to get out-of-sync with the actual optionalContent-state.
To avoid querying the DOM unnecessarily we cache the last known UI-state and compare with the actual optionalContent-state, which thus replaces the previously used hash-comparison.
This patch updates the minimum supported browsers as follows:
- Google Chrome 103[1], which was released on 2022-06-21; see https://chromereleases.googleblog.com/2022/06/stable-channel-update-for-desktop_21.html
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
---
[1] This is consistent with the minimum supported version in the recently updated Google Chrome addon.
We disable any non-default CursorTool in PresentationMode and during Editing, since they don't make sense there and to prevent problems such as e.g. [bug 1792422](https://bugzilla.mozilla.org/show_bug.cgi?id=1792422).
Hence it seems like a good idea to *also* disable the relevant SecondaryToolbar-buttons, to avoid the user being surprised that the CursorTools-buttons do nothing if clicked.
Given the following excerpt from the [Wikipedia article](https://en.wikipedia.org/wiki/Gill_Sans), mapping this to Helvetica should hopefully be fine:
> It has been described as "the British Helvetica" because of its lasting popularity in British design.
Given that the `WorkerTransport`-constructor will access all possible factory-instances, let's ensure that the `transportFactory`-object always has a consistent shape regardless of other options.
Also, since `useWorkerFetch` is always true in MOZCENTRAL builds we never need to create `CMapReaderFactory`/`StandardFontDataFactory`-instances there.
The first goal of this patch was to remove the tabindex because it helps
to improve overall a11y. That led to move some html elements associated
with the buttons which helped to position these elements relatively to their
buttons.
Consequently it was easy to change the toolbar height (configurable in Firefox
with the pref browser.uidensity): it's the second goal of this patch.
For a11y reasons we want to be able to change the height of the toolbar to make
the buttons larger.
This is unblocked because in commit bb302dd the default value for the
constructor got removed, which apparently confused TypeScript before.
Fixes#18770.
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.
As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
Currently, the css for a separator is something like { height: 1px; background-color: ... }.
But its rendering depends on its position on the screen.
So instead of setting the height to 1px, we just set something like { border-top: 1px solid ...; },
this way the final rendering is exactly the same for all the separators.
This code is quite old and has been moved/re-factored a few times over the years, however we can simplify this even further since we don't actually need a function to determine what NetworkStream-implementation to use.
After the removal of 'unsafe-eval' CSP in #18651, WebAssembly fails to
load, resulting in issues such as seen in #18457.
Manifest Version 3 does not allow 'unsafe-eval', does accept the more
specific 'wasm-unsafe-eval' as of Chrome 103. Note that manifest.json
already sets minimum_chrome_version to 103.
This patch also adds `object-src 'self'` because it was required until
Chrome 110. As of Chrome 111, the default is `object-src 'self'` and
`object-src` is no longer required. We could drop `object-src` in the
future, but for now we need to include it to support Chrome 103 - 110.
Note that the textContent is returned in "chunks" from the API, through the use of `ReadableStream`s, and on the main-thread we're (normally) using just one temporary canvas in order to measure the size of the textLayer `span`s; see the [`#layout`](5b4c2fe1a8/src/display/text_layer.js (L396-L428)) method.
*Order of events, for parallel textLayer rendering:*
1. Call [`render`](5b4c2fe1a8/src/display/text_layer.js (L155-L177)) of the textLayer for page A.
2. Immediately call `render` of the textLayer for page B.
3. The first text-chunk for pageA arrives, and it's parsed/layout which means updating the cached [fontSize/fontFamily](5b4c2fe1a8/src/display/text_layer.js (L409-L413)) for the textLayer of page A.
4. The first text-chunk for pageB arrives, which means updating the cached fontSize/fontFamily *for the textLayer of page B* since this data is unique to each `TextLayer`-instance.
5. The second text-chunk for pageA arrives, and we don't update the canvas-font since the cached fontSize/fontFamily still apply from step 3 above.
Where this potentially breaks down is between the last steps, since we're using just one temporary canvas for all measurements but have *individual* fontSize/fontFamily caches for each textLayer.
Hence it's possible that the canvas-font has actually changed, despite the cached values suggesting otherwise, and to address this we instead cache the fontSize/fontFamily globally through a new (static) helper method.
*Note:* Includes a basic unit-test, using dummy text-content, which fails on `master` and passes with this patch.
Finally, pun intended, ensure that temporary textLayer-data is cleared *before* the `render`-promise resolves to avoid any intermittent problems in the unit-tests.
Fix regression from #18711. `urlFilter` is a key of `condition`, not of
the `rule`. Consequently, the feature detection method failed to detect
the availability of the feature in Chrome 128+.
The minimum required version is Chrome 103 because wildcard support for
web_accessible_resources[].extension_id was introduced in 103, in
c9caeb1a08
A way to broaden compatibility is to drop that key. By doing so, the
minimum required Chrome version is then 96, because of the
chrome.storage.session API (and declarativeNetRequestWithHostAccess).
Since gulpfile.js already defines "Chrome >= 98" and there is no real
point in expanding support to older versions, I'm just setting the
minimum version to 103.
- Replace DOM-based pdfHandler.html (background page) with background.js
(extension service worker).
- Adjust logic of background scripts to account for the fact that the
scripts can execute repeatedly during a browser session. Primarily,
register relevant extension event handlers at the top level and use
in-memory storage.session API to keep track of initialization state.
- Extension URL router: replace blocking webRequest with the service
worker-specific "fetch" event.
- PDF detection: replace blocking webRequest with declarativeNetRequest.
This requires Chrome 128+. The next commit will add a fallback for
earlier Chrome versions.
In MV3, the background script is a service worker. localStorage is not
supported in service workers. Switch to storage.local instead.
Data migration is not implemented because it is not needed due to the
privacy-friendly design of the telemetry: In practice the data is reset
about every 4 weeks, when the major version of Chrome is updated.
restoretab.js was added in https://github.com/mozilla/pdf.js/pull/6233
with the purpose of restoring lost tabs when Chrome closes all extension
tabs when it reloads the extension. This forced reload can happen when
the user toggles the "Allow access to file URLs" option.
This logic does not work any more, and since the use of localStorage is
a blocker in migrating to MV3, this patch just drops the logic.
MV3 does not support chrome_style in options_ui. Remove it and replace
it with the minimal amount of styles that still has some spacing around
the individual settings for readability.
webRequestBlocking is unavailable in MV3. Non-blocking webRequest can
still be used to detect the Referer, but we have to use
declarativeNetRequest to change the Referer header as needed.
*Please note:* As a general rule we probably don't need to fix things affecting *custom* implementations of the default viewer, but in this case a "targeted" fix seem possible that shouldn't (famous last words) break anything else.
Ensure that the `.visibleMediumView` class, which is used when the viewer becomes narrow, cannot override the `display`-value for DOM elements that are being *explicitly* hidden either through the associated attribute or class.
The idea is to insert a span in the text layer with an aria-role set to img
and use the bounding box provided by the attribute field in the tag dict in
order to have non-null dimensions for the image to make it "visible".
In hindsight it occurred to me that there's a couple of smaller issues with this method after it's made asynchronous (in PR 18658).
- If the `render`-method is invoked back-to-back the existing caching doesn't guarantee that re-parsing won't occur, which we can address by introducing a new (private) promise.
- If there's any errors fetching and/or parsing the structTree-data, we'd attempt to parse it again on re-rendering despite that being pointless.
It has been tested with Voice Over (mac) and with NVDA (windows).
When an added stamp annotation is focused, the screen reader will announce
that it's a figure containing a graphic with the added alt-text.
The `Headers` functionality is now available in all browsers/environments that we support, which allows us to consolidate and simplify how the `httpHeaders` API-option is handled; see https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility
Also, simplifies the old `NetworkManager`-constructor a little bit.
It was recently brought to my attention that using partial or generated localization ids is bad for maintainability, hence this patch goes through the code-base and replaces any such occurrences.
Currently we repeat virtually the same http/https request code in two different classes in this file, which seems unnecessary and it leads to more code.
With the introduction of Fluent the `getLanguage`-method became synchronous, hence it no longer seems necessary to do the metric-locale check eagerly in the constructor and it can instead be "delayed" until actually needed.
It was recently brought to my attention that using partial or generated localization ids is bad for maintainability, which means that PR 18636 wasn't the correct thing to do.
However, just reverting that one doesn't really fix the problems which is why this patch updates *every* l10n-id in the `PDFDocumentProperties` class (but doesn't touch any `viewer.ftl`-files). Obviously this leads to more verbose code, but that cannot really be helped.
We can remove the `reset`-parameter, since it's redundant, given that it's only used after `PDFDocumentProperties.#reset` has been invoked which means that `this.#fieldData === null` which is equivalent to resetting.
Also, we don't need to have two separate loops in order to update the UI in this method.
Finally, inline the `DEFAULT_FIELD_CONTENT` constant now that it's only used once.
The Node.js url.parse API (https://nodejs.org/api/url.html#urlparseurlstring-parsequerystring-slashesdenotehost)
is deprecated because it's prone to security issues (to the point that Node.js doesn't even publish CVEs for it anymore).
The official reccomendation is to instead use the global URL constructor, available both in Node.js and in browsers.
Node.js filesystem APIs accept URL objects as parameter, so this also avoids a few URL->filepath conversions.
This l10n-string is being re-defined once for every editor, i.e. currently four times, which seems unnecessary.
To avoid having to check if this l10n-string exists first, we can utilize rest parameters to move it into the `AnnotationEditor._l10nPromise` Map-definition instead.
Currently we manually localize and update the DOM-elements of the editor-resizers, and it seems nicer to utilize Fluent for that task.
This can be achieved by updating the l10n-strings to directly target the `aria-label` and then just setting the `data-l10n-id` on the DOM-elements.
In preparation for migrating the Chrome extension to Manifest Version 3,
this patch removes parts of the manifest that are obsolete and/or
unsupported in MV3.
Remove ftp mentions: ftp was dropped from 6 years ago, in Chrome 59.
Remove file_browser_handlers: does not work on Chrome OS according to
https://github.com/mozilla/pdf.js/issues/14161 . MV3 replacement needs
a different manifest key and logic, which will be added later.
Remove content_security_policy: MV3 does not support unsafe-eval CSP,
and PDF.js does not require it. This may result in a mild performance
degradation in PDFs that contain PostScript.
Remove page_action logic: When this logic was originally introduced,
Chrome showed page action buttons in the address bar, which enabled
the extension to display the button on specific PDF viewer tabs only.
In Chrome 49 (2016), pageActions were dropped from the address bar and
put in an UI area that is hidden by default. The user can pin the button
to be unconditionally visible on the toolbar, which is distracting.
Because the UX is no longer serving the original needs, this patch
removes page_action from the manifest.
This major version contains three breaking changes that impact us:
- The `product` option has been renamed to the more suitable `browser`.
- The `page.screenshot()` API returns a `Uint8Array` instead of a
`Buffer`, but since `pngjs` requires a `Buffer` object we need to do
the conversion using `Buffer.from()` before passing data to `pngjs`.
- The browser configuration should be set using a configuration file
instead of environment variables. Note that as a bonus this allows us
to remove the `cross-env` dependency since that was only used to set
the Puppeteer environment variable equally for all operating systems.
For more information about the changes between the old and new Puppeteer
versions refer to https://github.com/puppeteer/puppeteer/releases.
The `AnnotationLayer` may not display correctly formatted data in PopupAnnotations, especially in the GENERIC viewer, since it's using native methods[1] that depend on the *browser* locale instead of the viewer locale as intended.
With Fluent we're able to improve things since it's got built-in support for formatting dates. Not only does this simplify the JavaScript code slightly, but it also gives the localizer more fine-grained control of the desired output.
Please find additional information here:
- https://projectfluent.org/fluent/guide/builtins.html
- https://projectfluent.org/fluent/guide/functions.html
---
[1] `toLocaleDateString`, and `toLocaleTimeString`.
The `PDFDocumentProperties` dialog may not display correctly formatted data, especially in the GENERIC viewer, since it's using native methods[1] that depend on the *browser* locale instead of the viewer locale as intended.
At the time when this dialog was introduced that was probably all we could easily do, but with Fluent we're able to improve things since it's got built-in support for formatting numbers and dates. Not only does this simplify the JavaScript code, but it also gives the localizer more fine-grained control of the desired output.
Please find additional information here:
- https://projectfluent.org/fluent/guide/builtins.html
- https://projectfluent.org/fluent/guide/functions.html
---
[1] `toLocaleString`, `toLocaleDateString`, and `toLocaleTimeString`.
Currently we *manually* fetch the "pdfjs-additional-layers" string and update the DOM-element, which was needed since we want to avoid triggering a bunch of otherwise unnecessary translation when appending the entire layer-tree to the DOM.
By introducing a new helper method in the `L10n`-class we can avoid this, and instead use a "data-l10n-id" attribute on the element (as most other viewer code does nowadays).
Given the length of the l10n-strings we can slightly reduce verbosity, and thus overall code-size, by introducing a helper method for fetching l10n-data.
While testing this I stumbled upon an issue in the `L10n`-class, where an optional chaining operator was placed incorrectly since the underlying method always return an Array; see 48e2a62ed4/fluent-dom/src/localization.js (L38-L77)
- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
- Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
- PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font.
Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see b47c7eca83/src/core/fonts.js (L1186-L1190)
This integration test fails intermittently because we cache the initial
total value to be able to compare it to the new total value at the end
of the test to check that it's different before doing the assertions.
However, this doesn't work as expected because the second `clearInput`
call triggers an intermediate total value calculation because it clicks
on another input field and that triggers a sandbox event.
This results in the `waitForFunction` calls always resolving immediately
and since we don't use other means of waiting until the calculation is
done (using e.g. `waitForSandboxTrip`) we basically rely on the time
between the final click and the assertions to be enough for the sandbox
to do its work. If it's is not done in that time, we do the assertions
with older values and that makes the test fail.
This commit fixes the issue by simply waiting for the total value to be
what we expect it to be. This requires less code, is more consistent
with the other integration tests and removes the possibility of doing
assertions against older values.
In PR #18574 setting `window.uiManager` was moved into the `src` folder
to avoid intermittent integration test failures because at the time we
lacked a way to register event listeners early (before PDF.js loads).
However, in PR #18617 this functionality got introduced, so we can now
use the new way of setting up the event bus in the tests to move this
back to the `test` folder again and to reduce the amount of test-only
code in the main codebase as discussed in PR #18574.
Partially reverts e037c5711d3d2413669e9b6c275986adf24a295b.
The function evaluateOnNewDocument in Puppeteer allow us to execute some js before the pdf.js one
is loaded.
It allows us to stub some setters before there are used and then set some event handlers very soon.
We introduced quite a few new strings recently, but during the last few
rounds of updates we didn't see new translations updates becoming
available. I only just now recalled the announcement that Mozilla is
moving from Mercurial to Git, and indeed the Mercurial page at
https://hg.mozilla.org/l10n-central hasn't been updated since July
anymore and that's were we used to pull our translations from.
This commit fixes the issue by changing the URLs to the Mozilla Git
repositories hosted on GitHub instead.
The way that the debugging hash-parameter parsing is implemented leads to a lot of boilerplate code in this method, since *most* of the cases are very similar.[1]
With just a few exceptions most of the options can be handled automatically, by defining an appropriate checker for each option.
---
[1] With the recent introduction of TESTING-only options the size of this method increased a fair bit.
The problem seems to be caused by the browser trying to "restore" editing input-elements, in the various toolbars, to their previous values when the tab is re-opened.
Hence the simplest solution appears to be to move the event handling into the editor-code, which is also less code overall, since the listener thus won't be registered early enough for the problem to appear.
This patch allows embedders of PDF.js to provide custom match
logic for seaching in PDFs. This is done by subclassing the
PDFFindController class and overriding the `match` method.
`match` is called once per PDF page, receives as parameters the
search query, the page contents, and the page index, and returns
an array of { index, length } objects representing the search
results.
This commit fixes a regression from #18596 where the "Add image" button
styles broke because it's a labeled toolbar button but the labeled
toolbar button CSS rules were only scoped to the secondary toolbar.
However, nowadays labeled toolbar buttons are also used in e.g. the
editor parameters toolbar, and this highlighted that we didn't clearly
encode the concept of labeled toolbar buttons in the CSS so far.
This patch extracts the CSS rules for labeled toolbar buttons that were
previously limited to the secondary toolbar into a dedicated generic
class that can be applied on top of any unlabeled toolbar button to
convert it to a labeled toolbar button, regardless of its position in
the DOM. This also makes the distinction clearer in the HTML, and the
new CSS scope for the toolbar buttons lays the foundation for combining
the other toolbar buttons rules in there later on.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Using the `:is(a)` selector we can move the previously separate rules
into the main `.toolbarButton` override rules.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Now that we have dedicated CSS scopes for the findbar and the secondary
toolbar we have ended up with two CSS blocks with identical selectors,
so now we can combine them for simplicity and to remove some rules in
the first block that were always overridden by the second block.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
We have a number of base-classes that are only intended to be extended, but never to be used directly. To help enforce this during development these base-class constructors will check for direct usage, however that code is obviously not needed in the actual builds.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~2.7` kilo-bytes, which isn't a lot but still cannot hurt.
The secondary toolbar CSS rules predate the general availability of CSS
nesting, which makes them more difficult to understand and change
safely. The primary issues are that the rules are spread over the
`viewer.css` file, they share blocks with other elements and the scope
of the rules is sometimes bigger than necessary.
This refactoring groups all remaining secondary toolbar rules together,
scoped to the top-level `#secondaryToolbar` element, for improved
overview and isolation. Note that this patch only intends to move the
existing rules around and not change any behavior.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Secondary toolbar buttons are toolbar buttons with some extra rules,
mainly to make them wider and have visible labels. However, this
similarity is currently not clearly reflected in the implementation
because the secondary toolbar buttons use a different CSS class,
`secondaryToolbarButton`, compared to the other toolbar buttons that
use the `toolbarButton` CSS class. This also causes some duplication
in the rules and requires extra care to keep the common bits for the
`secondaryToolbarButton` class in sync with the `toolbarButton` class.
Fortunately, now that we have a dedicated CSS scope for the secondary
toolbar, we can simplify this by giving all secondary toolbar buttons
the `toolbarButton` class and explicitly listing the required overrides
in the `#secondaryToolbar` scope instead. Doing so removes most of the
special-casing for secondary toolbar buttons while explicitly listing
the differences in a single place for a better overview. It also lays
the foundation for making all toolbar buttons respect the
`browser.uidensity` Firefox preference later by reducing differences.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The secondary toolbar CSS rules predate the general availability of CSS
nesting, which makes them more difficult to understand and change
safely. The primary issues are that the rules are spread over the
`viewer.css` file, they share blocks with other elements and the scope
of the rules is sometimes bigger than necessary.
This refactoring groups all CSS rules for the secondary toolbar button
container/icons together, scoped to the top-level `#secondaryToolbar`
element, for improved overview and isolation. Note that this patch only
intends to move the existing rules around and not change any behavior.
Moreover, this patch does not move the rules for the secondary toolbar
itself and the secondary toolbar buttons; those will be part of a
follow-up patch and will be easier once this is in place first.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
Because of an old oversight (by me) we don't stop sidebar resizing when the browser window loses focus, which seems generally wrong and can also lead to duplicate mouse-related event listeners being registered.
There's a fair number of event listeners in the editor-code that we're currently removing "manually", by keeping references to their event handler functions.
This was necessary since we have a "global" `AbortController` that applies to all event listeners used in the editor-code, however it's now possible to combine multiple `AbortSignal`s; please see https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static
Since this functionality is [fairly new](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal/any_static#browser_compatibility) the viewer will check that `AbortSignal.any()` is available before enabling the editing-functionality.
(It should hopefully be fairly straightforward, famous last words, for users to implement a polyfill to allow editing in older browsers.)
Finally, this patch also adds checks and test-only asserts to ensure that we don't add duplicate event listeners in various editor-code.
The findbar CSS rules predate the general availability of CSS nesting,
which makes them more difficult to understand and change safely. The
primary issues are that the findbar rules are spread all over the
`viewer.css` file, they share blocks with non-findbar elements and the
scope of the rules is sometimes bigger than necessary.
This refactoring groups all findbar-related CSS rules together, scoped
to the top-level `#findbar` element, for improved overview and
isolation. Note that this patch only intends to move the existing rules
around and not change any behavior yet, but it does lay the foundation
for e.g. making the findbar respect the `browser.uidensity` preference
in Firefox in follow-up work.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
It looks like this has accidentally been copy/pasted from the
`Textfields and focus` block, which is the only one in which a test
actually uses it. We can therefore safely remove it from all other
blocks where no test uses it.
This commit updates the Babel plugin to:
- apply the same flattening logic that we already
have for blocks, to flatten blocks nested inside
class static blocks
- remove class static blocks when, after flattening
all the blocks they contain, they are empty.
Before this commit, the transform output was the
same as the input.
Given that we're removing event listeners with `AbortSignal` it's no longer necessary to keep a reference to a few of the event handler functions in order to remove them.
Hence we can simply inline the relevant `bind`-calls instead, which reduces the code-size a tiny bit.
The `waitForClick` helper function is functionality-wise mostly a
reduced copy of the more generic `waitForEvent` helper function that
we use for other integration tests, so we can safely replace it to
reduce the amount of code.
Moreover, the `waitForClick` code is prone to intermittent failures
given recent assertion failures we have seen on the bots (one of them
is linked in #18396) while `waitForEvent` has recently been fixed to
avoid intermittent failures, so usiong it should also get rid of the
flakiness for these integration tests.
The `options` handling, for the download-methods, was originally added in PR 16391 and became obsolete in PR 17771.
This fixes failures, in mozilla-central tests, which appeared after landing PR 18527.
Given that these event handlers are virtually identical, obviously with the exception of the name-parameter, let's reduce a little bit of code duplication.
- Shorten the names of the event listeners.
- Use `bind` to pass in the `PDFViewerApplication`-scope to the event handlers.
This makes the event handler code (a lot) less verbose, and this change is possible now that we're removing event listeners with `AbortSignal`.
- Move the GENERIC-only event listeners into the same pre-processor block.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~4.3` kilo-bytes, which isn't a lot but still cannot hurt.
Given that users fairly often report issues with unsupported browsers/environments it cannot hurt to provide a link to the relevant section in the FAQ.
We have a fair number of (effectively) single-line event handlers in the `web/app.js` file, which leads to unnecessarily verbose code. These can, without affecting readability too much, be replaced either by:
- Using `bind` for the simplest cases.
- Using arrow-functions for the remaining ones.
Note that this is possible since we started removing event listeners with `AbortSignal`, which means that we no longer need to keep a reference to the event handler functions to be able to remove them.
Given that the old event handler functions use fairly long function names, and the way that they access `PDFViewerApplication` (given their scope), they impact the overall code-size unnecessarily.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~3.7` kilo-bytes, which isn't a lot but still cannot hurt.
This patch adds a new entry in the secondary menu in order to open a dialog to let the user:
- disables the alt-text generation thanks to a ML model;
- deletes the alt-text model downloaded in Firefox;
- disabled the new alt-text flow.
Unfortunately it turns out (perhaps unsurprisingly) that even the new bug report template isn't stopping users from leaving out the single most important part, i.e. `Attach (recommended) or Link to PDF file`, despite it now being marked as a required field.
Added annotations could have some quadpoints (highlight, ink).
The isNumberArray check was returning false and consequently the annotation wasn't
printable.
The tests didn't catch this issue because the quadpoints were passed as Array.
So driver.js has been updated in order to pass them as Float32Array in order
to be in a situation similar to the real life one.
Over time a couple of event listeners have been placed in the constructor, despite there being an existing helper method for that purpose. To improve the code organization, let's move these to the intended method instead.
This refactoring lays the foundation for making the toolbar height
configurable in Firefox via the `browser.uidensity` preference. For
this to work correctly the toolbar height must be defined in a single
place that can easily be updated dynamically, hence this patch which
moves it to a CSS variable in such a way that the rest of the UI adapts
correctly if the value is changed.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The HTML button elements we use are all regular buttons that don't
submit form data to a server. According to
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/button#notes
those buttons should all have their type set to `button` explicitly, but
we only do that a handful of them. This commit fixes the issue by
consistently giving all our buttons the `button` type.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
The sidebar and secondary toolbar both have a reference to their toggle
buttons in their own sections in `getViewerConfiguration`, so it makes
sense for the findbar to do the same.
While we actually have a findbar-specific reference to the toggle
button, I noticed that we don't use it consistently because the toolbar
also has a reference to the exact same toggle button and we use both in
the code. This is probably for historical reasons: the docstring in the
toolbar file indicates that the `viewFind` element is an input to the
component, but that option is never actually used in the code itself.
This commit fixes the issue by removing the toolbar-specific reference,
since it's not actually used (anymore) in the toolbar code, so that we
consistently use the findbar-specific reference everywhere.
This dependency got introduced in PR #10293, almost six years ago now,
because `eslint-plugin-mozilla` didn't work without it but also didn't
require it as a dependency itself.
However, nowadays `eslint-plugin-mozilla` works just fine without it,
and other dependencies that need it correctly require it themselves.
This can be seen using `npm ls globals`:
```
$ npm ls globals
pdf.js
├─┬ @babel/core@7.24.9
│ └─┬ @babel/traverse@7.25.0
│ └── globals@11.12.0
├─┬ @babel/preset-env@7.25.0
│ └─┬ @babel/plugin-transform-classes@7.25.0
│ └── globals@11.12.0
├─┬ eslint-plugin-unicorn@55.0.0
│ └── globals@15.8.0 deduped
├─┬ eslint@8.57.0
│ ├─┬ @eslint/eslintrc@2.1.4
│ │ └── globals@13.24.0
│ └── globals@13.24.0
└── globals@15.8.0
```
Further proof that `eslint-plugin-mozilla` (no longer) uses `globals` is
from a source code search in
https://searchfox.org/mozilla-central/search?q=globals&path=&case=false®exp=false.
The only results for `eslint-plugin-mozilla` refer to a file named
`globals.js`, but the `globals` NPM package is not actually imported
anywhere.
Given this we should be able to safely get rid of this explicit
dependency on our end now.
For the Firefox pdf viewer, we want to use AI to guess an alt-text when adding an image to a pdf.
For now the telemtry stuff is not implemented and will come soon.
In order to test it locally:
- set enableAltText, enableFakeMLManager and enableUpdatedAddImage to true.
or in Firefox:
- set browser.ml.enable, pdfjs.enableAltText and pdfjs.enableUpdatedAddImage to true.
This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.
The current error-messages also mention internal parameters, which an end-user obviously don't have to care about. So, let's try to avoid confusion here by only including the API parameters.
- Initialize all user-options upfront, to allow getting the options without having to check if they exist first. (This is easier to implement after PR 18475 landed.)
- Move the user-options into a private field in `AppOptions`. (This code is old enough to pre-date general availability of that class feature.)
- Move code/methods limited to GENERIC-builds into `AppOptions`.
Currently we'll only dispatch events, for the options that support it, when updating preferences. Since this could potentially lead to inconsistent behaviour, let's avoid any future surprises by *always* dispatching events regardless of how the relevant option is being changed.
Obviously we should then also dispatch events in `AppOptions.set`, and to avoid adding even more duplicated code this method is changed into a wrapper for `AppOptions.setAll`.
While this is technically a tiny bit less efficient I don't think it matters in practice, since outside of development- and debug-mode `AppOptions.set` is barely used.
When selecting text, hovering over an element
causes all the text between (according the the dom
order) the current selection and that element to
be selected.
This means that when, while selecting, the cursor
moves over a link, all the text in the page gets
selected. Setting `user-select: none` on the link
annotations would improve the situation, but it
still makes it impossible to extend the selection
within a link without using Shift+arrows keys on
the keyboard.
This commit fixes the problem by setting
`pointer-events: none` on the `<section>`s in the
annotation layer while selecting some text. This
way, they are ignored for hit-testing and do not
affect selection.
It is still impossible to _start_ a selection
inside a link, as the link text is covered by the
link annotation.
Fixes#18266
Since "localeProperties" is needed in Firefox, let's remove a tiny bit of option duplication by using it in the GENERIC builds as well.
For convenience, the old debug-only "locale" hash-parameter is kept intact.
The `streamqueue` dependency is only used for the test targets in the
Gulpfile to make sure that the test types are run in series. This is
done by modelling the test processes as readable streams and then having
`streamqueue` combine them into a single readable stream for Gulp that
processes the inner readable streams in series (in contrast to the
`ordered-read-streams` dependency which is very similar but processes
the inner streams in parallel).
However, modelling the test processes as readable streams is a bit odd
because we're not actually streaming any data as one might expect.
Instead, we only use them to signal test process completion/abortion.
Fortunately nowadays, with modern Gulp versions, we don't need readable
streams and `streamqueue` anymore because we can achieve the same result
with simple asynchronous functions that can be passed to e.g.
`gulp.series()` calls. Note that we already do this in various places,
and overall it should be a better fit for test process invocations.
For options with varying types, see `useSystemFonts`, we're not sufficiently validating the type when setting a new value. This means that for an option that uses `OptionKind.UNDEF_ALLOWED` we'd allow *any* value, which is obviously not the intention.
Hence we instead introduce a new and *optional* `type`-field that allows specifying exactly which types are valid when multiple ones are supported.
*Note:* This obviously didn't occur to me until after PR 18465 landed, sorry about that!
This option is old enough that it predates e.g. the introduction of AppOptions, so it probably cannot hurt to re-factor this a little bit now.
- In development-mode we can just set this directly in AppOptions.
- In the extension-builds we still need to set it dynamically, however by moving this code we get the benefit of being able to avoid storing a data-URL in that case; note how [the API ignores those anyway](98e772727e/src/display/api.js (L256-L262)).
To avoid introducing any inline "hacks" in the viewer-code this meant adding `useSystemFonts` to the AppOptions, which thus required some new functionality since the default value should be `undefined` given how the option is handled in the API; note [this code](ed83d7c5e1/src/display/api.js (L298-L301)).
Finally, also moves the definition of the development-mode `window.isGECKOVIEW` property to the HTML file such that it's guaranteed to be set regardless of how and when it's accessed.
The old implementation in `PDFViewerApplication.download` means that if the `getDocument`-call hasn't yet downloaded the *entire* PDF document it will be re-downloaded. This seems generally undesirable since:
- In some (probably rare) cases a URL may not be valid an arbitrary number of times, which means that the download may fail.
- It will lead to wasted resources, since we'll end up fetching the same PDF document *twice* in that case (once via the `getDocument`-call and once to allow the user to save it).
Hence this patch suggests that we change this very old code to instead always call the `PDFDocumentProxy.getData` method, since that'll trigger immediate downloading of the remaining document via the existing `getDocument`-call.
Finally, the patch removes the `PDFViewerApplication.downloadComplete` property since it's now unused.
This was only necessary to prevent (unlikely) visual glitches when `disableAutoFetch = true` is being used.
However, it turns out that we can move this functionality into the `ProgressBar` class instead by checking if the entire PDF document has loaded. This works since the API is always reporting 100% loading progress regardless of how the document was loaded; see [this code](ed83d7c5e1/src/display/api.js (L2735-L2740)).
After the changes in PR 18413 we're now relying even more on `AppOptions` and it thus seems like a good idea to ensure that no invalid values can be added.
Hence the `AppOptions.{set, setAll}` methods will now *unconditionally* validate that the type of the values agree with the default-options.
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
To avoid having to request and await various browser data early during the PDF Viewer initialization, we can include that data in the existing preference fetching instead. With other planned changes to the PDF Viewer, the current situation would only become worse over time.
*Note:* Technically this data aren't preference-values, however we're already including other non-prefs in this list (e.g. `isInAutomation`) and doing it this way simplifies the overall implementation.
As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used.
Hence this seems like something that should be kept *internal* and thus only exposed in TESTING-builds.
Switching to an editing mode can be asynchronous (e.g. if an editable annotation exists on a
visible page), so we must add a new editor only when the page rendering is done.
Somehow I managed to mess up the URL creation relevant to e.g. MOZCENTRAL builds, which is breaking the pending PDF.js update in mozilla-central; sorry about that!
To avoid future issues, we'll now always check if absolute filter-URLs are necessary regardless of the build-target.
The Git logic for pushing to the `pdfjs-dist` repository was already
removed in PR #18350, but it was forgotten to also remove the logic to
create a Git repository in the first place. This commit fixes the open
action for that from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661367441.
Moreover, we implement the suggestion from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661364026
about moving the Git URL to a separate variable for consistency and we
update the homepage URL to the HTTPS scheme to avoid an HTTP -> HTTPS
redirect and enforce TLS-encrypted connections.
In PR #18356 the new tab page logic was disabled to prevent Firefox
from logging failed network connections to Contile, the Mozilla Tiles
service that is used for the new tab page [1]. However, recently this
log reappeared locally and on the bots:
```
console.warn: TopSitesFeed: Failed to fetch data from Contile server:
NetworkError when attempting to fetch resource.
```
It looks like Contile communication is also triggered from other places
in Firefox such as the URL bar [2], so this commit fixes the issue by
disabling network connections to Contile [3] altogether regardless of
their origin within Firefox. Note that we don't revert the change from
PR #18356 because as noted in [4] it can't hurt to keep that disabled
too to avoid overhead for a feature we don't use in the tests.
[1] https://github.com/mozilla-services/contile
[2] 196ef8360e/browser/components/urlbar/UrlbarProviderTopSites.sys.mjs (L38)
[3] 196ef8360e/browser/components/newtab/lib/TopSitesFeed.sys.mjs (L111)
[4] https://github.com/mozilla/pdf.js/pull/18356#issuecomment-2200354730
When the mouse was hovering an existing highlight, all the text in the page
was selected.
So when the user is selecting some text or drawing a free highlight, the mouse
is disabled for the existing editors.
Because of editor z-index, the toolbar belonging to an highlight created
before a second adjacent one, can be overlapped by this new one.
So when the user select an editor we just show it on front of all the other
ones to make sure that it can be used normally.
This functionality is purposely limited to development mode and GENERIC builds, since it's unnecessary in e.g. the *built-in* Firefox PDF Viewer, and will only be used when a `<base>`-element is actually present.
*Please note:* We also have tests in mozilla-central that will *indirectly* ensure that relative filter-URLs work as intended in the Firefox PDF Viewer, see https://searchfox.org/mozilla-central/source/toolkit/components/pdfjs/test/browser_pdfjs_filters.js
---
To test that the issue is fixed, the following code can be used:
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<base href=".">
<title>base href (issue 18406)</title>
</head>
<body>
<ul>
<li>Place this code in a file, named `base_href.html`, in the root of the PDF.js repository</li>
<li>Run <pre>npx gulp dist-install</pre></li>
<li>Run <pre>npx gulp server</pre></li>
<li>Open <a href="http://localhost:8888/base_href.html">http://localhost:8888/base_href.html</a> in a browser</li>
<li>Compare rendering with <a href="http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf">http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf</a></li>
</ul>
<canvas id="the-canvas" style="border: 1px solid black; direction: ltr;"></canvas>
<script src="/node_modules/pdfjs-dist/build/pdf.mjs" type="module"></script>
<script id="script" type="module">
//
// If absolute URL from the remote server is provided, configure the CORS
// header on that server.
//
const url = '/test/pdfs/issue16287.pdf';
//
// The workerSrc property shall be specified.
//
pdfjsLib.GlobalWorkerOptions.workerSrc =
'/node_modules/pdfjs-dist/build/pdf.worker.mjs';
//
// Asynchronous download PDF
//
const loadingTask = pdfjsLib.getDocument(url);
const pdf = await loadingTask.promise;
//
// Fetch the first page
//
const page = await pdf.getPage(1);
const scale = 1.5;
const viewport = page.getViewport({ scale });
// Support HiDPI-screens.
const outputScale = window.devicePixelRatio || 1;
//
// Prepare canvas using PDF page dimensions
//
const canvas = document.getElementById("the-canvas");
const context = canvas.getContext("2d");
canvas.width = Math.floor(viewport.width * outputScale);
canvas.height = Math.floor(viewport.height * outputScale);
canvas.style.width = Math.floor(viewport.width) + "px";
canvas.style.height = Math.floor(viewport.height) + "px";
const transform = outputScale !== 1
? [outputScale, 0, 0, outputScale, 0, 0]
: null;
//
// Render PDF page into canvas context
//
const renderContext = {
canvasContext: context,
transform,
viewport,
};
page.render(renderContext);
</script>
</body>
</html>
```
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
Given:
```css
html,
body {
height: 100%;
}
body {
display: flex;
}
```
The `<div>` appended to the `<body>` will take up the full height of the
viewport due to the implicit `align-items: stretch` of flex containers.
This results in an incorrect computed `minFontSize` value.
In the MOZCENTRAL build the `BasePreferences` class is almost unused, since it's only being used to initialize and update the `AppOptions` which means that theoretically we could move the relevant code there.
However for the other build targets (e.g. GENERIC and CHROME) we still need to keep the `BasePreferences` class, although we can re-factor things to move the necessary validation inside of `AppOptions` and thus simplify the code and reduce duplication.
The patch also moves the event dispatching, for changed preference values, into `AppOptions` instead.
Code inspection uncovered that quite a few integration tests don't wait
for scripting to be ready before proceeding, which is a source of
intermittent failures, especially on slower machines where e.g. creating
the sandbox takes longer.
This commit fixes the issues by introducing a `waitForScripting` helper
function and calling it consistently in all scripting integration tests.
Add unit test to check compatability with such cmaps
In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
This unit test fails occasionally (albeit much less than before thanks
to PR #17663), so we change the parsing time check's divisor to prevent
it from happening again. If the last page's rendering time is less than
or equal to 50% of the first page's rendering time that should be enough
proof that no worker thread re-parsing occurred while also providing a
wide enough range to avoid intermittents.
Note that the assertion is now equal to the one we already have in the
"caches image resources at the document/page level, with main-thread
copying of complex images (issue 11518)" unit test which seems to work
reliably so far.
This patch fixes a situation that'll probably never happen, but nonetheless seems like something that we should address.
Currently the "updatedPreference" listener isn't registered *until after* the preferences have been read and initialized, which leaves a very short window of time where a preference change could theoretically be skipped.
If uncaught exceptions occur in the tests (which happened in #17962 and
can be triggered manually by throwing an error in `integration-boot.js`)
the teardown logic of the tests doesn't get to run and thus spawned
browser processes are not closed properly. Given that `test.mjs` is the
only process that has a reference to them they will become orphaned and
keep running if `test.mjs` exits without explicitly closing them.
This commit fixes the issue by always closing the browsers if uncaught
exceptions occur, and we make sure to log them for debugging purposes.
This integration test fails intermittently because we're not (correctly)
awaiting the character limit increase sandbox action.
For the first increase an attempt was made to handle this, but it doesn't
work correctly because the text in the field is `abcdefghijklmnopq` and
it's not be truncated until the sandbox action is completed, so because
we didn't await that we would could pass the `value !== "abcdefgh"` check
because `abcdefghijklmnopq !== abcdefgh` is true. For the second increase
we didn't have a check in place.
This commit fixes the issues by using the `waitForSandboxTrip` and
`waitForSelector` helper functions to make sure that we can only proceed
if the sandbox action is completed and the DOM element is updated.
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
Similar to the `mustBeViewed` method, we can check the relevant parameters within the `mustBeViewedWhenEditing` method itself since that (in my opinion) slightly helps readability of the code in the `src/core/document.js` file.
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
- if the annotation has to be composed with the page then the canvas must be correctly
composed with its parent. That means we should move the canvas under canvasWrapper
and we should extract composing info from the drawing instructions...
Currently it's the case with highlight annotations.
- we use some extra memory for those canvas even if the user will never edit them, which
the case for example when opening a pdf in Fenix.
So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
For provenance, enabled in PR #18352, to work the repository URL in
`package.json` is required to match the repository URL of the GitHub
Actions invocation. This should fix the following error we encountered
publishing a new release today:
```
npm error 422 Unprocessable Entity - PUT https://registry.npmjs.org/pdfjs-dist - Error verifying sigstore provenance bundle: Failed to validate repository information: package.json: "repository.url" is "git+https://github.com/mozilla/pdfjs-dist.git", expected to match "https://github.com/mozilla/pdf.js" from provenance
```
It should help to have such a garbage in the logs:
```
console.warn: TopSitesFeed: Failed to fetch data from Contile server: NetworkError when attempting to fetch resource.
JavaScript error: , line 0: TypeError: NetworkError when attempting to fetch resource.
```
This PR switches from `npm install` to `npm ci` on CI. This enables some additional checks to ensure repo integrity when using CI/CD.
Read more: https://docs.npmjs.com/cli/v10/commands/npm-ci
This commit removes the following warnings from the `npm publish` output:
```
npm warn publish npm auto-corrected some errors in your package.json when publishing. Please run "npm pkg fix" to address these errors.
npm warn publish errors corrected:
npm warn publish Removed invalid "scripts"
npm warn publish "repository.url" was normalized to "git+https://github.com/mozilla/pdfjs-dist.git"
```
For the "scripts" section it turns out that if the package doesn't have
any scripts it's expected to explicitly set it to an empty object; refer
to https://github.com/npm/cli/issues/6918 and
https://github.com/denoland/dnt/pull/414.
Errors related to this `requestAnimationFrame` show up intermittently when running the integration-tests on the bots, however I've been unable to reproduce it locally.
Hence I cannot guarantee that it's enough to fix the timing issues, however this should be generally safe since the `requestAnimationFrame` invokes the `_next`-method and the first thing that one does is check that rendering hasn't been cancelled.
While the event listener is removed during testing, the `requestAnimationFrame` isn't cancelled and that occasionally shows up when the integration-tests are run on the bots.
Debugging #17931 uncovered a race condition in the way we use the
`waitForEvent` function. Currently the following happens:
1. We call `waitForEvent`, which starts execution of the function body
and immediately returns a promise.
2. We do the action that triggers the event.
3. We await the promise, which resolves if the event is triggered or
the timeout is reached.
The problem is in step 1: function body execution has started, but not
necessarily completed. Given that we don't await the promise, we
immediately trigger step 2 and it's not unlikely that the event we
trigger arrives before the event listener is actually registered in the
function body of `waitForEvent` (which is slower because it needs to be
evaluated in the page context and there is some other logic before the
actual `addEventListener` call).
This commit fixes the issue by passing the action to `waitForEvent` as
a callback so `waitForEvent` itself can call it once it's safe to do so.
This should make sure that we always register the event listener before
triggering the event, and because we shouldn't miss events anymore we
can also remove the retry logic for pasting.
The integration tests are currently not consistent in how they do
copy/pasting: some tests use the `kbCopy`/`kbPaste` functions with
waiting for the event inline, some have their own helper function to
combine those actions and some even call `kbCopy`/`kbPaste` without
waiting for the event at all (which can cause intermittent failures).
This commit fixes the issues by providing a set of four helper functions
that all tests use and that abstract e.g. waiting for the event away
from the caller. This makes the invididual tests simpler and consistent,
reduces code duplication and fixes possible intermittent failures
due to not waiting for events to trigger.
Browsers have an accessibility option that allows user to enforce
a minimum font size for all text rendered in the page, regardless
of what the font-size CSS property says. For example, it can be
found in Firefox under `font.minimum-size.x-western`.
When rendering the <span>s in the text layer, this causes the
text layer to not be aligned anymore with the underlying canvas.
While normally accessibility features should not be worked around,
in this case it is *not* improving accessibility:
- the text is transparent, so making it bigger doesn't make it more
readable
- the selection UX for users with that accessibility option enabled
is worse than for other users (it's basically unusable).
While there is tecnically no way to ignore that minimum font size,
this commit does it by multiplying all the `font-size`s in the text
layer by minFontSize, and then scaling all the `<span>`s down by
1/minFontSize.
This code contains the same bug that the previous commit fixed in
`waitForEvent`, namely that we don't clear the timeout if the event
is triggered. By using the now fixed `waitForEvent` function we not
only deduplicate this code but we also fix this issue so that no
incorrect timeout logs show up anymore.
Debugging #17931, by printing all parts of the event lifecycle including
timestamps, uncovered that some events for which a timeout was logged
actually did get triggered correctly in the browser. Going over the code
and discovering https://stackoverflow.com/questions/47107465/puppeteer-how-to-listen-to-object-events#comment117661238_65534026
showed what went wrong: if the event we wait for is triggered then
`Promise.race` resolves, but that doesn't automatically cancel the
timeout. The tests didn't fail on this because `Promise.race` resolved
correctly, but slightly later once the timeout was reached we would see
spurious log lines about timeouts for the already-triggered events.
This commit fixes the issue by canceling the timeout if the event we're
waiting for has triggered.
Currently errors in `afterAll` are logged, but don't fail the tests.
This could cause new errors during test teardown to go by unnoticed.
Moreover, the integration test use a different reporting mechanism which
also handled errors differently (this is extra reason to do #12730).
This patch fixes the issues by consistently handling errors in
`suiteStarted` and `suiteDone` in both reporting mechanisms.
Fixes#18319.
This integration test contains three issues:
- The `page.bringToFront()` call is not awaited, even though it returns
a promise (see https://pptr.dev/api/puppeteer.page.bringtofront). Note
that in other tests we do this correctly already.
- The `page.waitForSelector()` call at the end is unnecessary because
that exact condition is already checked at the end of the
`waitForImage` function we call just before this line; see
https://github.com/mozilla/pdf.js/blob/master/test/integration/stamp_editor_spec.mjs#L74.
- The pages should be closed in reversed order; please refer to the
description in #18318 for more details.
Fixes#18318.
This integration test is currently the only one that spawns a separate
browser instance. However, while it closes the browser once it's done,
it doesn't close the page (and therefore doesn't call the `testingClose`
method) like the other integration tests do.
This commit fixes this difference by closing the page before closing the
browser, thereby ensuring all regular cleanup logic gets called and we
avoid (intermittent) shutdown tracebacks in the logs. This allows
upcoming integration tests that spawn a separate browser instance to
reuse this pattern to cleanly end the test.
Given that we integrate the `closeSinglePage` code from #17962 for this
patch, @calixteman is credited as the co-author.
Co-authored-by: Calixte Denizet <calixte.denizet@gmail.com>
This commit fixes the following log line that currently shows up for
every type of tests that involve a browser:
`System addon update list error SyntaxError: XMLHttpRequest.open:
'http://%(server)s/dummy-system-addons.xml' is not a valid URL.`
If Firefox is in testing mode, the system addons update URL is
configured to a dummy URL so that it can't actually update (see for
example the same value in Marionette at
https://searchfox.org/mozilla-central/source/testing/marionette/client/marionette_driver/geckoinstance.py#109),
but this doesn't stop Firefox from trying to update, and when it does
it logs this line because the URL is obviously invalid.
Hence this patch which disables system addon updates altogether so
Firefox doesn't attempt to use the dummy URL anymore. The browser
updates are all managed by Puppeteer, and regular updates have already
been disabled too (see
6937a76f0a/packages/browsers/src/browser-data/firefox.ts (L302-L303)).
Allow apps with supportsIntegratedFind to better monitor the find state.
A recognized use case is the Firefox findbar, its "not found" sound must
consider `entireWord` and only make noise when it is off.
See related implementation in
https://hg.mozilla.org/mozilla-central/rev/16b902cbcf26
This change can help if we have to move the implementation from cpp to jsm.
Ensure that we never round the canvas dimensions above `maxCanvasPixels`
by rounding them to the preceeding multiple of the display ratio rather
than the succeeding one.
This commit introduces a test to ensure that:
- When zooming below the maxCanvasPixels limit, the canvas is rendered
with the correct size (the two sides are multiplied by the zoom factor).
- When zooming above the maxCanvasPixels limit, the canvas size is capped.
This has two advantages, as far as I'm concerned:
- The tests don't need to manually invoke multiple functions to properly clean-up, which reduces the risk of missing something.
- By collecting all the relevant clean-up in one method, rather than spreading it out, we get a much better overview of exactly what is being reset.
The release builds are currently not reproducible because ZIP files
record the modification date of files generated during the build
process, meaning that two builds from identical source code, made
at different times, result in different output.
This is undesirable because it makes detecting differences in the output
harder, for instance recently during the Gulp 5 efforts, because the
modification date differences are irrelevant and could obscure actually
important differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring the ZIP file creation to use
the (fixed) date of the last Git commit for which the release is being
made. With this the build is fully reproducible so that identical source
code builds result in bit-by-bit identical output artifacts.
To improve readability we convert the compression method to take a
parameter object and use template strings where useful.
The JSDoc builds are currently not reproducible because a timestamp is
included in the output, meaning that two builds from identical source
code, made at different times, result in different output.
This is undesirable because it makes diffing the output difficult, for
instance recently during the Gulp 5 efforts, because the timestamp
differences are irrelevant and could obscure actually important
differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring JSDoc to not include the
timestamps in the output. It's not relevant for end users and without it
the build is fully reproducible so that identical source code builds
result in bit-by-bit identical output artifacts.
Note that this option sadly can only be set via a configuration file,
and not via the command line parameters like we used to have, so for
consistency we also move the other options into the configuration file
so they are all in one place and the Gulpfile becomes a bit simpler.
After the previous commit this method has only a single call-site, hence we can inline the needed part of that check directly in `PDFViewerApplication.download` instead.
Currently saving a modified PDF document may fail *intermittently*, if it's triggered before the entire document has been downloaded.
When saving was originally added we only supported forms, and such PDF documents are usually small/simple enough for this issue to be difficult to trigger. However, with editing-support now available as well it's possible to modify much larger documents and this issue thus becomes easier to trigger.
One way to reproduce this issue *consistently* is to:
- Open http://localhost:8888/web/viewer.html?file=/test/pdfs/pdf.pdf#disableHistory=true&disableStream=true&disableAutoFetch=true
- Add an annotation on the first page, it doesn't matter what kind.
- Save the document.
- Open the resulting document, and notice that with the `master` branch the annotation is missing.
This should make the API documentation slightly quicker to access for
users by removing an extra click. Moreover, it makes the API
documentation blend in with the rest of the website/theme (one of the
points in #6526).
Fixes#18249.
This avoids having to repeat the same code multiple times, since besides resolving the promise we also need to send the "configure" message to the worker-thread.
The feature-testing on the worker-thread has been simplified in previous pull requests, which means that we can simplify this main-thread handler as well.
This helps reduce overall indentation in the method, thus leading to slightly less code.
Also, remove an old comment referring to Chrome 15 since that's no longer relevant now.
Wintersmith is no longer maintained given that the most recent version
is from six years ago, and all vulnerabilities that NPM reports
originate from Wintersmith's dependencies. Metalsmith, and its plugins,
on the other hand have recently had releases and don't have known
vulnerabilities. In fact, the number of reported vulnerabilities by NPM
even goes down to zero with this patch applied.
This commit therefore replaces Wintersmith with Metalsmith by providing
a transparent drop-in replacement, in a way that requires the least
amount of changes to the code and the generated output.
Note that this patch does update our versions of jQuery, Bootstrap and
the Highlight.js theme because the previous versions were very outdated
and didn't work correctly with Metalsmith. Moreover, those old versions
contained vulnerabilities that are hereby fixed.
Fixes#18198.
It's recommended to always install dependencies locally in the project
folder because global dependencies can easily conflict with other
projects and, because they are not managed by the project, diverge from
versions defined in e.g. `package.json`. Previously we installed
`gulp-cli` globally because at the time we lacked a convenient mechanism
to use Gulp otherwise, but nowadays NPM provides the `npx` command for
that purpose and recommends using it over global installations (see
https://docs.npmjs.com/downloading-and-installing-packages-globally
and PR #17489 that provided the ground work for using it).
This commit therefore updates our README and website to no longer recommend
installing `gulp-cli` globally but instead installing it locally from the
already existing entries in `package.json` like all other dependencies
we use. Not only does this remove the special-casing for `gulp-cli`
which simplifies the installation procedure, it also ensures that the
version ranges provided in `package.json` are respected.
This change is similar to the change in commit 92de2b7.
Fixes#18232.
Fixes 98ef8a1.
If for example dd:mm is failing we just try with d:m which is equivalent
to the regex /d{1,2}:m{1,2}/. This way it allows the user to forget the
0 for the first days/months.
- Use a CSS rule to display the wait-cursor during copying. Since copying may take a little while in long documents, there's a theoretical risk that something else could change the cursor in the meantime and just resetting to the saved-cursor could thus be incorrect.
- Remove the `interruptCopyCondition` listener with an AbortController, since that's slightly shorter code.
This method has only a single call-site in the viewer, since it's used as a fallback, and the functionality can be moved into the `DownloadManager.download` method instead.
There's no specification for that (even if it's possible to have an idea from
the xfa specs) so we just want to hide them in order to avoid to display something
wrong.
This helper method is simpler/shorter than it originally was[1] and with recent refactoring so is the `render`-method, hence we can just inline this code now.
---
[1] It used to e.g. dispatch the "textlayerrendered" event.
Part of this code is really old and pre-dates general availability of things such as `Blob` and `URL.createObjectURL`. To avoid having to duplicate the Blob-creation in the viewer, we can move this into `DownloadManager.download` instead.
Also, remove a couple of unnecessary `await` statements since the methods in question are synchronous.
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
This is a major version bump, and the changelog at
https://github.com/gulpjs/gulp/releases/tag/v5.0.0 indicates one
breaking change that impacts us, namely that streams are now by default
interpreted/transformed to UTF-8 encoding. This breaks `gulp.src` calls
that work on binary files such as images or CMaps, but is fortunately
easy to fix for us by disabling re-encoding for all `gulp.src` calls
(see https://github.com/gulpjs/gulp/issues/2764#issuecomment-2063415792
for more information). This restores the previous behavior of copying
the files as-is without Gulp performing any transformations to it, which
is what we want because Gulp is only used for bundling and we make sure
that the source files have the right encoding.
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
This parameter allows defining which point should remain
fixed while scaling the document. It can be used, for example,
to implement "zoom around the cursor" or "zoom around
pinch center".
The logic was previously implemented in `web/app.js`, but
moving it to the viewer scaling utilities themselves makes it
easier to implement similar zooming functionalities in
other embedders.
`updateScale` receives a `drawingDelay`, a `scaleFactor` and/or a number of `steps`.
If `scaleFactor` is a positive number different from `1` the current scale is multiplied by
that number. Otherwise, if `steps` if a positive integer the current scale is multiplied by
`DEFAULT_SCALE_DELTA` `steps` times. Finally, if `steps` is a negative integer, the
current scale is divided by `DEFAULT_SCALE_DELTA` `abs(steps)` times.
In order to share common parts between different dialogs, this patch
aims to slightly refactor the css in making it more generic.
This way it'll simplify adding a new dialog (we want to add a new one when
leaving an unsaved document).
After the re-factoring in PR 18104 there's now a *theoretical* risk that a pending `TextLayer` is never removed, which we can avoid by not registering it until `render` is invoked.
Note that this doesn't affect the viewer or tests, but if a third-party user calls `new TextLayer(...)` without a following call of either the `render`- or `cancel`-method we'd block global clean-up without this patch.
The `setTextContentSource` functionality is very old code, and was introduced years ago together with streaming of textContent.
By moving the `streamTextContent`-call into the `TextLayerBuilder` class we collect more functionality in one place and slightly reduce the amount of code needed.
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
This should avoid the latest JS features appearing in the TypeScript definitions, and given the currently supported browsers/environments (in PDF.js) we shouldn't need to target an even older ES-version.
Apparently we forgot to update this in the version 4 release, so better late than never I suppose.
- Use `.mjs` extensions where appropriate.
- Remove mention of the `debugger`-functionality, since that's not really relevant to users.
- Unify the whitespace handling to use spaces consistently.
- Move the definition of the `loadingParams` Object, to simplify the code.
- Add a unit-test, since none existed and the viewer depends on this functionality.
Over time the number of integration tests that get the rectangle for a
given selector has increased quite a bit, and the code to do so has
consequently become duplicated.
This commit refactors the integration tests to move the rectangle
fetching code to a single place, which reduces the code by over 400
lines and makes the individual tests simpler.
- Use `this` rather than explicitly spelling out the class-name in the static `#enableGlobalSelectionListener` method, since that leads to shorter code. Given that all the relevant static fields are *private* ESLint will catch any scope errors in the code.
- Reduce a little bit of duplication when using the `#selectionChangeAbortController` signal.
- Utilize early returns in the "selectionchange" event handler, since that reduces overall indentation which helps readability a tiny bit.
The `merge-stream` dependency is no longer maintained and doesn't work
in combination with Gulp 5 anymore (for more information refer to
https://github.com/gulpjs/gulp/issues/2802#issuecomment-2094130656).
Fortunately the Gulp team maintains a drop-in replacement dependency
called `ordered-read-streams` with the same API as `merge-stream`.
Indeed, running all affected Gulp targets and comparing build artifacts
with `diff -r <old> <new>` confirms that no unexpected changes are made.
Fixes a part of #17922.
This commit replaces most `waitForTimeout` occurrences with calls to
`waitForFunction` or `waitForSandboxTrip`. Note that the occurrences in
the "must check that focus/blur callbacks aren't called" test remain
until we find a good way to ensure that nothing happened after the tab
switches (because currently we can't be sure that nothing happens since
there is nothing to await).
This commit adds a test for 0603d1ac1843bc4098d74382beda6cc511350ccd.
Before the fix the `pagerendered` events would be fired just 2-3
milliseconds after the call to `increaseScale`/`decreaseScale`.
Fixes issue #16843.
In certain cases, the text layer was misaligned
due to a difference between the `lang` attribute
of the viewer and the canvas. This commit addresses
the problem by adding the `lang` attribute to the canvas.
The issue was caused because PDF.js uses serif/sans-serif
fonts to generate the text layer and relies on system fonts.
The difference in the `lang` attribute led to different fonts
being picked, causing the misalignment.
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
The `through2` dependency got introduced over four years ago in #11325 to
replace the unmaintained `gulp-transform` dependency. However, sadly the
same holds for `through2` since the last release was also four years ago.
Fortunately the `through2` dependency can trivially be replaced with the
built-in Node.js `stream.Transform` API nowadays. In fact, the `through2`
dependency mentions themselves in their README already that they are "a
tiny wrapper around Node.js streams.Transform". The `stream.Transform`
API is available in all Node.js versions we support, and in Node.js 6
already the simplified constructor approach for `stream.Transform` got
introduced to simplify creating custom stream transformers; see
https://nodejs.org/docs/latest-v6.x/api/stream.html#stream_new_stream_transform_options.
This commit therefore replaces `through2` by switching to the
`stream.Transform` API directly so we don't need any wrappers anymore.
Note that for our case the only change we have to make is to enable
object mode, see https://nodejs.org/api/stream.html#object-mode, because
we pass in `VinylFile` objects instead of e.g. regular `Buffer` objects.
I have confirmed in two ways that this is indeed a drop-in replacement:
- Running the Gulp targets that call the `transform` function and
diffing the resulting `build` folder before/after this patch, with
`diff -r build-old/ build-new/`, to ensure that there are no
unexpected changes in the output.
- Changing the Gulpfile to, instead of UTF-8, transform the files to
ASCII, and diffing the resulting `build` folder to confirm that the
transformation logic works and produces different results, such as:
```
diff build/lib/core/standard_fonts.js build-ascii/lib/core/standard_fonts.js
284c284
< t["Trinité"] = true;
---
> t["Trinit�"] = true;
```
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.
This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.
Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
*Please note:* This doesn't really affect the viewer, but may affect the library API if multiple PDF documents are opened in parallel.
Since we clean-up "global" textLayer-data when destroying a PDF document, this means that other active PDFs could potentially break by invoking `cleanupTextLayer` unconditionally. Note that textLayer rendering is an asynchronous task, and we thus need to ensure those are all finished before running clean-up.
The `needle` dependency originally got introduced in #12024, almost four
years ago, to be able to use pre-built binaries for the `canvas`
dependency on macOS. However, nowadays the `needle` dependency isn't
used by `canvas` anymore, or any other package we use for that matter,
as shown by the empty NPM dependency tree:
```
$ npm ls needle
pdf.js
└── needle@3.3.1
```
Investigation showed that the `canvas` package depends on the
`node-pre-gyp` package which in turn depended on `needle` (see
https://github.com/Automattic/node-canvas/issues/1110#issuecomment-411232630),
but in version 1.0.0 of `node-pre-gyp` from three years ago the `needle`
dependency got dropped in favor of `node-fetch` (see
a74f5e367c/CHANGELOG.md (L52)).
This explains why the NPM dependency tree is empty now and proves that
we can safely get rid of this dependency now.
In Node.js 14.14.0 the `fs.rmSync` function was added that removes files
and directories. The `recursive` option is used to remove directories
and their contents, making it a drop-in replacement for the `rimraf`
dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `rimraf` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
I broke this accidentally in PR 18089, sorry about that!
Note that since `#processItems` is private we can no longer just "replace" the method as was done in PR 18052.
The issue from #18003 hasn't been shown to be caused by PDF.js, but it
did surface that we don't have (direct) unit test coverage for the
`BaseException` class. This made it more difficult to prove that the
`stack` property was already available on exception instances, but more
importantly it caused the CI to be green even though the suggested
change would have caused the `stack` property to disappear.
To avoid future regressions, for e.g. similar changes or a rewrite from
a closure to a proper class, this commit introduces a dedicated unit
test for `BaseException` that asserts that our exception instances
indeed expose all expected properties.
The Puppeteer update should in particular be helpful for us because it
contains improved WebDriver BiDi compatibility, a newer Chrome version
(both might help for #17962) and an official deprecation of CDP for
Firefox. Note that the latter doesn't require changes on our end because
we already use WebDriver BiDi unconditionally for Firefox since commit
4db0174. The full release notes can be found at
https://github.com/puppeteer/puppeteer/releases/tag/puppeteer-core-v22.8.0.
When seleciting on a touch screen device, whenever the finger moves to a
blank area (so over `div.textLayer` directly rather than on a `<span>`),
the selection jumps to include all the text between the beginning of the
.textLayer and the selection side that is not being moved.
The existing selection flickering fix when using the mouse cannot be
trivially re-used on mobile, because when modifying a selection on
a touchscreen device Firefox will not emit any pointer event (and
Chrome will emit them inconsistently). Instead, we have to listen to the
'selectionchange' event.
The fix is different in Firefox and Chrome:
- on Firefox, we have to make sure that, when modifying the selection,
hovering on blank areas will hover on the .endOfContent element
rather than on the .textLayer element. This is done by adjusting the
z-indexes so that .endOfContent is above .textLayer.
- on Chrome, hovering on blank areas needs to trigger hovering on an
element that is either immediately after (or immediately before,
depending on which side of the selection the user is moving) the
currently selected text. This is done by moving the .endOfContent
element around between the correct `<span>`s in the text layer.
The new anti-flickering code is also used when selecting using a mouse:
the improvement in Firefox is only observable on multi-page selection,
while in Chrome it also affects selection within a single page.
After this commit, the `z-index`es inside .textLayer are as follows:
- .endOfContent has `z-index: 0`
- everything else has `z-index: 1`
- except for .markedContent, which have `z-index: 0`
and their contents have `z-index: 1`.
`.textLayer` has an explicit `z-index: 0` to introduce a new stacking context,
so that its contents are not drawn on top of `.annotationLayer`.
- Change all possible semi-private methods into properly private ones. Note that this code is old enough to predate standard classes.
- Move the `appendText` helper function into `TextLayerRenderTask`, as a private method, to avoid having to manually pass in the scope.
- Simplify `#layoutText` by directly passing in all necessary data. This is possible after the changes PR 18052.
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
- These changes will allow a simpler way of implementing PR 17770.
- The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).
- This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
- We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).
- Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
*Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
If a user manually calls `PDFPageView.prototype.update()` with a `drawingDelay`-option then it'll always be necessary to re-call the method *without* a delay afterwards, regardless of the `maxCanvasPixels`-value (e.g. even when CSS-only zooming is used).
This only happens when it's safe to do so. The exceptions are:
- when the class extends another subclass: removing the constructor would remove the error about the missing super() call
- when there are default parameters, that could have side effects
- when there are destructured prameters, that could have side effects
- The `stopAtErrors` API option, which is the inverse of the "internal" `ignoreErrors` option, is explicitly documented as applying to *parsing* (i.e. the worker-thread) while the `FontFaceObject` class is used during rendering (i.e. the main-thread); see b6765403a1/src/display/api.js (L164-L167)
- A glyph that fails in the `FontRendererFactory`, on the worker-thread, will already cause (overall) parsing to stop when `ignoreErrors === false` hence checking the option on the main-thread as well seems redundant; see b6765403a1/src/core/evaluator.js (L4527-L4533)
- Removing this option simplifies the code, and slightly reduces the number of options that we need to handle in the main-thread code.
This avoids having to add a couple of event listeners in the viewer, when debugging is enabled, and is consistent with the existing handling of `FontInspector` and `StepperManager` in the API.
This global was only introduced to work-around problems caused by the GENERIC PDF.js build using top level await. Since that was removed in the previous commit, this global is now dead code.
This limit is currently completely non-functional, since the check happens *after* the entire textLayer has been parsed and appended to the DOM. It seems that this has been *accidentally* broken ever since the introduction of `ReadableStream` support.
The reason that this hasn't caused noticeable textLayer-related performance issues in practice is probably because we nowadays manage to coalesce the textLayer into fewer overall DOM elements, whereas years ago many PDF documents ended up with one DOM element *per* glyph.
By moving this check, and thus restoring the functionality, we're also able to remove the `render` helper function and simplify the code.
The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.
Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.
Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".
Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
This commit replaces `waitForFunction` calls that use
`document.activeElement` to wait for an element to get focus by simpler
`waitForSelector` expressions that use the `:focus` selector. Note that
we already use this in other tests, so this improves consistency too.
This commit replaces a `waitForTimeout` occurrence with an equivalent
`waitForSelector` expression, and removes two other `waitForTimeout`
occurrences that are obsolete because we already wait for an observable
event to trigger or class change to happen.
Note that the other `waitForTimeout` occurrences in this file are either
part of #17931 or remain until we find a good way to ensure that nothing
happened (because currently there is nothing we can await there).
- Check that the `filename` is actually a string, before parsing it further.
- Use proper "shadowing" in the `filename` getter.
- Add a bit more validation of the data in `pickPlatformItem`.
- Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
It fixes issues #14982 and #14724.
The main problem of upscaling a canvas is that it can induces some pixelation
(see issue #14724). So this patch is just removing the limit and as a side
effect it fixes issue #14982.
As far as I can tell, in looking different profiles (especially some memory profile)
in using the Firefox profiler, I don't see any noticeable difference in term of
memory use.
and implement then in using some SVG filters and composition.
Composing in using destination-in in order to multiply RGB components by
the alpha from the mask isn't perfect: it'd be a way better to natively have
alpha masks support, it induces some small rounding errors and consequently
computed RGB are approximatively correct.
In term of performance, it's a real improvement, for example, the pdf in
issue #17779 is now rendered in few seconds.
There are still some room for improvement, but overall it should be a way
better.
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
We no longer need the helper method to *potentially* call itself once data is available, and can instead take full advantage of async/await by inlining the code.
Rather than having to maintain a dummy `IPDFLinkService` implementation, we can simply let it extend the primary `PDFLinkService` class instead.
Note how *some* methods already checked for an active PDF document, and those checks only needed to be used consistently throughout `PDFLinkService` in order for this to work.
Node.js 22 was just released, and it seems like it's not compatible
with the `canvas` package. This commit pins the version on GitHub
actions to Node.js 21 as a temporary workaround.
This commit should be reverted once
https://github.com/Automattic/node-canvas/issues/2377
is fixed.
This commit replaces all `waitForTimeout` occurrences with the
appropriate `waitForFunction` calls. Note that in some places they were
already present, so in those cases we could simply remove the
`waitForTimeout` call altogether.
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
Given that the `decode` method only returns the actual image-data, a user would now need to invoke `parseImageProperties` to obtain e.g. the width and height.
This method only accepts `BaseStream`-instances, which are (obviously) not exposed, hence we extend it in IMAGE_DECODERS builds to wrap TypedArray data into the expected format.
By using the `signal` option when invoking `addEventListener`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#signal), we're able to remove an arbitrary number of event listeners with (effectively) a single line of code.
Besides getting rid of a bunch of `removeEventListener`-calls, which means shorter code, we no longer need to manually keep track of event-handling functions.
By using the `signal` option when invoking `addEventListener`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#signal), we're able to remove an arbitrary number of event listeners with (effectively) a single line of code.
Besides getting rid of a bunch of `removeEventListener`-calls, which means shorter code, we no longer need to manually keep track of event-handling functions.
The timeout was introduced in commit 402e3fe with equal timeouts around
the helper function call. In commit 55e5af2 the timeouts around the
helper function call have been removed, and it looks like the helper
function itself was not updated purely due to an oversight.
The operations here should not require any timeouts because the promises
only resolve once the action is completed, which also explains why
removing the timeouts surrounding the helper function calls went without
any problems. It should therefore be safe to remove this timeout too.
This commit changes the `JpxImage.decode` method signature to define the
`ignoreColorSpace` argument as optional with a default value. Note that
we already set this default value in the `getBytes` method of the
`src/core/decode_stream.js` file since this option only seems useful for
certain special cases and therefore shouldn't be mandatory to provide.
Moreover, the JPX fuzzer is changed to use the new `JpxImage` API.
We should not wait for an arbitrary amount of time, which can easily
cause intermittent failures, but wait for a value change instead. Note
that this patch mirrors the approach we already use in other scripting
integration tests that also check for a value change; see e.g. the
"must check that a field has the correct formatted value" test.
This patch updates the minimum supported browsers as follows:
- Safari 16.4, which was released on 2023-03-27; see https://developer.apple.com/documentation/safari-release-notes/safari-16_4-release-notes
Nowadays we usually we try, where feasible and possible, to support browsers/environments that are about two years old. The reasons for limiting support to a slightly more recent Safari version include:
- Safari has always been slower, compared to other browsers, at implementing e.g. new JavaScript features.
- Trying to provide support for Safari is often difficult, and over the years we have seen *a lot* of bugs that are specific to Safari.
- Safari is, and has been for many years, only listed as "mostly" supported in the FAQ.
- This allows us to remove feature-testing, only relevant to Safari, from the main code-base.
By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the built-in Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js core contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
We should not wait for an arbitrary amount of time, which can easily
cause intermittent failures, but wait for a property value change
instead. Note that this patch mirrors the approach we already use in
other scripting integration tests that also check for a visibility
change; see e.g. the "must show a text field and then make in invisible
when content is removed" test.
In Node.js 14.14.0 the `fs.rmSync` function was added that removes files
and directories. The `recursive` option is used to remove directories
and their contents, making it a drop-in replacement for the `rimraf`
dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `rimraf` and reduce the
number of project dependencies in the test folder.
This commit also gets rid of the indirection via the `removeDirSync`
test helper function by simply calling `fs.rmSync` directly.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
In Node.js 10.12.0 the `recursive` option was added to `fs.mkdirSync`.
This option allows us to create a directory and all its parent
directories if they do not exist, making it a drop-in replacement for
the `mkdirp` dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `mkdirp` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
This allows us to get rid of the `@private` JSDoc comments, which were
used to convey intent back when proper private methods could not be used
yet in JavaScript. This improves code readability/maintenance and enables
better usage validation by tooling such as ESlint.
This commit implements the following improvements for the test:
- Replace the hardcoded highlight padding with a dynamic calculation. It
turns out that the amount of padding can differ per system, possibly
due to e.g. local (font) settings, which could cause the test to fail.
The test now no longer implicitly depends on the environment.
- Remove the page offset subtraction. It's only needed for one assertion,
and we can simply add the page offset there once.
- Improve the "magic" numbers in the test. The number 5 is removed now
that the padding calculation made it obsolete, the number 28 is changed
to 30 because that's the actual value in the PDF data and the number 4
is explained a bit more to state that the space also counts as a glyph.
- Annotate the steps and checks to improve readability of the test and
to explain why certain calculations have to be performed.
These CSS rules were imported straight from mozilla-central, and contains a property that causes the RTL-rule to be skipped everywhere else (even when using the development viewer in Firefox).
The `waitForTimeout` function should not be used anymore and only exists
for old usages that have to be rewritten, but there was nothing in place
to signal this. This commit therefore implements a linting rule, specific
to the integration tests, to make it clear that this function should no
longer be used. We exclude the old usages from it because we are already
tracking those in #17656 (so this patch is mostly to not make the scope
of that issue bigger).
It's recommended to always install dependencies locally in the project
folder because global dependencies can easily conflict with other
projects and, because they are not managed by the project, diverge from
versions defined in e.g. `package.json`. Previously we installed
`gulp-cli` globally because at the time we lacked a convenient mechanism
to use Gulp otherwise, but nowadays NPM provides the `npx` command for
that purpose and recommends using it over global installations (see
https://docs.npmjs.com/downloading-and-installing-packages-globally
and PR #17489 that provided the ground work for using it).
This commit therefore updates our GitHub Actions workflows to no longer
install `gulp-cli` globally but instead install it locally from the
already existing entries in `package.json` like all other dependencies
we use. Not only does this remove the special-casing for `gulp-cli`
which simplifies the workflow definitions, it also ensures that the
version ranges provided in `package.json` are respected. This makes the
local and workflow setups more similar, but is also relevant for the
upcoming upgrade to Gulp 5 which from a quick try is a bit involved and
having `package.json` be the single source of truth for the dependency
versions we use is therefore important.
The code was added in order to guess if an editor has been moved but
it's simpler to just set a variable when it's dragged or moved with
the keyboard. This way we remove a bit of asynchronicity.
The PDF specification states that empty dash arrays, i.e. arrays with
zero elements, are in fact valid. In that case the dash array simply
corresponds to a solid, unbroken line. However, this case was erroneously
being flagged as invalid and therefore the annotation was not drawn
because its width was set to zero. This commit fixes the issue by
allowing dash arrays to have a length of zero.
This allows us to get rid of the `@private` JSDoc comments, which were
used to convey intent back when proper private methods could not be used
yet in JavaScript. This improves code readability/maintenance and enables
better usage validation by tooling such as ESlint.
This commit removes the final callbacks in this code by switching to a
promises-based interface, overall simplifying the code. Moreover, we
document why we write to files on disk and modernize the code using e.g.
template strings.
The original `test.py` code, see
c2376e5cea/test/test.py,
did not have any timeout logic for TTX, but it got introduced when
`test.py` was ported from Python to JavaScript as `test.js` in
c2376e5cea (diff-a561630bb56b82342bc66697aee2ad96efddcbc9d150665abd6fb7ecb7c0ab2f).
However, I don't think we've ever actually seen TTX timing out.
Moreover, back then we used a very old version of TTX and ran the font
tests on the bots (where a hanging process would block other jobs and
would require a manual action to fix), so this code was most likely
only included defensively.
Fortunately, nowadays it should not be necessary anymore because we use
the most recent version of TTX (which either returns the result or
errors out, but isn't known to hang on inputs) and we run the font tests
on GitHub Actions which doesn't block other jobs anymore and also
automatically times the job out for us in the unlikely event that a hang
would ever occur.
In short, we can safely remove this logic to simplify the code and to get
rid of a callback.
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers
The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
- In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
- In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
- In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
- In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
- For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
- For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
- In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.
---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
With recent improvements to the `AppOptions`, e.g. with better validation and testing, one remaining "annoyance" is the `compatibilityParams` handling. Especially since there's only *a single* parameter left, limited to GENERIC builds.
To further reduce the amount of unnecessary code in e.g. the Firefox PDF Viewer, we can move the `compatibilityParams` handling into the user-options instead since that keeps the previous precedence order between default/user-options.
This patch updates the minimum supported browsers as follows:
- Google Chrome 98, which was released on 2022-02-01; see https://chromereleases.googleblog.com/2022/02/stable-channel-update-for-desktop.html
The primary reason for this version bump is `structuredClone` support, please see https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility
At this point in time we're using `structuredClone` in different parts of the code-base, and it's unfortunately functionality that's difficult to polyfill *completely* (affects the `transfer` option specifically).
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
- Use slightly shorter variable names when initializing the preferences.
- Correctly copy the "old" preference-values before writing to storage, since Objects are passed by reference in JavaScript. (Only applies to the GENERIC viewer.)
- Use `await` fully when writing new preference-values to storage. (Only applies to the GENERIC viewer.)
- Stub the `get`-method in the Firefox PDF Viewer, since it's unused there nowadays.
The original bug was because the parent was null when trying to show
an highlight annotation which led to an exception.
That led me to think about having some null/non-null parent when removing
an editor: it's a mess especially if a destroyed parent is still attached
to an editor. Consequently, this patch always sets the parent to null when
deleting the editor.
This old method is essentially just adding, a small amount of, unnecessary indirection and we can easily invoke `PDFViewerApplication.open` directly from the extension-specific code instead.
The following are some highlights of this patch:
- In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.
- In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.
- Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]
---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
Given that the Fetch API is supported since Node.js 18 we should be able to use it when downloading l10n files, which allows us to simplify the code and to make it fully `async`.
The fact that the highlight-thickness can only be changed in "free" mode isn't really obvious visually in the toolbar, so attempt to provide at least some indication of the `disabled`-state by "dimming" the slider.
In implementing caret browsing mode in pdf.js, I didn't notice that selectstart isn't always triggered.
So this patch removes the use of selectstart and rely only on selectionchange.
In order to simplify the selection management, the selection code is moved in the AnnotationUIManager:
- it simplifies the code;
- it allows to have only one listener for selectionchange instead of having one by visible page
for selectstart.
I had to add a delay in the integration tests for highlighting (there's a comment with an explanation),
it isn't really nice, but it's the only way I found and in real life there always is a delay between
press and release.
The function caretPositionFromPoint return the position within the last visible element
and sometimes there are some elements on top of the ones in the text layer.
So the idea is to hide the visible elements which aren't in the text layer in order
to get the right caret position.
*Please note:* This is a micro optimization, hence I fully understand if the patch is rejected.
Currently we create two temporary Arrays and have to iterate twice in total when building the final `hexNumbers` Array.
With this patch there's only one temporary Array and a single iteration required to build the final `hexNumbers` Array.
When highlighting, the annotation editor layer is disabled to get pointer events
from the text layer, but the annotation layer must be then disabled either in
order to avoid bad interactions.
Previously we'd simply export this directly from `web/app_options.js`, which meant that it'd be technically possible to *accidentally* modify the `compatibilityParams` Object when accessing it.
To avoid this we instead introduce a new `AppOptions`-method that is used to lookup data in `compatibilityParams`, which means that we no longer need to export this Object.
Based on these changes, it's now possible to simplify some existing code in `AppOptions` by taking full advantage of the nullish coalescing (`??`) operator.
Given that the "PREFERENCE" kind is used e.g. to generate the preference-list for the Firefox PDF Viewer, those options need to be carefully validated.
With this patch we'll now check this unconditionally in development mode, during testing, and when creating the preferences in the gulpfile.
Over time, as we've started relying more and more on import maps, the number of aliases have increased a lot. This is now affecting the size and readability of `createWebpackConfig`, which was already fairly large and complex, hence moving the aliases to their own function should help improve things a little bit.
As part of the changes in PR 17686 we "accidentally" enabled source-maps for the *minified* builds, which seems unnecessary since those have never been included in the `pdfjs-dist` output.
Locally this patch reduces the run-time of `gulp minified` by ~15 percent.
Rather than first building the library and then use Terser "manually" to minify the files, we can utilize a Webpack plugin to combine these steps which helps to simplify the gulpfile.
The `handler` method contained this code in two inline functions,
triggered via callbacks, which made the `handler` method big and harder
to read. Moreover, this code relied on variables from the outer scope,
which made it harder to reason about because the inputs and outputs
weren't easily visible.
This commit fixes the problems by extracting the request checking code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings. The logic is now
self-contained in a single method that can be read from top to bottom
without callbacks and with comments annotating each check/section.
- Run the minification in "parallel" since that should be a *tiny* bit more efficient.
- Don't rename the minified files since that seems unnecessary, especially considering that they are only used in the `dist-pre` target where we currently change the name back manually.
After the changes in PR 17637 there's no longer any reason to invoke `tweakWebpackOutput` without an argument, since the `__non_webpack_import__` re-writing was moved into the Babel plugin.
This way we can avoid a (little) bit of unnecessary parsing during building.
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
Note how we're using custom `__non_webpack_import__`-calls in the code-base, that we replace during the post-processing stage of the build, to be able to write `import`-calls that Webpack will leave alone during parsing.
This work-around is necessary since we let Babel discards all comments, given that we generally don't need/want them in the builds, hence why we cannot utilize `/* webpackIgnore: true */`-comments in the source-code.
After the changes in PR 17563 it thus seems to me that we should be able to just move this re-writing into the Babel plugin instead.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the directory listing code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the range file serving code
into a dedicated private method, and modernizing it to use e.g. `const`/
`let` instead of `var` and using template strings.
The `handler` method contained this code in an inline function, which
made the `handler` method big and harder to read. Moreover, this code
relied on variables from the outer scope, which made it harder to reason
about because the inputs and outputs weren't easily visible.
This commit fixes the problems by extracting the file serving code into
a dedicated private method, and modernizing it to use e.g. `const`/`let`
instead of `var` and using template strings.
Currently the `web/app.js` file pulls in various build-specific dependencies, via the use of import maps, and those files in turn import from `web/app.js` thus creating undesirable import cycles.
To avoid this we instead pass in a `PDFViewerApplication`-reference, immediately after it's been created, to the relevant code.
Note that we use an ESLint plugin rule, see `import/no-cycle`, that is normally able to catch import cycles. However, in this case import maps are involved which is why this wasn't caught.
Looking at the *built* files you'll notice some lines containing nothing more than a semicolon. This is the result of (mostly top-level) `if`-statements, which include `PDFJSDev`-checks, that evalute to `false` during Babel parsing.
This has always annoyed me a bit, and looking at Babel plugin it seems that we can fix this simply by *removing* the relevant nodes.
This part of the (modern) preprocessor is now dead code, since we no longer use `require` statements anywhere in the main code-base.
Note that as part of the changes leading up to PDF.js version `4` we removed all[1] the remaining `require` statements, and we also have an ESLint rule to ensure that no new ones are accidentally added.
---
[1] With two small exceptions, in benchmarking-code and in the Webpack-example.
Given that we need to pass in a `PDFDataRangeTransport`-instance a number of the needed parameters can be obtained from it, rather than having to specify them manually.
This should be a *tiny* bit more efficient, since it avoids parsing substrings that we don't care about.
*Please note:* I cannot find an ESLint rule to enforce this automatically.
- Ensure that localization works in the GENERIC viewer, even if the necessary locale files cannot be loaded.
This was the behaviour prior to the introduction of Fluent, and it seems worthwhile to keep that (especially since we already bundle the en-US strings anyway).
- Let the `GenericL10n`-implementation use the *bundled* en-US strings directly when no language is provided.
- Remove the `NullL10n`-implementation, and simply fallback to `GenericL10n`, to reduce the maintenance burden of viewer-components localization.
- Indirectly, given the previous point, stop exporting `NullL10n` in the viewer-components since it's now removed.
Note that it was never really intended to be used directly and only existed as a fallback.
*Please note:* This doesn't affect the Firefox PDF Viewer, thanks to the use of import maps.
When an highlight is self-intersecting, the outline was drawn inside.
In order to remove it, we use a svg mask to exclude the shape inside
when drawing the outlines.
That leads to change the outline 1px,white-2px,blue-1px,white to a
2px,white-2px,blue: the part of the stroke which is inside the shape
is removed because of the mask.
All of our static evaluation & dead-code elimination transforms need to
happen in post-order, transforming inner nodes first. This is so that
in complex nested cases all transforms see the simplified version of
their inner nodes.
For example:
async getNimbusExperimentData() {
if (!PDFJSDev.test("GECKOVIEW")) { return null; }
// other code
}
-> [evaluation of PDFJSDev.*]
async getNimbusExperimentData() {
if (!false) { return null; }
// other code
}
-> [!false -> true]
async getNimbusExperimentData() {
if (true) { return null; }
// other code
}
-> [if (true) -> replace with the if branch]
async getNimbusExperimentData() {
return null;
// other code
}
-> [early return -> remove dead code]
async getNimbusExperimentData() {
return null;
// other code
}
This was done correctly in all cases except for our `UnaryExpression`
transform, which was happening in pre-order.
Having this parameter among a list of DOM-elements seems slightly strange now, however this is very old code hence the explanation for why this was done is for historical reasons (as is often the case).
Hence we can simply move this into `AppOptions` instead, which seems more appropriate overall.
Given that only the GENERIC viewer supports opening more than one PDF document, we can simplify things a tiny bit by instead generating the necessary DOM-element in JavaScript.
This unit-test is now failing in up to date versions of Node.js respectively Chromium-browsers, since `CompressionStream` no longer produces consistent data across all environments/browsers.
However logging the compressed TypedArray produced by `writeStream`, with Firefox respectively Chrome, and then feeding *both* of those TypedArray as input to `DecompressionStream` produced the same (correct) result in both browsers.
Hence the *exact* output of `CompressionStream` shouldn't matter, as long as we're able to successfully decompress it when the resulting PDF document is opened with the PDF.js library, and the unit-test is thus extended to check this.
Starting with Chrome 120.0.6099.109 (shipped with Puppeteer 21.8.0+) the
unit test fails in Chrome as well. The issue is tracked in #17399, but
for now we'll only run the unit test in Firefox so we can continue to
update Puppeteer while also still having a browser in which it runs,
until we figure out why the behavior of `CompressionStream` changed.
The `DefaultExternalServices` code, which is used to provide build-specific functionality, is very old. This results in a pattern where we first initialize `PDFViewerApplication.externalServices` and then *override* it for the different builds.
By converting `DefaultExternalServices` into a "regular" class, and leveraging import maps, we can directly initialize the correct instance depending on the build.
Given the simplicity of the `createPreferences` method, we can leverage import maps to directly initialize the correct `Preferences`-instance depending on the build.
Given the simplicity of the `createDownloadManager` method, we can leverage import maps to directly initialize the correct `DownloadManager`-instance depending on the build.
The latest mozilla-central update has test failures, because some CSS variables are not "properly" referenced; in particular:
- Give `--hcm-highlight-selected-filter` a default value, of `none`, similar to the previously existing HCM filter.
- Remove the `--mix-blend-mode` variable, since it's unused.
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
This commit converts the pdfjsdev-loader transform into a Babel plugin,
to skip a AST->string->AST round-trip.
Before this commit, the webpack build process was:
1. Babel parses the code
2. Babel transforms the AST
3. Babel generates the code
4. Acorn parses the code
5. pdfjsdev-loader transforms the AST
6. @javascript-obfuscator/escodegen generates the code
7. Webpack parses the file
8. Webpack concatenates the files
After this commit, it is reduced to:
1. Babel parses the code
2. Babel transforms the AST
3. babel-plugin-pdfjs-preprocessor transforms the AST
4. Babel generates the code
5. Webpack parses the file
6. Webpack concatenates the files
This change improves the build time by ~25% (tested on MacBook Air M2):
- `gulp lib`: 3.4s to 2.6s
- `gulp dist`: 36s to 29s
- `gulp generic`: 5.5s to 4.0s
- `gulp mozcentral`: 4.7s to 3.2s
The new Babel plugin doesn't support the `saveComments` option of
pdfjsdev-loader, and it just always discards comments. Even though
pdfjsdev-loader supported multiple values for that option, it was
effectively ignored due to `acorn` dropping comments by default.
This is in preparation for the next commit, which will convert
preprocessor2.mjs to a Babel plugin. The purpose of this commit
is to help git track the rename regardless of the large amount
of changes.
In the Gulpfile only the exit codes of `test.mjs` child processes
erroneously aren't checked. This causes failures in `test.mjs` to be
logged but not propagated to the master process, which in turn causes
test runners such as GitHub Actions to succeed because they only
monitor the master process. This is easy to reproduce by throwing an
error at the top of `test.mjs` and running `gulp makeref` or `gulp
unittest`: the error is logged, but the task that spawned the child
process succeeds and the master process exits with exit code 0. This is
problematic because it can easily cause errors to go by unnoticed.
This commit fixes the issue by making sure that the `test.mjs`
invocations are handled in the same way as the other child processes
in the file, i.e., if the child process exits with a non-zero exit code
then the master process also exits with a non-zero exit code. After this
patch the error is still logged, but the task now also fails and the
master process exits with exit code 1 to properly signal failure.
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.
Please see https://eslint.org/docs/latest/rules/arrow-body-style
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.
For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
The `if` statement is no longer necessary because the Node.js versions
that didn't provide `dns.setDefaultResultOrder` are no longer supported,
but looking into this a bit more it turns out that the entire workaround
is no longer necessary because the issue got fixed in Firefox 105 in bug
1769994. Indeed, Firefox now starts nicely with the workaround removed.
Reverts 60ed3cd297c4045b90f4114a74e5baa4ef1c5056.
In order to do that we must change the text layer opacity to 1 but
it has several implications:
- the selection color must have an alpha component,
- the background color of the span used for highlighted words
must have an alpha component either, but now the opacity is 1
we can use some backdrop-filters in HCM making the highlighted
words more visible.
- fix a regression caused by #17196: the css variable --hcm-highlight-filter
has to live under the #viewer element because in HCM it's overwritten
by js at this level, hence links annotations for example didn't
have the right colors when hovered.
The free highlighting is enabled when the mouse pointer isn't on some text.
Then we draw a shape with smoothed borders corresponding to the movement of
the mouse.
Printing/saving and changing the thickness will come later.
and try to load the font family (guessed from the font name) before trying
the local substitution.
The local(...) command expects to have a real font name and not a predefined
substitution it's why we try the font family.
By always removing the "visibilitychange" listener in the `PDFViewer.#onePageRenderedOrForceFetch`-method we can (ever so slightly) reduce duplication in the code.
Ensure that users cannot provide incorrect values when trying to set the global worker-options.
This patch was prompted by occasionally seeing users manually loading the `pdf.worker.mjs`-file and then assigning it to the `workerSrc`-option, something that obviously doesn't make sense and will cause fake-workers to be used (with poor performance as a result).
When the text of an annotation is extracted in using getTextContent, consecutive white spaces
are just replaced by one space and. So this patch add an option to make sure that white
spaces are preserved when appearance is parsed.
For the case where there's no appearance, we can have a fast path to get the correct string
from the Content entry.
When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones
to make to see all of them on screen.
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/secondary_toolbar.js` file.
With modern JavaScript class features we can move the relevant event handling into private methods, and thus invoke it directly when resetting the toolbar UI-state.
*Please note:* This patch slightly reduces the size of the `web/toolbar.js` file.
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.
Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.
For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
- With the `master`-branch it takes >600 ms to render.
- With this patch that goes down to ~50 ms, which is one order of magnitude faster.
(Note that all other pages are, as expected, completely unaffected by these changes.)
This new main-thread copying is limited to "large" global images, since:
- Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
- With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
- This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
This commit changes the code to use a template string and to use `const`
instead of `var`. Combined with the previous commits this allows for
enabling the ESLint `no-var` rule for this file now.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks make the code harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- uses `Array.includes` for shorter response code checking code.
This test intermittently fails, likely because the auto-print is triggered fast enough
that we don't manage to get it.
So this patch aims to try to set a listener very early in order to be sure that
we'll be aware that a print has been triggered.
It seems this unit-test now fails consistently in "all" up-to-date Node.js versions. We should probably try and understand why, but for now just disable it to get passing CI tests.
There's obviously a few things wrong with the Annotations in the referenced PDF document, however parsing of an Annotation shouldn't just break if the /BS-entry isn't a dictionary.
When opening a pdf from the secondary toolbar, a second color picker is added.
So in order to avoid that, we just stop listening for annotationeditoruimanager
in the toolbar.
The doorhanger for highlighting has a basic color picker composed of 5 predefined colors
to set the default color to use.
These colors can be changed thanks to a preference for now but it's something which could
be changed in the Firefox settings in the future.
Each highlight has in its own toolbar a color picker to just change its color.
The different color pickers are so similar (modulo few differences in their styles) that
this patch introduces a new class ColorPicker which provides a color picker component
which could be reused in future editors.
All in all, a large part of this patch is dedicated to color picker itself and its style
and the rest is almost a matter of wiring the component.
When a pdf as a FreeText without appearance, we use a fake font in order to render it
and that leads to create few new refs for the font.
But then when we're saving, we create some new refs which start at the same number
as the previous created ones.
Consequently, when saving we're using some wrong objects (like a font) to check if
we're able to render the newly added FreeText.
In order to fix this bug, we just remove the persistent refs (which are only used
when rendering/printing) during the saving.
Having just tested PR 17337 locally I noticed that especially the `JpxImage`-test causes a "ridiculous" amount of warning messages to be printed, which doesn't seem helpful.
Given that only actual `Error`s should be relevant here, we can easily disable this logging during the tests.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- replaces the recursive function calls with a simple loop;
- uses `const`/`let` instead of `var`;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
The test helper code largely predates the introduction of modern
JavaScript features and should be refactored to improve readability.
In particular callbacks and recursive function calls make the code
harder to understand and maintain.
This commit:
- replaces the callback argument with returning a promise;
- uses `const` instead of `var`;
- uses arrow functions for shorter code;
- uses template strings for shorter string formatting code;
- improves the error messages to have more details.
This unfortunately broke in PR 17060, since I had completely forgotten about https://bugzilla.mozilla.org/show_bug.cgi?id=1632644#c5 when writing that patch.
The easiest solution, while slightly unfortunate, seems to be to add a couple of non-standard hash parameters specifically for the PDF attachment use-case in the Firefox PDF Viewer. (Note that we cannot use "nameddest" here, since we also need to support the stringified destination-Array case.)
It seems this unit-test started failing in Node.js version 20.10 as well. We should probably try and understand why, but for now just disable it to get passing CI tests.
Given that this event listener is only used to trigger rendering after the sidebar has been opened/closed, we can utilize the existing one in the `PDFSidebar` class for this purpose instead. That one is registered on the sidebar DOM-element, and is needed to remove a CSS-class indicating that the sidebar is moving.
It fixes few errors in the CSS for HCM.
It now complies to the specs from UI/UX.
Only the foreground must change in HCM and not the background, similarly to what
we had for the alt-text button before moving it.
After the two previous commits, which removed the remaining call-sites, this method is no longer used and can thus be removed.
As mentioned in the JSDocs for the now removed method, synchronous communication between the viewer and the platform code isn't really a good idea.
Once this patch has landed in mozilla-central some additional clean-up of the platform code will also be possible.
The return value is not, nor has it ever been, used for anything and we should thus be able to just send the message.
Note that the responses are already handled by the "message" event listener registered above.
This commit fixes the JSDoc comment for the `annotationEditorMode` setter.
The types tests fail on that now because the input value was changed from
a number to an object with various properties in recent patches, but the
JSDoc comment was not updated accordingly.
Moreover, the types tests also fail because TypeScript 5.3 assumes that
getters and setters have equal return and input value types, which is
arguably also what one would expect, but our `annotationEditorMode`
getter and setter deviate from that because the getter returns a number
while the setter accepts an object. Given that it seems more important
to document the setter entirely, including the meaning and types of its
properties, and the type of the getter can easily be inferred from this
comment and the other JSDoc comments that have `annotationEditorMode` in
it, we remove the getter type to make the types tests pass again.
- Extend the `fetchData` helper function to also support fetching of "blob" data.
- Use the `fetchData` helper function more in the code-base, when fetching non-PDF data. Given that the Fetch API isn't supported for all protocols, this should improve compatibility for the PDF.js library.
Currently the SVG images for the loading-icons exist in two versions, for the light- respectively dark-theme, which nowadays are the only "duplicated" icons left.
The reason for this is that these icons are being used in `input`-elements, where the regular `mask-image` approach used for all buttons don't work.
To address this we add containers for the `input`-elements, such that we have a "regular" DOM-element where we can use `mask-image`.
The goal is to be able to get these outlines to fill the shape corresponding
to a text selection in order to highlight some text contents.
The outlines will be used either to show selected/hovered highlights.
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]
- Expose the `fetchData` helper function in the API, such that the viewer is able to access it.
- Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.
---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
This is consistent with the implementation used in the (now removed) webL10n-library, and by only using lowercase language-codes internally in the `L10n`-implementations we should avoid future issues e.g. when users manually set the `locale`-option (in the default viewer).
Some fields, somewhere under the Fields entry in Acroform, could have no name (in T)
but with a parent which has a name but which isn't somewhere under Fields.
As a side-effect, this patch prevents infinite loops because of potential cycles
under Fields.
This commit migrates the font tests away from the bots. Not only are the
font tests broken on the Windows bot since some time, they also run on
Python 2 (end of life since January 2020) and `ttx` 3.19.0 (released in
November 2017). The latter is installed via a submodule, which requires
more complicated logic for finding and running `ttx`.
We solve the issues by implementing a modern workflow that installs the
most recent stable Python and `ttx` (`fonttools` package) versions. This
simplifies the `ttx` driver code as well because it can now assume `ttx`
is available on the path (just like we do for e.g. `node` invocations).
GitHub Actions takes care of creating a virtual environment with
`fonttools` in it so that the `ttx` entrypoint is available. Locally
the font tests can be run in a similar way by creating and sourcing a
virtual environment with `fonttools` in it before running the font
tests, and a README file is included with instructions for doing so.
This commit prepares for running the font tests on GitHub Actions where
we can't spin up headful browsers because there are no display
capabilities on the workers. This will also be useful for porting other
test targets to GitHub Actions at a later time, as well as running the
tests locally in headless mode.
This commit prepares for the introduction of extra options in later
commits by changing the function signatures of the `startBrowser(s)`
functions to take parameter objects instead of plain parameters. This
makes the call sites explicitly state which parameters they pass,
improving overall readability as well.
The current logic is more complicated than it needs to be because it's
passing a callback function to `startBrowsers` instead of a string.
This commit simplifies the logic by passing the base URL as a string to
`startBrowsers` and having it do further augmentation internally,
thereby removing all indirection of the function calls to `makeTestUrl`
and the inner function it returned.
After recent PRs the size and scope of the CI workflow is now reduced, and this patch tries to simplify things further. More specifically we can directly specify the gulp-tasks in the workflow, and thus clean-up the `gulpfile` a tiny bit.
Note that this will technically be slower, since the tests are now run in series (rather than in parallel), however `gulp externaltest` runs so quickly that it really won't matter in practice.
Hopefully this is enough to address the problem of initializing the Worker in Chromium-based browsers.
Locally I've tried to *force* use of `createCDNWrapper` in development mode, by commenting out the `isSameOrigin` checks, and worker-loading fails against `master` and works with this patch.
Currently the background-color of the `editorParamsToolbar`s don't match that of the arrow, which is especially noticable in dark mode (see zoomed-in screen-shots below).
The simplest solution seem to be to just style the `editorParamsToolbar`s like the `secondaryToolbar`, to limit the amount of CSS changes required.
This should *hopefully* fix 17228, by tweaking the build scripts to give the GENERIC viewer something to await to avoid breaking third-party users of the standalone viewer components.
This button is *only* used in the GENERIC viewer, and will currently be visible either in the main or secondary toolbars (depending on the viewer width).
To simplify upcoming changes, and to avoid then having to complicate the relevant CSS rules unnecessarily, let's place the "Open file"-button permanently in the secondary toolbar instead.
(Note that the GENERIC viewer also, since five years, supports drag-and-drop in order to open local files.)
The `fieldObjects`-getter is implemented in the `PDFDocument` class, which means that the `this._localIdFactory`-property that we pass to `AnnotationFactory.create` doesn't actually exist.
The reason that this hasn't caused any bugs, that I'm aware of, is that all /Fields-entries need to be References to actually make sense.
The `fieldObjects`-getter itself is called, from `src/core/worker.js`, in a way that'll ensure that any `MissingDataException`s are handled. However the problem is that the actual data-lookups in `fieldObjects` and `#collectFieldObjects` are done inside of a Promise, which means that `MissingDataException`s won't be handled and parsing could thus break.
To address this we change all data-lookups to be asynchronous instead.
The `viewerCssTheme`-implementation has always been somewhat hacky, and now it's also *partially* broken ever since we've started using CSS nesting.
Trying to support nested media queries would thus require a lot more parsing of the CSS rules, which seems inefficient and thus generally undesirable.[1]
As discussed on Matrix, let's try to remove the `viewerCssTheme`-option and see if there's any (significant) fallout from this.
---
[1] If this option is brought back, it seems to me that it (in Firefox) should probably be set through the platform-code that handles theming.
Depending on the structure of the outline we could potentially need to expand a few levels, especially in long PDF documents, hence it cannot hurt to pause translation in that case as well.
With the changes in PR 17208, where browser-preferences are now handled as "regular" viewer-options, we can tweak the definition of `canvasMaxAreaInBytes` to slightly simplify things in the `PDFViewerApplication.open` method.
Hopefully this will allow us to catch bugs in new Node.js versions earlier, rather than having to wait for bug reports.
Given that `CompressionStream` is (currently) only potentially used when saving a *modified* PDF document, which is unlikely to be a common use-case in Node.js environments, let's just disable the affected unit-test for now.
Given that this branch is only necessary in development mode and *during* building, but is never actually used in the final viewer-bundles, we can utilize the pre-processor to ignore this code.
Currently we *synchronously* fetch a number of browser preferences/options, from the platform code, during the viewer respectively PDF document initialization paths.
This seems unnecessary, and we can re-factor the code to instead include the relevant data when fetching the regular viewer preferences.
The active LTS version is now based on Node.js version 20, hence let's update the relevant workflows to use that one instead; see https://en.wikipedia.org/wiki/Node.js#Releases
Given that we still support Node.js version 18, i.e. the maintenance LTS version, in the PDF.js library we'll keep testing both versions in GitHub Actions to prevent regressions.
I noticed the following warning in the GitHub Actions workflow logs:
`Configuration file not found: .github/linter_config.yml`
The configuration file is called `fluent_linter_config.yml` instead, so
this commit fixes the path so it points to the correct file.
Fixes 487816b.
The current stable version of Python is Python 3.12, see
https://www.python.org/downloads, so we should switch to that since
Python 3.10 is older and only receives security updates.
This commit tweaks the Fluent linter workflow to match the other
workflow files we have, so we make sure the steps have a newline between
them for better readability and align names and descriptions of steps
with how they are called in the other workflow files we have.
There are environments that include *incomplete* polyfills for the `navigator`-object, which may thus cause the PDF.js library to break.
Despite that clearly not being our fault, it may still result in bug reports filed against the PDF.js project; see e.g. 15728.
Currently this even seem to affect *the latest* version of Node.js; see e.g. [here].
*Please note:* Thanks to the pre-processor none of these changes affect the Firefox PDF Viewer, however it does add "overhead" when working with and reviewing the affected code (which is why I'm not crazy about this).
*Please note:* While following the steps in the README still works with this patch, in the sense that the example runs and successfully renders a PDF document, I unfortunately cannot tell if it illustrates Webpack best practices.
- Remove the `errorWrapper`-element, since it simplifies the example and is consistent with the default viewer; see PR 15533.
- Simplify the l10n-handling, since the `NullL10n` should be able to translate everything e.g. without fallback values; see PR 17146.
Note that we must append the textLayer to the DOM *before* enabling the `highlighter` and `accessibilityManager`, to avoid breaking e.g. a pending searching operation.
The least invasive solution, that I was able to come up with, is to introduce a new `TextLayerBuilder` callback-function for this purpose.
Currently the `WidgetAnnotationElement._getKeyModifier` method will always be falsy on Linux, which seems like a simple oversight. Looking at all the other `FeatureTest.platform` accesses we only handle the `isMac`-case specially, and it seems reasonable to do the same thing here.
The reason that this hasn't led to any bug reports is most likely that the `modifier`-property seems completely unused in the scripting-implementation.
Finally, with these changes we can (slightly) simplify the `FeatureTest.platform` implementation.
After PR 17177 the interface of `XfaLayerBuilder` is now inconsistent, since whether or not we directly append the xfaLayer to the DOM now depends on the rendering intent.
I forgot to include `web/l10n_utils.js` in PR 17161, which currently breaks `ConstL10n` since there's no longer a method called `setL10n`; sorry about that!
Most of the strings shouldn't contain special chars (<= 0x1F) so we can
have a fast path which just checks if the string contains at least one such
a char.
To prevent the *standalone* viewer-components from breaking, we need to ensure that the `NullL10n`-interface won't accidentally diverge from the actual `L10n`-implementations.
Looking at the `PDFThumbnailView.setPageLabel` method you'll see that we update e.g. the "aria-label" of the thumbnail-image for documents that contain (valid) pageLabels.
This isn't done in `PDFPageView`, which seems inconsistent, hence this patch.
This patch changes almost all viewer-components[1] to use "data-l10n-id"/"data-l10n-args" for localization, which means that in many cases we no longer need to pass around the `L10n`-instance any more.
One part of the code-base where the `L10n`-instance is still being used "directly" is the AnnotationEditors, however while it might be possible to convert (most of) that code as well that's not attempted in this patch.
---
[1] The one exception is the `PDFDocumentProperties` dialog, since the way it's currently implemented makes that less straightforward to fix without a lot of code changes.
*Please note:* In the Firefox PDF Viewer this findbar is only used for PDF documents placed in e.g. `<iframe>` elements.
By registering a `ResizeObserver` when the `PDFFindBar` is open we slightly unify and simplify how the findbar layout (row vs column) is handled.
This will be especially helpful with upcoming changes, where we'll make use of "data-l10n-id"/"data-l10n-args" to trigger translation in the viewer.
- The old translation engine handled language code casing slightly differently, hence we need to tweak the non-metric locale check in `PDFDocumentProperties` to account for that.
- Use only lowercase names for the pre-defined page names, to improve overall consistency.
*Please note:* These changes only affect the GENERIC build, since `NullL10n` is only a stub elsewhere (see PR 17135).
After the changes in PR 17115, which modernized and improved l10n-handling, the `NullL10n`-implementation is no longer a good fallback for the "proper" `L10n`-classes.
To improve this situation, especially for the *standalone* viewer-components, this patch makes the following changes:
- Let the `NullL10n`-implementation extend an actual `L10n`-class, which is constant and lazily initialized, to ensure that it works *exactly* like the "proper" ones.
- Automatically bundle the "en-US" l10n-strings in the build, via the pre-processor, such that we don't need to remember to manually update them.
- Ensure that the *standalone* viewer-components register their DOM-elements for translation, similar to the default viewer, since this will allow future code improvements by using "data-l10n-id"/"data-l10n-args" in most (if not all) parts of the viewer.
- Remove the `NullL10n` from the `AnnotationLayer`, to avoid affecting bundle size too much.
For third-party users that access the `AnnotationLayer`, as exposed in the main PDF.js library, they'll now need to *manually* register it for translation. (However, the *standalone* viewer-components still works given the point above.)
Use existing helper to calculate the Box
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Ensure that there are non-zero
Co-authored-by: Jonas Jenwald <jonas.jenwald@gmail.com>
Add a reference test for #17147
Given that there's now a bit more asynchronicity in the l10n-initialization in the Firefox PDF Viewer, after PR 17115, try to limit the impact of that by moving it to occur a tiny bit earlier in the default viewer initialization.
In Firefox debug builds, there is an assertion to check that we don't connect
a subelement of an already connected root. Thanks to this assertion, we can see
that the root has already been added to Fluent, hence we don't need to do it
a second time.
We don't need to await anymore on the translation in order to update the
toolbar: it'll be done by Fluent, so we can safely remove the "localized"
event and avoid to wait for it.
*Please note:* This patch contains a couple of micro-optimizations, hence I understand if it's deemed unnecessary.
Move the `AppOptions` initialization into the `Preferences` constructor, since that allows us to remove a couple of function calls, a bit of asynchronicity and one loop that's currently happening in the early stages of the default viewer initialization.
Finally, move the `Preferences` initialization to occur a *tiny* bit earlier since that cannot hurt given that the entire viewer initialization depends on it being available.
Note that CSS-features such as e.g. `flex` didn't exist, or had poor cross-browser support, back when the JavaScript-based solution was initially implemented.
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
and then we rely on document.l10n which is a DOMLocalization object.
Currently we're unnecessarily converting data between strings and typed-arrays, when dealing with compressible data, in the `writeStream` function.
Note how we're *first* getting a string-representation of the stream, which involves converting the underlying typed-array into a string, only to immediately convert this back into a typed-array. This seems completely unnecessary, and is easy enough to avoid, and we'll now only do a *single* type-conversion in this function.
The current test fails intermittently only on Windows for unknown
reasons: the code is correct and on Linux it always passes. However, we
have already spent quite a lot of time on this test, so rather than
spending even more time on it I figured we should look at what behavior
the test is trying to check and find an alternative way to do it that
can't trigger this intermittent issue anymore.
This commit changes the test to use a term that only exists once in the
entire document so we cannot accidentally highlight another match
anymore. This doesn't change anything about the behavior that this test
aims to check: we still test searching in the XFA layer, we still test
that the original term is matched case-insensitively and we still test
that that match is actually highlighted. Note that the only objective of
the test is confirming that the search functionality covers the XFA
layer, so the exact phrase/match is not the interesting bit.
Given that we only use standard `import`/`export` statements now, after recent PRs, the "exports" global is unused.
Instead we add "__non_webpack_import__" to the `globals` to avoid having to sprinkle disable statements throughout the code.
Finally, the way that `globals` are defined has changed in ESLint and we should thus explicitly specify them as "readonly"; please find additional details at https://eslint.org/docs/latest/use/configure/language-options#specifying-globals
The previous change that set the timeout had effect because we have seen
quite a few protocol timeouts now correctly being raised in the context
of the active test, however we have also still seen a handful of cases
where this wasn't the case and the one second difference turned out to
be too low (likely because the operation was started slightly after one
second into the test run). We therefore tweak the value to be 75% of the
Jasmine timeout. This should be enough to catch operations that happen
later on in the test run, and if a single operation takes that long any
hope for success is already gone anyway.
It's not necessary because we have configured silent printing for
Firefox and Chrome in the browser arguments we pass in `test.mjs`. This
means that the print dialog is not even shown at all or disappears
automatically once printing is done, so the Escape key press serves no
purpose. Since it has been shown to time out, likely because the page
loses focus during printing, and because the page itself doesn't know
when the printing dialog is shown and we therefore can't possibly do the
key press at the right time anyway, this commit gets rid of it to
stabilize the test.
Those files only contain old debugging code that is not used/imported
anywhere anymore, which is generating code scanning alerts. Moreover,
they rely on globals/platform-specific code and don't import/export
logic properly.
Given the amount of work put into removing `require`-calls from the code-base, let's ensure that new ones aren't accidentally added in the future.
Note that we still have a couple of files where `require` is being used, in particular:
- The Node.js examples, however those will be updated to use `import` in PR 17081.
- The Webpack examples, and related support files, however I unfortunately don't know enough about Webpack to be able to update those. (Hopefully users of that code will help out here, once version `4` is released.)
- The `statcmp`-tool, since *some* of those `require`-calls cannot be converted to `import` without other code changes (and that file is only used during benchmarking).
Please find additional details at https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-commonjs.md
This *finally* allows us to mark the entire PDF.js library as a "module", which should thus conclude the (multi-year) effort to re-factor and improve how we import files/resources in the code-base.
This also means that the `gulp ci-test` target, which is what's run in GitHub Actions, now uses JavaScript modules since that's supported in modern Node.js versions.
For large/complex images it's possible that the image-data arrives in the API *after* the page has been scrolled out-of-view and thus been cleaned-up. In this case we obviously shouldn't cache such page-level data, since it'll first of all be unused and secondly can increase memory usage *a lot*.
Also, ensure that we *immediately* release any `ImageBitmap` data in this case to help reclaim memory faster.
The examples themselves were updated to account for JavaScript modules, which didn't require changing the actual URLs.
However, since it seems that JSFiddle doesn't support JavaScript modules in its separate "JavaScript" editing-area we need to change how we embed the examples to avoid showing a blank "JavaScript"-tab.
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
The default protocol timeout is 180 seconds according to the
documentation at https://pptr.dev/api/puppeteer.browserconnectoptions,
but the Jasmine timeout we configure in the individual boot files is 30
seconds. The consequence of this is that if a protocol (CDP) error
occurs after 30 seconds Jasmine will fail the test, but the actual
protocol error from Puppeteer is raised much later in the context of
another test, which causes unrelated failures or tracebacks.
This commit fixes the problem by configuring Puppeteer to always use a
lower protocol timeout than the Jasmine timeout so that protocol errors
are always raised in the context of the test that actually triggered it.
It's been loaded as a JavaScript module for a long time, and given that the file is bundled as-is (without building) it seems reasonable to just change the file extension now.
The Windows bot is usually slower than the Linux bot, and therefore
text layer rendering is as well. However, the `autoprint` test awaited
text layer rendering to complete before activating the selector check,
which makes it timing-sensitive and causes it to never resolve because
the page is already printed (and the printed page div removed) by then.
This commit should fix the issue by activating the selector check as
soon as possible, namely as soon as the viewer appears, which should
ensure we're always registering the selector check in time because we're
doing it even before rendering is starting.
Comparing the currently supported browsers/environments, see [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) and the [MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility), the `structuredClone` polyfill is *only* needed in Google Chrome versions < 98. Because of some limitations in the core-js polyfill we're currently forced to special-case the `transfer` handling to prevent bugs, and it'd be nice to avoid that.
Note that `structuredClone`, with transfers, is only used in two spots:
- The `LoopbackPort` class, which is only used with fake workers. Given that fake workers should *never* be used in browsers, breaking that edge-case in older Google Chrome versions seem fine.
- The `AnnotationStorage` class, when Stamp-annotations have been added to the document. Given that Google Chrome isn't the main focus of development, breaking *part* of the editing-functionality in older Google Chrome versions should hopefully be acceptable.
To avoid problems with `export` statements in the QuickJS Javascript Engine, we can work-around that by *explicitly* exposing `pdfjsScripting` globally instead.
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
The user should *always* provide a correct `GlobalWorkerOptions.workerSrc` value when using the PDF.js library in browser environments. Note that the fallback:
- Has been deprecated ever since PR 11418, first released in version `2.4.456` over three years ago.
- Was always a best-effort solution, with no guarantees that it'd actually work correctly.
- With upcoming changes, w.r.t. outputting JavaScript modules, it'd now be more diffiult to determine the correct value.
The minified default viewer has never been distributed in either official releases or through pdfjs-dist, which means that it's most likely unused, and it's has never been tested nor actively maintained.
Setting the alpha-value explicitly to `1` in `rgb` colors is unnecessary, since that's the default value, and this way we ever so slightly reduce the size of our CSS files.
Unfortunately I've not found a Stylelint rule to enforce this automatically, and the patch was generated using search and replace.
When an editing button is disabled, focused and the user press Enter (or space), an
editor is automatically added at the center of the current page.
Next creations can be done in using the same keys within the focused page.
When an element has the hasOwnCanvas flag we must have an HTML container to attach
the canvas where the element will be rendered.
So the noHTML flag must take this information into account:
- in some cases the noHTML flag is resetted depending on the hasOwnCanvas value;
- in some others, the hasOwnCanvas flag is set depending on the value of noHTML.
To reduced the risk of regressing something else, given that the issue only applies to a (for the default viewer) non-default configuration, this patch is purposely limited to only TextWidget-annotations in the display layer.
It happens only on windows with chrome.
For any reason, click event isn't correctly triggered and it seems work correctly
with pointerup.
And it seems that when drawing a svg on an OffscreenCanvas we need to wait
a little in order to be able to transfer it: it's why this patch adds
a check on the canvas content.
This has been deprecated since version `2.15.349`, which is a year ago.
Removing this will also simplify some upcoming changes, specifically outputting of JavaScript modules in the builds.
This removes the only remaining old and non-standard handling of exports in the `web/`-folder, since some initial attempts at outputting JavaScript modules in the builds have identified this file as a potential problem.
While this uses a hard-coded list, for overall simplicity, I don't believe that that's a big problem since:
- Generating this file automatically would require a bunch more parsing *every single time* that the library is built.
- The official API-surface doesn't change often enough for this to really impede development in any significant way.
- The added unit-test helps ensure that this list cannot accidentally become outdated.
When the editor is invisible (because on a non-rendered page) its parent is null.
But when we undo its deletion, we need to have a parent to attach it.
Given that this is accessed multiple times per page in the viewer, that leads to a number of (strictly speaking unneeded) function calls and allocated Objects for each invocation. By converting `layerProperties` to a, lazily initialized, Object we can avoid this.
When PR 17015 removed the `disabled` handling for the "Save"-button it left a bunch of now unused CSS rules behind, which seems like a simply oversight.
Rather than shipping "dead" CSS rules, let's remove those until such a time that they're actually needed.
The old pre-processor used for CSS, and HTML, files leaves comments intact which unnecessarily contributes to the overall size of the *built* CSS files (note that the built JavaScript files don't include comments).
Rather than trying to "hack" comment removal into the pre-processor it seems easier to use a PostCSS plugin instead. The one potential issue is that it also affects *some* whitespaces, and it's not clear to me if this'll work with the various CSS-related tests that run in mozilla-central.
Please refer to https://www.npmjs.com/package/postcss-discard-comments for additional information.
*For many non-English locales the translated strings will be longer, which is easy to forget about during development/review.*
Note how for some locales (e.g. Swedish) the altText-button end up looking horizontally "cramped", hence it seems reasonable to add a bit of inline padding to improve this.
In the scripting integration tests we use a few different typing
delays, mostly 100 or 200 milliseconds. According to for example
https://www.typingpal.com/en/documentation/school-edition/pedagogical-resources/typing-speed,
a fast typing speed is around 300 characters per minute, which is 5
characters per second and therefore a delay of 200 milliseconds between
each keystroke. Note that this is already above average, so in practice
the delay will be even larger. Therefore the 100 milliseconds variant
is unrealistically fast and therefore not suitable for the integration
tests which aim to simulate the average user behavior.
On top of that, the quick typing speeds are problematic for the tests
that involve validation alert dialogs appearing during typing. In those
tests a handler is registered to close the dialog once it pops up, but
it takes time for Puppeteer to notice the dialog, trigger the handler
and close it. If the typing delay, which is the delay between the key
down and key up events according to the Puppeteer source code at
https://github.com/puppeteer/puppeteer/blob/master/packages/puppeteer-core/src/cdp/Input.ts#L209-L215,
is too short, the key up event will be fired before the dialog is
closed. In that time the text box we're typing in is not focused, so
when the dialog is closed the `page.type()` call on the text box will
never resolve because the key up event never reached the text box.
This commit aims to fix the issues by converting all 100 millisecond
delays to 200 milliseconds. For instance the "must check input for US
zip format" failed pretty consistently locally before and hasn't failed
anymore with a 200 millisecond delay.
This integration test fails often because we wait for scripting to be
ready before we check the printed page, but most of the time the PDF
is already done printing before scripting is reported to be ready.
This happens because the print trigger is on the `Open` event, which is
one of the first events to be dispatched and, most notably, before
scripting is marked as ready; please see
https://github.com/mozilla/pdf.js/blob/master/web/pdf_scripting_manager.js#L176-L191.
Given that the PDF document is only one page, printing it is usually
finished between triggering the `Open` event and scripting reported
to be ready. If this happens the printed page is already destroyed
before we get to our actual test, which will then timeout because it
will never find the printed page in the DOM.
This commit fixes the problem by not awaiting scripting to be ready
because the fact that the printed page appears is already enough to know
that autoprint was triggered (after all, there is no other user
interaction involved here). While we're here we also switch to the
shorter `page.waitForSelector` function.
but keep it for the text area.
Disable pointerdown on the alt-text button to disable dragging the editor
when the button is clicked (especially when slightly moving the mouse
between the down and the up).
The dialog element handles closing with <kbd>Esc</kbd> automatically, however we're not reporting telemetry in that case.
In order to fix that the easiest solution, as far as I'm concerned, seem to be moving the telemetry reporting into the dialog-close handler since it's always invoked.
This patch addresses an edge-case that'll probably never happen, but it nonetheless seems like something that we want to fix.
Note how we're using the `#currentEditor`-field to prevent opening the dialog when it's already active, and it being reset once the dialog has been closed.
By also resetting the `#currentEditor`-field during destruction, instead of waiting until the dialog has actually closed (assuming it's currently open), there's a *tiny* window of time[1] during which we could theoretically allow to (incorrectly) re-open the dialog and thus getting out-of-sync state in the viewer-component.
---
[1] Since the "close" event, on a dialog-element, is dispatched asynchronously by the browser.
When the user edit an existing alt-text and remove it, we want to be able
to save this state and consequently remove the done state from the
alt-text button.
Remove the button from its parent when the editor is removed: it should
help to save few Kb of memory.
Radio-buttons can also be toggled by clicking on their associated `label`-elements, and not only the `input`-elements itself, however it seems that "pointerdown" event listeners don't cover that case.
Hence it's possible that telemetry could miss certain cases of a mouse being used, and the easiest solution seem to be to instead use "click" event listeners and just ignore keyboard-based events.
Rather than trying to be "clever" here, and possibly affect code readability negatively, let's just restore the `collectFields` parameter to address the unneeded parsing that now happens when printing new Annotations.
Given the limitations of the old pre-processor that's used for CSS/HTML files, this unfortunately isn't as "easy" to implement as it is for JavaScript code.
Since this is the first case where we've wanted to do conditional CSS imports, rather than trying to completely re-write the pre-processor, this patch settles for handling it explicitly in the `expandCssImports` function.
Looking at the save-telemetry values they're all boolean *except* for "alt_text_edit" in one instance, since `this.#previousAltText` may be an empty string (looking at the `editAltText` method) and this value may thus become an empty string as well.
When closing a document in the viewer, e.g. by running `PDFViewerApplication.close()` in the console, the `AltTextManager.#finish` method currently throws *unless* the `altText` dialog is actually open.
Similar to e.g. the PasswordPrompt, we should thus only attempt to close the `altText` dialog when it's open.
In the rare situation that an optional content dictionary lacks a /Type-entry we currently throw, which may prevent e.g. Form XObjects from rendering completely.
Fixes https://bugs.ghostscript.com/show_bug.cgi?id=707147
It's a part of the UX specifications. There's a drawing issue in Firefox
(see bug https://bugzilla.mozilla.org/1853288) but setting the
background-clip property to content-box seems to be a good workaround.
Especially on slower bots there is some time between clicking the
element and the actual visibility change, but we didn't await this and
checked the visibility state immediately after clicking. This can be
reproduced 100% of the time by introducing a delay in the `display` and
`hidden` handlers of the `_commonActions` shadow call.
This commit fixes the problem by waiting until the first visibility
change actually happened before continuing with the assertions.
This integration test currently fails intermittently on the bots because
of the fixed timeout in the test, which is sometimes too low on slower
systems. The issue can be reproduced 100% of the time by introducing a
delay just before dispatching the `switchannotationeditormode` event.
Puppeteer also discourages this and instead recommends waiting for a
selector instead, which we now do here. This ensures that the test only
continues if the element under test is available and therefore prevents
any timing problems.
The x/y-coordinates are floats instead of integers like one might
expect. The current approach rounds both the old and the new
coordinates in order to do integer comparison. However, rounding each
coordinate individually causes too much loss of precision because,
depending on the decimal value, they are either rounded up or down
which causes intermittent off-by-one errors.
This commit fixes the problem by comparing coordinate differences
instead of the coordinates themselves. The precision loss is avoided
by subtracting the old from the new coordinate as-is and only rounding
the final result.
This integration test currently fails intermittently on the bots because
of the fixed timeout in the test, which is sometimes too low on slower
systems. The issue can be reproduced 100% of the time by introducing a
delay in the `WidgetAnnotationElement.showElementAndHideCanvas` method.
Puppeteer also discourages this and instead recommends waiting for a
selector instead, which we now do here. This ensures that the test only
continues if the element under test is available and therefore prevents
any timing problems.
We already use `page.$eval` in most other integration tests and it's
simpler because it already takes the selector as argument, so we don't
have to do a separate `querySelector` call ourselves.
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Given that this is a shadowed getter, the `opMap` is already lazily initialized and it shouldn't be necessary to *also* use the `getLookupTableFactory` helper function here. Looking at the history of the code, it seems that this is simply a leftover from before JavaScript classes existed.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
Now that modern JavaScript is fully supported also in the worker-thread we no longer need to keep old closures, which slightly reduces the size of the code.
While this cache will not contain a huge amount of data in practice, it's nonetheless a *global* cache that currently will never be cleared.
This patch also removes the existing closure, since it shouldn't really be necessary nowadays given that the code is a JavaScript module which means that only explicitly listed properties will be exported.
When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet).
Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the *entire* PDF document has been loaded *before* we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded.
Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.
The unit test is re-enabled because it no longer seems to fail after 10
runs on Linux where this used to fail often. Code inspection also shows
that the code is correct and should raise the previous exception
(anymore). Finally, a lot has changed since this test was disabled such
as new Jasmine versions, new Linux bot OS version and new browser
versions.
While it makes sense to check that the `destDict` parameter is indeed a Dictionary, since that data comes from the PDF document itself, the `resultObj` parameter is an internal PDF.js implementation detail that should always be correct (or tests will fail).
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.
An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.
One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
The classes were stripped out during when creating the field name but
it led to a wrong name.
Since class components in a path are irrelevant, they're just ignored
when searching for a node in the datasets.
Focus callback must be called only when the element has been blurred.
For example, blur callback (which implies some potential validation) is not called
because the newly focused element is an other tab, an alert dialog, ... so consequently
the focus callback mustn't be called when the element gets its focus back.
While reviewing PR 16898 it occurred to me that it's currently impossible to trigger downloading of FileAttachment annotations using the keyboard.
Hence this patch adds `Ctrl + Enter` as the keyboard shortcut to download those, thus supplementing the existing double-clicking when using a mouse.
The goal is to always have something which is focusable to let the user select
it with the keyboard.
It fixes the mentioned bug because, the annotation layer will now have a container
to attach the canvas for annotations having their own canvas.
`.grab-to-pan-grab:active` is `#viewerContainer` when the mouse is
pressed down. It is supposed to have a `cursor: grabbing` appearance
immediately on mousedown,
`.grab-to-pan-grabbing` is the overlay that is supposed to cover
everything, and also has the `cursor: grabbing` appearance. The "cover
everything" result is achieved through `position:fixed`, `inset:0`, etc.
The block with these CSS properties for "cover everything" is currently
shared by `.grab-to-pan-grab:active` and `.grab-to-pan-grabbing`, but
only "cursor" need to be shared. The original JS and CSS code at
https://github.com/Rob--W/grab-to-pan.js shows that these were supposed
to be associated with the overlay only.
The PR that added this to PDF.js also shows that the "cover everything"
CSS properties were supposed to be limited to the overlay only:
https://github.com/mozilla/pdf.js/pull/4209#discussion-diff-9285917
But the final version of the PR mistakenly merged them together.
This patch rectifies that mistake.
The typescript compiler is now configured to know about the import map
to be able to resolve those imports and find the associated types.
As tsc outputs declaration files using the original module identifiers
and not the resolved ones, tsc-alias is used to post-process the
declaration files by resolving those paths.
Some configurations settings like `paths` cannot be provided through CLI
arguments but only in a configuration file. And when using a
configuration file, only a few options (like `--outDir` can still be
provided) through the CLI.
Using `removeNullCharacters` on the URL should be completely redundant, given the kind of data that we're passing to the `addLinkAttributes` helper function. Note that whenever we're handling a URL, originating in the worker-thread, in the viewer that helper function is always being used.
Furthermore, on the worker-thread all URLs are parsed with the `createValidAbsoluteUrl` helper function, which uses `new URL()` to ensure that a valid URL is obtained. Note that the `URL` constructor will either throw, or in some cases just ignore them, when encountering `\u0000`-characters during parsing.
Hence it should be *impossible* for a valid URL to contain `\u0000`-characters and we can thus simplify the viewer-code a tiny bit. The use of `removeNullCharacters` is most likely a left-over from back when `new URL()` wasn't generally available in browsers.
Testing the `tagged_stamp.pdf` document locally in the viewer, I noticed that e.g. the /Alt entry for the StampAnnotation contains "Secondary text for stamp\u0000".
Elsewhere in the viewer we're skipping null-chars and it's easy enough to do that in the `StructTreeLayerBuilder` class as well. (Note that we generally let the API itself return the data as-is.)
This fixes invalid type references (either due to invalid paths for the
import or missing imports) in the JS doc, as well as some missing or
invalid parameter names for @param annotations.
The issue described in the mentioned bug is reall because
Acrobat is rendering the XFA instead of the Acroform.
The original patch just tried to workaround the issue but it
induces some regressions.
This was added in PR 14899, over a year ago, however it's still completely unused in the PDF.js library/viewer. In hindsight I think that it was a mistake to add unused functionality, and the issue should probably have been WONTFIXed instead, however we probably can't just remove it now.
Thanks to the pre-processor, we can at least exclude this code in the *built-in* Firefox PDF Viewer.
After the changes in PR 16828 the `StampEditor` can now be initialized with a File, in addition to a URL, hence it seems that the `isEmpty` method ought to take that property into account as well.
Looking at this I also noticed that the assignment in the constructor may cause the `this.#bitmapUrl`/`this.#bitmapFile` fields be `undefined` which "breaks" the comparisons in the `isEmpty` method.
We could obviously fix those specific cases, but it seemed overall safer (with future changes) to just update the `isEmpty` method to be less sensitive to exactly how these fields are initialized and reset.
Currently we're repeating virtually the same code *four times* when fetching the bitmap-data, which seems unnecessary.
Also, ensure that the `#bitmapPromise` is always `null`ed by moving that into the `StampEditor.#getBitmapDone` method.
Currently this unit-test will pass just fine if compression is disabled, e.g. by commenting out the relevant code in the `src/core/writer.js` file.
While we don't have a simple way of *directly* checking that the Annotation text-content is compressed, we can however use the resulting file-size as a fairly good proxy. (Note that if compression is disabled the file-size is more than doubled.)
Please note that for performance reasons it's not really advised to use the same worker-thread *in parallel* for parsing multiple PDF documents, since they will then unnecessarily compete for resources.
However, given that it's still possible to do that e.g. when using the global `workerPort` it probably won't hurt to add a unit-test for this particular situation.
Given that the `PDFDocumentLoadingTask.destroy()`-method is documented as being asynchronous, you thus need to await its completion before attempting to load a new PDF document when using the global `workerPort`.
If you don't await destruction as intended then a new `getDocument`-call can remain pending indefinitely, without any kind of indication of the problem, as shown in the issue.
In order to improve the current situation, without unnecessarily complicating the API-implementation, we'll now throw during the `getDocument`-call if the global `workerPort` is in the process of being destroyed.
This part of the code-base has apparently never been covered by any tests, hence the patch adds unit-tests for both the *correct* usage (awaiting destruction) as well as the specific case outlined in the issue.
- The `src/core/unicode.js` exclude ought to have become unnecessary already with PR 16200, which significantly shortened and simplified that file.
- The `src/core/glyphlist.js` exclude no longer seems necessary in practice either, possibly because of improvements in Babel.
The main stamp button will be used to just enter in a add/edit image mode:
- the user can add a new image in using the new button.
- the user can edit an image in resizing, moving it.
In image mode, when the user clicks outside on the page but not on an editor,
then all the selected editors will be unselected.
After the `src/core/`-changes in PR 16779 the `PDFDocumentProxy.getJSActions` method should no longer be able to return *empty* entries, which means that we can simplify the "JavaScript support is not enabled"-warning in the viewer.
Furthermore, improve the auto-printing hack used when scripting is disabled.
Given that the FieldObjects are parsed in parallel, in combination with the existing caching in the `getPage`-method and `annotations`-getter, adding additional caches for this fallback code-path doesn't seem entirely necessary.
We're adding the action in the undo/redo stack whatever the status of the
operation was. This patch aims to add the action only when the image has been
successfully added.
When several editors are selected and the window loses and then gets back its focus,
the previously focused editor is triggering its focus callback making it the only
selected one.
This patch aims to avoid triggering the focus callback called when the main window
gets its focus back.
When moving an element in the DOM, the focus is potentially lost, so we need to make sure
that the focused element before the translation will get back its focus after it.
But we must take care to not execute any focus/blur callbacks because the user didn't
do anything which should trigger such events: it's a detail of implementation. For example,
when several editors are selected and moved, then at the end the same must be selected, so
no element receive a focus event which will set it as selected.
There are 2 rotation we've to deal with: the viewer one and the editor one.
The previous implementation was a bit complex and having to deal with these
rotation would have potentially increase it.
So this patch aims to simplify the implementation and deal with all the possible
cases.
The main idea is to transform the mouse deltas according to the rotations and then
apply the resizing in the page coordinates system.
When resizing an editor we're currently using unidirectional cursors, please refer to https://developer.mozilla.org/en-US/docs/Web/CSS/cursor
Given that editors can (generally) be resized to become either smaller or larger, it seems overall more appropriate to use bidirectional cursors to make this clearer to the user.
Note that as mentioned in the MDN article some environments, which seems to apply to e.g. Windows 11, doesn't differentiate between the two cursor formats and simply use bidirectional ones unconditionally.
One additional benefit of these changes is that the relevant CSS rules become slightly more compact.
We obviously don't want to re-introduce any `require` usage in e.g. the viewer, since we should strive to only use native `import` statements wherever possible.[1]
Hopefully exposing e.g. the library globally in more cases won't break anything, however it's somewhat difficult for me to imagine all the ways in which third-party users may be accessing the PDF.js library. (Given the lack of a runnable test-case in the issue, I also cannot guarantee that this is enough to fully address the problem.)
---
[1] Ideally we should probably not rely on e.g. `pdfjsLib` being globally available in the *built* viewer, and rather always `import` the library instead.
Unfortunately this would require larger (possibly breaking) changes in the builds that we provide, however note that Firefox only recently got support for `import` in workers and that Webpack still only have *experimental* support for outputting "proper" modules.
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.
Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
Without this patch the password dialog is pretty difficult to use in the GeckoView-viewer, because of a number of missing CSS variables.
*Please note:* This patch makes no effort at actually styling the dialog to better suite the overall look of the GeckoView-viewer, but focuses solely on making it actually usable (since password protected PDF documents are somewhat rare).
If the current PDF document is closed while the password dialog is open, e.g. manually by calling `PDFViewerApplication.close()` from the console, the password dialog wouldn't be closed as intended.
*Please note:* This could only affect the GENERIC viewer, although it's very unlikely to ever happen, since that's the only one that supports opening more than one PDF document.
*Please note:* This situation should never happen in practice, but it nonetheless cannot hurt to fix this.
If the `PasswordPrompt.open` method would ever be called synchronously back-to-back *and* if opening of the dialog fails the first time, then the second invocation would remain pending indefinitely since we just clear out the capability.
Given that the `useOnlyCssZoom` option is essentially just a special-case of the `maxCanvasPixels` functionality, we can combine the two options in order to simplify the overall implementation.
Note that the `useOnlyCssZoom` functionality was only ever used, by default, in the PDF Viewer for the B2G/FirefoxOS project (which was abandoned years ago).
When searching for "endobj"-operators, make sure that we don't accidentally match a "trailer"-string in /Content-streams without /Filter-entries (i.e. streams that contain "raw" and thus human-readable data).
Currently we accidentally accept `cMapUrl` and `standardFontDataUrl` parameters that are empty strings or `null`, since e.g. `new URL(null, document.baseURI)` doesn't throw, when validating the `useWorkerFetch` parameter via the `isValidFetchUrl` helper function.
Please note that we are currently failing gracefully in this case, as intended, however the warning-messages printed in the console are perhaps less helpful without this patch.
When an editor is selected in using the keyboard then it has the focus.
But then if the editor is unselected with Escape key then the focus must
be removed otherwise we still have a blue outline around it.
And add few missing timeout in the integration tests.
This is quite old code, however the error-handling no longer seems necessary for a couple of reasons:
- The `PDFViewerApplication.open` method is asynchronous, which means that it cannot throw a "raw" `Error` and the try-catch is not needed in that case.
- None of the other affected methods should throw, and if they do that'd rather indicate an *implementation* error in the code.
- Finally, and most importantly, with the `PDFViewerApplication.run` method now being asynchronous an (unlikely) `Error` thrown within it will lead to a rejected `Promise` and not affect execution of other code.
We can use modern JavaScript features, in this case optional chaining, to (ever so slightly) simplify how `ViewHistory` errors are handled.
Also, use arrow functions when handling a few other (very rare) errors during loading since that's a tiny bit shorter.
The way that the callback-methods are specified feels unnecessarily verbose, however we can introduce a short-hand to improve this.
Also, adds a couple of new-lines to improve overall readability.
Selected editors can be moved in using the arrows:
- up/down/left/right will move the editors of 1 in page unit;
- ctrl (or meta)+up/down/left/right will move them of 10 in page unit.
The keyboard shortcuts (copy, paste, ...) didn't work correctly when the
main container was not focused.
This patch adds few waitForTimeout in the integration test for FreeText
in order to avoid possible intermittent failures.
Given that the `debugger` is loaded as a module we can use "top level await" in development mode to access the necessary API-functionality, which removes the need to manually pass in the required properties.
- it'll improve the way to resize images: diagonally (in keeping ratio between dimensions)
or horizontally/vertically.
- the resizer was almost invisible in HCM.
- make a resize undoable/redoable.
In order to reproduce the original issue:
- switch to freetext mode
- add a text somewhere
- double click outside and add some text
- repeat the previous step several times
no text is selected during the edition.
The existing Node.js-specific polyfills depend on the `node-canvas` package, which has unfortunately (repeatedly) shown to cause trouble for many users. We attempted to improve the situation by listing the relevant packages as `optionalDependencies`, but that didn't seem to really fix the problem.
With this patch the library should be able to load in Node.js-environments even if polyfilling fails, and any errors will instead occur during rendering. Obviously this is *not* a proper solution, since it basically moves the problem to another part of the code-base.
However for certain "simpler" use-cases, such as e.g. text-extraction, these changes should hopefully improve general usability of the PDF.js library in Node.js-environments.
*Please note:* For most PDF documents rendering should still work though, since `DOMMatrix` is *currently* only used with Patterns and `Path2D` only with Type3-fonts and Patterns.
In Gulp 4, which we use for years now, the `gulp.src()` function
supports the `removeBOM` option to disable the default BOM stripping,
so this commit uses that to get rid of our `vinyl-fs` dependency.
Note that this actually makes disabling BOM stripping work again. It's
currently broken because in `vinyl-fs` 3, that we already use since 2018
in commit 95de23e, the `stripBOM` option was renamed to `removeBOM`, so
the current code doesn't actually disable BOM stripping which we now
confirmed and sadly broke for years without anyone noticing. Most likely
this is because the BOM is not required for UTF-8 documents, but while
not necessary it also can't hurt to have it for tools that use it to
determine if a document is UTF-8.
*Please note:* This only removes the preference itself, however both the viewer-option and the actual implementation is still available.
The `useOnlyCssZoom` functionality was only ever used, by default, in the PDF Viewer for the B2G/FirefoxOS project (which was abandoned years ago). Given that CSS-only zooming can easily make the document look blurry even at low zoom levels, this functionality was only intended for low-powered mobile devices.
Hence it seems reasonable to remove the `useOnlyCssZoom` preference now, since neither the default viewer nor the GeckoView-specific viewer uses this functionality.
Trying to update Stylelint to version `15.10.1`, and beyond, broke linting. Looking at the changes the issue appears to be that the `bin/stylelint.js` file was replaced with `bin/stylelint.mjs` instead, which our `gulp lint` runner wasn't able to automatically find; see https://github.com/stylelint/stylelint/compare/15.10.0...15.10.1
When the flag is set, the appearance has to be generated from the value so it's
useless/meaningless to extract the content from the existing appearance.
When a pdf has /NeedAppearances set to true, the annotation appearance must be
generated from its value and we must take into account the hasOwnCanvas property.
*Please note:* I'm not aware of any bugs caused by this, however that might be more luck than anything else.
In PR 16392 the `incrementalUpdate` function, and all of its various helpers, were made asynchronous. However the call-site in `src/core/worker.js` wasn't updated, which means that we currently reset temporary XRef-entries while saving is ongoing.
By leveraging import maps we can get rid of *most* of the remaining `require`-calls in the `src/display/`-folder, since we should strive to use modern `import`-statements wherever possible.
The only remaining cases are Node.js-specific dependencies, since those seem very difficult to convert unless we start producing a bundle *specifically* for Node.js environments.
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
In the last couple of years we've been quicker to remove support for older browsers/environments, which means that at this point in time we don't bundle that many polyfills. (The polyfills are also generally simpler nowadays, ever since we removed support for e.g. Internet Explorer.)
Rather than having to *manually* handle the polyfills, we can actually let Babel take care of bundling the necessary polyfills for us; please refer to https://babeljs.io/docs/babel-preset-env
The only exception here is the Node.js-specific compatibility-code, which is moved into the `src/display/node_utils.js` file. This ought to be fine since workers are not available/used in Node.js-environments.
*Please note:* For the `legacy`-builds this will increase the size of the *built* files, however that seems like a very small price to pay in order to simplify maintenance of the general PDF.js library.
Localization of this button broke in PR 16340, which I assume was completely accidental, since the download-button now tries to access a l10n-id that was removed some time ago (see PR 15617).
Note how loading even the development viewer, i.e. http://localhost:8888/web/viewer-geckoview.html#locale=en-US, currently logs l10n-warnings on the `master` branch.
Having a `require` in this file has never made sense in e.g. the Firefox PDF Viewer and shouldn't really be necessary.
Possibly the idea was to facilitate some kind of third-party bundling, however the *built* `pdf.js` file has always exposed the API-contents globally.
Currently this class contains a few "special" code-paths for the COMPONENTS build-target, which normally wouldn't be a problem. However, in this particular case that means accessing code that we don't want to include unconditionally in all builds.
This is currently implemented using build-time `require`-calls which we nowadays want to avoid, and we should strive to remove all such cases from the code-base. (Generally speaking `import` is the future, and build-tools may not always play well with a mix of both formats.)
We can easily improve things here by using sub-classing for the COMPONENTS build-target, and then use the ability to re-name when exporting (to avoid breaking existing code).
Occasionally some test-suites may fail to start on the bots, however that's not correctly reflected in the botio-output posted to GitHub which makes it easy to accidentally overlook this situation.
Looking at the raw logs when that happens they always seem to contain a line such as `Run NaN tests` which means that we should be able to easily make this situation a *failure* as intended.
In order to reproduce the issue:
- scale down the image
- zoom the page and the image is pixellated
So this patch allow to redraw the image when zooming.
- Take into account the page translation,
- Take into account the correct translation for the editor border,
- Take into account the position of the first glyph in the annotation,
- Take into account the rotation of the editor.
Close#16633.
There's no good reason for getting this option multiple times in the same method. Also, we can slightly re-factor how the `editorStampButton` is made visible.
- Do the /Filter and /DecodeParms lookup in parallel, since that ought to be a *tiny* bit more efficient.
- Avoid code-duplication when `CompressionStream` isn't supported, since we already have a fallback code-path at the end of the function.
This regressed in PR 16659, when the signature of the `PDFViewer.annotationEditorMode`-setter was changed, and it currently leads to an Error being thrown when exiting PresentationMode.
Unfortunately I wasn't able to come up with a *simple* way to just replace the synchronous `require`-call, since we need to ensure that the default preferences are available when bundling starts.
Hence this patch adds a new intermediate parsing-step in all the relevant gulp-tasks, but this shouldn't affect build-times noticeable since the amount of extra parsing is very small.
*Please note:* It's very possible that there's a better way to handle this, however I figured that unblocking further ESM-work is more important than a "perfect" solution.
createImageBitmap doesn't work with svg files (see bug 1841972), so we need to workaround
this in using an Image.
When printing/saving we must rasterize the image, hence we get the biggest bitmap as image
reference to avoid duplications or poor quality on rendering.
The existing code is unable to *correctly* extract the color from the appearance-stream when the ColorSpace-data is "complex". To reproduce this:
- Open `freetexts.pdf` in the viewer.
- Note the purple color of the "Hello World from Preview" annotation.
- Enable any of the Editors.
- Note how the relevant annotation is now black.
Note how we're accidentally using the wrong operator when trying to parse CMYK colors. I'm not aware of any bugs caused by this, since it seems uncommon in practice for annotations to specify text-colors in CMYK format.
When there was a rotation, the generated bbox was wrong because of an inversion
between width and height.
This patch aims to fix this issue in re-writing the FreeText code generation
to have something similar to what Acrobat does.
And fix the name of the font which wasn't the correct one when calling the
evaluator.
- Update the "Getting the Code" section to specifically mention Mozilla Firefox, since while the development viewer *works* it may look slightly "broken" in Chromium-based browsers. (This is caused by a lack of support for unprefixed CSS properties, e.g. `mask-image`, however this does *not* affect the built PDF.js viewer.)
- Remove the Twitter-link, since that account has not been updated since 2016 (i.e. over seven years ago).
Semantically, it is more correct to encode the fragment in the URL
instead of the URL-encoded `file` query parameter. This shouldn't matter
in practice, because `rewriteUrlClosure` in `chromecom.js` decodes the
`file` parameter and restores the fragment. However, as #16625 shows,
there was a case where this did not work as expected.
`PDFViewerApplication` reads from `location.hash` to initialize
`initialBookmark`. But when extensions/chromium/pdfHandler.js prepares
the redirect URL, the reference fragment is encoded instead of bare.
`rewriteUrlClosure` in `chromecom.js` is responsible for decoding the
URL, but that currently runs too late.
To fix this, update `initialBookmark` after rewriting the URL.
This was not a problem in the past because `rewriteUrlClosure` in
`chromecom.js` executed before the initialization of `initialBookmark`.
Given that the PDF.js library has never officially supported/documented that binary data can be provided as a `Buffer`, and that it's been explicitly deprecated in *four* releases, it seems reasonable that we outright reject such data instead (to reduce the amount of Node.js specific code-paths).
We've now been throwing an Error in *three* releases if the `canvasFactory` option is provided, hence it ought to be fine to stop doing that and simply ignore the option instead.
Rather than having to *manually* determine the potential `transfers` at various spots in the API, we can let the `AnnotationStorage.serializable` getter include this.
To further simplify things, we can also let the `serializable` getter compute and include the `hash`-string as well.
These options are completely unused in the PDF.js viewer, and given that the last update of the `GrabToPan`-code from upstream was in 2016 it shouldn't hurt to remove them.
Before this commit, lint-chromium complained without an obvious course
of action:
> Warning: Pref objects doesn't have the same length.
> Error: chromium/preferences_schema is not in sync
With this commit, the error message is more actionable:
> Warning: extensions/chromium/preferences_schema.json does not contain an entry for pref: enableFloatingToolbar
> Error: chromium/preferences_schema is not in sync
This is something that I completely overlooked during review of PR 16593, since the idea is (obviously) that the viewer-components should be usable as-is without the user needing to manually pass in any *additional* parameters.
To support this we can very easily expose the current `FilterFactory`-instance on the `PDFPageProxy`-class[1], and if needed initialize the highlight-filters when initializing the page (again limited to the viewer-components).
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
- Modify the text and background colors in popup to fit a11y requirements
- Add a backdrop filter on clickable areas in using a svg filter mapping
canvas colors to Highlight and HighlightText ones.
It occurred to me that we can actually run this unit-test in Node.js environments by making use of the preprocessor to stub out the browser globals there.
Given that nullish coalescing is now available in all environments/browser that we support, we can (ever so slightly) simplify handling of the `TESTING` build-target.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js-viewer API exposes the intended functionality, which means that things can easily break accidentally.
*Please note:* This unit-test cannot (easily) be run in Node.js-environments, since the `external/webL10n/l10n.js` file contains various browser-specific functionality.
These constants were added "speculatively" in PR 10820, almost four years ago, but have never actually been used. We already have issue 10982 that tracks *potentially* extending support for the affected annotation-format, however until that happens I really don't think that we should keep shipping completely unused code in the PDF.js library.
For the MOZCENTRAL build-target, i.e. the Firefox PDF Viewer, this reduces the total bundle size by 1.1 kilo-byte.
Until now we've not actually had *any* tests that ensure that the *official* PDF.js API exposes the intended functionality, which means that things can easily break accidentally.
- Change (most) fields/methods into private ones, since that's now supported.
- Tweak the constructor-parameters, and simplify the sandbox initialization w.r.t. the viewer components.
- Remove some unused function/method parameters.
- Slightly simplify the "updatefromsandbox"-handler by using local variables and inverting some conditions.
Rather than sprinkling pre-processor statements throughout the viewer-code, simply "disable" the relevant `PDFViewer` setters instead.
Also, given that the GeckoView-specific viewer doesn't have a sidebar we don't actually need to explicitly ignore a `pageMode` during loading.
This helper function was added almost two years ago, in PR 13696, and it still has only a single call-site. Furthermore, with the changes made in PR 16572 it also cannot hurt to reduce the size of the `web/l10n_utils.js` file slightly.
Note how the [`ChromeActions.getPreferences` method](https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/components/pdfjs/content/PdfStreamConverter.sys.mjs#497-530) returns the preferences as a string, which we then have to convert back into an Object in the viewer.
Back when that code was originally written it wasn't possible to send Objects from the platform-code, however that's no longer the case and we should be able to (eventually) remove this unnecessary string-parsing now.
*Please note that in order to prevent breakage we'll need to land these changes in stages:*
- Land this patch in mozilla-central, as part of regular the PDF.js updates.
- Change the return type in the `ChromeActions.getPreferences` method, in a mozilla-central patch.
- Remove the string-handling from the `FirefoxPreferences._readFromStorage` method.
Please note that we've never had any functionality in the viewer itself that *set* preferences, and we've thus only ever read them.
For the GENERIC viewer it obviously makes sense for the user to be able to modify preferences, e.g. via the console, but that doesn't really apply to the *built-in* Firefox PDF Viewer since preferences are already accessible via `about:config` there. Hence it does seems somewhat strange to expose, a limited part of, the Firefox preference system in this way when we're not even using it.
Note that the unused preference setting-code also include a fair amount of *additional* validation on the platform-side, such as limiting any possible preference changes to the `pdfjs.`-branch and also an explicit white-list of preference names[1], to make sure that this is safe; please see:
- https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/components/pdfjs/content/PdfStreamConverter.sys.mjs#458-495
- https://searchfox.org/mozilla-central/rev/4e8f62a231e71dc53eb50b6d74afca21d6b254e9/toolkit/modules/AsyncPrefs.sys.mjs#21-48
Assuming that this patch lands, I'll follow-up with a mozilla-central patch to remove the code mentioned above.
---
[1] This hard-coded list contains preferences that no longer exist, and also at least one (fairly obvious) typo.
This method was added only for consistency with the `register`-method, however it's never actually been used. To avoid including dead code in the builds, let's just remove the `unregister`-method for now.
*Please note:* If this method ever becomes useful, it'll be trivial to revert this commit.
With the changes in PR 16552 we can now move general translation into the `AnnotationLayer` itself, which should improve things ever so slightly in third-party implementations where the default viewer isn't used.
*This is something that I completely overlooked during review of PR 16552, despite leaving a l10n-related comment.*
The new l10n-handling of PopupAnnotations assume that the `AnnotationLayer` is always initialized with a l10n-instance, which might not actually be the case in third-party implementations where the default viewer isn't used.
To work-around that we'll now bundle, and fallback on, the existing `NullL10n`-implementation in GENERIC builds of the PDF.js library. This will only result in a slight file-size increase for the *built* `pdf.js` file, again limited to GENERIC builds, since the `web/l10n_utils.js` file has no dependencies.
Also, tweaks a couple of TESTING pre-processor checks to *only* include that code when running the reference tests.
- it'll help to be able to move popups on screen to let the user read the text
- popups won't inherit some properties from their parent:
- the popup can be misrendered if for example the parent has a clip-path property.
- add an outline to the popup when the parent is focused.
- hide a popup when it's clicked.
Fix handling of /Filter-entries, since the current implementation could potentially corrupt the data if there's multiple filters present.
Please note that filters are applied *sequentially* during decoding, starting from the first one in the Array, hence the first Array-entry needs to be /FlateDecode in order for things to actually work correctly.
To prevent a future bug, if we want to save more "complex" data such as images, also ensure that we include any existing /DecodeParms-entries when updating the /Filter-entry.
The existing unit-test doesn't work as intended, since the page never actually renders. Note how `cleanup` is *not* allowed to run when parsing and/or rendering is ongoing, however an (old) incorrect condition could prevent rendering from ever starting.
This is very old code, which has been slightly re-factored a couple of times (many years ago), however this doesn't appear to affect e.g. the default viewer since the incorrect behaviour seem highly dependent on "unlucky" timing.
Note also how at the start of the `PDFPageProxy.prototype.render`-method we purposely cancel any pending `cleanup`-call, to prevent unnecessary re-parsing for multiple sequential `render`-calls.
Finally, avoid running `cleanup` when document/page destruction has already started since it's pointless in that case.
After PR 16226 the deprecated SVG back-end is now unused in development mode, with the exception of unit-tests, hence we can re-factor how it's exposed in the API to avoid including a useless webpack-closure in e.g. the *built-in* Firefox PDF Viewer.
Given that this API method isn't used anywhere within the PDF.js library itself, except for the unit-tests, we can avoid including what's effectively dead code in e.g. the *built-in* Firefox PDF Viewer.
- Don't attempt to lookup an "SM" entry, since we're only using "SMask" in the `PDFImage` code and I also cannot find any mention in the PDF specification about that being a valid abbreviation for a Soft Mask entry. (There's only a `SM = Smoothness Tolerance` Graphics State parameter, which is obviously something completely different.)
- Don't lookup the /SMask and /Mask entries unless it's actually an inline image, since it's pointless otherwise.
- Last, but most importantly, only check for the *existence* of /SMask and /Mask entries but don't actually fetch the data. Note that if either one exists it'll contain a Stream, and those cannot be cached on the `XRef`-instance, which leads to unnecessary parsing/allocations and in this case we're not using the actual data for anything.
The original `trimCache` functionality was intended to be exposed on the
top-level `puppeteer` module, but due to a bug in Puppeteer this didn't
work correctly and we had to call `trimCache` on the default Puppeteer
node instance instead, which was fortunately exposed. However, since
this didn't feel like intended API usage, this bug was reported and is
now fixed in Puppeteer 20.5.0, so this commits updates Puppeteer to that
version so we can use the intended API.
The full history of this issue can be found at
https://github.com/puppeteer/puppeteer/issues/10174.
This patch is the result of me going through some old issues regarding non-embedded Wingdings support.
There's a few different things wrong in the referenced PDF document:
- The /BaseFont and /FontName entries don't agree on the name of the fonts, with one font using `/BaseFont /Wingdings-Regular` and `/FontName /wg09np` which obviously makes no sense.
To address this we'll compare the font-names against our lists of known ones and ignore /FontName entries that don't make sense iff the /BaseFont entry is a known font-name.
- The non-embedded Wingdings font also set an incorrect /Encoding, in this case /MacRomanEncoding, which should have been fixed by PR 16465. However this doesn't work since the font has *bogus* font-flags, that fail to categorize the font as Symbolic.
To address this we'll also compare the font-name against the list of known symbol fonts.
As far as I can tell there's no particular reason for initializing `KeyboardManager`-instances eagerly, since the user may never use editing, and we can easily do this lazily instead by utilizing shadowed getters.
While it's slightly difficult to trigger in practice, unless the `defaultZoomDelay`-value is increased, it's currently possible to generate thumbnails from *partially* rendered pages when doing *temporary* CSS-only zooming.
We shouldn't dispatch a "pagerendered"-event when doing *temporary* CSS-only zooming, but simply wait until the actual rendering is done.
While I don't believe that this regression has caused any actual bugs, dispatching *duplicate* events is nonetheless inconsistent and should be fixed.
This patch updates the minimum supported browsers as follows:
- Google Chrome 92, which was released on 2021-07-20; see https://support.google.com/chrome/a/answer/10314655
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Given that this functionality is only relevant in third-party use-cases, for example the viewer-components, we can avoid needlessly including it in e.g. the MOZCENTRAL build.
This commit makes the following required changes:
- Replace custom cache trimming logic in favor of the (per our request)
newly added `trimCache` method in Puppeteer. Not only does this greatly
simplify our code and prevents having to import Puppeteer internals,
it's also necessary because Puppeteer 20 removed the `BrowserFetcher`
API in favor of the new separate `@puppeteer/browsers` package.
- Start browsers in series instead of in parallel. Parallel browser
starts broke since Puppetter 19.1.0 and it turns out that it has never
been supported officially, so it worked more-or-less by accident.
Starting browsers in series is the supported way, is almost equally
fast and ensures that we avoid any race conditions during startup.
Finally, it also allows us to remove the `browserPromise` state on our
session objects.
Fixes#15865.
This patch does two things:
- Moves the updating of thumbnails into `web/app.js`, via a new `PDFSidebar` callback-function, to avoid having to include otherwise unnecessary parameters when initializing a `PDFSidebar`-instance.
- Only attempt to generate thumbnail-images from pages that are *cached* in the viewer. Note that only pages that exist in the `PDFPageViewBuffer`-instance can be rendered, hence it's not actually meaningful to check every single page when updating the thumbnails.
For large documents, with thousands of pages, this should be a tiny bit more efficient when e.g. opening the sidebar since we no longer need to check pages that we know have not been rendered.
The way that the cleanup was implemented in PR 12613 has always bothered me slightly, since the `isPageCached`-method that I introduced there always felt quite out-of-place in the `IPDFLinkService`-implementations.
By introducing a new "thumbnailrendered" event, similar to the existing "pagerendered" one, we're able to move the cleanup handling into the `PDFViewer`-class instead.
The way that this was implemented in PR 10217 has always bothered me slightly, since the `isPageVisible`-method that I introduced there always felt quite out-of-place in the `IPDFLinkService`-implementations.
Hence this is instead replaced by a callback-function in `PDFFindController`, to handle the page-visibility checks. Note that since the `PDFViewer`-constructor always sets this callback-function, e.g. the viewer-component examples still work as-is.
Now that font-substitution has been implemented, we should be able to do much a better job at supporting non-embedded Wingdings fonts.
Given that this is a Windows-specific font, see https://en.wikipedia.org/wiki/Wingdings, this is however not guaranteed to work (well) on other platforms.
The affected font is non-embedded ZapfDingbats, however the PDF document for some inexplicable reason specifies the encoding as "WinAnsiEncoding" (which is obviously wrong).
To work-around this bug in the PDF generator, we'll simply ignore any explicitly specified named encoding for non-embedded symbol fonts.
- Remove the dependency on fit-curve;
- Improve the way to draw the current line in using a Path2D and
in clearing only the last part of the curve instead of clearing
all the canvas;
- Smooth the curve when drawing to avoid to have some changes after
the drawing ends;
- Make the smoothing a bit less agressive.
Given that inline images may contain "EI"-sequences in the image-data itself, actually finding the end-of-image operator isn't always straightforward.
Here we extend the implementation from PR 12028 to potentially check all of the following bytes, rather than stopping immediately. While we have fairly decent test-coverage for this code, whenever you're changing it there's unfortunately a slightly higher than normal risk of regressions. (You'd really wish that PDF generators just stop using inline images.)
- if the contours count is lower than -1, the glyph is really likely wrong
so just remove it from the font;
- if a contour has the repeat flag then repeats count mustn't be 0.
Looking at the behaviour in Adobe Reader it doesn't appear that attachments are sorted alphabetically, hence it doesn't seem necessary for us to do so either in the viewer.
An additional benefit of *not* sorting the attachments is that any "actual" attachments are now always placed at the top of the list in the sidebar, and if any `FileAttachment`-annotations exist in the document they will now be appended at the end.
The pdf linked in bug 1135277 contains a lot of stroke instructions.
In using the Firefox profiler, this patch helps to reduce the overall
spent time in this function by 30%.
According to https://en.wikipedia.org/wiki/Impact_(typeface) this font should be available on all current versions of Windows, and with the recently added font-substitution we should actually be able to render it correctly (at least on Windows).
The `fontID` handling is quite old and predates the use of the `idFactory` to generate a unique id for each font, hence we can simplify this code a little bit.
When fixing bug 1766987, I thought the field formatted value came from
the result of the format callback: I was wrong. The format callback is ran
but the value is unused (maybe it's useful to set some global vars... or
it's just a bug in Acrobat). Anyway the value to display is the one rendered
in the AP stream.
The field value setter has been simplified and that fixes issue #16409.
This essentially extends PR 11218 to also apply when looking up the final font-reference, via the XRef-table, fails because the font isn't available.
This patch also changes `PartialEvaluator.fallbackFontDict` to simply use "Helvetica" as the default font-name, since that seems generally reasonable given the now existing font-substitution code.
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)
For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
Given that the `css` property isn't constant, since it contains document/font ids, we cannot just check it directly. However, we can make use of regular expressions to ensure that the format is generally correct.
Despite this being a *major* version increase, it doesn't appear to require any updates in our test-suites.
Note in particular that the minimum supported browsers/environments were updated, however this isn't a problem given our recent support-changes in the PDF.js library.
Please find additional details at https://github.com/jasmine/jasmine/blob/main/release_notes/5.0.0.md
Originally the `PDFSidebarResizer` class was slightly larger, since the code used to contain e.g. feature testing for older (and no longer supported) browsers.
Given that there's some amount of overlap, when it comes to what DOM-elements and state that these classes need, it now seems reasonable to simply move the sidebar-resizing into the `PDFSidebar` class.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.1` kilobytes.
Similar to other toolbar/secondaryToolbar buttons that open toolbars or dialogs, it seems reasonable to use "aria-controls" for the editor-toolbar buttons as well.
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
This patch updates the minimum supported browsers as follows:
- Safari 15.4, which was released on 2022-03-15; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_15
Nowadays we usually we try, where feasible and possible, to support browsers that are about two years old. The reasons for limiting support to a *somewhat* more recent Safari version include:
- Throughout the history of the PDF.js project, Safari has always been the worst browser to attempt to support. Compared to other browsers there's a disproportionate number of bugs affecting Safari, especially on iOS, and in most cases those are browser-specific issues that we simply cannot address.[1]
- Safari has often been a lot slower, compared to other browsers, at implementing new web-platform features. Historically this has sometimes blocked usage of new features, for the benefit of the Firefox PDF Viewer, and it's very often meant having to include and maintain polyfills *only* for Safari.
- The current (minimum) supported Safari version lack enough functionality that polyfills placed in the `src/shared/compatibility.js` file are unfortunately not sufficient, but it also requires a bunch of special-cases in both the `gulpfile` and in the `web/`-code.
- Given that the *built-in* Firefox PDF Viewer is the primary development target for the PDF.js library, and the general development pace these days, we need to limit the maintenance "overhead" caused by other browsers.
---
[1] In a few cases a work-around might be possible, however it'd negatively affect e.g. performance, readability, and/or maintainability of the code.
Originally we only used the `structuredClone` polyfill in the `LoopbackPort`-implementation, and that obviously isn't used anywhere within the various image decoders.
At this point in time we've started to use `structuredClone` a little bit more, hence it seems overall simpler to just bundle the polyfill even in the `legacy`-version of the IMAGE_DECODERS built-target.
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).
Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.
*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
This patch updates the minimum supported environments as follows:
- Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
The /Decode-implementation in the our JPEG decoder, i.e. `src/core/jpg.js`, seems to only handle *inverting* of images properly. To support arbitrary /Decode-entries correctly we'll always use the `PDFImage.decodeBuffer` method, even for "simple" JPEG images, which should be fine since non-default /Decode-entries aren't a very common occurrence.
*Please note:* This patch will lead to a little bit of movement in some existing test-cases, however it should be virtually imperceivable to the naked eye.
The fallback code-path has never really been used, since the `PDFSidebar` is only used in the default viewer (and has never been exposed in e.g. the COMPONENTS-build).
This patch tries to simplify, and improve, the thumbnail styling:
- For rendered thumbnails there's one less DOM-element per thumbnail, which can't hurt in longer documents.
- Use CSS-variables to set the dimensions of all relevant DOM-elements at once.
- Simplify the visual styling of the thumbnails, e.g. remove the border since the viewer no longer has visible borders around pages, since the relevant CSS-rules are quite old code.
These changes also, at least in my opinion, makes the relevant CSS-rules much easier to understand and work with.
- Make it easier to work on e.g. [bug 1690428](https://bugzilla.mozilla.org/show_bug.cgi?id=1690428) without affecting the other sidebarViews.
This property was added in PR 12726 specifically for use in the `getFontType` function, indirectly used by the `PDFDocumentProxy.stats` getter in the API.
In PR 15880 that functionality was removed, but I forgot to remove this now unused font-property.
Now that we no longer depend on the old Babel version in SystemJS we can remove the `static get ...` work-arounds used to define constants, which leads to slightly more compact code.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.
When the `GlobalImageCache` implementation originally landed, back in PR 11912, the image handling was slightly more complex (with e.g. browser-decoding of some JPEG images). At this point it no longer seems necessary to manually handle pageIndexes in this way, and we should be able to simply inline that in the `GlobalImageCache.shouldCache` method.
This commit migrates this functionality away from the bots. Nowadays
it's possible to build and deploy the website to GitHub Pages directly
through the GitHub Actions, which provides a nice simplication of the
process. Not only does this remove the requirement to have a `gh-pages`
branch in the repository, it also avoids the complexity of having to
configure the workflow to commit to Git branches and allows us to remove
the Git committing code from the Gulpfile.
Note that deploying directly though GitHub Actions workflows needs to be
enabled in the repository settings, but this is easy and well documented
on the link below.
The following resources are relevant for this patch:
- Enabling deployment to GitHub Pages directly through GitHub Actions:
https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#publishing-with-a-custom-github-actions-workflow
- Uploading GitHub Pages artifacts example:
https://github.com/actions/upload-pages-artifact#usage
- Deploying GitHub Pages artifacts example:
https://github.com/actions/deploy-pages#usage
This patch tries to mimic the look of the message-element in the Firefox browser-findbar, and thus makes the following changes:
- Remove the red colour, since it didn't take the light/dark themes into account.
- Display the "notFound" message in bold.
The `pdfjs-dist/lib/` directory contains a README file that explicitly advises against using those files, however based on a fairly large number of issues filed over the years users seem to be (mostly) overlooking that warning.
In particular it unfortunately seems to be somewhat common for users to attempt to "combine" proper builds from `pdfjs-dist/build/` together with individual components from the `pdfjs-dist/lib/web/` directory, which more often than not leads to subtle bugs and general problems.
When we receive bug reports about this it's often not immediately obvious what the problem is, given that many issues lack enough details (such as runnable test-cases), but after some back-and-forth it usually turns out that usage of `pdfjs-dist/lib/` is the culprit.
Considering that keeping the general PDF.js library working is challenging and time-consuming enough nowadays, this patch thus proposes that we stop including the "lib"-build in the `pdfjs-dist` repository to both reduce user confusion and the support burden.
Currently we only prevent triggering the actual text-extraction multiple times in "parallel", when using the "copy all text" feature, however the "copy"-event itself is not prevented.
The result is that if the user selects all text in a long PDF document and then uses the copy-shortcut multiple times in quick succession, we'll actually populate the clipboard with "incomplete" contents (via a `TextLayerBuilder` copy-listener) until all text-extraction finishes.
In PR #16295 one occurrence of this was changed, but a few more remained
in the codebase. This commit fixes the other occurrences so that we
don't use the deprecated way of creating custom events anywhere anymore.
According to MDN, see https://developer.mozilla.org/en-US/docs/Web/API/CustomEvent/initCustomEvent,
using the `CustomEvent.initCustomEvent` method is deprecated and the
`CustomEvent` constructor should be used instead.
Extends d9bf571f5c49e1cac9054cf6b7acfc0b5b719876.
In PR #16327 the `eslint-plugin-mozilla` package was updated so we no
longer have to force-install packages, and the force-install flags for
`npm install` were removed. However, the CI job was missing from this
commit, which we fix here. In general force-installing packages
shouldn't be necessary unless there are problems with dependencies,
which we would like to know about, so especially in the CI job it seems
like a good idea to not force-install packages to catch upcoming defects
early on.
Extends 19526d2322fabd4425688bb7c5504fa9ea015c5c.
This method was added in PR 4938, almost nine years ago, however it doesn't appear to ever have been used.
Given the similarities between the `PDF17` and `PDF20` classes, and how they're used, if the `PDF20.hash` method was actually necessary you'd also expect a similiar method in the `PDF17` class.
The "binary" CMap-format is specific to the PDF.js library, and is used to reduce the size of the built-in CMap data-files.
By moving this code to its own file we can remove the nowadays unnecessary closures, which helps to slightly reduce the size of this code.
The latest version of `eslint-plugin-mozilla` removed the Prettier dependency, see https://bugzilla.mozilla.org/show_bug.cgi?id=1677562, which means that we no longer need to use `npm install --force` in the PDF.js library.
When permissions are enabled and the PDF document doesn't have the COPY-flag set, it shouldn't be possible for the user to trigger the "copy all text" feature.
While this slightly reduces duplication in the CSS rules, some of the auto-formatting done by Prettier is perhaps not great. (Given the overall advantage of using Prettier, we'll probably have to simply accept this.)
Hopefully these changes make sense (since this functionality is new to me), however the existing `xfa`-tests should help avoid any outright regressions.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.
When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
After the previous patch we now have only *a single* `PRODUCTION` occurrence in the entire code-base, more specifically in the `web/viewer.html` file.
This special build-target can be replaced with any condition that always evaluate to `false`, such as e.g. a comment.
*Please note:* This patch might be considered too hacky, hence I completely understand if it's rejected.
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.
This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
To make this functionality work out-of-the-box in custom implementations, see e.g. the "viewer components" examples, it'd be slightly easier if we dynamically create/insert the "hiddenCopyElement" in the `PDFViewer` constructor.
Given that the "copy all text" feature still appears to work just as before with this patch, hopefully I'm not overlooking any reason why doing this would be a bad idea.
I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf
The problem turns out to be that on [page 83](https://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#page=83) the textLayer contains `\u0000` and apparently copying just stops when a null char is encountered.
To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.
*Please note:* This patch only extends the `PDFFindController` implementation itself to support this functionality, however it's *purposely* not exposed in the default viewer.
This replaces the previous `phraseSearch`-parameter, and a `query`-string will now always be interpreted as a phrase-search.
To enable searching for individual words, the `query`-parameter must instead consist of an Array of strings. This way it's now also possible to combine phrase/word searches, with a `query`-parameter looking something like `["Lorem ipsum", "foo", "bar"]` which will search for the phrase "Lorem ipsum" *and* the words "foo" respectively "bar".
This method was originally added in PR 1320, eleven years ago, however it doesn't appear to ever have been used (not even from the start).
Furthermore, this method also tries to access a property that doesn't exist (`this.out`) and then call a method that also doesn't exist (`writeByteArray`).
In looking at https://bugs.ghostscript.com/show_bug.cgi?id=706451 I noticed that bug2.pdf was pretty
slow to load for such a basic file.
In profiling I noticed that a lot of time is spent in Array.concat, hence this patch use Array.push when
it's possible (it's now ~3 times faster).
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.
*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
By getting the width/height of the first page initially, we can slightly reduce the amount of code needed both in the `hasEqualPageSizes`-check and when building the print-styles.
For the moment there is no real consensus on how we should download a pdf on Android.
Hence we keep this solution for the moment but behind a pref (which will be true on
nightly only).
Currently we repeat the same code in lots of places, to update the "toggled" class and "aria-checked" attribute, when various toolbar buttons are clicked.
For the MOZCENTRAL build-target this patch reduces the size of the *built* `web/viewer.js` file by just over `1.2` kilo-bytes.
Apparently the `structuredClone` polyfill doesn't handle transfers correctly, and `DOMException`s may thus be thrown. This is particularly problematical in Node.js environments, where that exception (obviously) isn't available.
To work-around these issues we'll simply ignore any transfers in `legacy`-builds, since those *may* use the `structuredClone` polyfill. This will obviously lead to slightly higher memory usage in those builds, however this really only affects Node.js environments. (Browsers are only affected if workers are disabled, however that's never been an officially recommended/supported configuration.)
Currently `float: inline-start/inline-end` is only supported in Firefox, see https://developer.mozilla.org/en-US/docs/Web/CSS/float#browser_compatibility, and in order to support other browsers we're thus forced to jump through some hoops.
This leads to slightly less nice code in the *built-in* Firefox PDF Viewer, and this patch attempts to improve the current situation:
- Use Stylelint to forbid direct use of `float: inline-start/inline-end` in the CSS files, to prevent future bugs in the general PDF.js viewer.
- Do a build-time replacement, only in MOZCENTRAL builds, to replace the CSS-variables with raw `float: inline-start/inline-end` instances.
Currently we have two separate image-caches on the worker-thread:
- A local one, which is unique to each `PartialEvaluator.getOperatorList` invocation. This one caches both names *and* references, since image-resources may be accessed in either way.
- A global one, which applies to the entire PDF documents and all its pages. This one only caches references, since nothing else would work.
This patch introduces a third image-cache, which essentially sits "between" the two existing ones. The new `RegionalImageCache`[1] will be usable throughout a `PartialEvaluator` instance, and consequently it *only* caches references, which thus allows us to keep track of repeated image-resources found in e.g. different /Form and /SMask objects.
---
[1] For lack of a better word, since naming things is hard...
This effectively implements some of the changes from https://phabricator.services.mozilla.com/D170496, but using our existing "direction aware" CSS-variable to limit the amount of code changes needed.
The patch changes the minimum supported version of Google Chrome as follows:
- Chrome 88, which was released on 2021-01-19; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
This is done to allow use of modern CSS features, such as e.g. `:is()` and `:where()` in the code-base.
When installing the PDF.js project itself it's currently necessary to use `--force` in order for all packages to install correctly, see issue 15429, hence the same is also necessary when using the `gulp dist-install` command for local development/testing.
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.
This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.
Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
With the changes in PR 16153 we're no longer setting a `<base href>` in the Firefox PDF Viewer, hence it shouldn't be necessary to keep setting a `baseUrl` in the `PDFLinkService`-class.
Given that the original document URL is now kept, the browser itself will handle relative URLs and we can thus slightly reduce the amount of string parsing required when handling various links in the viewer.
Currently if you e.g. enable the `useOnlyCssZoom` option rendering may no longer finish as intended. To reproduce:
- Enable the `useOnlyCssZoom` option.
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf (in the development viewer).
- When rendering starts, *immediately* change the zoom-level.
In this case the document will never finish rendering, since the `postponeDrawing`-functionality will (here incorrectly) abort rendering and with CSS-only zooming rendering is only expected to happen once per page.
To fix this we'll simply ignore any `drawingDelay` when CSS-only zooming is used (regardless if it's triggered via the option or the zoom-level being very large).
Currently the `zoomLayer` isn't rotated correctly in all cases. To reproduce:
- Load https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf
- Let the document render.
- Rotate the document *four* times, such that the original rotation is restored.
The easiest solution, as far as I can tell, is that we always set the `transform` just as we did (for years) prior to the changes in PR 15812.
Originally we used helper functions for checking if something was a Dictionary or Stream, and then having an initial `typeof` check probably made sense.
However, given that we're using `instanceof` nowadays the additional check longer seems necessary.
Currently we're *virtually* duplicating the same code, for validating quotation marks, twice in this helper function.
The size decrease is quite small (107 bytes) and this makes the code slightly harder to reader, hence I completely understand if this patch is rejected.
Given that this functionality only applies in the viewer, when `PDFBug` is being enabled and used, it can't hurt to slightly reduce the size of this code.
- Reduce a little bit of duplication by enforcing the max/min scale-values once, at the end, in the `increaseScale`/`decreaseScale` methods.
- Convert the "private" `PDFViewer` scale-related methods into actually private ones, now that JavaScript supports that.
Having just reviewed a patch touching this code, I couldn't help noticing that an `Object` isn't really the optimal data-structure for this and nowadays we can do better by using a `Set` instead.
Given that the viewer always set the `dir`-attribute, to either LTR or RTL, we should be able to use this logical CSS property to (very slightly) reduce the size of the CSS; please see https://developer.mozilla.org/en-US/docs/Web/CSS/inset-block
This is something that I completely overlooked in PR 16162, which in some cases cause the default viewer to incorrectly print warnings.
This can be reproduced with the PAGE scrolling-mode, and/or the PresentationMode, and this patch simply work-around it by checking the visibility as well (since the warning is a best-effort solution anyway).
The `pageColors`-option was removed from the `CanvasGraphics`-constructor in PR 16075, hence the code in the API no longer needs to pass in that option; this is something that I missed during review.
The signatures of these methods were changed in PR 15886, which has now been included in a couple of releases, hence it should hopefully be OK to remove the fallback code-paths now.
Also, the methods are updated slightly to be explicit about what options are supported and we'll no longer pass along any arbitrary options to the "private" methods.
Some of these pre-processor statements are *many* years old, and could thus do with some clean-up. Note that the pre-processor originally didn't support else-if statements, and by using those the code becomes a bit less verbose.
The idea is to apply an overall filter on each page: the main advantage
is to have some filtered images which could help to make them visible for
some users.
During review of PR 16151 this method was simplified, however I overlooked the fact that we now can (and really should) improve this by removing duplication.
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)
---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
With the previous commit this is now completely unused in API, hence it can be removed. This is done in a separate commit to make it easier to re-instate it, would the need ever arise.
This patch extends PR 16115 to work in all browsers, regardless of their `OffscreenCanvas` support, such that transfer functions will be applied to general rendering (and not just image data).
In order to do this we introduce the `BaseFilterFactory` that is then extended in browsers/Node.js environments, similar to all the other factories used in the API, such that we always have the necessary factory available in `src/display/canvas.js`.
These changes help simplify the existing `putBinaryImageData` function, and the new method can easily be stubbed-out in the Firefox PDF Viewer.
*Please note:* This patch removes the old *partial* transfer function support, which only applied to image data, from Node.js environments since the `node-canvas` package currently doesn't support filters. However, this should hopefully be fine given that:
- Transfer functions are not very commonly used in PDF documents.
- Browsers in general, and Firefox in particular, are the *primary* development target for the PDF.js library.
- The FAQ only lists Node.js as *mostly* supported, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
The tag <base> is used to resolve relative URIs within the document.
Newly added SVG filters use a relative URI which then use the URI in base
but this one mismatches with the document URI and consequently filters are
not found in the Firefox viewer.
So this patch just removes <base> and replace few relative URLs by absolute
ones.
In the general PDF.js library multiple PDF documents may be opened on the same web-page, which is why we many years ago started using document-specific identifiers to prevent issues with global data such e.g. with fonts.
Hence we need to treat the identifiers generated by the `FilterFactory` in the same way, since the SVG-filters for two separate PDF documents may otherwise get identical ids.
PDF gradients do not have color stops but an arbitrary PDF function of
the type f(t) -> color. CSS gradients are only based on color stops.
Most PDF gradient functions are produced from color stop oriented
gradients.
Take advantage of this by sampling the PDF function at a higher
frequency but not converting any samples which could be interpolated to
color stops. The sampling frequency is chosen to be the least common
multiple of as many values as practical to exactly re-create the common
case of the PDF function implementing equally spaced linearly
interpolated stops in RGB color space. This also allows for better
approximation of other smooth PDF functions (non-linear, or non-equally
spaced, or in different color space).
Fixes: #10572, #14165
The current value originated in PR 2317, and in the decade that have passed the amount of RAM available in (most) devices should have increased a fair bit.
Nowadays we also do a much better job of detecting repeated images at both the page- and document-level, which helps reduce overall memory-usage in many documents.
Finally the constant is also moved into the `src/shared/util.js` file, since it was implicitly used on both the main- and worker-thread previously.
Currently in PDF documents with large images we immediately cleanup once rendering has finished, in order to reduce memory-usage.
Normally that shouldn't be a big problem, however when e.g. repeated zooming happens in the viewer that could easily lead to a lot of wasted resources (and waiting).
Hence this patch, which introduces a new `PDFPageProxy` method that will slightly delay cleanup after rendering.
The dimensions still need to be fixed (from times to times they're in px)
but it doesn't have to be postponed anymore.
To test it: draw something and when resizing look at the dimensions of the div
in devtools, the units must be %.
This simply extends the approach in PR 10727 to also cover Patterns, which shouldn't be a common occurrence in Type3 fonts (since this is the first issue we've seen).
This patch updates the minimum supported environments as follows:
- Node.js 16, which was released on 2021-04-20; see https://en.wikipedia.org/wiki/Node.js#Releases
Note also that Node.js 14 will very soon reach EOL, and thus no longer receive any security updates.
This was deprecated in PR 15758, which has now been included in three official PDF.js releases.
While PR 15880 did limit the bundle-size impact of this functionality on e.g. the Firefox PDF Viewer, it still leads to some unnecessary "bloat" that these changes remove.
Furthermore, with this being deprecated there'd also be no effort put into e.g. extending the `UNSUPPORTED_FEATURES` list when handling future error cases.
The idea is to encode large image in BMP format (which is very simple and doesn't
require to compute any checksums) and then use createImageBitmap with a BMP blob
(which doesn't suffer of the Canvas/ImageData limits).
From a performance point of view, it isn't crazy (generating a large blob + decoding
it on the main thread is really not ideal) but at least we've something to display
which is a way better than a blank page (and one can notice that most of the time is
spent in decoding the image from the pdf stream).
PDF 32000-1:2008 7.10.5.1 "Type 4 (PostScript Calculator) Functions"
defers to the PostScript Language Reference for the description of these
functions. The PostScript Language Reference, third edition chapter 8
"Operators" defines the `angle` type as a "number of degrees". Section
8.1 defines "angle `sin` real", "angle `cos` real", and "num den `atan`
angle". The documentation for `atan` further states that it will return
an angle in degrees between 0 and 360.
Handle these operators correctly in `PostScriptEvaluator.execute`.
Convert the inputs to `sin` and `cos` from degrees to radians for use
with `Math.sin` and `Math.cos`. Correctly pop two values from the stack
for `atan`, use `Math.atan2`, and convert from radians to (positive)
degrees.
This was deprecated in PR 15943, which has now been included in two official PDF.js releases.
Given that `PDFDataRangeTransport` is somewhat unlikely to be used outside of the *built-in* Firefox PDF Viewer, it doesn't seem necessary to wait longer before removing this.
Also, removes the specific error-message for GENERIC builds to not unnecessarily "advertise" using non-objects when calling the `getDocument`-function.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
We introduced the use of OffscreenCanvas in #14754 and this patch aims
to use them for all kind of images.
It'll slightly improve performances (and maybe slightly decrease memory use).
Since an image can be rendered in using some transfer maps but because of
OffscreenCanvas we don't have the underlying pixels array the transfer maps
stuff is re-implemented in using the SVG filter feComponentTransfer.
Rather than repeatedly initializing a `canvasFactory`-instance for every page, move it to the document-level instead.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
Currently we repeat the `FeatureTest.isOffscreenCanvasSupported` checks all over the worker-thread code, and with upcoming changes this will become even "worse".
Hence this patch, which changes the *worker-thread* default value for the `isOffscreenCanvasSupported`-parameter to `false` and moves the feature-testing into the `BasePdfManager`-constructor.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
Currently some `getCtx` calls will have `isOffscreenCanvasSupported === undefined` set, meaning that `OffscreenCanvas` isn't being used as intended, since no `TextLayerRenderTask._isOffscreenCanvasSupported` property exists.
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
In order to help to identify a link, we add a border around it with the LinkText color.
And backdrop colors are inverted when the mouse pointer hovers them, this way it should
help to identify the link where the pointer is.
With upcoming changes we'll potentially start to cache `ImageBitmap` data at the document-level, in addition to just at the page-level.
Hence we need to ensure that such data is actually released on clean-up, and rather than duplicating the existing *manual* handling this code is instead moved into the `PDFObjects.clear` method. (In my opinion, this is an overall improvement even without globally cached `ImageBitmap` data.)
*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it's correct and makes sense.
The `Buffer`-object is Node.js specific functionality[1], thus (obviously) not found in browsers. Please note that the PDF.js library has never officially supported/documented that binary data can be passed as a `Buffer`, and that *internally* in the `src/core`-code we only work with standard `Uint8Array`s.
This means that if, in Node.js environments, a `Buffer` is passed to the API we need to wrap it into a `Uint8Array`, which essentially means creating a copy of the data and thus increasing memory usage.
---
[1] Refer to https://nodejs.org/api/buffer.html#buffer
- Pass the `URL`-object directly to `getDocument`, since that's been supported since PR 13166.
- Remove support for the `disableRange`-option in the test-manifest, since it's completely unused. Please note that it's originally added in PR 2719, however there's never actually been any reference tests using it (not even from the start).
Given that the option is `false` by default everywhere (e.g. in the Firefox PDF Viewer) and that we have unit-tests for `disableRange = true`, it doesn't seem necessary to add new reference tests for it now.
Currently we duplicate the same code more than once in the `test/driver.js` file, which we can avoid by adding a new `AnnotationStorage` helper method instead.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years.
Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well.
*Please note:* Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.
Given that the debugging hash-parameters will only be used when the `pdfBugEnabled` option is manually set[1], we can skip a *tiny* bit of asynchronicity for "regular" users.
---
[1] Note that it's enabled by default in the development viewer, i.e. in `gulp server` mode.
Currently this helper function only has two call-sites, and both of them only pass in `ArrayBuffer` data. Given how it's implemented there's a couple of code-paths that are completely unused (e.g. the "string" one), and in particular the intended fast-paths don't actually work.
This patch re-factors and simplifies the helper function, and it'll no longer accept anything except `ArrayBuffer` data (hence why it's also re-named).
Note that at the time when `arraysToBytes` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
When printing the pdf in #12233 in Acrobat, we can see that the combo for country
is empty: it's because the V entry doesn't have to be one of the options.
We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](a49d1d1615/src/core/worker_stream.js (L90-L98)) and [`PDFWorkerStreamRangeReader.read`](a49d1d1615/src/core/worker_stream.js (L122-L128)) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function.
Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
In the mac case we don't want to care about the scaleFactor threshold
because else if too big another move could start and then subsequent
events aren't considered as wheel events.
It isn't really ideal and at some point we'll need to find a way at
least for the Firefox case to get the real events instead of the fake
wheel ones.
In looking at a profile, I noticed in Marker chart that there's an animation
for loading-icon.gif even if this icon isn't visible.
This patch doesn't completely remove it but just slightly postpones it.
This further extends the web-specific import maps introduced in PR 16009, to allow removing *most* of the build-time `require` statements from the viewer. The few remaining ones are fallbacks used for the COMPONENTS respectively the `legacy` GENERIC builds.
After the compatibility updates in PR 15968 it's no longer strictly necessary to build the `viewer.css` file in order for the *development viewer* to work in Chromium-based browsers.
*Please note:* Given that Chromium-based browsers still don't support the *unprefixed* `mask-image` property the icons won't look right, however the development viewer itself works.
Given that Firefox is the *primary* development target, and that running `gulp generic` locally will generate polyfilled CSS, it seems reasonable to make this simplification here.
Currently there's no toolbar in the GV-viewer, hence invoking the pageLabels functionality isn't meaningful and just leads to unnecessary parsing on both the main- and worker-threads. (And if a toolbar is added at some point, it's not clear to me if we'd want to support pageLabels in the GV-viewer anyway.)
Currently we have a couple of pre-processor checks, specifically for the GV-viewer, spread throughout the code. This works fine when *building* the viewer, however they're obviously ignored in development mode (i.e. `gulp server`).
This leads to a situation where the GV development viewer, i.e. http://localhost:8888/web/viewer-geckoview.html, behaves subtly different from its built version. This could easily lead to bugs, hence this patch introduces a development mode constant to hopefully improve things here.
Finally, in a follow-up to PR 15842, also ignores the `pageMode`-state since there's no sidebar available.
Currently there's no UI for this functionality in the GV-viewer, however we still call the API methods. This potentially leads to a bunch of worker-thread parsing, for PDF documents with these features, despite the result being completely unused.
Given that mobile devices are usually more resource constrained than desktop/laptop computers, not to mentioned battery life, we can avoid doing work that'll just be ignored anyway.
The `DownloadManager.openOrDownloadData` method is written for the default-viewer specifically, assuming a viewer able to handle e.g. URL search/hash parameters. In the viewer components there's obviously no such functionality, and we should thus trigger downloading of PDF attachments directly instead.
Given that the GV-viewer isn't using most of the UI-related components of the default-viewer, we can avoid including them in the *built* viewer to save space.[1]
The least "invasive" way of implementing this, at least that I could come up with, is to leverage import maps with suitable stubs for the GV-viewer.
The one slightly annoying thing is that we now have larger import maps across multiple html-files, and you'll need to remember to update all of them when making future changes.
---
[1] With this patch, the built `viewer.js` size is 391 kB and `viewer-geckoview.js` is 285 kB.
By default we're using worker-thread fetching (in browsers) of this data nowadays, however in Node.js environments or if the user provides custom factories we still fallback to main-thread fetching.
Hence it makes sense, as far as I'm concerned, to move this initialization into the `getDocument` function to ensure that the factories can actually be initialized *before* attempting to load the document.
Also, this further reduces the amount of `getDocument` parameters that we need to pass into into the `WorkerTransport` class.
Currently we're passing all available parameters to this function respectively class, despite that not actually being necessary.
By splitting the parameters we not only improve the structure, and basically "document" the code a little bit, but we can also simplify the `_fetchDocument` function considerably.
This is very old code, where we loop through the user-provided options and build an internal parameter object. To prevent errors we also need to ensure that the parameters are correct/valid, which is especially important for the ones that are sent to the worker-thread such that structured cloning won't fail.[1]
Over the years this has led to more and more code being added in `getDocument` to validate the user-provided options, and at this point *most* of them have at least basic validation. However the way that this is implemented feels slightly backwards, since we first build the internal parameter object and only *afterwards* validate those parameters.[2]
Hence this patch changes the `getDocument` function to instead check/validate the supported options upfront, and then *explicitly* build the internal parameter object with only the needed properties.
---
[1] Note the supported types at https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types
[2] The internal parameter object may also, because of the loop, end up with lots of unnecessary properties since anything that the user provides is being copied.
These functions invoke the `PDFViewer.currentPageNumber` setter, which already checks that a `pdfDocument` is currently active. Also, given that they're event handlers for the First/Last-page buttons (in the SecondaryToolbar) they can't be invoked before the viewer has been fully initalized.
The default value of the `--scale-select-width` CSS variable has been choosen such that it should be large enough for most locales. This means that in many locales we don't even update the CSS variable at all, and for those locales where we do the update happens *one time* early during the viewer initialization (i.e. before the PDF document has loaded).
*Please note:* Compared to other recent PRs, the effect of these changes ought to be really tiny and are mostly done to promote better coding patterns.
We should be able to let Jasmine simply compare directly against an actually empty Object, rather than using a manually implemented helper function for that.
A number of methods have their Promises cached, to avoid repeated worker round-trips, since they're expected to be called more than once from the default viewer. The way that the caching is currently implemented means that we need to remember to manually clear these Promises on document cleanup/destruction, and it'd be nice to avoid that.
With this patch the relevant Promises are now instead placed in just one `Map`, which is easy to clear, and a new helper method is also introduced to reduce duplication for *simple* `WorkerTransport` methods.
The reasons for making this change are:
- There's no UI available to toggle the cursor-tools in the GeckoView-specific viewer.
- The `HandTool`-implementation basically *simulates* touch scrolling, and is thus unlikely to be helpful/useful anyway.
- PR 15831 already changed the relevant call-sites to handle `PDFViewerApplication.pdfCursorTools` being undefined.
Some of the code in this method is *very* old, and we could thus modernize it a little bit by removing a couple of the loops used to build the `getDocument` argument.
These `@media` rules were most likely just copy-pasted from the regular viewer, however none of them are currently necessary since the GeckoView-specific viewer doesn't have any toolbars.
Note that the whole purpose of these CSS rules are to make the toolbar, of the regular viewer, responsive. If we in the future add toolbars for the GeckoView-specific viewer, these rules most likely wouldn't be usable as-is anyway.
This option was added specifically for third-party users, but has never been used in the PDF.js project itself. Furthermore there's no preference that can be used to enable it, and you need to provide the `removePageBorders` option when initializing a `PDFViewer`-instance.
This patch thus get rid of a little bit more unused code in the Firefox PDF Viewer.
*Unfortunately I missed this during testing/reviewing of PR 15992.*
With the changes in PR 15992 we're now only adding the `loadingIcon`-class when rendering is actually `RUNNING`, in order to improve overall performance.
However when resetting the page, i.e. the `INITIAL` state, we also need to remove the `loadingIcon` completely. Without this patch if you scroll through a document where the pages don't load instantaneously, see e.g. issue 2504, we'll leave the `loadingIcon`-class attached to pages that have had their rendering cancelled *and* also been evicted from the `PDFPageViewBuffer`-instance.
This way we don't have a lot of useless divs and we let the css engine handle the
creation/destruction of the :after pseudo-element.
It'll help to slightly improve performance when zooming.
The only parameter that we actually need here is the `PDFDataRangeTransport`-instance, since the others are not necessary.
- The `url` parameter, as passed to the `getDocument` function in the API, is simply being ignored; see 2d87a2eb1c/src/display/api.js (L447-L458)
- The `length` parameter, as passed to the `getDocument` function in the API, is always being overwritten; see 2d87a2eb1c/src/display/api.js (L519-L525)
Until PR 12563 is deemed safe to land, I'd still like to be able to use worker-modules in the viewer during local development.
Hence this patch which *temporarily* adds a new `workerModules` hash-parameter, only available in non-PRODUCTION mode, that allows using worker-modules in the development viewer.
To enable this functionality, simply use http://localhost:8888/web/viewer.html#workerModules=true
The initial CMap support was added in PR 4259 using the "raw" Adobe files, however they were quickly deemed to be unnecessarily large. As a result PR 4470 introduced the more compact "binary" CMap format, with both of those PRs being included in the very same release (version `0.8.1334`) .
Please note that we've thus never shipped anything *except* the "binary" CMap files with the PDF library, and furthermore note that we've not even once updated the CMap files since they were originally added almost nine years ago.
Requiring users to remember that `cMapPacked = true` is necessary, in addition to setting the `cMapUrl` parameter, in order for CMap loading to work feels like a less than ideal API.
Hence this patch, which suggests that we simply let `cMapPacked` default to `true` now.
If this method was added today, I really can't imagine that we'd support anything *except* objects. Unfortunately we cannot just remove this now, since the code has existed since "forever", however we can deprecate this and limit it to only the GENERIC build.
Furthermore, we can avoid a redundant `PDFViewerApplication.setTitleUsingUrl` call in the Firefox PDF Viewer since the title has already been set previously in that case.
Until just recently the only existing `Path2D` polyfill didn't have support for Node.js and/or the `node-canvas` package. Given that this was just fixed, in the latest version, we can now finally remove our inline-checks at the relevant call-sites; please also see https://github.com/nilzona/path2d-polyfill#usage-with-node-canvas
In PR #15757, a value is automatically converted into a number when it's possible
but the case of numbers like "000123" has been overlooked and their format must
be preserved.
When a script is doing something like "foo.value + bar.value" and the values are
numbers then "foo.value" must return a number but the displayed value must be what
the user entered or what a script set, so this patch is just adding a a field
_orginalValue in order to track the value has it has defined.
Some people are used to use a comma as decimal separator, hence it must be considered
when a value is parsed into a number.
This patch is fixing a regression introduced by #15757.
There's really no need for these "complicated" default value assignments, since `GlobalWorkerOptions` is a local variable at this point, and this is rather a case of too much copy-and-paste.
Note that years ago, when all options were set using a global `PDFJS` object, it's possible that options had been set (from the outside) *before* the object had been properly initialized; see e.g. a89071bdef/src/display/global.js
In general it's always recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
Unfortunately we cannot really remove this, since that code has existed since "forever", however we can limit it to only the GENERIC build to avoid completely unnecessary checks in e.g. the Firefox PDF Viewer.
Finally, note that the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
By being less specific about which *exact* JavaScript features are required for the default vs `legacy` build, we don't need to worry about keeping multiple README files up-to-date.
These README files will now refer back to the FAQ for current browser/environment support information.
This function is only used in PresentationMode these days, but we can still improve it a little bit:
- Use the existing web-platform `deltaMode` constants, rather than defining our own constants for those values.
- Access the `deltaMode` first, before the `delta{X, Y}` properties, to avoid being affected by bug 1392460 (similar to the default viewer).
- Use a `URL`-instance directly, since it's by definition an absolute URL.
- Actually limit the "raw" url-string handling to Node.js environments, as intended.
- Skip the warning, since we're already throwing an Error if the `url`-parameter is invalid.
It seems nicer overall, since we're exporting the `ProgressBar` in the viewer-components, to move this functionality into the `ProgressBar`-class itself rather than handling it "manually" in the default-viewer.
*Please note:* I cannot reproduce the problem reported in bug 1811668, regarding the context menu, and in any case it's not clear that that part is even a PDF Viewer bug.
Looking at bug 1811668 I couldn't help but noticing that the textLayer isn't correct, and it's unfortunately once again a problem with the `adjustType1ToUnicode` function. That's intended to help improve text-selection for fonts without a /ToUnicode-entry, and in many cases it does help (the original PR fixed lots of issues) however it's also caused some problems.
In order to improve text-selection in bug 1811668, we'll now properly ignore fonts that have a predefined *named* encoding specified since that's really the intention with PR 14050.
At the beginning of a search we can an update can be triggered with 0 over 0
found matches.
In the GeckoView context, we can't update the finder whenever we want but only
when it has been required.
The JBIG2 images in this PDF document are corrupt enough that even Adobe Reader warns about it when opening the file.
*Please note:* I don't really know the JBIG2 image format at all, however from a very brief look at the specification it seems that integers should be 32-bit.
In general it's recommended to pass a *parameter object* when calling the `getDocument`-function in the API, since that's the only way to provide additional options, and the fact that it also accepts a URL or TypedArray directly is now mostly for backwards compatibility reasons.
However, the `getDocument`-function also accepts a direct `PDFDataRangeTransport`-instance which just seems unnecessary.
*Please note:* The `PDFDataRangeTransport`-implementation was added specifically for the *built-in* Firefox PDF Viewer, however it's most likely not commonly used by any third-party (given that it requires manual PDF-data loading).
Furthermore, the default-viewer always provides a *parameter object* when calling the `getDocument`-function and it's thus completely unaffected by these changes.
Rather than adding `@media (forced-colors: active) { ... }`-blocks throughout the CSS code, we should utilize CSS variables instead as in our other CSS files.
The relevant TrueType font is missing both /ToUnicode *and* /Encoding entires, either of which would have prevented the (current) broken textLayer rendering.
My first idea was that we could use the `post` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6post.html, to get the actual glyphNames and amend the fallback ToUnicode-map that way. Unfortunately that didn't work, since the `post` table only contained ".notdef" and "" (i.e. empty string) entries.
Instead we try to use the `name` table in the TrueType font, see https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6name.html, to determine if the platform is Windows and thus fallback to generate a ToUnicode-map from the `WinAnsiEncoding`.
When a css variable is update in a node then all the children under this
node are updated.
In order to avoid to update all the UI when a page is rescaling, this
patch moves the --scale-factor from the :root to the viewer container.
Note how all over the `src/core/annotation.js`-code we're assuming that if an `appearance`-entry exists it's also a Stream. However, we're not actually checking that thoroughly enough which causes issues in some badly generated PDF documents.
After the changes in PR 15812 we'll now *intermittently* display completely black canvases during zooming. To reproduce this, try switching to wrapped-scrolling and zoom in/out very quickly using either the mouse-wheel or pinching.
This patch removes the recently introduced `transferPdfData` API-option, and simply enables transferring of TypedArray data *by default* instead of copying it. This will help reduce main-thread memory usage, however it will take ownership of the TypedArrays. Currently this only applies to the following cases:
- TypedArrays passed to the `getDocument`-function in the API, in order to open PDF documents from binary data.
- TypedArrays passed to a `PDFDataRangeTransport`-instance, used to support custom PDF document fetching/loading (see e.g. the Firefox PDF Viewer).
*PLEASE NOTE:* To avoid being affected by this, please simply *copy* any TypedArray data before passing it to either of the functions/methods mentioned above.
Now that we transfer TypedArray data that we previously only copied, we need to be more careful with input validation. Given how the `{IPDFStreamReader, IPDFStreamRangeReader}.read` methods will always return ArrayBuffer data, which is then transferred to the worker-thread[1], the actual TypedArray data passed to the API thus need to have the same exact size as its underlying ArrayBuffer to prevent issues.
Hence we'll check for this and only allow transferring of *safe* TypedArray data, and fallback to simply copying the data just as before. This obviously shouldn't be an issue in the Firefox PDF Viewer, but for the general PDF.js library we need to be more careful here.
---
[1] See e09ad99973/src/display/api.js (L2492-L2506) respectively e09ad99973/src/display/api.js (L2578-L2590)
Note how in the API we're transferring the PDF data that's fetched over the network[1]:
- f28bf23a31/src/display/api.js (L2467-L2480)
- f28bf23a31/src/display/api.js (L2553-L2564)
To support that functionality we have the `PDFDataTransportStream`, `PDFFetchStream`, `PDFNetworkStream`, and `PDFNodeStream` implementations. Here these stream-implementations vary slightly in how they handle `ArrayBuffer`s internally, w.r.t. transferring or copying the data:
- In `PDFDataTransportStream` we optionally, after PR 15908, allow transferring of the PDF data as provided externally (used e.g. in the Firefox PDF Viewer).
- In `PDFFetchStream` we're currenly always copying the PDF data returned by the Fetch API, which seems unnecessary. As discussed in PR 15908, it'd seem very weird if this sort of browser API didn't allow transferring of the returned data.
- In `PDFNetworkStream` we're already, since many years, transferring the PDF data returned by the `XMLHttpRequest` functionality. Note how the `getArrayBuffer` helper function simply returns an `ArrayBuffer` response as-is.
- In `PDFNodeStream` we're currently copying the PDF data, however this is unfortunately necessary since Node.js returns data as a `Buffer` object[2].
Given that the `PDFNetworkStream` has been, indirectly, supporting transferring of PDF data for years it would seem really strange if this didn't also apply to the `PDFFetchStream`-implementation.
Hence this patch simply enables transferring of PDF data, when accessed using the Fetch API, unconditionally to help reduced main-thread memory usage since the `PDFFetchStream`-implementation is used *by default* in browsers (for the GENERIC build).
---
[1] As opposed to PDF data being provided as e.g. a TypedArray when calling `getDocument` in the API.
[2] This is a "special" Node.js object, see https://nodejs.org/api/buffer.html#buffer, which doesn't exist in browsers.
- Scale factor is rounded to only scale by integer percent, hence the unused
ticks are accumulated (like we already do for zoom with the mouse wheel).
- Use the same thing for the pinch-to-zoom on a touchscreen: it led to slightly
refactor the code because it happened to ignore a not so small scale which
led to a not so smooth zooming.
Also, removes the `initialData`-parameter JSDocs for the `getDocument`-function given that this parameter has been completely unused since PR 8982 (over five years ago). Note that the `initialData`-parameter is, and always was, intended to be provided when initializing a `PDFDataRangeTransport`-instance.
Version 16 that we used before is now in maintenance mode, so we should
upgrade to the most recent LTS version. For more information on the
Node.js release schedule please refer to
https://github.com/nodejs/release#release-schedule.
After the changes in PR 15850, the `background-color` of the sidebar is now unnecessarily dark in the light-theme. Hence, we can simply remove this CSS rule to improve things overall (and these changes don't affect the dark-theme much at all).
This is even an overall consistency improvement, given the existing `--sidebar-narrow-bg-color` values.
Given that this is internal functionality, not exposed in the official API, it's not entirely clear (at least to me) why we can't just initialize this directly in `src/display/api.js` instead.
When testing both the development viewer and all the ways in which we run tests, everthing still appears to work just fine with this patch.
*Please note:* The reduced test-case is *not* a perfect reproduction of the original PDF document, since this one fails to open in e.g. Adobe Reader, but I do believe that it captures the most important points here.
For corrupt *and* encrypted PDF documents, it's possible that only some trailer dictionaries actually contain an /Encrypt-entry. Previously we'd could easily miss that, since we generally pick the first not obviously corrupt trailer dictionary, and the solution implemented here is to simply pre-parse all trailer dictionaries to see if there's any /Encrypt-entries.
In most of the cases, showing the loading icon is useless because
it's for a very short time, consequently it doesn't bring any useful
information for the user.
After a delay (400ms), the icon is shown in order to inform the user
that the viewer isn't stuck but it's doing something.
In GeckoView, on an event, a callback must be executed with the result of an action,
but the callback can be used only one time.
So for each FindInPage event, we must trigger only one matches count update.
This fixes a warning reported by CodeQL, and should also make general sense given that we parse the font-data to determine the *actual* `type`/`subtype` rather than trusting the PDF document.
This option/preference was disabled in GENERIC builds, see PR 15812, to avoid landing it *just before* a new release. Hence it should be fine to enable this now.
This was deprecated in PR 15758 but it's unfortunately quite difficult to tell if third-party users are depending on this, e.g. to implement custom error reporting, and if so to what extent.
However, thanks to the pre-processor we can limit *most* of this code to GENERIC builds which still seem like a worthwhile change.
These changes reduce the bundle size of the Firefox PDF Viewer by 3.8 kB in total.
This was deprecated in PR 15758 and given that it's quite unlikely that any third-party users are relying on this functionality, since it was only ever added to support telemetry reporting in the Firefox PDF Viewer, it should hopefully be fine to remove this fairly quickly.
These changes reduce the bundle size of the Firefox PDF Viewer by 4.5 kB in total.
Given that the Fetch API only supports the http/https protocols, worker-thread fetching of CMaps and Standard-fonts may thus fail in certain cases. To improve the default behaviour we'll now also check that the `cMapUrl` and `standardFontDataUrl` options are appropriate, except in Firefox where this should always work.
This tweaks a few name that originated in PR 15812, to improve overall consistency:
- Use the `drawingDelay` parameter-name in all methods that accept a delay.
- Use the `postponeDrawing` variable-name in all relevant methods.
With upcoming background changes elsewhere in the viewer, this should be helpful in separating the styling of the loadingBar. These changes also means that both the "regular" and the "indeterminate" loadingBar now uses the same `background-color` value.
Also, shortens the related CSS variables a little bit since that can't hurt.
*This makes the same kind of changes as in the previous patch, but for the pageNumber-loadingIcon in the main toolbar.*
To display the pageNumber-loadingIcon when rendering starts, if the page is the most visible one, we'll utilize the existing "pagerender" event.
To toggle the pageNumber-loadingIcon as the user moves through the document we'll now instead utilize the "pagechanging" event, which should actually be slightly more efficient overall[1]. Note how we'd, in the old code, only consider the most visible page anyway when toggling the pageNumber-loadingIcon.
---
[1] Even in a PDF document as relatively short/simple as `tracemonkey.pdf`, scrolling through the entire document can easily trigger the "updateviewarea" event more than a thousand times.
Given that we only render one page at a time, this will lead to only *one* page-loadingIcon being displayed at a time even if multiple pages are visible in the viewer. However, this will make it clearer which page is the currently parsing/rendering one.
To simplify toggling of the page-loadingIcon visibility, the existing `PDFPageView.renderingState` is changed into a getter/setter-pair with the latter also handling the page-loadingIcon state.
An additional benefit of these changes is that the `PDFViewer` no longer needs to handling toggling of page-loadingIcon visibility during rendering, since there can only ever be *one* page rendering.
Finally, this may also simplify future changes w.r.t. page-loadingIcon visibility toggling (using e.g. a show-timeout).
The rotation-caching added in PR 15812 completely breaks initialization of PDF documents with varying page sizes, causing all pages to wrongly get the same size; see e.g. `sizes.pdf` from the test-suite.
To fix that without having to e.g. add a new parameter, which feels error prone, this patch changes the `PDFPageView.#setDimensions` method to completely ignore the rotation-caching until the `setPdfPage`-method has been called.
Note how, in the scripting initialization in the viewer, we only ever invoke `PDFPageProxy.getJSActions` *once* per page in order to improve overall performance; see a575aa13b9/web/pdf_scripting_manager.js (L372-L375)
Hence it really shouldn't be necessary to cache its result in the API, especially when that is done *manually* rather than using something like `shadow`.
When we're destroying a `PDFPageProxy`-instance, during full document destruction, we'll force-abort any worker-thread parsing of operatorLists. Hence we should make sure that any pending cancel timeout is always aborted, since a later `PDFPageProxy._abortOperatorList` call should always "replace" a previous one.
*Please note:* Technically this was always wrong, but with the changes in PR 15825 it became *ever so slightly* easier to trigger this thanks to the potentially longer timeout.
Right now, the visible pages are redrawn for each scale change.
Consequently, zooming with mouse wheel or in pinching can be pretty janky
(even on a desktop machine but with a hdpi screen).
So the main idea in this patch is to draw the visible pages only once zooming
is finished.
After the changes in PR 15829 the `loadingIconDiv` is no longer always visible when it should be, specifically in the case where we cancel and re-render a partially parsed/rendered page.
To reproduce this, try opening https://github.com/mozilla/pdf.js/files/1522715/wuppertal_2012.pdf in the viewer and change the zoom level while rendering is ongoing. In this case the `loadingIconDiv` doesn't actually become visible, despite being present in the DOM, since it's no longer at the end of the page-div.
I don't know to what extent this renders PR 15829 "pointless", however we're not repeatedly re-creating and re-inserting the `loadingIconDiv` but rather just *move* the existing element in the DOM.
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.
Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility
Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
Given that the `ProgressBar`-constructor was updated, we need to update the "mobile-viewer" example as well; this is yet another thing I missed during review.
We'll no longer import the `SimpleLinkService` dependency unconditionally in the file, since it's only used in COMPONENTS-builds.
Furthermore, for the COMPONENTS-builds, we'll create a `SimpleLinkService`-instance only for those layers that actually need it.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `xfaLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `textLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `textHighlighterFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `structTreeLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `annotationLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Please note that this functionality has never really mattered for the Firefox PDF Viewer, the GENERIC viewer, or even the "simpleviewer"/"singlepageviewer" component-examples. Hence, in practice this means that only the "pageviewer" component-example[1] have ever really utilized this.
Using factories to initialize various layers in the viewer, rather than simply invoking the relevant code directly, seems (at least to me) like a somewhat roundabout way of doing things.
Not only does this lead to more code, both to write and maintain, but since many of the layers have common parameters (e.g. an `AnnotationStorage`-instance) there's also some duplication.
Hence this patch, which removes the `annotationEditorLayerFactory` and instead uses a lookup-function in the `PDFPageView`-class to access the external viewer-properties as necessary.
Note that this should even be an improvement for the "pageviewer" component-example, since most layers will now work by default rather than require manual configuration.
---
[1] In practice we generally suggest using the "simpleviewer", or "singlepageviewer", since it does *most* things out-of-the-box and given that a lot of functionality really require *a viewer* and not just a single page in order to work.
Currently we'll only initialize and render the `annotationEditorLayer` once the regular `annotationLayer` has been rendered.
While it obviously makes sense to render the `annotationEditorLayer` *last*, the way that the code is currently written means that if a third-party user disables the `annotationLayer` then the editing-functionality indirectly becomes disabled as well.
Given that this seems like a somewhat arbitrary limitation, this patch simply decouples these two layers while still keeping the rendering order consistent.
By moving this code the "pageviewer"-component example will become slightly more usable on its own, it may simplify a future addition of XFA Foreground document support, and finally also serves as preparation for the following patches.
The container position and dimensions should be almost constant, hence
it's pretty useless to query them on each rescale.
Finally it avoids to trigger some reflows.
First of all, given the screen-sizes of most mobile phones using Spread modes is unlikely to be useful.
Secondly, and more importantly, since there's (currently) no UI available for the user to override a PDF document-specified Spread mode this would result in a bad UX otherwise.
Also, removes an outdated comment from the `apiPageLayoutToViewerModes` helper function.
Previously we'd abort all parsing if an Error was encountered, despite the fact that multiple `startXRefQueue`-entries may be available and that continued parsing could thus eventually be able to find usable data.
Note that in the referenced PDF document the `startxref`-operator, at the end of the file, points to a position in the middle of an arbitrary `stream` which is why things break.
Depending on e.g. the `textLayerMode` option it might not actually be necessary to always initialize this eagerly.
*Please note:* Unfortunately we cannot `shadow` a private field, hence why this is only made semi-"private".
This is done to support upcoming viewer-changes, and in order to prevent third-party users from outright breaking things we'll simply ignore too large values.
It's a follow-up of #14950: some format actions are ran when the document is open
but we must be sure we've everything ready for that, hence we have to run some
named actions before runnig the global format.
In playing with the form, I discovered that the blur event wasn't triggered when
JS called `setFocus` (because in such a case the mouse was never down). So I removed
the mouseState thing to just use the correct commitKey when blur is triggered by a
TAB key.
In order to move the annotations in the DOM to have something which corresponds
to the visual order, we need to have their dimensions/positions which means that
the parent must have some dimensions.
While reviewing recent patches, I couldn't help but noticing that we now have a lot of call-sites that manually access the `PageViewport.viewBox`-property.
Rather than repeating that verbatim all over the code-base, this patch adds a lazily computed and cached getter for this data instead.
This was essentially done only to compensate for the viewer calling `PDFPageProxy.getAnnotations` unconditionally on every annotationLayer-rendering invocation. With the previous patch that's no longer happening, and this API-caching should thus no longer be necessary.
For pages without any annotations, applies e.g. to the `tracemonkey.pdf` document, we'll repeatedly try to re-create the `annotationLayer` on every zoom and rotation operation.
The reason that this happens is because we don't insert the `annotationLayer`-div into the DOM unless there's annotations present on the page, which thus means that we miss the existing `annotationLayer`-caching present in the `PDFPageView` implementation.
This is a very old issue, and the easiest solution is to simply always insert an *empty* (and hidden) `annotationLayer`-div such that the existing code/caching starts working for the "no annotations" case as well.
Note that this is consistent with other layers, since e.g. the `textLayer` and/or `annotationEditorLayer` may be empty. Given that only a limited, by default ten, number of pages are ever active at once the additional DOM-elements shouldn't effect things negatively here.
This is consistent with the `render` methods of the other layers, and reduces overall indentation in the method.
Furthermore, don't "swallow" errors since the `PDFPageView._renderXfaLayer` method is already able to deal with that.
It doesn't seem necessary to have a *separate* `destroy` method given that the `cancel` method always invokes it unconditionally.
In the `PDFPageView.reset` method we currently attempt to call `destroy` directly, however that'll never actually happen since either:
- We're keeping the annotationEditorLayer, in which case we're just hiding the layer and nothing more (and the relevant branch is never entered).
- We're removing the annotationEditorLayer, in which case the `PDFPageView.cancelRendering` method has already cancelled *and* nulled it (and there's thus nothing left to `destroy` here).
*Please note:* Hopefully I'm not overlooking something obvious here, since both reading through the code *and* also adding `console.log(this.annotationEditorLayer);` [before this line](9d4aadbf7a/web/pdf_page_view.js (L438)) suggests that it's indeed unnecessary.
In PR 14877 I forgot to update the horizontal padding, used when computing the scale of the pages, for the case where SpreadModes and PresentationMode are being used together.
Steps to reproduce:
1. Open the viewer with the default `tracemonkey.pdf` document.
1. Enable any SpreadMode.
2. Rotate the document *once*, either clockwise or counterclockwise.
3. Enter PresentationMode.
4. Try swithching page, e.g. by clicking on the document.
Expected result:
The visible pages change as you click.
Actual result:
The visible pages are "stuck" in the current view.
The `PDFPageProxy._pageIndex` property is a "private" one that shouldn't be accessed, since it could theoretically break tomorrow if we re-factor the relevant API code.
Also, try to clean-up and improve consistency in a couple of JSDoc comments.
This change was made in PR 5552, however I cannot tell why we needed to disable searching in PresentationMode. Furthermore, with the changes in PR 13908 which effectively moved where this code is invoked, searching has now (accidentally) been working in PresentationMode in e.g. the Firefox PDF Viewer for well over a year.
So, let's just enable searching unconditionally in PresentationMode to simplify the code.
An annotation editor layer can be destroyed when it's invisible, hence some
annotations can have a null parent but when printing/saving or when changing
font size, color, ... of all added annotations (when selected with ctrl+a) we
still need to have some parent properties especially the page dimensions, global
scale factor and global rotation angle.
This patch aims to remove all the references to the parent in the editor instances
except in some cases where an editor should obviously have one.
It fixes#15780.
The main issue is due to the fact that an editor's parent can be null when
we want to serialize it and that lead to an exception which break all the
saving/printing process.
So this incomplete patch fixes only the saving/printing issue but not the
underlying problem (i.e. having a null parent) and doesn't bring that much
complexity, so it should help to uplift it the next Firefox release.
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.
Also, improves a few JSDoc comments and `typedef`-imports.
Compared to the recent PR 15722 for the `textLayer` this one should be a (comparatively) much a smaller win overall, since most documents don't have any structTree-data and the required parsing should be cheaper. However, it seems to me that it cannot hurt to improve this nonetheless.
Note that by moving the `structTreeLayer` initialization we remove the need for the "textlayerrendered" event listener, which thus simplifies the code a little bit.
Also, removes the API-caching of the structTree-data since this was basically done to offset the lack of caching in the viewer.
*Please note:* I don't really expect that this is will be an observable change, since virtually all PDF documents already order e.g. /MediaBox and /CropBox entries correctly.
By normalizing boundingBoxes already on the worker-thread, we can be sure that even a corrupt document won't cause issues.
Note how we're passing the `view`-getter to the `PartialEvaluator.getTextContent` method, in order to detect textContent which is outside of the page, hence it makes sense to ensure that it's formatted as expected.
Furthermore, by normalizing this once on the worker-tread we should no longer have to worry about a possibly negative width/height in the `PageViewport` constructor.
Finally, the patch also simplifies the `view`-getter a little bit.
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
The deprecation is included in the current release, i.e. version `3.1.81`, and given the edge-case nature of this option I really don't think that we need to keep it deprecated for multiple releases.
This patch has been successfully tested in a local, artifact, Firefox build.
*Please note:* The only thing that'll no longer work for PDF documents opened using "data:"-URLs is middle-clicking on internal/outline links, in order to open the destination in a new tab. This is however an extremely small loss of functionality, and as can be seen in the bug the alternative (i.e. doing nothing) is surely much worse.
Currently both of the `AnnotationElement` and `KeyboardManager` classes contain *identical* `platform` getters, which seems like unnecessary duplication.
With the pre-processor we can also limit the feature-testing to only GENERIC builds, since `navigator` should always be available in browsers.
Add a deprecation notification for PDFDocumentLoadingTask.onUnsupportedFeature and PDFDocumentProxy.stats
which are likely useless.
The unsupported feature stuff have initially been added in (#4048) in order to be able to display a
warning bar and to help to have some numbers to know how a feature was used.
Those data are no more used in Firefox.
This is very old code, which is unused (by default) in browsers nowadays since the Font Loading API will always be preferred.
For Node.js environments we use the same constant as elsewhere throughout the code-base, and we can also simplify the Firefox-specific check given that the lowest supported version is `102` (as of this writing).
Finally the old TODO is removed, since the general availability of the Font Loading API has made it redundant.
The use of `Array.prototype.reduce()` is, in my opinion, hurting overall readability since it's not particularly easy to look at the relevant code and immediately understand what's going on here. Furthermore this code leads to strictly speaking unnecessary allocations and parsing, since we could just track the min/max values directly in the relevant loop instead.
This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant.
Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless).
While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting).
*Please note:* While unrelated here, also removes a now unused property that I forgot in PR 15259.
---
[1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.
This can't be a particularly common feature, since we've supported Optional Content for over two years and this is the very first TilingPattern-case we've seen.
The reason for the issue is that we use the generic `getFilenameFromUrl` helper function, which was originally intended for regular URLs.
For the filenames we're dealing with in FileAttachments, we really only want to strip the path when one exists[1].
---
[1] See [bug 1230933](https://bugzilla.mozilla.org/show_bug.cgi?id=1230933) for an example of such a case.
With the changes made in PR 14564 this *should* no longer be necessary now, however we still need to keep the `scrollMatches` parameter to handle textLayers with markedContent correctly when searching.
Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the `pdf.js` and `pdf.worker.js` files.
Given that these functions are virtually identical, with the latter only adding a BOM, we can combine the two. Furthermore, since both functions were only used on the worker-thread, there's no reason to duplicate this functionality in both of the `pdf.js` and `pdf.worker.js` files.
Having just played around with adding FreeText-annotations and then trying to print, there were `FreeTextAnnotation: OffscreenCanvas is not supported, annotation may not render correctly.` messages printed in the console.
The reason for this is that `FreeTextAnnotation` inherits from `MarkupAnnotation`, however only `WidgetAnnotation` actually defines the `_isOffscreenCanvasSupported` property.
Adding some logging with `console.{time, timeEnd}` around all the constant definitions at the top of the `web/pdf_find_controller.js` file, I noticed that computing `DIACRITICS_EXCEPTION_STR` took close to half the total time.
My first idea was just to try and make it slightly more efficient, by reducing the amount of iterations and intermediate allocations. However, with this constant only being used during "match diacritics" searches it thus seemed like a good candidate for lazy initialization.
*Please note:* Given that this is a micro optimization, I fully understand if the patch is rejected.
Given that this PDF document is an interesting test-case for performance reasons, w.r.t. inline image caching, it probably can't hurt to add it to the test-suite to make it more readily available.
Considering the contents of that PDF document I'm not sure if we can include it directly in the repository, hence why a *linked* test-case was choosen here.
Given that this `assert` is only intended to catch any implementation bugs in our code, and not actually to validate the PDF data directly[1], we can avoid making this function call unconditionally.
---
[1] In those cases, for example a `FormatError` should have been thrown instead.
With modern EcmaScript features, we can define these fields directly instead. Please note that for backwards compatibility purposes they are still public as before, however note that this functionality is *disabled* by default (see the `pdfBug` API option).
Also, we can (slightly) simplify the two loops used in the `toString` method.
These fields were never intended to be public, since modifying them manually would lead to inconsistent state, and with modern EcmaScript features we can now enforce this.
Also, this patch removes a couple of JSDoc comments that we generally don't use.
- For text fields
* when printing, we generate a fake font which contains some widths computed thanks to
an OffscreenCanvas and its method measureText.
In order to avoid to have to layout the glyphs ourselves, we just render all of them
in one call in the showText method in using the system sans-serif/monospace fonts.
* when saving, we continue to create the appearance streams if the fonts contain the char
but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
to true and remove the appearance stream. This way, we let the different readers handle
the rendering of the strings.
- For FreeText annotations
* when printing, we use the same trick as for text fields.
* there is no need to save an appearance since Acrobat is able to infer one from the
Content entry.
These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
*Please note:* This only fixes the "wrong letter" part of bug 1799927.
It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.
One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
The purpose of this patch is twofold:
- Initialize the unicode-category data *lazily* during text-extraction, since this is completely unused during general parsing/rendering.
- Stop exposing this data in the API, since it's unused on the main-thread and it seems like it was *accidentally* included.
Obviously these changes are API-observable, but hopefully no user is depending on this. Furthermore, it's trivial for a user to re-create this unicode-category data manually with a regular expression (from the exposed `unicode` property).
Currently, during text-extraction, we're repeatedly normalizing and (when necessary) reversing the unicode-values every time. This seems a little unnecessary, since the result won't change, hence this patch moves that into the `Glyph`-instance and makes it *lazily* initialized.
Taking the `tracemonkey.pdf` document as an example: When extracting the text-content there's a total of 69236 characters but only 595 unique `Glyph`-instances, which mean a 99.1 percent cache hit-rate. Generally speaking, the longer a PDF document is the more beneficial this should be.
*Please note:* The old code is fast enough that it unfortunately seems difficult to measure a (clear) performance improvement with this patch, so I completely understand if it's deemed an unnecessary change.
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.
*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.
---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
Note that the "trailer"-case is already a fallback, since normally we're able to use the "xref"-operator even in corrupt documents. However, when a "trailer"-operator is found we still expect "startxref" to exist and be usable in order to advance the stream position. When that's not the case, as happens in the referenced issue, we use a simple fallback to find the first "obj" occurrence instead.
This *partially* fixes issue 15590, since without this patch we fail to find any objects at all during `XRef.indexObjects`. However, note that the PDF document is still corrupt and won't render since there's no actual /Pages-dictionary and the /Root-entry simply points to the /OpenAction-dictionary instead.
After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.
- Some events, which require a user interaction, will allow those functions to be called.
But after few seconds, if there are no more user interaction, it won't be possible
anymore.
The idea is to give an opportunity to the user to leave the pdf.
- Disable print function when we're printing, the same with saving and disallow to save
on open events.
When a form isn't changed, we used the appearances we had in the file, but when
/NeedAppearances is true, all the appearances have to be regenerated whatever they're.
- In #15373, we implemented copy/paste actions in using the system
clipboard.
For any reasons, on Windows, the clipboard doesn't contain the expected
data when the tests are ran in parallel, hence the tests which are
using the clipboard need to be ran sequentially.
- Make sure that we can paste after having copied.
*This is very old code, and it could thus do with some simplification.*
Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.
PR 13725 was only intended as a temporary work-around, and it seems that we can now revert that.
- Firefox 102 is the currently maintained ESR-branch, and the PDF.js project only supports the active one.
- Node.js now works, thanks to the `node-canvas` package, and I've confirmed locally that following the STR in issue 13724 generates a correct image.
In the referenced PDF document there are "numbers" which consist only of `-.`, and while that's obviously not valid Adobe Reader seems to handle it just fine.
Letting this method ignore more invalid "numbers" was suggested during the review of PR 14543, so let's simply relax our the validation here.
It appears that PR 15593 broke `issue12402`, and we thus need to partially restore the /Count check.
I completely missed this when looking at the test-results for PR 15593, both locally and on the bots, since the `Driver._getLastPageNumber` method would "swallow" an unavailable page number.
- When we're editing some annotations, keeping the role="text-box" make them visible
as editable and VoiceOver (Mac) is able to read the contents when they're focused;
- Add an attribute "aria-activedescendant" in order to make the content discoverable
by NVDA on Windows.
After PR 14311, and follow-up patches, we no longer require that the /Count entry (in the /Pages dictionary) is either present or even valid in order to parse/render a PDF document.
Hence it seems strange to keep this requirement for *corrupt* PDF documents, when trying to find a usable `trailer` in the `XRef.indexObjects` method.
With the changes in the previous patch we can move the glyph-cache lookup to the top of the method and thus avoid a bunch of, in *almost* every case, completely unnecessary re-parsing for every `charCode`.
This method, and its class, was originally added in PR 4453 to reduce memory usage when parsing text. Then PR 13494 extended the `Glyph`-representation slightly to also include the `charCode`, which made the `matchesForCache` method *effectively* redundant since most properties on a `Glyph`-instance indirectly depends on that one. The only exception is potentially `isSpace` in multi-byte strings.
Also, something that I noticed when testing this code: The `matchesForCache` method never worked correctly for `Glyph`s containing `accent`-data, since Objects are passed by reference in JavaScript. For affected fonts, of which there's only a handful of examples in our test-suite, we'd fail to find an already existing `Glyph` because of this.
When we fail to find a usable PDF document `trailer` *and* there were errors during parsing, try and fallback to a *previous* generation as a last resort during fetching of uncompressed references.
*Please note:* This will not affect "normal" PDF documents, with valid /XRef data, and even most *corrupt* documents should be completely unaffected by these changes.
Given that the new sidebar icon is slightly shorter than the old one, it cannot hurt to ever so slightly tweak the vertical position of the notification icon.
(While the patch also changes the CSS rule used for the horizontal position, this is a no-op and was done to improve consistency between the two values.)
Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.
Note how we're currently skipping all main-thread cleanup when document destruction has started, but for some reason we're still dispatching the "Cleanup" message.
This seems like a simple oversight, since destruction will already invoke the `BasePdfManager.cleanup` method (on the worker-thread) to fully clear-out all caches.
Given the sheer number of heuristics added to this method over the years, moving the *valid* unicode found case to the top should improve readability of the code.
- Fix Field::getArray in order to collect only the fields which have a value;
- Fix AFSimple_Calculate:
* allow to have a string with a list of field names as argument;
* since a field can be non-terminal, use Field::getArray to collect
the field under it and then apply the calculation on all the descendants.
This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since:
- We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays.
- The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway.
- We're also no longer displaying a notification-bar, in the *built-in* Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.
*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
When a new PDF document is opened in the GENERIC viewer we (obviously) create a new `AnnotationEditorUIManager`-instance, since those are document-specific, and thus we need to ensure that we actually register the `editorTypes` for each one.
Note how after having found the "%PDF-" prefix we then read both the prefix and the version in the loop, only to then remove the prefix at the end.
It seems better to instead advance the stream position past the "%PDF-" prefix, and then read only the version data.
Finally the loop-condition can also be simplified slightly, to further clean-up some very old code.
*Fixes a regression from PR 15246, sorry about that!*
The return value of all `Annotation.getOperatorList` methods was changed in PR 15246, however I missed updating the error code-path in `Page.getOperatorList` which thus breaks all operatorList-parsing for pages with corrupt Annotations.
Looking at the code on the worker-thread, there doesn't appear to be any particular reason for placing *some* of the properties in a `source`-object when sending them with "GetDocRequest".
As is often the case the explanation for this structure is rather "for historical reasons", since originally we simply sent the `source`-object as-is. Doing that was obviously a bad idea, for a couple of reasons:
- It makes it less clear what is/isn't actually needed on the worker-thread.
- Sending unused properties will unnecessarily increase memory usage.
- The `source`-object may contain unclonable data, which would break the library.
Rather than sending all of these parameters individually and then grouping them together on the worker-thread, we can simply handle that in the API instead.
All of the these constants have been deprecated for a while, and with the upcoming *major* version this seems like a good time to remove them.
For the string-constants we can simply remove them, but the number-constants are left commented out since we don't want to re-number the list to prevent third-party breakage.
The way that we set the width of the `dropdownToolbarButton`-select is very old, and despite some improvements over the years this is still somewhat hacky.
In particular, note how we're assigning the select-element a larger width than its containing `dropdownToolbarButton`-element. This was done to prevent displaying *two* separate icons, i.e. the native and the PDF.js one, since it's the only way to handle this in older browsers (particularly Internet Explorer).
Given the currently supported browsers, there's however a better solution available: use `appearance: none;` to disable native styling of the select-element. [According to MDN](https://developer.mozilla.org/en-US/docs/Web/CSS/appearance#browser_compatibility), this is supported in all reasonably modern browsers.
This way we're able to simplify both the CSS rules and the JS-code that's used to adjust the `dropdownToolbarButton` width in a localization aware way.
Because of https://bugzilla.mozilla.org/show_bug.cgi?id=1582545, the padding-inline is by default 0.
0 is not really enough because of the outline, so just set it to 2px (it was 4px before the patch)
in order to have something visually correct.
I noticed the 256 % 3 (which is equal to 1) so I slighty simplify the code.
The sum of the 16 Uint8 doesn't exceed 2^12, hence we can just take the
sum modulo 3.
This method was originally added in PR 1157 (back in 2012), however its only call-site was then removed in PR 2423 (also in 2012).
Hence this method has been completely unused for nearly a decade, and it should thus be safe to remove it.
This patch first of all makes `isOffscreenCanvasSupported` configurable, defaulting to `true` in browsers and `false` in Node.js environments, with a new `getDocument` parameter. While you normally want to use this, in order to improve performance, it should still be possible for users to control it (similar to e.g. `isEvalSupported`).
The specific problem, as reported in issue 14952, is that the SVG back-end doesn't support the new ImageMask data-format that's introduced in PR 14754. In particular:
- When the SVG back-end is used in Node.js environments, this patch will "just work" without the user needing to make any code changes.
- If the SVG back-end is used in browsers, this patch will require that `isOffscreenCanvasSupported: false` is added to the `getDocument`-call.
While it can't hurt to localize the main error-messages, also localizing the error *details* has always seemed somewhat unnecessary since those are only intended for debugging/development purposes. However, I can understand why that's done since the GENERIC viewer used to expose this information in the UI; via the `errorWrapper` UI that's removed in PR 15533.
At this point, when any errors are simply logged in the console, it no longer seems necessary to keep localizing the error *details* in the default viewer.
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.
See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909
---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
This is yet another small piece of clean-up of the `FontLoader`-code, since we've not used this `id`-property for anything ever since PR 6571 (which landed almost seven years ago). Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.
*Please note:* This is tagged `[api-minor]` since it's technically observable from the outside, however no user ought to be directly interacting with these CSS font rules.
This commit fixes the "Expected null to equal '401R'" errors that
surfaced after the Puppeteer 18 upgrade. Note that even before that
this would have been an improvement because it takes some time between
scripting being reported ready (i.e., triggering the execution of any
OpenActions) and those OpenActions actually completing execution, so
it's only safe to check which element is focused if we know an element
actually became focused.
In the Firefox PDF Viewer this has never been used, with the error message simply printed in the web-console, and (somewhat) recently we've also updated the viewer code to avoid bundling the relevant code there. Furthermore, in the Firefox PDF Viewer we're not even display the *browser* fallback bar any more; see https://bugzilla.mozilla.org/show_bug.cgi?id=1705327.
Hence it seems slightly strange to keep this UI around in the GENERIC viewer, and this patch proposes that we simply remove it to simplify/unify the relevant code in the viewer. In particular this also allows us to remove a couple of l10n-strings, which have always been unused in the Firefox PDF Viewer.
Currently the compatibility-file is loaded using a standard `import`-statement and while its code is enclosed in a pre-processor block, and thus is excluded in e.g. the MOZCENTRAL build-target, it still results in the *built* `pdf.js`/`pdf.worker.js` files having an effectively empty closure as a result.
By moving the checks from `src/shared/compatibility.js` and into `src/shared/util.js` instead, we can load the file using a build-time `require`-statement and thus avoid that closure.
Note that with these changes the compatibility-file will no longer be loaded in development mode, i.e. when `gulp server` is used. However, this shouldn't be a big issue given that none of its included polyfills could be loaded then anyway (since `require`-statements are being used) and that it's really only intended for the `legacy`-builds of the library.
Rather than "manually" looking up the l10n-string and then updating the button, we can (and probably even should) just update the l10n-id and then trigger proper translation for the button DOM-element.
This extends the approach used in PresentationMode to also cover the AnnotationEditor, and tries to handle the combination of both cases correctly.
In order to simplify the overall implementation we simply track the *first* seen "previous" cursorTool, and don't allow it to be reset as long as either PresentationMode or an AnnotationEditor is being used.
Note that this PR only adds the "underscore"-variant of *actually existing* ligatures, however the referenced PDF document also uses a couple of non-standard ones (e.g. `ft`, `Th`, and `fh`) that we cannot easily support without larger changes (since they don't have official Unicode-entries).
Given that it's clearly the PDF document, and its fonts, that's the culprit here it's not entirely clear to me that we actually want to attempt a larger refactoring/rewriting of the `glyphlist.js` code, assuming it's even generally possible. Especially when this patch alone already improves our copy-paste behaviour when compared to both Adobe Reader and PDFium, and that this is only the *second* time this sort of bug has been reported.
Fewer dependencies shouldn't be a bad idea in general, and given that the `node-canvas` package already include a `DOMMatrix` polyfill we can simply use that one instead.
Given that Firefox supports *synchronous* font loading, when the Font Loading API isn't being used, there's really no point including code which if called would just throw in the MOZCENTRAL build. (This is safe, since the `FontLoader.isSyncFontLoadingSupported`-getter always return `true` there.)
After the changes in PR 10539 (which landed over three years ago) the `FontLoader.bind` method can only be called with *a single* font at a time, hence the `_prepareFontLoadEvent` method obviously don't need to support multiple fonts any more.
By having just *one* class, and using pre-processor blocks directly in the relevant methods, we reduce the size of this code in the *built* `pdf.js` file.
Originally, when the `BaseFontLoader` abstraction was added in PR 9982, the idea was probably that additional build-targets would get their own implementations. Given that this hasn't happened in the four years since that landed, it doesn't seem meaningful to keep it around.
The existing `loadingContext` class-property can be simplified slightly, since we've not been using the `id`-property on the requests ever since PR 3477 (which landed nine years ago).
Furthermore, by default we're also not even using that code-path now since the Font Loading API will always be used when available.
Currently the `viewBookmark`-button, which is actually a `href`-element, gets an inconsistent `outline`.
Similarly, the `dialog`-buttons also have an inconsistent `outline` after the changes in PR 15438.
Finally, simplifies a couple of `border` rules since setting a border-width when "none" is being used doesn't seem meaningful.
This was done all the way back in PR 8361, for a mozilla-central test that's since been removed. As can be seen in the following search results, there's no `LoopbackPort` invocation outside of the PDF.js code itself: https://searchfox.org/mozilla-central/search?q=LoopbackPort&path=
Given that the `LoopbackPort` is only used in connection with "fake workers", which is something that we don't officially recommend/support, this doesn't seem like functionality that we want to keep exposing in the public API.
The changes in PR 15438 added a `border-radius` when input-elements are focused, however there's no radius when the same elements are hovered. Having the radius change, and not just the `border-color`, when input goes from hovered to focused feels a bit inconsistent (at least to me).
OperatorList.addOp can trigger a flush if it's required, hence the values passed to it must
be correctly initialized in order to avoid some wrong values in the renderer.
Because of that a clip path was considered as empty, nothing was clipped, hence the wrong
rendering in bug 1791583.
*This effectively replaces PR 15465.*
As outlined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/forEach, the argument order when iterating through a `Map` is actually `value, key`.
Ignoring the incorrect Array used in the old code, I cannot imagine that this would've worked anyway since we didn't use the actual `setTimeout`-functionRefs to clear the timeouts; please refer to the `setTimeout`/`setInterval` methods in the `SandboxSupportBase.createSandboxExternals` method.
Since there are no script engine with XFA, the FormCalc parser is not used irl.
The bug @nmtigor noticed was hidden by another one (the wrong check on `match`).
Some z-index have been added in the annotation layer because the elements inside are re-ordered
in order to improve accessibility.
Hence we must add a "high" z-index on the annotation editor layer in order to avoid any bad
interaction between the different layers.
Most of the `String.prototype.search` call-sites found throughout the code-base is actually not necessary, since we usually only want a *boolean*, and those can be replaced with `RegExp.prototype.test` instead.
The default outline for a focused text input is not that bad but for any reason when changing
the background color, all the good default border/outline properties are lost (it's the same
behaviour in Edge).
So in order have something consistent in HCM/non-HCM, a 2px-border+1px-outline (on @MReschenberg
advices) is added when an input is focused with different colors depending on HCM.
While working on the above issue, I noticed few bugs I fixed when in HCM:
- input, button and select have some default properties which have been created at a time where
annotation layer didn't exist, hence this patch remove them and set those properties where
they should live;
- some elements (like the main toolbar) is using a box-shadow which is invisible in HCM, hence
it's replaced by a border-bottom in HCM;
- some separators are invisible in HCM, hence use GrayText color to render them correctly;
- the options for the zoom selection were invisible in HCM with Desert (one of the Windows 11
themes).
By force-quitting the browser while the FullScreen API is active, we don't get a chance to exit PresentationMode *cleanly* and some of its state thus remains (via the `ViewHistory`).
To try and improve things here we can skip updating the Scroll/Spread-mode while PresentationMode is active, since they will be changed when entering PresentationMode, which seems to help and is really the best that we can do here (and what the issue describes is very much an edge-case anyway).
In the `legacy`-builds we (obviously) support the currently maintained Firefox ESR version, and looking at the [release history](https://wiki.mozilla.org/Release_Management/Calendar) those are officially supported (by Mozilla) for about 1-1.5 years.
However, for non-Firefox browsers the `legacy`-builds currently attempt to "support" browsers that are approximately *three* years old.[1] Historically, in the PDF.js project, trying to support old browsers have caused some maintenance problems and even delayed adoption of new web-platform features/functionality.
To lessen the support burden, given that the primary purpose of the PDF.js library is still to develop the *built-in* Firefox PDF Viewer, this patch proposes that the upcoming *major* release changes the minimum supported browsers/environments as follows:
- Chrome 85, which was released on 2020-08-25; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
- Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
- Safari 14, which was released on 2020-09-16; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_14
- Node.js 14 (as before), which is now explicitly listed to prevent it from accidentally breaking; see https://en.wikipedia.org/wiki/Node.js#Releases
---
[1] In older browsers some functionality may not be available and generally we'll ask users to update to a modern browser when bugs, specific to old browsers, are being reported.
The [official Chrome extension](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm) has unfortunately not been updated for *three years*, which means that it's currently missing out on years worth of bug fixes, performance improvements, and new features.
In particular, the Chrome extension suffers from a known bug with non-embedded standard fonts; see issue 13669 for details.
For the time being, this patch proposes that we *temporary* make the following changes:
- Remove the mention of the official Chrome extension from the main README, since it seems unfortunate to somewhat prominently recommend users an old and partially non-working extension.
- Don't run the `gulp lint-chromium` task as part of the CI, since in addition to the official extension not having been updated its code is also not being actively maintained.[1]
Once the official Chrome extension has been updated, and it's being actively maintained again, this patch should be simple enough to revert.
---
[1] The last commits, which aren't e.g. linting or general code-maintenance related, happened a year ago now.
There's no point in having this variable defined (implicitly) as `undefined` in e.g. the Firefox PDF Viewer.
By defining it with `var` and using an ESLint ignore, rather than `let`, we can move it into the relevant pre-processor block instead. Note that since the entire viewer-code is placed, by Webpack, in a top-level closure this variable will thus not become globally accessible.
After the changes in PR 15391 one separator may now become visible too soon when the viewer is narrow, applies e.g. to the MOZCENTRAL viewer, since the wrong CSS class is being used.
The reason that this happens is that only the GENERIC viewer includes the "openFile"-buttons, and we thus need the separator to also be conditionally defined.
This is a slightly speculative change, based on something that I happened to notice while browsing MDN, to hopefully prevent PDF.js from outright breaking in older browsers.
According to the following information on MDN, Safari didn't implement support for the necessary features until version 14:
- https://developer.mozilla.org/en-US/docs/Web/API/MediaQueryList#browser_compatibility
- https://developer.mozilla.org/en-US/docs/Web/API/MediaQueryList/change_event#browser_compatibility
Given the browsers that we currently support only older versions of Safari should be affected, hence it seems reasonable to simply disable the functionality rather than trying to polyfill it.
(It's interesting how it's very often Safari which is *much* slower than the other browsers at implementing new features.)
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.
*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
After the changes in PR 14112 the `PDFViewer`-class is now "identical" to the `BaseViewer`-class and the `PDFSinglePageViewer`-class is just a very thin wrapper around the `BaseViewer`-class.
Hence we can rename these files, and also remove the abstract `BaseViewer`-class, which helps reduce some unnecessary "closures" in the *built* viewer.
*Please note:* These changes are made in two separate commits, to allow GitHub to preserve `blame` for the affected files.
This patch updates a bunch of older code, that makes conditional function calls, to use optional chaining rather than `if`-blocks.
These mostly mechanical changes reduce the size of the `gulp mozcentral` build by a little over 1 kB.
*Please note:* This is only a, hopefully generally helpful, work-around rather than a proper solution to issue 15292.
There's something that's "special" about the Type1 fonts in the referenced PDF document, since we don't manage to find any actual font programs and thus cannot render anything.
Given that it shouldn't make sense for a Type1 font program to ever be empty, since that means that there's no glyph-data to render, we simply fallback to a standard font to at least try and render *something* in these rare cases.
Given that the change in PR 13393 was slightly speculative, given the lack of test-cases, let's just revert part of that to fix the referenced issue.
Based on a quick look at old issues and existing test-cases, it seems that most (if not all) PDF documents that benefit from using the font-data in this way lack any /ToUnicode maps which should mean that they're unaffected by these changes.
Given that the official Bower website, since almost five years, has been advising users to utilize other tools it doesn't seem entirely necessary to keep including the `bower.json` file in the `pdfjs-dist` repository; see e.g. https://bower.io/blog/2017/how-to-migrate-away-from-bower/
This patch proposes removing the `browserify` example for the following reasons:
- The last `browserify` release was almost two years ago, according to both https://github.com/browserify/browserify/releases and https://www.npmjs.com/package/browserify?activeTab=versions
- The project no longer seems to be actively maintained, since so far this year there's only been *a single* (seemingly trivial) patch merged; see https://github.com/browserify/browserify/commits/master
- Because of the previous points `browserify` doesn't support modern and up-to-date JavaScript features, as evident from e.g. issue 14731 and multiple issues found in https://github.com/browserify/browserify/issues
- Our `browserify` example is most likely not very commonly used, judging by the very low volume of issues/PRs related to it. Looking at the `git` history of that example the only changes have been lint- or maintenance-related.[1]
- Providing an example for a framework that's no longer actively maintained doesn't seem like a good idea in general, since we probably don't want to steer users towards using (possibly) older frameworks.
- Given that we've never used `browserify` in the PDF.js project, it's also quite difficult to provide support for the example.
---
[1] It's interesting to compare with the `webpack` example, since that's generated both issues *and* also PRs (for missing features) from users.
Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents.
The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`.
---
[1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.
It slightly helps to reduce the code size and its complexity.
But the cool thing is that it allows to copy/paste some anntations from a pdf
to an other.
Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.
A number of Annotation-types are currently creating their own PopupAnnotations, since they need to use a custom `trigger`-element. However, because of where that check is currently implemented[1] we end up attaching empty/unused containers for those PopupAnnotations to the DOM[2]; see e.g. the `annotation-line.pdf` file in the test-suite for one example.
By instead moving the types-check into the `PopupAnnotationElement` constructor, we can completely skip those PopupAnnotations that are being explicitly handled elsewhere.
Note that I don't *believe* that this is a new issue, although I've not tried to bisect it, but this likely goes back quite some time (possibly even as far as PR 8228).
---
[1] In the `PopupAnnotationElement.render` method.
[2] Please note that the actual Popup-element *itself* isn't being attached/rendered here, just its container which by itself serves no purpose as far as I can tell.
There's three notable exceptions here:
- The `saveDocument` one is converted into a permanent `warn`, since it still works when the `annotationStorage` is empty although it's (obviously) less efficient than `getData`.
- The `fallbackWorkerSrc` functionality (for browsers), since just removing it would risk too much third-party breakage.
- The SVG back-end, since a final decision is yet to be made. (It might be completely removed, or left as-is in an essentially "frozen" state.)
Note that this patch prepends the document title with "* ", rather than only "*" as suggested in the bug, since there's nothing that says that a PDF document cannot specify a title[1] beginning with an asterisk. To reduce possible confusion, having a space between the "editing marker" and the actual document title thus cannot hurt as far as I'm concerned.
In order to notify the viewer when all `AnnotationEditor`s have been removed, we utilize the existing `onAnnotationEditor`-callback to allow the document title to be updated as necessary.
Finally, this patch makes the following (slightly unrelated) changes:
- Rename the `AnnotationStorage.removeKey` method to just `AnnotationStorage.remove` instead. This is consistent with e.g. the `has`-method and should suffice to explain what it does.
- Remove the `AnnotationStorage.hasAnnotationEditors` getter, since the viewer now tracks the necessary state internally. This avoids unnecessarily having to iterate through the `AnnotationStorage`-instance when saving/printing the document.
---
[1] Using either an /Info dictionary or a /Metadata stream.
This functionality has never been used anywhere in the PDF.js library/viewer itself, since it was added in 2013.
Furthermore this functionality is, and has always been, *completely untested* and also unmaintained.
Finally, there's (at least) one old issue about `appendImage` not returning the correct position; see issue 4182.
All-in-all, it seems that keeping very old, untested, unmaintained, and partially broken code around probably isn't what we want here.
(On the off-chance that any future a11y-work requires getting access to image-positions, it'd likely be much better to re-implement the necessary functionality from scratch and also make sure that it's properly tested from the beginning.)
This old method, which is only used with the `imageLayer` functionality, is essentially just a re-implementation of the existing `Util.applyTransform` method.
The password dialog can be cancelled in three different ways:
- By clicking on its "Cancel"-button.
- By pressing the Escape-key.
- By force-opening another dialog, although this shouldn't happen in practice.
Here the "Cancel"-button case is slightly special since it'll trigger `PasswordPrompt.#cancel` *twice*, first directly via the click and secondly via the "close" event on the `dialog`-element.
While this shouldn't, as far as I know, cause any bugs it's nonetheless inconsistent with the other cases outlined above. To improve this we can simply attempt to *close* the password dialog instead, and then rely on the "close" event to run the `PasswordPrompt.#cancel` method.
Currently we simply use the Babel `preset-env` in the `legacy`-builds of the PDF.js library. This has the side-effect of transpiling the code for *very old* browsers/environments, including ones that we (since many years) no longer support which unnecessarily bloats the size of the `legacy`-builds.
For the CSS files we're only targeting *the supported browsers*, and it's thus possible to extend that to also apply to Babel.
One of the most significant changes, with this patch, is that we'll no longer polyfill `async`/`await` in the `legacy`-builds. However, this shouldn't be an issue given the browsers that we currently support in PDF.js; please refer to:
- https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/async_function#browser_compatibility
Currently in `disableWorker=true` mode it's possible that opening of password-protected PDF documents outright fails, if an *incorrect* password is entered. Apparently the event ordering is subtly different in the non-Worker case, which causes the password-callback to be updated *before* the dialog has been fully closed.
To avoid that we'll utilize a `PromiseCapability` to keep track of the state of the password dialog, such that we can delay both re-opening and (importantly) updating of the password-callback until doing so is safe.
This patch *may* also fix issue 15330, but it's impossible for me to tell.
*This is a follow-up to PR 14869.*
In the old code we're accidentally "swallowing" part of the event-details, which explains why the annotationLayer didn't render.
One thing that made debugging a lot harder was the lack of error messages, from the viewer, and a few `PDFPageView`-methods were updated to improve this situation.
- Remove the `typeof Worker` check, since all browsers have had `Worker` support for many years now; see https://developer.mozilla.org/en-US/docs/Web/API/Worker#browser_compatibility
Furthermore the `new Worker(...)` call is wrapped in try-catch, which means that we'll still fallback to "fake workers" if necessary.
- Limit the `fallbackWorkerSrc` handling, in the `PDFWorker.workerSrc` getter, to only GENERIC builds since that's the only place where it's defined anyway.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these change.
By removing this closure the file-size is decreased, even for the *built* `pdf.worker.js` file, since there's now less overall indentation in the code.
This was moved into the `src/display/`-folder in PR 15110, for the initial editor-a11y patch. However, with the changes in PR 15237 we're again only using `binarySearchFirstItem` in the `web/`-folder and it thus seem reasonable to move it back there.
The primary reason for moving it back is that `binarySearchFirstItem` is currently exposed in the public API, and we always want to avoid that unless it's either PDF-related functionality or code that simply must be shared between the `src/`- and `web/`-folders. In this case, `binarySearchFirstItem` is a general helper function that doesn't really satisfy either of those alternatives.
Currently when the `TextAccessibilityManager.enabled` method is called, we'll update `aria-owns` for any pre-existing elements. This obviously makes sense when e.g. zooming/rotating in the viewer, since the annotationLayer/annotationEditorLayer is kept in those cases.
However when the page is *fully* reset, e.g. as result of going out-of-view and thus being evicted from the cache, we still keep the `#textNodes`-Map around. This causes us to set the `aria-owns` attribute (in the textLayer) for an element that doesn't actually exist any more, which as far as I'm concerned seems incorrect. In this case the element will simply, as already implemented, be re-inserted when the annotationLayer/annotationEditorLayer renders again.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these changes.
For e.g. the `gulp mozcentral` command the *built* `pdf.worker.js` file-size decreases `~2 kB` with this patch, and most of the improvement comes from having less overall indentation in the code.
Given that the code is written with JavaScript module-syntax, none of this functionality will "leak" outside of this file with these changes.
By removing this closure the file-size is decreased, even for the *built* `pdf.worker.js` file, since there's now less overall indentation in the code.
This patch doesn't structurally change the text layer: it just adds some aria-owns
attributes to some spans.
The aria-owns attribute expect to have an element id, hence it's why it adds back an
id on the element rendering an annotation, but this id is built in using crypto.randomUUID
to avoid any potential issues with the hash in the url.
The elements in the annotation layer are moved into the DOM in order to have them in the
same "order" as they visually are.
The overall goal is to help screen readers to present to the user the annotations as
they visually are and as they come in the text flow.
It is clearly not perfect, but it should improve readability for some people with visual
disabilities.
According to MDN `Path2D` is available in all browsers that we currently support, see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility
Hence only Node.js is currently lagging behind here, and requires that we keep the old code as a fallback in the `compileType3Glyph` function. However, there's an open PR in the `node-canvas` repository for adding `Path2D` support.
As far as I'm concerned, there's two possible solutions here:
- We land this patch now, since it removes unnecessary code in e.g. the Firefox PDF Viewer, which means that compilation of Type3 glyphs will be disabled in Node.js until that PR is landed.[1]
If users report bugs about Type3 glyphs looking "inconsistent" in Node.js and/or being slow to render, we could perhaps encourage them to upvote and otherwise help out getting that PR landed?
- We wait for the mentioned PR to land *first*, before moving forward with this patch. Given that there's been no updates on that PR for almost two months, this alternative may possibly take a while.
---
[1] Note that Type3 fonts are first of all not very common in PDF documents, and secondly that compilation only applies specifically to Type3 glyphs that contain /ImageMask-data (i.e. not all Type3 fonts are affected).
This exports the same constants as the viewer components, but in the default viewer. To avoid bloating the global-scope the constants are added to a new `PDFViewerApplicationConstants` object[1], which also allows us to skip this in builds where it's not actually needed (e.g. the Firefox *built-in* PDF Viewer).
*Please note:* I'm not completely sold on this idea, and thus wouldn't mind the patch being rejected, since we probably don't want to export every single viewer constant this way. (And it may seem a bit arbitrary, to users, why some constants are exported and others are not.)
---
[1] Somewhat similar to the existing `PDFViewerApplicationOptions` structure.
In addition to the existing `LinkTarget` constant, used by the `PDFLinkService`-constructor, this patch exports the following constants in the viewer components:
- `ScrollMode` and `SpreadMode`, since the `BaseViewer` has getters/setters which work with those constants.
- `RenderingStates`, since that one may be helpful when using `PDFPageView` directly.
While this has always worked, as a consequence of the implementation, it's never been officially supported.
In addition to adding basic unit-tests, this patch also introduces a couple of new JSDoc `@typedef`s in the API to avoid overly long lines.
By invoking the `reset` methods *last* in the `Toolbar`/`SecondaryToolbar`-constructors, we ensure that the "toolbarreset"/"secondarytoolbarreset"-events are actually handle when the viewer loads. Note that previously those events were dispatched *before* the relevant event-listeners had been attached.
With this small change we can avoid inconsistent initial toolbar-state, specifically in the case when the viewer is *reloaded* (since Firefox keeps the HTML-state on "soft" reloads).
By doing this in the worker-thread this code will only need to run *once*, whereas currently re-rendering of a page forces this to be repeated (e.g. after it's been scrolled out-of-view and then back into view again).
When a FreeText editor is pasted then it hasn't an editorDiv yet when added
to the layer, hence it's empty.
So this patch just move the call to addToAnnotationStorage to ensure we've
what we need.
An annotation doesn't have to be in the text flow, hence it's likely a bad idea
to insert its text in the text layer. But the text must be visible from a screen
reader point of view so it must somewhere in the DOM.
So with this patch, the text from a FreeText annotation is extracted and added in
a div in its HTML counterpart, and with the patch #15237 the text should be visible
and positioned relatively to the text flow.
Given that this image is intended specifically for the default viewer, we simply use the CSS preprocessor to remove the image reference in the `gulp components` build.
Considering that the issue only affects a CSS file, I don't believe that replacing the *just released* PDF.js version is actually necessary here.
It doesn't make sense to use a page-canvas that's *smaller* than the resulting thumbnail, since that causes the image to be upscaled which results in a blurry thumbnail. Note that this doesn't normally happen, unless a very small zoom-level is used in the viewer.
To improve performance of the sidebar we use the page-canvases to generate the thumbnails whenever possible, since that avoids unnecessary re-rendering when the sidebar is open. This works generally well, however there's an old problem in PDF documents that contain interactive forms (when those are enabled): Note how the thumbnails become partially (or fully) blank, since those Annotations are not included in the OperatorList.[1]
We obviously want to keep using the `PDFThumbnailView.setImage`-method for most documents, however we need a way to skip it only for those pages that contain interactive forms.
As it turns out it's unfortunately not all that simple to tell, after the fact, from looking only at the OperatorList that some Annotations were skipped. While it might have been possible to try and infer that in the viewer, it'd not have been pretty considering that at the time when rendering finishes the annotationLayer has not yet been built.
The overall simplest solution that I could come up with, was instead to include a *summary* of the interactive form-state when doing the final "flushing" of the OperatorList and expose that information in the API.
---
[1] Some examples from our test-suite: `annotation-tx2.pdf` where the thumbnail is completely blank, and `bug1737260.pdf` where the thumbnail is missing the "buttons" found on the page.
Currently all of these factories take a bunch of (randomly ordered) parameters, which first of all doesn't look that nice in the `PDFPageView`-class when some parameters are optional.
Furthermore, it also makes deprecation/removal of any existing parameter a *potentially* breaking change.
Finally, using an Object will provide a small amount of "documentation" at the call-site which isn't really the case with a bunch of "regular" parameters.
Note that all of the `viewer component` examples still work as-is with this patch, which is why I don't believe that we necessarily have to deprecate in the usual fashion.
Currently some `OPS.beginAnnotation` arguments will contain a `Number` value for the `isUsingOwnCanvas`-parameter, or in some cases an `undefined` value, which is inconsistent from an API perspective.
We can undo/redo a command which will at some point add a command in the queue: typically
it can happening when redoing an addition.
So the idea is to lock the queue when undoing/redoing.
Rather than always disable `PDFThumbnailView.setImage` as soon as user has changed the visibility of the Optional Content, we can utilize the new method added in the previous patch to improve thumbnail performance. Note in particular how, in the old code, even *resetting* of the Optional Content to its default state wouldn't enable `PDFThumbnailView.setImage` again.
While slightly unrelated, this patch also removes the `PDFThumbnailViewer._optionalContentConfigPromise`-property since it's completely unused.
This will allow us to improve the `PDFThumbnailView.setImage` handling in the viewer, and thanks to the added caching this should be reasonbly efficient.
Given that Optional Content visibility is only intended/supported to be updated via the `OptionalContentConfig.setVisibility`-method, this patch actually enforces that now.
Note that this will be used by the next patch in the series, and will help prevent inconsistent state in the `OptionalContentConfig`-class.
*Please note:* This patch also uncovered a pre-existing bug, related to iterating through the visibility groups in the constructor, for the `baseState === "OFF"` case.
- After undoing a deletion of several editors, they appeared to be selected (they had a red border)
when in fact they were not, consequently, this patch aims to remove the selectedEditor class when
an editor is removed;
- Add a test with some ink editors.
The previous version was maybe functional but definitely painful to maintain
(maybe more efficient... I don't know) so this patch aims to simplify it and
it adds some basic unit tests.
Given that the SVG back-end is not defined anywhere except in GENERIC builds, we can remove a bit more unnecessary code in e.g. the Firefox PDF Viewer.
The elements in the annotationEditor layer are rearranged to make them
more accessible, but we must draw them in the order they have been created,
hence this patch adds a z-index to the editors.
- This way, the keyboard callbacks are called even if the page has not
the focus, hence the user doesn't have to guess that they have to click
on the page which is a bit painful especially in Ink mode.
- Add two keyboard shortcuts to commit a Freetext editor (ctrl+enter and
escape).
In the referenced PDF document the fonts have /CIDToGIDMap-entries that cannot be loaded. Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /CIDToGIDMap-entries and fallback to simply assume that no such data is available.
Given that this is *clearly* a case of a corrupt PDF document, there's no guarantee that this will "fix" things in the general case since a /CIDToGIDMap may be *required* in order for some composite fonts to render correctly. However, attempting to render *something* is surely better than skipping a font altogether.
- In the annotationEditorLayer, reorder the editors in the DOM according
the position of the elements on the screen;
- add an aria-owns attribute on the "nearest" element in the text layer
which points to the added editor.
- in using the global clipboard, it'll be possible to copy from a
pdf and paste in an other one;
- it'll allow to edit a previously created annotation;
- copy the editors in the current page.
Previously, we had to set the #allowClick property by hand which was
a bit painful because it's easy to overlook one case or an other.
So with this patch a new editor (for now FreeText one only because the
Ink one is a bit different) is created on the first click if none is selected
on mousedown, else the first click will just commit the data and then the
second will creater a new editor.
The problem is clearly visible when the thickness is at max.
It's mainly because the thickness was not taken into account when
translating the div but it was when the line is drawn on the canvas.
Note that these cases, which are all in older code, were found using the [`unicorn/no-for-loop`](https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/no-for-loop.md) ESLint plugin rule.
However, note that I've opted not to enable this rule by default since there's still *some* cases where I do think that it makes sense to allow "regular" for-loops.
Given that the SVG back-end is not defined anywhere except in GENERIC builds, we can remove a little bit more unnecessary code in e.g. the Firefox PDF Viewer.
Note how we currently throw a "raw" Error, which is problematical since all of the `PartialEvaluator.loadFont` call-sites expect a Promise to be returned. Furthermore, this also means that we don't benefit from the fallback code-path that now exists below.
*Please note:* Unfortunately I don't have a test-case that fails without this patch, since it's something I happened to notice when reading the code while working on another patch.
Previously it was created only on mouseover event but on a touch screen
there are no fingerover event...
The idea behind creating the ink editor on mouseover was to avoid to have
a canvas on each visible page.
So now, when the editor is created, the canvas has dimensions 1x1 and
only when the user starts drawing the dimensions are set to the page ones.
The current version 2.5 is from September 2014, which is almost 8 years
old now. The new version 3.19.0 is from November 2017, which is still
almost 5 years old, but is a step forward towards eventually using the
most recent version. Note that we currently can't update any further;
see #11802 for the details.
Fortunately using this newer version only required a few changes:
- The `ttx` output regexes needed updating to ignore comments that `ttx`
now puts after some XML nodes (`<!-- ... -->` and `/* ... */`).
- The `ttx` invocation now explicitly uses `python2` (except on Windows
where this alias doesn't exist) since otherwise the font tests can't
be run on modern systems anymore given that `python` is nowadays an
alias for `python3`, and it now points at the new location of the
`ttx.py` file since the `Tools` folder got removed.
- The note about needing a 32-bit Python 2.6 version is dropped since
it's obsolete: this version (and also the existing one already) work
just fine on a 64-bit Python 2.7 as well.
We want to avoid adding regular `id`s to xfaLayer-elements, since that means that they become "linkable" through the URL hash in a way that's not supported/intended. This could end up clashing with "named destinations", and that could easily lead to bugs; see issue 11499 and PR 11503 for some context.
Rather than using `id`s, we'll instead use a *custom* `data-element-id` attribute such that it's still possible to access the DOM-elements directly if needed. *Please note:* This is basically the xfaLayer-equivalent of PR 15057.
- and because of rounding errors it led to slightly resize again and again
the ink container;
- when zooming the size is changing but not the ratio, so in this case we
don't need to change the dimension of the container.
`HTMLSectionElement` is not part of the DOM, so the generated typescript definitions contain a non-existing type.
HTML Section elements have to be handled as simple `HTMLElements`.
fixing punctuation and lint problems
[jsdoc] failing typescript builds - wrong type
Rather than including all of this external code in the PDF.js repository, we should be using the npm package instead.
Unfortunately this is slightly more complicated than you'd hope, since the `fit-curve` package (which is older) isn't directly compatible with modern JavaScript modules.
In particular, the following cases needed to be considered:
- For the development viewer (i.e. `gulp server`) and the unit-tests, we thus need to build a fitCurve-bundle that can be directly `import`ed.
- For the actual PDF.js build-targets, we can slightly reduce the sizes by depending on the "raw" `fit-curve` source-code.
- For the Node.js unit-tests, the `fit-curve` package can be used as-is.
- this way the context menu in Firefox can take into account what we
have in the clipboard, if an editor is selected, ...
- when the user will click on a context menu item, an action will be
triggered, hence this patch adds what is required to handle it;
- some tests will be added in the Firefox' patch.
This extends PR 13461, by also building a fallback bounding box for Type3 fonts that contain a much too small /FontBBox-entry.
*Please note:* While this patch improves things overall, copy-and-pasting still doesn't work perfectly for this document. In particular the lowercase letter "c" cannot be selected/copied, however this can be reproduced in both Adobe Reader and PDFium (in Google Chrome) too, which is caused by a lack of proper /ToUnicode-data in the PDF document.
Rather than forcing the user to *manually* call `setDimensions`, which is also breaking any existing third-party code, it seems that we can simply let the `AnnotationLayer.{render, update}`-methods handle that internally.
As far as I can tell, based on testing manually in the viewer *and* running the browser-tests, everything still appears to work correctly with this patch.
After the changes in PR 15036, the trigger-element created in `FileAttachmentAnnotationElement.render` is now too small. This can be fixed by using the same approach as in PR 15065, and the patch can be tested using the `annotation-fileattachment.pdf` document in the test-suite.
- Simplify how we look-up the DOM-element, which should also be a tiny bit more efficent.
- Use private class-fields, rather than property-names prefixed with underscores.
- Inline the `#updateBar` helper-method directly in the `percent`-setter, since having a separate method doesn't seem necessary in this case.
- Set the `indeterminate`-class on the ProgressBar DOM-element, to simplify the code.
Finally, also (slightly) re-factors the `PDFViewerApplication.progress`-method to make it a bit smaller.
- for example in Dusk theme (Windows 11), black appears to be white, so
the user will draw something in white. But if they want to print or
save the used color must be black.
- fix a bug with the color input which only accepts hex string colors;
- adjust outline color of the selected/hovered editors in HCM.
Now that it's possible to change the font-size, this placeholder string feels a little bit long (especially for larger font-sizes).
Given that Editing is not enabled/released yet, I hope that it should be fine to update this without changing the l10n-id.
This replaces the boolean `annotationEditorEnabled` option/preference with a "proper" `annotationEditorMode` one. This way it's not only possible for the user to control if Editing is enabled/disabled, but also which *specific* Editing-mode should become enabled upon PDF document load.
Given that Editing is not enabled/released yet, I cannot imagine that changing the name and type of the option/preference should be an issue.
Note that even though Puppeteer got a major version bump the changelog
doesn't include compatibility changes that are relevant to us; please
see https://github.com/puppeteer/puppeteer/releases.
Currently we're loading the `web/annotation_layer_builder.css` and `web/xfa_layer_builder.css` files *directly* during the reference tests.
This becomes a problem is we want to reduce duplication in the CSS-files, e.g. by placing *common* rules in the `web/pdf_viewer.css` file.
Given that `gulp components` is already being utilized when running tests, we can thus use that to instead depend on the *entire* viewer-components CSS-file in the reference tests.
Similar to other recent patches, see e.g. PR 15057, we don't want to add these kind of `id`s to DOM-elements since they shouldn't become "linkable" through the URL hash.
*Please note:* This patch can be tested, in the viewer, with e.g. `bug1737260.pdf` from the test-suite.
- As in the annotation layer, use percent instead of pixels as unit;
- handle the rotation of the editor layer in allowing editing when rotation
angle is not zero;
- the different editors are rotated counterclockwise in order to be usable
when the main page is itself rotated;
- add support for saving/printing rotated editors.
Given that Annotations can also have an `OC`-entry, we need to take that into account when generating their operatorLists.
Note that in order to simplify the patch the `getOperatorList`-methods, for the Annotation-classes, were converted to be `async`.
- the annotations must be rendered in the same order as the chronological one.
- fix a bug in document.js which avoids to read a saved pdf correctly in Acrobat:
there is no need to reset the xref state: it's done in worker.js once everything
has been saved.
Given that printing is triggered *synchronously* in browsers, it's thus possible for scripting (in PDF documents) to modify the Annotation-data while printing is currently ongoing.
To work-around that we add a new printing-specific `AnnotationStorage`, where the serializable data is *frozen* upon initialization, which the viewer can thus create/utilize during printing.
For encrypted PDF documents without the required permissions set, this patch adds support for disabling of Annotation-editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true` (since PDF document permissions could be seen as user hostile).[1]
As I started looking at the issue, it soon became clear that *only* trying to fix the issue without slightly re-factor the surrounding code would be somewhat difficult.
The following is an overview of the changes in this patch; sorry about the size/scope of this!
- Use a new `AnnotationEditorUIManager`-instance *for each* PDF document opened in the GENERIC viewer, to prevent user-added Annotations from "leaking" from one document into the next.
- Re-factor the `BaseViewer.#initializePermissions`-method, to simplify handling of temporarily disabled modes (e.g. for both Annotation-rendering and Annotation-editing).
- When editing is enabled, let the Editor-buttons be `disabled` until the document has loaded. This way we avoid the buttons becoming clickable temporarily, for PDF documents that use permissions.
- Slightly re-factor how the Editor-buttons are shown/hidden in the viewer, and reset the toolbar-state when a new PDF document is opened.
- Flip the order of the Editor-buttons and the pre-exising toolbarButtons in the "toolbarViewerRight"-div. (To help reduce the size, a little bit, for the PR that adds new Editor-toolbars.)
- Enable editing by default in the development viewer, i.e. `gulp server`, since having to (repeatedly) do that manually becomes annoying after a while.
- Finally, support disabling of editing when `pdfjs.enablePermissions` is set; fixes issue 15049.
---
[1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).
Note how the "page"-div, "canvasWrapper"-div, and `textLayer`-div all have *integer* dimensions (rounded down) rather than using the "raw" viewport-dimensions.
Hence it seems reasonable that the same should apply to the "annotationLayer"-div, now that it's explicit dimensions set.
- Let the `Page.save`-method filter out "empty" entries, similar to the `Page._parsedAnnotations`-getter, since that on its own already simplifies the "SaveDocument"-handler a tiny bit.
- The existing `reduce` and `concat` construction isn't exactly a wonder of readability :-)
Thanks to modern JavaScript features it should be possible to replace all of this with `Array.prototype.flat()` instead, which at least to me feels a lot easier to understand.
There are obviously cases where using `concat` makes perfect sense, since that method doesn't change any of the existing Arrays; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/concat
However, in a few cases throughout the code-base that's not an issue and using `concat` only leads to unnecessary intermediate allocations. With modern JavaScript we can thus replace those with a combination of `push` and spread-syntax, which wasn't originally possible when the code was written.
- each annotation has its coordinates/dimensions expressed in percentage,
hence it's correctly positioned whatever the scale factor is;
- the font sizes are expressed in percentage too and the main font size
is scaled thanks a css var (--scale-factor);
- the rotation is now applied on the div annotationLayer;
- this patch improve the rendering of some strings where the glyph spacing
was not correct (it's a Firefox bug);
- it helps to simplify the code and it should slightly improve the update of
page (on zoom or rotation).
We want to avoid adding regular `id`s to Annotation-elements, since that means that they become "linkable" through the URL hash in a way that's not supported/intended. This could end up clashing with "named destinations", and that could easily lead to bugs; see issue 11499 and PR 11503 for some context.
Rather than using `id`s, we'll instead use a *custom* `data-element-id` attribute such that it's still possible to access the Annotation-elements directly.
Unfortunately these changes required updating most of the integration-tests, and to reduce the amount of repeated code a couple of helper functions were added.
This option is not used, nor has it ever been used, in the *built-in* Firefox PDF Viewer. Hence we can define it only for the environments where it makes sense instead.
This should really have been done as part of PR 12470, since it's now possible to directly set the `defaultUrl`-option without having to fallback to `var`-usage.
- Since the border belongs to the section containing the HTML
counterpart of an annotation, this section must be hidden when
a JS action requires it;
- it wasn't possible to hide a button in using JS.
- Right now, we must select the tool, then click to select a page and
click to start drawing and it's a bit painful;
- so just create a new ink editor when we're hovering a page without one.
Given that the SVG back-end is not defined anywhere except in GENERIC builds, we can remove a little bit of unnecessary code in e.g. the Firefox PDF Viewer.
This appears to be a Microsoft-specific version of the regular Arial font, hence we simply map this to Helvetica in the same way that we treat many other Arial-named fonts.
Apparently the ESLint rule added in PR 15031 wasn't able to catch all cases that can be converted, which is probably not all that surprising given how some of these call-sites look.
- Use `Element.prepend()` to insert nodes before all other ones in the element, rather than using `firstChild` with `insertBefore`-calls; see https://developer.mozilla.org/en-US/docs/Web/API/Element/prepend
- Fix one *incorrect* `insertBefore` call, in the AnnotationLayer-code.
Initially the patch simply changed that to an `Element.before()`-call, however that broke one of the integration-tests. It turns out that the `index` may try to access a non-existent select-child, which triggers undefined behaviour; note the warning in https://developer.mozilla.org/en-US/docs/Web/API/Node/insertBefore#parameters
This only adds the minimum entries required in order to render the referenced document correctly, rather than trying to support "all" Hebrew glyphs, to ensure that all lines in `getGlyphMapForStandardFonts` are covered by tests.
*Please note:* The dates below are still a little ways off, however that obviously won't affect the existing PDF.js releases. Hence I think that we can make these changes now, since by the time of the *next* official PDF.js release they'll likely match up pretty well.[1]
While we "support" some (by now) fairly old browsers, that essentially means that the library (and viewer) will load and that the basic functionality will work as intended.[2]
However, in older browsers, some functionality may not be available and generally we'll ask users to update to a modern browser when bugs (specific to old browsers) are reported.[3]
Since we've previously settled on only supporting browsers/environments that are approximately *three years old*, this patch updates the minimum supported browsers/environments as follows:
- Chrome 76, which was released on 2019-07-30; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
- Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
- Safari 13, which was released on 2019-09-19; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_13
- Node.js 14, which was release on 2020-04-21 (all older versions have reached EOL); see https://en.wikipedia.org/wiki/Node.js#Releases
---
[1] Given that the releases usually happen every two to three months.
[2] Assuming that a `legacy/`-build is being used, of course.
[3] In general it's never a good idea to use old/outdated browsers, since those may contain *known* security vulnerabilities.
Fixes two recent "Code scanning alerts" on GitHub, which likely happened because these calls originally used `parseInt` instead (during initial development).
After the changes in PR 14998, these operators are now no-ops in the `src/display/canvas.js` code and should no longer be necessary.
Given that `beginAnnotations`/`endAnnotations` are not in the PDF specification, but are rather *custom* PDF.js operators, it seems reasonable to stop using them now that they've become no-ops.
The update in https://bugzilla.mozilla.org/show_bug.cgi?id=1773794 failed, because the `editorNone` icon is identical to a pre-existing one. Given that all of the editor-icons are simply placeholders for now, we can just make a tiny change to the SVG-paths to prevent these kind of problems.
While `TextLayerRenderTask` apparently makes sense in TypeScript environments, given that it's being returned by the `renderTextLayer`-function in the API, we really don't want to extend the *public* API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually.
Hence we follow the same pattern as in PR 14013, and add some very basic unit-tests to ensure that `renderTextLayer` always returns a `TextLayerRenderTask`-instance as expected.
In PR #14717, the type was changed from a HTMLElement to a DocumentFragment.
This broke TypeScript projects that use a HTMLElement container.
To remedy this, we extend the type of container to also include HTMLElement.
- Approximate the drawn curve by a set of Bezier curves in using
js code from https://github.com/soswow/fit-curves.
The code has been slightly modified in order to make the linter
happy.
This only applies to *corrupt* PDF documents, where Annotations are missing the required /Rect-entry. Rendering PopupAnnotations unconditionally shouldn't be a problem, since we're not using a `BaseSVGFactory`-instance in that case.
In the "no fontSize available" code-path, in the `ChoiceWidgetAnnotation._getAppearance` method, we don't provide the necessary second argument when calling the `_getTextWidth`-method which will cause errors to be thrown.
Because of a capitalization error, the MIME type wasn't actually being set correctly. However, please note that downloading of font files still worked correctly which is probably why this has gone unnoticed since 2014.
- each annotation must be rendered independently of the others. So
after having rendered each annotation, the canvas states are reset
in order to have something clean to render the next one.
This method/function was added only for the `gulp image_decoders`-builds, and is completely unused elsewhere (e.g. in the Firefox PDF Viewer).
While this only reduces the size of the *built* `pdf.worker.js` file by a little over 1 kB, it can't hurt to remove completely unused code from the "normal" builds.
While calling `JSON.stringify(...)` on a class-instance obviously "works" (as in it doesn't throw), since it's really just an Object, it doesn't really make much sense in the context of the `AnnotationStorage.hash`-getter.
Also, access the *inverse* Viewport-transform correctly in `FreeTextEditor.serialize` to prevent errors being thrown when that method is invoked.
Finally, slightly updates the `AnnotationStorage.serializable`-getter to improve consistency within the class.
*This fixes a regression from PR 14754.*
We didn't lookup the image-data correctly, with the result that we tried to render some ImageMasks using a string rather than the intended TypedArray. To make matters worse, this code-path was apparently not *properly* covered by existing test-cases.
Given the differences between XFA documents and "normal" PDF documents, we don't support editing of the former ones. Hence, when a XFA-document is opened, we temporarily disable the editor-buttons.
- Ensure that the modified-warning won't be displayed, when navigating away from the viewer, if the user has added custom Annotations and then *removed all* of them.
- Ensure that the *initial* editor-buttons state, i.e. the `toggled`-class, is correctly displayed in the toolbar when then viewer loads.
- Tweak the CSS-classes for the editor-buttons, such that they use the correct focus/hover-rules (similar to the sidebar-buttons).
- Remove a no longer accurate comment from the `BaseViewer.annotationEditorMode`-setter.
- Address a couple of *smaller* outstanding review comments, including some re-formatting changes, from PR 14976.
In PR 14710 we only included the JavaScript-part of the polyfill, however we probably need to include the CSS as well to reduce the risk of problems in older browsers.
With the recent CSS-related improvements in the `preprocess`-function we could probably have included this conditionally in the `viewer.css` file. However, considering that the `<dialog>` polyfill-code is only invoked when actually needed it seemed most appropriate/correct to lazy-load the polyfill-CSS as well.
This patch contains a small optimization specifically for the case when `getDocument` is called with TypedArray-data. In that case we'll still hold onto that data, which could obviously be large, even after the "GetDocRequest"-message has been sent to the worker-thread.
In practice this will most likely not affect memory usage in any noticeable way, since the application calling `getDocument` will probably also be keeping a reference to the TypedArray-data. However, it seems like a good idea to ensure that the PDF.js API *itself* won't unnecessarily keep this data alive.
- it's a regression from PR #14247:
- before the PR, the button was rendered on the canvas whatever its status was;
- after the PR, the button image has been moved in an other canvas so when the button is not renderable
(because it has no actions) then the image is not added the HTML element.
- the buttons in the pdf in bug 1737260 or in the pdf in #14308 were not visible
- make the button always renderable but don't add the link element if it's useless.
- right now we're using the font size from the pdf itself but we use an other font
in the annotation layer. So this size doesn't really make sense and leads to bad
rendering (see pdf in #14928);
- use a sans-serif font for the fields containing text (fix issue #14736);
- remove useless padding in text-based fields (fix issue #14301);
- text fields allow/disallow scrolling bars (see bit 24 in Ff entry), so use this
value to hide/show scrollbars in annotation layer.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1771477;
- hangul contains some syllables which are decomposed when using NFD, hence
the text must be correctly shifted in case it contains some of them.
Currently, when range-requests and/or streaming are not supported or for documents opened from `data`-URLs, we'll manually set the `contentDispositionFilename` (assuming it exists and is valid) from the `onOpenWithData`-callback in `PDFViewerApplication.initPassiveLoading`.
However, because of a small oversight in `PDFViewerApplication._initializeMetadata`, this *cached* `contentDispositionFilename` would be ignored and we'd only attempt to use the one returned by `PDFDocumentProxy.getMetadata` in the API (which in the cases outlined above will always be empty).
Also, to ensure that the document properties dialog always displays the *correct* fileName we'll now lookup it using the same exact method as in the viewer itself (via a new callback-function).
With the exception of `isThumbnailViewVisible`, these getters are completely unused. Generally speaking, using the `visibleView`-getter directly works just as well and seems (at least to me) to be overall preferable considering how our classes are usually implemented.
Currently, when non-standard `pageColors` are specified, the thumbnails will look inconsistent depending on how they're created.
The thumbnails that are created by downsizing the *page* canvases will obviously use the `pageColors` as intended, however the thumbnails which are rendered *directly* will always use the default colors.
Over time, as we've been introducing JavaScript code to modify CSS variables, we've been adding shorthand properties to various classes to reduce unnecessary repetition when accessing the document-styles.
Rather than repeating this in multiple places, it seems overall simpler to just introduce a constant and re-use that throughout the viewer instead.
In the `src/display/canvas.js` code the `d1` operator will be used to set the clipping region, and it obviously cannot be empty since that prevents the Type3-glyph from rendering.
Also, the patch removes an outdated comment; refer to PR 12718.
After the changes in https://bugzilla.mozilla.org/show_bug.cgi?id=1757771, that simplified the MOZCENTRAL downloading code, the `ChromeActions.download`-method will no longer invoke the `sendResponse`-callback.
Hence it should no longer be necessary for the `DownloadManager`, in the MOZCENTRAL viewer, to use `FirefoxCom.requestAsync` since no response is ever provided.[1] For the allocated BlobURLs, they should (hopefully) be released when navigating away from the viewer.
---
[1] Note that that was *already* the case, for one of the previous code-paths in the `ChromeActions.download`-method.
After the changes in https://bugzilla.mozilla.org/show_bug.cgi?id=1757771, that simplified the MOZCENTRAL downloading code, the `sourceEventType` is now completely unused and should thus be removed (in my opinion).
Furthermore, with these changes, we also no longer need a *separate* internal "save"-event and can instead just use the older "download"-event everywhere.
This limits the heuristics for handling of incomplete path operators, see PR 9838, to only apply to *sequences* of such operators. In practice a couple of invalid path operators are (hopefully) unlikely to completely break rendering, whereas a sequence of them will easily lead to fairly chaotic rendering artifacts.
The original `ProgressBar`-functionality is very old, and could thus do with some general clean-up.
In particular, while it currently accepts various options those have never really been used in either the default viewer or in any examples. The sort of "styling" that these options provided are *much better*, not to mention simpler, done directly with CSS rules.
As part of these changes, the "progress" is now updated using CSS variables rather than by directly modifying the `style` of DOM elements. This should hopefully simplify future changes to this code, see e.g. PR 14898.
Finally, this also fixes a couple of other small things in the "mobile viewer" example.
While zeroing the temporary `canvas` makes sense, manually clearing the canvas and its context doesn't really accomplish anything since those are tied to the scope of the method.
We need to wait for UI rendering to start *before* getting the CSS variable values, since otherwise the values will be `NaN`.
This is only an issue if the viewer is completely hidden during loading, e.g. in a `display: none` iframe-element.
When pre-processor blocks are being removed, since they don't apply to the current build target, we may currently end up with consecutive blank lines.
While this is obviously not a big issue, it's nonetheless undesirable and we can adjust the `writeLine` function to prevent that.
*This is a follow-up to PR 14886, which "broke" this.*
In addition to fixing the issue, using an Array and `join`-ing it at the end may also be a tiny bit more efficient than using a growing string.
*This patch can be tested, in the viewer, using the `annotation-fileattachment.pdf` document from the test-suite.*
It seems that the code to delay dispatching of the "attachmentsloaded"-event, when `attachmentsCount === 0`, is now effectively broken.[1]
Rather than only using an arbitrary timeout, the new code will instead wait for an "annotationlayerrendered"-event and only *fallback* to using a timeout.
---
[1] The timing of the annotationLayer-rendering changed slightly in PR 14247, and the old code in `PDFAttachmentViewer` wasn't good enough to handle that.
If the computed background/foreground colors are identical, the `canvas` would be rendered mostly blank with only images visible. Hence it seems reasonable to also ignore the `pageColors`-option in this case.
Also, the patch tries to *briefly* outline the various cases in which we ignore the `pageColors`-option in a comment.
Given that the new API-option is an Object named `pageColors`, with `background`/`foreground` keys, it occurred to me that it'd be slightly more consistent if the options/preferences names fully reflected that.
An old shortcoming of the `preprocessCSS`-function is its complete lack of support for our "normal" defines, which makes it very difficult to have build-specific CSS rules. Recently we've started using specially crafted comments to remove CSS rules from the MOZCENTRAL build, but (ab)using the `preprocessCSS`-function in this way really doesn't feel great.
However, it turns out to be surprisingly simple to instead use the "regular" `preprocess`-function for the CSS files as well. The only special-handling that's still necessary is the helper-function for dealing with CSS-imports, but apart from that everything seems to just work.
One reason, as far as I can tell, for having a separate `preprocessCSS`-function was likely that we originally used *lots* of vendor-prefixed CSS rules in our CSS files. With improvements over the years, especially thanks to Autoprefixer and PostCSS, we've been able to remove *almost* all non-standard CSS rules and the need for special-casing the CSS parsing has mostly vanished.
*Please note:* As part of testing this patch I've diffed the output of `gulp generic`, `gulp mozcentral`, and `gulp chromium` against the `master`-branch to check that there was no obvious breakage.
These methods are completely unused in the Firefox PDF Viewer, and were only added to supplement the `PDFViewerApplication.{bindEvents, bindWindowEvents}`-methods since third-party implementations may apparently need to remove event listeners (see PR 8525).
However, in the MOZCENTRAL build that's just dead code and this patch reduces the size of the *built* `web/viewer.js`-file by `~3.5` kB.
*This patch can be tested, in the viewer, using the `annotation-fileattachment.pdf` document from the test-suite.*
Note how the `FileSpec`-implementation already uses `stringToPDFString` during the filename lookup, see cfac6fa511/src/core/file_spec.js (L70)
Hence there's no reason to repeat that again in the `FileAttachmentAnnotationElement`-constructor, and we can thus simplify the "fileattachmentannotation"-event handling a little bit.
This new CSS variable will allow us to simplify a couple of different viewer components, since we no longer need to use JavaScript-based hacks and can directly set the CSS rules instead. In particular:
- The `BaseViewer`-handling, used as part of the code that will center pages *vertically* in PresentationMode, can be simplified.
By using CSS to control the height of the "dummy"-page we avoid unnecessarily invalidating the scale-value, which can reduce *some* unneeded re-rendering while PresentationMode is active.
- The `SecondaryToolbar.#setMaxHeight`-method, and its associated parameters, are no longer necessary and can be completely removed.
Note that in order for things to work correctly in general, the new `--viewer-container-height` CSS variable must potentially be updated on any window-based "resize"-event (even when there's no zooming). While this is currently only done in the default viewer, that shouldn't be an issue since neither PresentationMode nor Toolbar-functionality is included in the "viewer components".
- Use Canvas & CanvasText color when they don't have their default value
as background and foreground colors.
- The colors used to draw (stroke/fill) in a pdf are replaced by the bg/fg
ones according to their luminance.
After recent changes, adding *basic* Spread mode support in PresentationMode has now become reasonably straightforward.
However, documents with *varying* page sizes are non-trivial to handle and would require re-writing (or at least re-factoring) a bunch of the zooming-code.
Hence this PR *purposely* only allow Spread modes to be used, in PresentationMode, for documents where all pages have exactly the same size. While this obviously isn't a fully complete solution, it will however cover the vast majority of all documents and should hopefully be deemed good enough for now.
In PR 14112 usage of this *internal* method was reduced, and it thus can't hurt to clean-up things a little bit more.
Note in particular that we can simplify the call-sites by directly passing in the already available `PDFPageView`-instance, since the `id`-property those contain can replace the previous `pageNumber`-parameter[1].
Given that the method name has always been prefixed with an underscore it was thus never intended to be "public", hence we can now enforce that with modern ECMAScript features.
---
[1] There's already a bunch of other spots, throughout the viewer-code, where we assume that the `PDFPageView.id`-property contains proper page *numbers* (and not e.g. indices); note how we initialize the `PDFPageView`-instances in the `BaseViewer.setDocument`-method.
The current `lastModified`-getter, which only contains a time-stamp, is a fairly crude way of detecting if the stored data has actually been changed. In particular, when the `getRawValue`-method is used, the `lastModified`-getter doesn't cope with data being modified from the "outside".
To fix these issues[1], and to prevent any future bugs in this code, this patch introduces a new `AnnotationStorage.hash`-getter which computes a hash of the currently stored data. To simplify things this re-uses the existing `MurmurHash3_64`-implementation, which required moving that file into the `src/shared/`-folder, since its performance should be good enough here.
---
[1] Given how the `AnnotationStorage.lastModified`-getter was used, this would have been limited to *printing* of forms.
- since resetForm function reset a field value a calculateNow is consequently triggered.
But the calculate callback can itself call resetForm, hence an infinite recursive loop.
So basically, prevent calculeNow to be triggered by itself.
- in Firefox, the letters entered in some fields were duplicated: "AaBb" instead of "AB".
It was mainly because beforeInput was triggering a Keystroke which was itself triggering
an input value update and then the input event was triggered.
So in order to avoid that, beforeInput calls preventDefault and then it's up to the JS to
handle the event.
- fields have a property valueAsString which returns the value as a string. In the
implementation it was wrongly used to store the formatted value of a field (2€ when the user
entered 2). So this patch implements correctly valueAsString.
- non-rendered fields can be updated in using JS but when they're, they must take some properties
in the annotationStorage. It was implemented for field values, but it wasn't for
display, colors, ...
- it fixes#14862 and #14705.
Given that we're (ab)using spread-modes in order to ensure that pages are centered *vertically* in PresentationMode, this re-factoring simplifies the code slightly.
Furthermore, in the event that we *possibly* want to try and support spread-modes in PresentationMode[1] this re-factoring will also prevent future duplication.
---
[1] Note that I'm not particularly keen on doing that, since documents with varying page sizes will be annoying to handle.
Given that the `isNodeJS`-constant will, after PR 14858, *always* be `false` in non-GENERIC builds we can simplify a couple of `getDocument`-parameter default values slightly.
The old format, with inline `PDFJSDev`-checks, wasn't exactly a wonder of readability; which was my fault.
This first of all simplifies the file, since we no longer need dummy-classes and can instead *directly* define the actual classes.
Furthermore, and more importantly, this means that we no longer need to bundle this code in e.g. MOZCENTRAL-builds which reduces the size of *built* `pdf.js` file slightly.
Given that the `compileType3Glyph` function *returns* a function, see `drawOutline`, we'll thus keep the surrounding scope alive. Hence it shouldn't hurt to *explicitly* mark the temporary `Uint8Array`s, used during parsing, as no longer needed. Given the current `MAX_SIZE_TO_COMPILE`-value these `Uint8Array`s may be approximately two mega-bytes large *for every* Type3-glyph.
Interestingly enough this appears to be the very first case of *encoded* dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are *somewhat* similar.
Unfortunately newer versions either caused breakage when running the
unit tests manually in a browser or when serving the development viewer.
Given that we hope to use native import maps soon and this dependency
will then be removed anyway, let's pin it for the time being.
This moves the `COMPILE_TYPE3_GLYPHS`/`MAX_SIZE_TO_COMPILE`-checks into the `compileType3Glyph` function itself, which allows for some simplification at the call-site.
These changes also mean that the `COMPILE_TYPE3_GLYPHS`-check is now done *once* per Type3-glyph, rather than everytime that the glyph is being rendered.
There are two notable changes here:
- `dommatrix` is a major version upgrade, but looking through the commit
history of their `package.json` file at https://github.com/thednp/dommatrix/commits/master/package.json
(due to the lack of a changelog) I couldn't find any breaking changes.
- `es-module-shims` is a regular update, but it was previously pinned
for causing intermittent breakage when running the unit tests in a
browser manually. Fortunately this cannot be reproduced anymore with
the most recent version, so we can also put the caret back now.
There's no point in having these variables defined (implicitly) as `undefined` in e.g. the Firefox PDF Viewer.
By defining them with `var` and using ESList ignores, rather than `let`, we can move them into the relevant pre-processor block instead. Note that since the entire viewer-code is placed, by Webpack, in a top-level closure these variables will thus *not* become globally accessible.
Given that `webViewerOpenFileViaURL` only has a single call-site, and also isn't a particularly large/complex function, it doesn't seem necessary for this to be a separate function and hence it's simply inlined instead.
Also, changes the "no valid build-target was set"-case to throw unconditionally since the only way that it could ever be hit is if there are bugs in the `gulpfile`-code.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608;
- it's only a partial fix for #3351;
- some tiled images have some spurious white lines between the tiles.
When the current transform is applyed the corners of an image can have
some non-integer coordinates leading to some extra transparency added
to handle that. So with this patch the current transform is applied on the
point and on the dimensions in order to have at the end only integer values.
As it turns out, most of the code-paths in the `PDFImage`-class won't actually pass the TypedArray (containing the image-data) to the `ColorSpace`-code. Hence we *generally* don't need to force the image-data to be a `Uint8ClampedArray`, and can just as well directly use a `Uint8Array` instead.
In the following cases we're returning the data without any `ColorSpace`-parsing, and the exact TypedArray used shouldn't matter:
- b72a448327/src/core/image.js (L714)
- b72a448327/src/core/image.js (L751)
In the following cases the image-data is only used *internally*, and again the exact TypedArray used shouldn't matter:
- b72a448327/src/core/image.js (L762) with the actual image-data being defined (as `Uint8ClampedArray`) further below
- b72a448327/src/core/image.js (L837)
*Please note:* This is tagged `api-minor` because it's API-observable, given that *some* image/mask-data will now be returned as `Uint8Array` rather than using `Uint8ClampedArray` unconditionally. However, that seems like a small price to pay to (slightly) reduce memory usage during image-conversion.
Currently we're using *both* ids and classes when specifying the button icons, which seems completely unnecessary and only serves to bloat the CSS code.
In general you always need to be careful about CSS specificity, but in these cases that should not actually be a problem.
Also, while not a button, this patch makes a similar simplification for the `pageNumber`-input.
Note how in the `viewer.html` file we're specifying class names for most buttons, despite that not really being necessary. First of all, in many cases those class names are *identical* to the element-ids. Secondly, looking through the CSS rules they are only ever used when specifying button icons.
All-in-all, we should be able to simplify the HTML-code and use the element-ids in the CSS rules instead. Obviously ids have a higher CSS specificity than classes, but given how the old classes were being used that shouldn't be a problem here.
Also, the patch generalizes the styling for buttons (e.g. `viewBookmark`) that are *actually* link-elements.
Finally, while slightly unrelated, this patch also removes a little bit of duplication when specifying the border for `toolbarField`s.
Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys.
Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen.
Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree.
Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)
Searching through all of the files in the `web/`-folder has no *other* hits for the "textButton" string. Hence it's clear that there are no DOM elements actually using this class, and it's thus dead code.
This re-factors the various toolbar separators to *explicitly* specify both their dimensions and margins. Also, for the `horizontalToolbarSeparator`-class we can just set the `background-color` rather than using `border-top`.
Note that the `splitToolbarButtonSeparator`-class currently sets a number of unnecessary CSS rules, since as mentioned by the Firefox Devtools both the `display`- and `z-index`-properties are being ignored because `float` is used.
Finally, there's also no need to set a `z-index` for some of the `:hover`-rules. It's possible that this *was* necessary before the re-design, since the buttons had borders then.
Note how both of the openFile-buttons are always hidden during viewer initialization in the MOZCENTRAL build, i.e. the *built-in* Firefox PDF Viewer. Despite that we still include HTML, CSS, and JavaScript code for these buttons in the build.
This patch *reduces* the size of the `gulp mozcentral` output by `1679` bytes, which isn't a lot but still cannot hurt.
While working on PR 14825, I couldn't help noticing that the code to increment the `count` for cached ImageMasks was repeated multiple times. Hence it makes sense, as far as I'm concerned, to move this into a helper function instead.
These rules became unnecessary with PR 7697, over five years ago, since printing is now done from a `printContainer`-element rather than "directly" using the viewer.
Note how the *entire* `outerContainer`, which contains all of the DOM elements that were being manually hidden, is now being hidden during printing. Furthermore, note also how the print-canvases/images and their containers are using custom CSS-classes[1] rather than re-using the ones from the viewer.
---
[1] See the `printedPage` respectively `xfaPrintedPage` classes.
Rather than modifying the leading/trailing `margin` on the actual toolbar buttons, to achieve appropriate spacing at the left/right edges of the toolbar(s), it seems much more appropriate (and simpler) to just specify an explicit `padding` for the relevant toolbar containers.
Also, for toolbar buttons placed in `splitToolbarButton`-classes we can reduce some complexity around setting the `margin` (since it should always be zero now).
With these changes, we're thus able to get rid of a couple more `!important`-rules.
Currently we only insert optionalContent-data into the operatorList the first time that an image is parsed, which will (in hindsight) obviously cause problems for cached images.
Hence we also need to insert the optionalContent-data in the various worker-thread image caches, such that it can be accessed in the fast-paths that are used to skip re-parsing of images.
In order to reduce the amount of repeated code, this patch also adds a new `OperatorList`-method that takes care of inserting the necessary data in the operatorList.
In the referenced PDF document the fonts have /Encoding-entries that are Streams (containing completely bogus data), which are thus obviously not valid here.
Hence, only when `ignoreErrors` is set, we'll now ignore these corrupt /Encoding-entries and fallback to the existing code to try and infer a usable encoding.
Given that this is *clearly* a case of corrupt PDF documents, there's no guarantee that this will "fix" all such cases, however it's the best that we do here and shouldn't really be worse than ignoring an entire font.
- Remove a redundant `margin-top` rule for the `.dropdownToolbarButton`. After the (somewhat) recent UI-refresh all buttons now use `margin: 2px 1px;`, which renders the override unnecessary (and getting rid of an `!important`-rule can't hurt).
- Combine two `.toggled::before` rules, since they're identical.
Note that both the `errorWrapper` HTML and JavaScript code is being ignored in the MOZCENTRAL build, i.e. the *built-in* Firefox PDF Viewer, however the CSS rules are still being included.
That seems totally unnecessary, and while we currently don't have full build-target support in the CSS pre-processor we can actually improve things quite easily anyway. By (ab)using the existing CSS pre-processor, which will remove any non-Firefox CSS rules for the MOZCENTRAL build, it's possible to easily stop bundling any CSS rules by using comments that include a `-webkit`-string.
*Please note:* To easily test that this doesn't break the `errorWrapper` in GENERIC builds, try running e.g. `PDFViewerApplication._otherError("test");` in the web-console.
This special-case was added because the original PresentationMode-implementation used some CSS-tricks to hide everything except the current page. With the changes in PR 14112, which added a PAGE scroll-mode, many of the old PresentationMode-specific hacks could thus be removed from both the JS and CSS code.
This patch is yet another (small) clean-up step, to reduce the number of PresentationMode special-cases used throughout the `BaseViewer`. Note in particular that `BaseViewer.update` now works just fine in PresentationMode[1], and that we only need to ensure that the active page won't *accidentally* change because of the PresentationMode-specific zooming that occurs during page-switching.
---
[1] In the event that we ever want to try and support spread-modes in PresentationMode, which I'm really not keen on doing since documents with varying page sizes will be annoying to handle, these changes would be necessary as well.
The styling of the previous/next-buttons and the findInput, with the elements being "glued" together, was supposed to mimic the styling used in the Firefox *browser* findbar. However, after the most recent re-styling of the Firefox browser UI these elements are now visually separated.
Hence it makes sense, as far as I'm concerned, to try and follow this styling for the findbar used in the GENERIC viewer. One benefit of doing this is that we get more consistent styling, since the buttons now look/behave identically in both the main toolbar and the findbar. Additionally this also simplifies the CSS a bit, since a lot of the existing findbar-specific rules can be removed.
- most of the time the current transform is a scaling one (modulo translation),
hence it's possible to avoid to apply the transform on each bbox and then apply
it a posteriori;
- compute the bbox when it's possible in the worker.
The spread-mode code in `BaseViewer.#ensurePageViewVisible`-method was initially copied from the `BaseViewer._updateSpreadMode`-method, which means that it's slightly more complicated than actually necessary.
In particular, in the PAGE scroll-mode there can only be *one* spread active at a time and we thus don't need to handle insertion of multiple spread-divs.
Given that we're no longer, after PR 14560, bundling the `web-streams-polyfill`-code in the `legacy`-builds we shouldn't need to exclude it from Babel now.
- it's the second part of the fix for https://bugzilla.mozilla.org/show_bug.cgi?id=857031;
- some image masks can be used several times but at different positions;
- an image need to be pre-process before to be rendered:
* rescale it;
* use the fill color/pattern.
- the two operations above are time consuming so we can cache the generated canvas;
- the cache key is based on the current transform matrix (without the translation part)
and the current fill color when it isn't a pattern.
- the rendering of the pdf in the above bug is really faster than without this patch.
Given that no HTML element has used the `loadingBox`-id for many years, we obviously don't need to try and hide a non-existent element during printing.
Furthermore, we also shouldn't need to change the `overflow`-value for the `viewerContainer`-element during printing. Originally, many years ago now, we printed "directly" using the viewer and then this apparently made sense.
Given that none of these CSS rules are used at all, unless debugging is enabled, it seems completely unnecessary to load them *unconditionally* for all users.[1]
Note that if *both* the `textLayer` and `pdfBug` debugging hash-parameters are specified simultaneously, we'll now load the `PDFBug`-file *twice* (since the code is simpler that way). However, given first of all that none of this is enabled by default and secondly that using those parameters together isn't helpful[2], potentially loading that file twice is hopefully not an issue.
For the `gulp mozcentral` target, the size of the *built* `viewer.css` file is reduced `> 3%` with this patch.
---
[1] For the Firefox built-in PDF Viewer, in order to even be able to access the `PDFBug` functionality, you need to first of all set `pdfjs.pdfBugEnabled = true` manually in `about:config`. Secondly, you then also need to append the `pdfBug=...` hash-parameter to the URL when *initially* loading the document.
[2] Note how the `textLayer`-settings are already, since essentially forever, overriding the highlighting-features of the "FontInspector"-tab.
- avoid to call normalizeRect which clones the rectangles: it's useless
and time consuming;
- in profiling the pdf in bug 1135277, the time spent in intersect drops
from ~1s to ~30ms.
Because of a bug in previous `core-js` versions, which caused an Error to be thrown if its `structuredClone` polyfill was called with an *explicit* `null`/`undefined` transfer-parameter, the `LoopbackPort`-class contained a work-around.
In the latest `core-js` version this has been fixed, and we can thus simplify our code ever so slightly; please see https://github.com/zloirock/core-js/releases/tag/v3.22.0
This CSS variable is only used together with the `annotationCanvasMap`-functionality in the canvas-code, however its value can be *trivially* computed by using the older `--zoom-factor` CSS variable together with the `PixelsPerInch`-structure.
Rather than having *two different* CSS variables that are this closely linked, it seems better to simplify things by using just one CSS variable instead.
According to the CSS, there should be a visible "divider" after the "Page Width" zoom-option. However, this is being ignored in both Mozilla Firefox[1] and Google Chrome hence the rule is effectively useless now.
Furthermore, the "custom" zoom-option is already being `hidden` using the attribute (in the HTML code) and there should thus be no reason to duplicate this in the CSS as well.
---
[1] Support for *detailed* styling of `<select>`-elements was removed as part of the E10s project.
With just a couple of exceptions, for the `thumbnailView`, all of the sidebarViews share the same basic styling which thus allows for some simplification.
- For the findbar/secondaryToolbar case, the `min-width` rule doesn't really make sense since it's way too small to be useful. Furthermore, the findbar is already specifying its own `min-width` and the secondaryToolbar will (thanks to its buttons) receive a correct/useful width.
- The pageNumber-input already has an *explicit* `width` set, hence setting the `min-width` rule as well is completely unnecessary.
- The treeItem-links are supposed to *compute* their `min-width`, and the static value was only added as a fallback for older browsers without `calc()` support.
In a couple of spots, mostly related to the debugging tools, we're unnecessarily using a somewhat "complex" `background`-format only to specify a solid color. This can be simplified by using `background-color` instead, and the patch also removes a `color`-rule that's being ignored anyway.
After the changes in https://bugzilla.mozilla.org/show_bug.cgi?id=1761839, we no longer need this CSS work-around to prevent the entire `<dialog>` contents from becoming selected when the backdrop is clicked.
When the viewer becomes narrow enough that the sidebar is overlaying the document, which means that the `viewerContainer` is not moving when opening/closing the sidebar, we're currently not removing the `sidebarMoving` CSS class as intended.
While this doesn't cause any *visible* issues, it's nonetheless wrong and should be fixed.
With the changes in PR 8993, a number of the `@media`-related CSS rules became unnecessary. However, it appears that some of these rule were *accidentally* left behind despite being unused now.
Note that previously, when opening the sidebar shifted the position of the main toolbar, we had to take both the sidebar opened *and* closed cases into account in these `@media` rules.
- write some uint32 instead of uint8 to avoid the check before clamping;
- unroll the loop to write data in the buffer
- but keep a loop for the last element of a line: it likely doesn't hurt
that much since it's executed only for one time for each line;
- I tested on a macbook with an Apple chip, and on Firefox nightly the new
code is almost 3.5x faster than before (~1.8x with Chrome).
*This is yet another installment in a never-ending series of patches that attempt to simplify and improve old code.*
The `fileInput`-element is used to support the "Open file"-button in the `GENERIC` viewer, however this is very old code.
Rather than creating the element dynamically in JavaScript, we can simply define it conditionally in the HTML code thanks to the pre-processor. Furthermore, the `fileInput`-element currently has a number of unnecessary CSS rules, since the element is *purposely* never made visibly.
Note that with these changes, the `fileInput`-element will now *always* have `display: none;` set. This shouldn't matter, since we can still trigger the `click`-event of the element just fine (via JavaScript) and this patch has been successfully tested in both Mozilla Firefox and Google Chrome.
- it aims to partially fix performance issue reported: https://bugzilla.mozilla.org/show_bug.cgi?id=857031;
- the idea is too avoid to use byte arrays but use ImageBitmap which are a way faster to draw:
* an ImageBitmap is Transferable which means that it can be built in the worker instead of in the main thread:
- this is achieved in using an OffscreenCanvas when it's available, there is a bug to enable them
for pdf.js: https://bugzilla.mozilla.org/show_bug.cgi?id=1763330;
- or in using createImageBitmap: in Firefox a task is sent to the main thread to build the bitmap so
it's slightly slower than using an OffscreenCanvas.
* it's transfered from the worker to the main thread by "reference";
* the byte buffers used to create the image data have a very short lifetime and ergo the memory used is globally
less than before.
- Use the localImageCache for the mask;
- Fix the pdf issue4436r.pdf: it was expected to have a binary stream for the image;
- Move the singlePixel trick from operator_list to image: this way we can use this trick even if it isn't in a set
as defined in operator_list.
With the changes in the previous patch, we can simplify the state-tracking by using the `PresentationModeState`-values directly in the `PDFPresentationMode` class.
We don't need to first check if the Dictionary contains the key, since trying to get a non-existent key simply returns `undefined` and we're already ensuring that the value is a boolean.
Furthermore, we shouldn't need to worry about the `Object.prototype` containing enumerable properties since the checks (in `src/core/worker.js`) done for `Array.prototype` *indirectly* also cover `Object`s. (Keep in mind that an `Array` is just a special kind of `Object` in JavaScript.)
The `pdfOpenParams` parameter has never really made sense in PresentationMode, since e.g. the zoom-value doesn't (generally) agree with the value chosen by the user prior to entering PresentationMode.
This has never mattered all that much, since the `viewBookmark`-button isn't visible in PresentationMode (nor is any other toolbar button for that matter). However, in the `PDFHistory`-implementation we're currently forced to handle this case specifically since we don't want to populate the browser history with nonsensical state.
Hence it makes overall sense, as far as I'm concerned, to tweak the "updateviewarea" event to include a *simplified* `pdfOpenParams` parameter when PresentationMode is active. Given that the `viewer components` don't include PresentationMode functionality, this change thus shouldn't matter for third-party users.
This method was originally added specifically to work-around bugs/issues related to PresentationMode in Google Chrome. Note that prior to PR 14112 we were using some CSS hacks to only show the current page in PresentationMode, and that could lead to the `getVisibleElements` function not always finding the correct elements.
However, after the changes in PR 14112 we're now using the Page-scrolling mode in PresentationMode and consequently there'll only be *a single* page visible at a time. Hence then `BaseViewer._getCurrentVisiblePage` helper method should no longer be needed, and when testing (locally) in Google Chrome everything seems to work correctly now.
The various functionality in `web/debugger.js` is currently *indirectly* added to the global scope, since that's how `var` works when it's used outside of any functions/closures.
Given how this functionality is being accessed/used, not just in the viewer but also in the API and textLayer, simply converting the entire file to a module isn't trivial[1]. However, we can at least export the `PDFBug`-part properly and then `import` that dynamically in the viewer.
Also, to improve the code a little bit, we're now *explicitly* exporting the necessary functionality globally.
According to MDN, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility, all the browsers that we now support have dynamic `imports` implementations.
---
[1] We could probably pass around references to the necessary functionality, but given how this is being used I'm just not sure it's worth the effort. Also, adding *official* support for these viewer-specific debugging tools in the API feels both unnecessary and unfortunate.
*Please note:* This is possibly bad/wrong in general, but I figured that submitting it for review wouldn't hurt.
It seems that even Adobe Reader doesn't handle the non-ASCII characters that appear in some of the fields correctly, however it should be pretty easy to improve things on the PDF.js side.
- it aims to fix#14685;
- add a basic object to get values from the parsed datasets;
- these annotations don't have an appearance so we must create one when printing or saving.
Please note that this patch is purposely quite basic, e.g. it doesn't add the polyfill-CSS in order to simplify the build process, and things such as `::backdrop` thus isn't working.
However, this patch does ensure that older browsers can at least still *access* all of the previous overlays and that things like e.g. opening of password-protected documents respectively printing still works.
This will hopefully improve the a11y a little bit in these dialogs, however there's most definately more things that can be done here (by someone more knowledgeable about a11y).
This way we're able to store the `<dialog>` elements directly, which removes the need to use manually specified name-strings thus simplifying both the `OverlayManager` itself and its calling code.
This replaces our *custom* overlays with standard `<dialog>` DOM elements, see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/dialog, thus simplifying the related CSS, HTML, and JavaScript code.
With these changes, some of the functionality of the `OverlayManager` class is now handled natively (e.g. `Esc` to close the dialog). However, since we still need to be able to prevent dialogs from overlaying one another, it still makes sense to keep this functionality (as far as I'm concerned).
PDFScriptingManager uses the `mousedown` and `mouseup` listeners to keep
track of whether the mouse pointer is pressed in the `isDown` flag.
These listeners were registered to run during the bubbling phase of the
event dispatch, which can be interrupted if any of the previous event
listeners stopped the event propagation. An example of that is by
`GrabToPan` in web/grab_to_pan.js.
Since the mousedown (and mouseup) listeners of PDFScriptingManager are
free of side effects, and the intention is to always run them, it makes
most sense to register them with the capture flag.
- it aims to fix issue #14627;
- the basic idea of the recent text refactoring was to only consider the rendered visible whitespaces.
But sometimes, the heuristics aren't correct and although some whitespaces are in the text stream
they weren't in the text chunks because they were too small. Hence we added some exceptions, for example,
we always add a whitespace when it is between two non-whitespace chars but only when in the same Tj.
So basically, this patch removes the constraint to have the chars in the same Tj
(in using a circular buffer to save the two last chars) but don't add a space when the visible space is really
too small (hence `NOT_A_SPACE_FACTOR`).
Given that the textLayer-code has been using a `DocumentFragment` ever since PR 3356 (back in 2013), simply updating the type of the `container` property should be fine.
This patch also tries to, ever so slightly, improve the grammar of a couple of other properties in the typedef.
After the recent round of patches, I figured that we'd gone as far as possible in replacing `dir`-dependent CSS rules for the viewer.
However, it occurred that me that we could actually use a bit of CSS-trickery to get rid of the remaining ones. More specifically, this was done by defining a CSS variable whose value depends on the document direction and then using that variable together with `calc()` in the affected rules.
*Please note:* I suppose that this could perhaps be seen as a bit too "magical", hence I understand if this patch is ultimately rejected, however this is probably the only simple way to get rid of the remaining `dir`-dependent CSS rules.
The old code would allow an overlay to force close *itself*, before immediately re-opening itself, which actually isn't very helpful in practice since that won't re-run any overlay-specific initialization code.
Given how the overlays are being used this really shouldn't have caused any issues, but it's a bug that we should fix nonetheless.
There's a couple of `getDocument` parameters that should be numbers, but which are currently not *fully* validated to prevent issues elsewhere in the code-base.
Also, improves validation of the `ownerDocument` parameter since we currently accept more-or-less anything here.
Note that the Prettier update made it possible to move a couple of comments after `default:`-cases back to their original/intended positions, please see https://prettier.io/blog/2022/03/16/2.6.0.html
*Please note:* This is another step in what will, time permitting, become a series of patches to simplify/modernize the viewer CSS.
Rather than having to manually specify ltr/rtl-specific float-values in the CSS, we can use logical `inline-start`/`inline-end` instead (and similar for some related left/right occurrences).
These logical properties depend on, among other things, the direction of the HTML document which we *always* specify in the viewer.
Given that most of these logical CSS properties are fairly new, and that cross-browser support is thus somewhat limited (see below), we rely on PostCSS plugins in order to support this in the GENERIC viewer.
- https://developer.mozilla.org/en-US/docs/Web/CSS/float#browser_compatibility
- https://developer.mozilla.org/en-US/docs/Web/CSS/inset-inline-end#browser_compatibility
Currently we're resolving the Promises in the `_extractTextPromises` Array with the page-index, despite that not really being necessary since the Promises in the Array are explicitly inserted in the correct order.
Furthermore, we can replace the standard `for`-loop with a `for...of`-loop which results in ever so slightly more compact code.
This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.
Rather than *manually* specifying a "mode", we can simply use the regular `defines` directly instead. To improve consistency, in the `external/builder/builder.js` file, a couple of parameters are also re-named.
Given that we now only use Workers when `postMessage` transfers are supported, there's really no point in trying to send a "test" message *without* transfers present.
Hence, if `postMessage` transfers are not supported by the browser, we'll now fallback to "fake" Workers immediately instead. The comment about Opera is also removed, since it was originally added back in PR 983 and mentions Opera `11.60` [which was released in 2011](https://en.wikipedia.org/wiki/History_of_the_Opera_web_browser#Version_11).
Given that none of these methods were ever intended to be accessed directly from the outside, we can use modern ECMAScript features to ensure that they are indeed private.
This patch also makes `fieldData` private, to remove the old hack used to prevent it from being modified from the outside.
Given that we're now *building* the `web/viewer.css` file used in the development viewer, i.e. with `gulp server`, we no longer need to hard-code these `-webkit`-prefixed rules and can instead let Autoprefixer handle that for us.
To allow using modern CSS features that currently only Mozilla Firefox supports[1], while still enabling development/testing in recent Google Chrome versions, we'll have to start building the `web/viewer.css` file with `gulp server` as well.
In my testing, building the development CSS (and copying the images) takes *less than* `200 ms` on average which is hopefully an acceptable overhead for this sort of feature.
---
[1] In particular `float`, with `inline-start`/`inline-end` values.
Every single call-site has always passed in `true` for this parameter, ever since the function was first added back in PR 8023. Hence the parameter appears to be completely unnecessary, which is why it's removed and the function is updated to *unconditionally* strip out any license headers (in the middle of the file).
Apparently this unit-test works in Node.js now, hence it's *possible* that the reason it didn't work previously is that there were bugs in our old `structuredClone` polyfill.
This patch fixes an old inconsistency, when using `BasePreferences.{reset, set}`, where the internal Preference values would be kept even if writing them to storage failed.
Given that none of these fields were ever intended to be accessed directly from the *outside*, since that will lead to inconsistent/broken state, we can use modern ECMAScript features to ensure that they are indeed private.
These changes make sense for two reasons:
- Given that the parameters are potentially passed to the worker-thread, depending on the `useWorkerFetch` parameter, we need to prevent errors if the user provides values that aren't clonable.
- By ensuring that the default values are indeed `null`, we'll trigger main-thread fetching (of CMaps and Standard fonts) as intended in the `PartialEvaluator` and thus potentially provide better Error messages.
This function is currently placed in the `src/shared/util.js` file, which means that the code is duplicated in both of the *built* `pdf.js` and `pdf.worker.js` files. Furthermore, it only has a single call-site which is also specific to the `GENERIC`-build of the PDF.js library.
Hence this helper function is instead moved into the `src/display/api.js` file, in such a way that it's conditionally defined but still can be unit-tested.
Besides converting the `send` function to use the Fetch API, this patch also changes the method to return a `Promise` to get rid of the callback function. (Although, currently there's no call-site passing in a callback function.)
This is the final part in a series of patches that try to re-implement PR 14287 in smaller steps.
Besides converting `inlineImages` to use the Fetch API, this patch also combines the `inlineImages` and `resolveImages` functions since they are always used together.
Originally the code in the `src/`-folder was shared between the main/worker-threads, and back then it probably made sense that the `PDFDocument` constructor accepted different arguments.
However, for many years we've not been passing anything *except* Streams to `PDFDocument` and we should thus be able to slightly simplify that code. Note that for e.g. unit-tests of this code, using either a `NullStream` or a `StringStream` works just fine.
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/DOMMatrix/DOMMatrix#browser_compatibility, all browsers that we support have native `DOMMatrix` implementations (since quite some time too).
Hence Node.js is the only environment that lack `DOMMatrix` support, which probably isn't that surprising given that it's browser functionality.
While the `DOMMatrix` polyfill isn't that large, it nonetheless seems completely unnecessary to bundle it in the `legacy` builds when it's not needed in browsers. However, we can avoid that by simply listing `dommatrix` as a dependency for the `pdfjs-dist` library.
This is another part in a series of patches that try to re-implement PR 14287 in smaller steps.
Besides converting `Driver._send` to use the Fetch API, this also changes the method to return a `Promise` to get rid of the callback function.
Please note that I *purposely* try to maintain the existing behaviour of re-sending the data on failure/unexpected response, including how/where the old callback function was invoked.
*Please note:* I don't really know anything about a11y, hence it's possible that this patch either doesn't work correctly or at least isn't a complete solution.
In both the SecondaryToolbar and the Sidebar we have "button groups" that functionally acts essentially like radio-buttons. Based on skimming through [this MDN article](https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Roles/radio_role) it thus appears that we should tag them as such, using `role="radiogroup"` and `role="radio"`, and then utilize the `aria-checked` attribute to indicate to a11y software which button is currently active.
When there are *multiple* empty glyphs at the start of the data, ensure that the "first" glyph gets a correct `endOffset` to avoid skipping it during parsing in the `sanitizeGlyph` function.
The situation described in issue 14626 seems like a fairly special case, and it thus seem reasonable that we simply follow the same pattern as elsewhere in the `PartialEvaluator` when the `stopAtErrors` API-option is being used.
Unfortunately simply using `appearance: textfield;`, or even `-webkit-appearance: textfield;`, doesn't actually hide the "spinner" in number-input elements in e.g. Google Chrome.
Hence we need to use a work-around with the `-webkit-inner-spin-button` rule, however in our CSS code we also have `-webkit-outer-spin-button` currently. According to both [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/CSS/::-webkit-outer-spin-button#browser_compatibility) and also manual testing in Google Chrome Beta 99, the `-webkit-outer-spin-button` rule is no longer necessary and we can thus clean-up the CSS a tiny bit.
This was added in PR 4516 specifically for Safari on iOS devices, but according to MDN it should no longer be necessary now; see https://developer.mozilla.org/en-US/docs/Web/CSS/-webkit-overflow-scrolling#browser_compatibility
According to the MDN compatibility data, this CSS feature:
- Was never implemented anywhere *except* for Safari on iOS.
- Was never standardized and thus never existed in an *unprefixed* version.
- Has now been removed, starting with Safari version 13.
Given that [the FAQ](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) already lists Safari as "Mostly" supported, and that the default viewer is written primarily for Mozilla Firefox, it ought to be fine to remove these CSS rules now.
Since the Autoprefixer plugin indirectly depends on this, it seems like a good idea to add this as a direct dependency in the PDF.js project to hopefully avoid having to manually update `caniuse-lite` in the future; see https://github.com/browserslist/browserslist#browsers-data-updating
Also, slightly tweaks the Autoprefixer config for GENERIC-builds of the PDF.js library; note that this change doesn't affect the contents of the *built* `web/viewer.js` file.
The "External: Promise"-page in the JSDocs pre-dates the introduction of `Promise`s, as a generally available standard JS feature, by a number of years. Hence it now longer seems necessary, as far as I can tell, to include this "special" page in the documentation.
Also, while unrelated to the rest of the patch, updates the `test/`-folder description in the documentation.
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
This removes the `DocumentInfoValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
- Scroll the selected reference into view (makes it easier to tell which pdf you're looking at)
- Show the keyboard shortcuts (easier for new people)
- Keep the test/ref controls visible (if you scroll you can now tell if you're looking at a test or ref)
Sometimes I get a "Unable to find target with id XXX closeTarget..." error
when running tests which happens when test.js tries to close all the
open pages. I haven't been able to fully verify since this is intermittent,
but I think this is coming from us closing the window in driver.js and also
trying to close it in test.js.
- it aims to fix:
- https://bugzilla.mozilla.org/show_bug.cgi?id=1753075;
- https://bugzilla.mozilla.org/show_bug.cgi?id=1743245;
- https://bugzilla.mozilla.org/show_bug.cgi?id=1710019;
- issue #13211;
- issue #14521.
- previously we were trying to adjust lineWidth to have something correct after the current transform is applied but this approach was not correct because finally the pixel is rescaled with the same factors in both directions.
And sometimes those factors must be different (see bug 1753075).
- So the idea of this patch is to apply a scale matrix to the current transform just before setting lineWidth and stroking. This scale matrix is computed in order to ensure that after transform, a pixel will have its two thickness greater than 1.
All call-sites that use `wrapReason` should be passing a (possibly cloned) `Error` to the helper function, hence we shouldn't need to have a fallback code-path for any other data.
Note that for the `cancel`/`error` methods on Streams, since PR 11115 we've been asserting that the argument is in fact an `Error` as intended.
When calling `wrapReason` from *rejected* Promises, we should also be guaranteed that an `Error` is provided thanks to the ESLint rules `no-throw-literal` and `prefer-promise-reject-errors`.
At this point in time, after recent rounds of clean-up, the `webkit`-prefixed Fullscreen API is the only remaining *browser-specific* compatibility hack in the `web/`-folder JavaScript code.
The standard, and thus unprefixed, Fullscreen API has been supported for *over three years* in both Mozilla Firefox and Google Chrome. [According to MDN](https://developer.mozilla.org/en-US/docs/Web/API/Fullscreen_API#browser_compatibility), the unprefixed Fullscreen API has been available since:
- Mozilla Firefox 64, released on 2018-12-11; see https://wiki.mozilla.org/Release_Management/Calendar#Past_branch_dates
- Google Chrome 71, released on 2018-12-04; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
Hence *only* Safari now requires using a prefixed Fullscreen API, and it's thus (significantly) lagging behind other browsers in this regard.
Considering that the default viewer is written *specifically* to be the UI for the Firefox PDF Viewer, and that we ask users to not just use it as-is[1], I think that we should only support the standard Fullscreen API now.
Furthermore, note also that the FAQ already lists Safari as "Mostly" supported; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
---
[1] Note e.g. http://mozilla.github.io/pdf.js/getting_started/#introduction
> The viewer is built on the display layer and is the UI for PDF viewer in Firefox and the other browser extensions within the project. It can be a good starting point for building your own viewer. *However, we do ask if you plan to embed the viewer in your own site, that it not just be an unmodified version. Please re-skin it or build upon it.*
Now that the there's ECMAScript support for properly private methods on `class`es, we can use that instead and thus remove all of the `@private` JSDocs comments.
We can (and in my opinion should) use the standard `atob`/`btoa` functions, rather than manually re-implementing this functionality for the font-tests.
This will default to generating test images at the device pixel
ratio of the machine the tests are created on unless the
test explicitly defines and output scale using the
`outputScale` setting. This makes the test look visually
like they would on the machine they are running on. It
also allows us to test different output scales.
Currently we'll happily attempt to send any argument passed to this method over to the worker-thread, without doing any sort of validation.
That could obviously be quite bad, since there's first of all no protection against sending unclonable data. Secondly, it's also possible to pass data that will cause the `Ref.get` call in the worker-thread to fail immediately.
In order to address all of these issues, we'll now properly validate the argument passed to `PDFDocumentProxy.getPageIndex` and when necessary reject already on the main-thread instead.
Trying to use a non-string argument in either a `Cmd` or a `Name` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, similar to the existing one used e.g. in the `Dict.set` method.
This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
Trying to use a non-string `key` in a `Dict` is not intended, and would basically be an implementation error. Hence we can add a non-PRODUCTION check to enforce this, complementing the existing `value` check added in PR 11672.
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls.
These changes were *mostly* done using regular expression search-and-replace, with two exceptions:
- In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary.
- In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.
Unless you actually need to check that something is both a `Name` and also of the *correct* type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.
This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.
This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
Unless you actually need to check that something is both a `Cmd` and also of the *correct* type, using `instanceof Cmd` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.
This patch uses ESLint to enforce this, since we obviously still want to keep the `isCmd` helper function for where it makes sense.
Given that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances this ought to help provide slightly better TypeScript definitions.
The manually tracked `resolved`-property is no longer necessary, since the same information is now directly available on all `PromiseCapability`-instances.
Furthermore, since the `PDFObjects.resolve` method is not documented as accepting e.g. only Object-data, we probably shouldn't resolve the `PromiseCapability` with the `data` and instead only store it on the `PDFObjects`-instance.[1]
---
[1] While Objects are passed by reference in JavaScript, other primitives such as e.g. strings are passed by value and the current implementation *could* thus lead to increased memory usage. Given how we're using `PDFObjects` in the PDF.js code-base none of this should be an issue, but it still cannot hurt to change this.
This ensures that the underlying data cannot be accessed directly, from the outside, since that's definately not intended here.
Note that we expose `PDFObjects`-instances, via the `commonObjs` and `objs` properties, on the `PDFPageProxy`-instances hence these changes really cannot hurt.
This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.
*Please note:* I'm completely fine with this patch being rejected, and the issue instead closed as WONTFIX, since this is unfortunately a case where the TypeScript definitions dictate how we can/cannot write JavaScript code.
Apparently the TypeScript definitions generation converts the existing `PixelsPerInch` code into a `namespace` and simply ignores the getter; please see a7fc0d33a1/types/src/display/display_utils.d.ts (L223-L226)
Initially I tried tagging `PixelsPerInch` as en `@enum`, see https://jsdoc.app/tags-enum.html, however that unfortunately didn't help.
Hence the only good/simple solution, as far as I'm concerned, is to convert `PixelsPerInch` into a class with `static` properties. This patch results in the following diff, for the `gulp types` build target:
```diff
@@ -195,9 +195,10 @@
*/
static toDateObject(input: string): Date | null;
}
-export namespace PixelsPerInch {
- const CSS: number;
- const PDF: number;
+export class PixelsPerInch {
+ static CSS: number;
+ static PDF: number;
+ static PDF_TO_CSS_UNITS: number;
}
declare const RenderingCancelledException_base: any;
export class RenderingCancelledException extends RenderingCancelledException_base {
```
In some cases, in the `PDFPageView` implementation, we're modifying the `sx`/`sy` properties when CSS-only zooming is being used.
Currently this requires that you remember to *manually* update the `scaled` property to prevent issues, which doesn't feel all that nice and also seems error-prone. By replacing the `scaled` property with a getter, this is now handled automatically instead.
The `CanvasRenderingContext2D.backingStorePixelRatio` property was never standardized, and only Safari set (its prefixed version of) it to anything other than `1`.
Note that e.g. MDN doesn't contain any information about this property, and one of the few sources of information (at this point) is the following post: https://stackoverflow.com/questions/24332639/why-context2d-backingstorepixelratio-deprecated
Hence we can simplify the `getOutputScale` helper function, by removing some dead code, and now it no longer requires any parameters when called.
Soft masks can be enabled/disabled at anytime and at different
points in the save/restore stack. This can lead to
the amount of save/restores becoming unbalanced across the
two canvases. Instead of save/restoring on the temporary canvas
change it so we only track state on the main (suspended canvas).
I was also getting an out balance stack from patterns, so I've also
fixed that and added a warning that will at least show up on chrome.
It would be nice to add this so Firefox at some point too.
Fixes#11328, #14297 and bug 1755507
At this point all the various Stream-classes extends an abstract base-class, hence this helper function is no longer necessary and only adds unnecessary indirection in the code.
Unfortunately I don't have a test-case that breaks without this change, however the `stringToPDFString` helper function will fail if anything other than a string is passed to it.
The changes in this patch thus make this code more-or-less identical to that found in the `Catalog.{_collectJavaScript, parseDestDictionary}` methods.
Given that the `Navigator` interface has been available since "forever", please see https://developer.mozilla.org/en-US/docs/Web/API/Navigator#browser_compatibility, it's somewhat difficult to see why these checks are actually necessary since the viewer is only intended for usage in browsers.
Looking at the history of the code, this functionality was originally placed in the general `src/shared/compatibility.js` file which could thus run in e.g. worker-threads and Node.js environments (where the `Navigator` interface isn't available).
- it aims to fix#14562;
- 'X-\n' were not correctly positioned;
- when X is a diacritic (e.g. in "sä-\n", which is decomposed into "sa¨-\n") we must handle both things:
- diacritics on the one hand;
- "-\n" on the other hand.
According to the MDN compatibility data, see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility, all browsers that we support have native `ReadableStream` implementations (since quite some time too).
Hence only Node.js is now lagging behind w.r.t. `ReadableStream` support, and its experimental implementation doesn't really help us given the life-span of the LTS releases (see https://en.wikipedia.org/wiki/Node.js#Releases).
It seems quite unfortunate to bundle a `ReadableStream` polyfill in the `legacy` builds when it's unnecessary in browsers, given its overall size, but fortunately we can avoid that by simply listing `web-streams-polyfill` as a dependency for the `pdfjs-dist` library.
With recent changes, specifically PR 14515 *and* the previous patch, the `createObjectURL` helper function is now only used with the SVG back-end.
All other call-sites, throughout the code-base, are now using `URL.createObjectURL(...)` directly and it no longer seems necessary to keep exposing the helper function in the API.
Finally, the `createObjectURL` helper function is moved into the `src/display/svg.js` file to avoid unnecessarily duplicating this code on both the main- and worker-threads.
Given that most of the code-base is already using native functionality, we can update these unit-tests similarily as well.
- For the `blob:`-URL test, we simply use `URL.createObjectURL(...)` and `Blob` directly instead.
- For the `data:`-URL test, we simply use `btoa` to do the Base64 encoding and then build the final URL-string.
This is essentially a *continuation* of PR 7926, where we added support for rejecting the current `PDFDocumentLoadingTask`-promise by throwing inside of the `onPassword`-callback.
Hence the naive way to address [bug 1754421](https://bugzilla.mozilla.org/show_bug.cgi?id=1754421) would be to simply throw in the `onPassword`-callback used in the default viewer. However it unfortunately turns out to not work, since the password input/validation is asynchronous, and we thus need another approach.
The simplest solution that I can come up with here, is thus to *extend* the `onPassword`-callback to also reject the current `PDFDocumentLoadingTask`-instance if an `Error` is explicitly passed as the input to the callback function. (This doesn't feel great, but I cannot see a better solution that isn't really complicated.)
While it's obviously fine to use the same PDF document in different reference-tests, note how we e.g. have both `eq` and `text` tests for one document, we should always avoid adding *duplicate* files in the `test/pdfs/` folder.
The file used in this test-case is *identical* to, i.e. the md5 entry perfectly matches, the file used with the `xfa_bug1716380` test-case.
While it's obviously fine to use the same PDF document in different reference-tests, note how we e.g. have both `eq` and `text` tests for one document, we should always avoid adding *duplicate* files in the `test/pdfs/` folder.
This appears to be consistent with the behaviour in both Adobe Reader and PDFium (in Google Chrome); this is essentially the same approach as used for a single decimal point in PR 9827.
Please note that while we "support" some (by now) fairly old browsers, that essentially means that the library (and viewer) will load and that the basic functionality will work as intended.[1]
However, in older browsers, some functionality may not be available and generally we'll ask users to update to a modern browser when bugs (specific to old browsers) are reported.[2]
There's always a question of just how old browsers the PDF.js contributors can realistically support, and here I'm suggesting that we place the cut-off point at approximately *three* years.
With that in mind, this patch updates the *minimum* supported browsers (and environments) as follows:
- Chrome 73, which was released on 2019-03-12; see https://en.wikipedia.org/wiki/Google_Chrome_version_history
- Firefox ESR (as before); see https://wiki.mozilla.org/Release_Management/Calendar
- Safari 12.1, which was released on 2019-03-25; see https://en.wikipedia.org/wiki/Safari_version_history#Safari_12
- Node.js 12, which was release on 2019-04-23 (and will soon reach EOL); see https://en.wikipedia.org/wiki/Node.js#Releases
---
[1] Assuming a `legacy`-build is being used, of course.
[2] In general it's never a good idea to use an old/outdated browser, since those may contain *known* security vulnerabilities.
Note that the *browser* findbar in Firefox uses "Title Case" for the labels, and it thus seem like a good idea to ensure that `PDFFindBar` in consistent with that.
Furthermore, the new label added in PR #13261 uses the "Title Case" format which means that currently the default viewer findbar looks inconsistent.
*Please note:* Based on the official Firefox localization docs, see https://firefox-source-docs.mozilla.org/l10n/overview.html#string-updates, changing only the casing should *not* require updating the key:
> 1) If the change is minor, like fixing a spelling error or case, the developer should update the en-US translation without changing the l10n-id.
When the viewer becomes narrow, the `PDFFindBar` will (forcibly) wrap its elements to prevent it from extending to the full window width.
Currently, after PR 13261, this now leads to the `findResultsCount` span taking up vertical space *unconditionally* when the findbar is wrapped. To avoid a blank space being shown in this case, before searching has begun, place the `findResultsCount` span in a "message" rather than an "options" container.
- get original index in using a dichotomic seach instead of a linear one;
- normalize the text in using NFD;
- convert the query string into a RegExp;
- replace whitespaces in the query with \s+;
- handle hyphens at eol use to break a word;
- add some \s* around punctuation signs
[api-minor] Remove the `normalizeWhitespace` option in the `PDFPageProxy.{getTextContent, streamTextContent}` methods (issue 14519, PR 14428 follow-up)
With these changes, we'll now *always* replace all whitespaces with standard spaces (0x20). This behaviour is already, since many years, the default in both the viewer and the browser-tests.
Either the latest Chromium update, the latest Puppeteer update, or a combination of them both are now causing the Windows bot to timeout during the browser-tests; please see PR 14392.
- it aims to fix#14502 and bug 1721335;
- Acrobat and Pdfium do the same;
- it'll avoid to have truncated data when printed;
- change the factor to compute font size in using field height: lineHeight = 1.35*fontSize
- this is the value used by Acrobat.
- in order to not have truncated strings on the bottom, add few basic metrics for standard fonts.
This `disableCreateObjectURL` option was originally introduced all the way back in PR 4103 (eight years ago), in order to work-around `URL.createObjectURL()`-bugs specific to Internet Explorer.
In PR 8081 (five years ago) the `disableCreateObjectURL` option was extended to cover Google Chrome on iOS-devices as well, since that configuration apparently also suffered from `URL.createObjectURL()`-bugs.[1]
At this point in time, I thus think that it makes sense to re-evaluate if we should still keep the `disableCreateObjectURL` option.
- For Internet Explorer, support was explicitly removed in PDF.js version `2.7.570` which was released one year ago and all IE-specific compatibility code (and polyfills) have since been removed.
- For Google Chrome on iOS-devices, while we still "support" such configurations, it's *not* the focus of any development and platform-specific bugs are thus often closed as WONTFIX.
Note here that at this point in time, the `disableCreateObjectURL` option is *only* being used in the viewer and any `URL.createObjectURL()`-bugs browser/platform bugs will thus not affect the main PDF.js library. Furthermore, given where the `disableCreateObjectURL` option is being used in the viewer the basic functionality should also remain unaffected by these changes.[2]
Furthermore, it's also possible that the `URL.createObjectURL()`-bugs have been fixed in *browser* itself since PR 8081 was submitted.[3]
Obviously you could argue that this isn't a lot of code, w.r.t. number of lines, and you'd be technically correct. However, it does add additional complexity in a few different viewer components which thus add overhead when reading and working with this code.
Finally, assuming the `URL.createObjectURL()`-bugs are still present in Google Chrome on iOS-devices, I think that we should ask ourselves if it's reasonable for the PDF.js project (and its contributors) to keep attempting to support a configuration if the *browser* developers still haven't fixed these kind of bugs!?
---
[1] According to https://groups.google.com/a/chromium.org/forum/#!topic/chromium-html5/RKQ0ZJIj7c4, which is linked in PR 8081, that bug was mentioned/reported as early as the 2014 (eight years ago).
[2] Viewer functionality such as e.g. downloading and printing may be affected.
[3] I don't have access to any iOS-devices to test with.
Either the latest Chromium update, the latest Puppeteer update, or a combination of them both are now causing the Windows bot to timeout during the browser-tests.
To unblock both the updates and other improvements (i.e. the `structuredClone` polyfill), let's simply disable the problematic configuration for now since this a Mozilla project after all.
This allows us to remove the manually implemented `structuredClone` polyfill, thus reducing the maintenance burden for the `LoopbackPort` class; refer to https://github.com/zloirock/core-js#structuredclone
*Please note:* While `structuredClone` support landed already in Firefox 94, Google Chrome only added it in version 98 (currently in Beta). However, given that the `LoopbackPort` will only be used together with *fake workers* in browsers this shouldn't be too much of a problem.[1]
For Node.js environments, where *fake workers* are unfortunately necessary, using a `legacy/`-build is already required which thus guarantees that the `structuredClone` polyfill is available.
Also, the patch updates core-js to the latest version since that one includes `structuredClone` improvements; please see https://github.com/zloirock/core-js/releases/tag/v3.20.3
---
[1] Given that we only support browsers with proper worker support, if *fake workers* are being used that essentially indicates a configuration problem/error.
- it aims to fix#14497;
- previously, only rotations with an angle 0, 90, 180 or 270 were taken into account;
- so generalize to any angle but keep the fast path for 0, 90, ... because they're likely more common than anything else.
So that Firefox doesn't switch to pixel mode for compat with other
browsers.
This should fix https://github.com/mozilla/pdf.js/issues/14476, in terms
of restoring the previous behavior.
We probably want to change the pixel-based scrolling code to not scroll
so much (the deltaMode stuff normalizes to +/-1 tick for each wheel
event, perhaps the pixel-based value should do the same).
This commit fixes Bug 1743245 (Grided PDF file lines rendered too thick) which was created by a fix for #12868 .
The lineWidth was set to round(1 * this._combinedScaleFactor) when the pixel is drawn as a parallelorgam with a height <1. This fix changes this to floor(1*this._combinedScaleFactor) .
This change shows a visual result comparable to Chrome and Acrobat.
Regarding the last PR 3 statements in canvas.js are affected and will change with this commit (stroke and paintChar).
renaming the reference files to naming comvention
Given that the regular expression has already become more complex (after the initial patch adding it), it seems to me that it probably cannot hurt to add a global cache to reduce unnecessary re-parsing.
Obviously the `Glyph`-instances are being cached *per* font, however in most documents multiple fonts are being used and in practice there's very often a fair amount of overlap between the /ToUnicode-data in different fonts[1].
Consider for example loading and rendering the entire `tracemonkey.pdf` document (from the test-suite), which isn't a particularily large document. In that case the `getCharUnicodeCategory` function is being called a total of `601` times, however there's only `106` *unique* unicode-chars being checked.
*Please note:* In practice I suppose that this won't have a *huge* effect on overall performance, however given the relative simplicity of this patch I figured that it'd not hurt to submit it for review.
---
[1] Consider e.g. how there's usually different fonts used for regular, bold, respectively italic text.
- it aims to fix issue #14307;
- this event has been added recently in Firefox and we can now use it;
- fix few bugs in aform.js or in annotation_layer.js;
- add some integration tests to test keystroke events (see `AFSpecial_Keystroke`);
- make dispatchEvent in the quickjs sandbox async.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1749563;
- use some helper functions to get (u|i)int** values in buffer: it helps to have a clearer code;
- in composite glyphes the translations values with a transformations are signed so consequently get some int8 instead of uint8;
- add few TODOs.
Please refer to https://www.pdfa.org/norm-refs/Type1Fonts.pdf#page=15 for the expected format for the /CharStrings entries.
In the referenced PDF document the /CharStrings are missing the expected end-token, which causes us to swallow the start of the next glyph name.
After the changes in PR 14428 we can *directly*, and more efficiently, handle whitespace conversion in `PartialEvaluator.getTextContent` when the `normalizeWhitespace` option is being used.
This way we no longer need a separate helper function for this, and can avoid having to (again) iterate through the text and checking each character. Finally, this also removes the need for using a regular expression on e.g. all non-ASCII text.
Inlining the checks should be a *tiny bit* more efficient, since it avoids have to make *unconditional* function calls in these fairly commonly used helper functions.
*Please note:* This is a tentative patch, since I don't know if this is deemed important enough to fix.
The new event could be seen as a *supplement* to the existing "documentinit" and "documentloaded" events, but for the case when a PDF document fails to load.
To make the "documenterror" event generally useful, it'll include both the localized error message as well as the original reason for the error (when that exists).
This patch implements this by looking for the UTF-8 BOM, i.e. `\xEF\xBB\xBF`, in order to determine the encoding.[1]
The actual conversion is done using the `TextDecoder` interface, which should be available in all environments/browsers that we support; please see https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder#browser_compatibility
---
[1] Assuming that everything lacking a UTF-16 BOM would have to be UTF-8 encoded really doesn't seem correct.
In corrupt PDF documents Type3 fonts may introduce circular dependencies, thus resulting in the affected font(s) never loading and parsing/rendering never completing.
Note that I've not seen any real-world examples of this kind of font corruption, but the attached PDF document was rather found in https://github.com/pdf-association/safedocs/tree/main/Miscellaneous%20Targeted%20Test%20PDFs
*Please note:* That repository contains a number of reduced test-cases that are specifically intended to test interoperability (between PDF viewer) and parsing/rendering for various kinds of strange/corrupt PDF documents.
Some of the test-cases found there may thus not make sense to try and "fix" upfront, in my opinion, unless the problems are also found in real-world PDF documents.
While `PageViewport` apparently makes sense in TypeScript environments, given that it's being returned by the `PDFPageProxy.getViewport`-method in the API, we really don't want to extend the *public* API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually.
Hence we follow the same pattern as in PR 14013, and also extend the API unit-tests to ensure that `PDFPageProxy.getViewport` always returns a `PageViewport`-instance as expected.
This prevents the `BaseSVGFactory.create`-method from throwing, and thus preventing any remaining Annotations (on the page) from rendering in corrupt documents.
This helper function has never been used in e.g. the worker-thread, hence its placement in `src/shared/util.js` led to a *small* amount of unnecessary duplication.
After the previous patches this helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.
*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on this helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
As part of the changes/improvement in PR 14092, we're no longer using the `addLinkAttributes` directly in e.g. the AnnotationLayer-code.
Given that the helper function is now *only* used in the viewer, hence it no longer seems necessary to expose it through the official API.
*Please note:* It seems somewhat unlikely that third-party users were relying *directly* on the helper function, which is why it's not being exported as part of the viewer components. (If necessary, we can always change this later on.)
This structure contains *almost* exclusively references to DOM elements (and a couple of simple strings), rather than complete classes/functions. Hence the `eventBus`-option sticks out a fair bit, and I'd guess that it's *mostly* unused in e.g. third-party implementations.
Given that we, in multiple places, mention that the default viewer shouldn't be used as-is I really don't think that we need to keep this special `eventBus`-option around. Furthermore, nowadays it's also a lot easier to (safely) access the existing `EventBus`-instance in the viewer; see https://github.com/mozilla/pdf.js/wiki/Third-party-viewer-usage#initialization-promise which shows how to listen for the default viewer being initialized (and its `eventBus` thus being available).
The patch in PR 14335 *essentially* re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous.
While this method is currently only used as a *fallback* in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the *entire* /Pages tree[1].
With this method now being asynchronous, we're able to handle fetching of References in a *much* easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything.
These changes also let us simplify the call-site slightly, by calling the method *directly* instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s).
Furthermore, this patch contains the following other changes:
- Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead.
- Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error *immediately* afterwards.
---
[1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.
This method is now being used a lot more, compared to when it's added, since it's now used together with scripting as part of the `PDFDocument.fieldObjects` parsing (called during viewer initialization).
For /Page Dictionaries that we've already parsed, the `pageIndex` corresponding to a particular Reference is already known and we're thus able to skip *all* parsing in the `Catalog.getPageIndex` method for those cases.
I happened to notice that we didn't have *any* unit-tests for either `getFieldObjects` or `getCalculationOrderIds`, on the `PDFDocumentProxy` class, which seems unfortunate since it's API functionality that we depend on in e.g. the viewer.
Besides converting `Catalog.getPageDict` to an `async` method, thus simplifying the code, this patch also allows us to pro-actively fix a existing issue.
Note how we're looking up References in such a way that `MissingDataException`s won't cause trouble, however it's *technically possible* that the entries (i.e. /Count, /Kids, and /Type) in a /Pages Dictionary could actually be indirect objects as well. In the existing code this could lead to *some*, or even all, pages failing to load/render as intended.
In practice that doesn't *appear* to happen in real-world PDF documents, but given all the weird things that PDF software do I'd prefer to fix this pro-actively (rather than waiting for a bug report).
With `Catalog.getPageDict` being `async` this is now really simple to address, however I didn't want to introduce a bunch more *unconditional* asynchronicity in this method if it could be avoided (since that could slow things down). Hence we'll *synchronously* lookup the *raw* data in a /Pages Dictionary, and only fallback to asynchronous data lookup when a Reference was encountered.
In addition to the above, this patch also makes the following notable changes:
- Let `Catalog.getPageDict` *consistently* reject with the actual error, regardless of what data we're fetching. Previously we'd "swallow" the actual errors except when looking up Dictionary entries, which is inconsistent and thus seem unfortunate. As can be seen from the updated unit-tests this change is API-observable, hence why the patch is tagged `[api-minor]`.
- Improve the consistency of the Dictionary /Type-checks in both the `Catalog.getPageDict` and `Catalog.getAllPageDicts` methods.
In `Catalog.getPageDict` there's a fallback code-path where we're *incorrectly* checking the /Page Dictionary for a /Contents-entry, which is wrong since a /Page Dictionary doesn't need to have a /Contents-entry in order to be valid.
For consistency the `Catalog.getAllPageDicts` method is also updated to handle errors in the /Type-lookup correctly.
- Reduce the `PagesCountLimit.PAUSE_EAGER_PAGE_INIT` viewer constant, to further improve loading/rendering performance of the *second* page during initialization of very long documents; PR 14359 follow-up.
*This addresses the following case missing from the previous patch:*
The viewer is loaded in an *active* window/tab, and enough time is allowed to pass in order to allow rendering to start. However, if the user then switches to another tab (or another program) *before* rendering has finished, the "load" event also needs to be unblocked.
Given that `requestAnimationFrame` is being used, see the `src/diplay/api.js` file, an inactive window/tab means that rendering will not run and we'll thus not fetch all pages. The latter is a requirement for the "load" event to be unblocked, in the MOZCENTRAL-version of, the default viewer.
This patch is a *partial* solution, since it only addresses the following situations:
- A *background* tab (containing the viewer) is reloaded, e.g. via the tab-bar context menu.
- The viewer is loaded in a active tab, but the user switches away from it (or switches to another program window) *before* rendering has started.
Most likely this code predates our use of Node.js, but in Node.js asking
for user confirmation is a solved problem, so we can remove the custom
logic we have for this, which overall makes things much simpler.
This provides a work-around for badly generated PDF documents that contain *negative* /FitH parameters (in the referenced issue the value `-32768` is used).
This patch, first of all, removes circular dependencies in the TypeScript definitions. Secondly, it also moves `RenderingStates` into `web/ui_utils.js` to break another type-dependency and directly use the `XfaLayerBuilder` during XFA-printing.
Finally, note that this patch *slightly* reduces the size of the default viewer (e.g. in the `MOZCENTRAL` build) by not having to bundle code which is completely unused.
This patch circumvents the issues seen when trying to update TypeScript to version `4.5`, by "simply" fixing the broken/missing JSDocs and `typedef`s such that `gulp typestest` now passes.
As always, given that I don't really know anything about TypeScript, I cannot tell if this is a "correct" and/or proper way of doing things; we'll need TypeScript users to help out with testing!
*Please note:* I'm sorry about the size of this patch, but given how intertwined all of this unfortunately is it just didn't seem easy to split this into smaller parts.
However, one good thing about this TypeScript update is that it helped uncover a number of pre-existing bugs in our JSDocs comments.
The size of the `web/ui_utils.js` file has increased over time, as more code has been added to (or moved into) that file. To reduce its size slightly, this patch moves the event-related functionality into a separate file.
In PR 14114 this was only added to the default viewer, which means that in the viewer components the user would need to *manually* implement /Lang handling. This was (obviously) a bad choice, since the viewer components already support e.g. structTrees by default; sorry about overlooking this!
To avoid having to make *two* `getMetadata` API-calls[1] very early during initialization, in the default viewer, the API will now cache its result. This will also come in handy elsewhere in the default viewer, e.g. by reducing parsing when opening the "document properties" dialog.
---
[1] This not only includes a round-trip to the worker-thread, but also having to re-parse the /Metadata-entry when it exists.
After the changes in PR 14338, specifically in the `XRef.parse`-method, the /Pages-entry will now always have been fetched/validated when the `Catalog`-instance is created.
Hence we can directly access the /Pages-entry in `Catalog.getPageDict` and thus avoid *one* asynchronous data-lookup per page in the document. (In practice this is unlikely to show up in e.g. benchmarks, but it really cannot hurt.)
Finally, make sure that the `getPageDict`/`getAllPageDicts`-methods track the /Pages-tree reference correctly to prevent circular references in corrupt documents.
We use string arguments in all other places, so these two places are a
bit inconsistent in that sense. Moreover, it's just one argument now,
which makes it a bit easier to read and see what it does because we
don't have to pass the always-empty options argument anymore. Finally,
doing it like this ensures it works in all Puppeteer versions given
https://github.com/puppeteer/puppeteer/issues/7836.
Once the upstream bug is fixed it can be enabled again because it's
causing way too much noise now. This is tracked in issue #14293. Note
that I deliberately added a new block so we can easily remove it later
on and because the other block is about another bug.
By making this API-call *unconditionally*, we introduce a (slight) delay in the initialization of *all* documents.
That seems quite unfortunate, since `pdfjs.enablePermissions` is off by default, and it thus seem better only do the API-call when actually needed; sorry about this!
For encrypted PDF documents without the required permissions set, this patch adds support for disabling of form editing. However, please note that it also requires that the `pdfjs.enablePermissions` preference is set to `true`[1] (since PDF document permissions could be seen as user hostile).
Based on https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G6.1942134, this condition hopefully makes sense.
---
[1] Either manually with `about:config`, or using e.g. a [Group Policy](https://github.com/mozilla/policy-templates).
Version 14 that we used before is now in maintenance mode, so we should
upgrade to the most recent LTS version.
Moreover, use the most recent `setup-node` workflow version and syntax;
see https://github.com/actions/setup-node#usage.
This patch is essentially *another* continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.
For most documents, unless they're *very* long, we'll eagerly initialize all of the pages in the viewer. For shorter documents having all pages loaded/initialized early provides overall better performance/UX in the viewer, however there's cases where it can instead *hurt* performance.
For documents with a couple of thousand pages[1], the parsing and pre-rendering of the *second* page of the document can be delayed (quite a bit). The reason for this is that we trigger `PDFDocumentProxy.getPage` for *all pages* early during the viewer initialization, which causes the worker-thread to be swamped with handling (potentially) thousands of `getPage`-calls and leaving very little time for other parsing (such as e.g. of operatorLists).
To address this situation, this patch thus proposes temporarily "pausing" the eager `PDFDocumentProxy.getPage`-calls once a threshold has been reached, to give the worker-thread a change to handle other requests.[2]
Obviously this may *slightly* delay the "pagesloaded" event in longer documents, but considering that it's already the result of asynchronous parsing that'll hopefully not be seen as a blocker for these changes.[3]
---
[1] A particularly problematic example is https://github.com/mozilla/pdf.js/files/876321/kjv.pdf (16 MB large), which is a document with 2236 pages and a /Pages-tree that's only *one* level deep.
[2] Please note that I initially considered simply chaining the `PDFDocumentProxy.getPage`-calls, however that'd slowed things down for all documents which didn't seem appropriate.
[3] This patch will *hopefully* also make it possible to re-visit PR 11312, since it seems that changing `Catalog.getPageDict` to an `async` method wasn't the problem in itself. Rather it appears that it leads to slightly different timings, thus exacerbating the already existing issues with the worker-thread being overloaded by `getPage`-calls.
Having recently worked with that method, there's a couple of (very old) issues that I'd also like to address and having `Catalog.getPageDict` be `async` would simplify things a great deal.
Rather than "swallowing" the actual Errors, when data fetching fails, ensure that they're always being propagated as intended to the call-site instead.
Note that we purposely handle `XRefEntryException` specially, to make it possible to fallback to indexing all XRef objects.
Rather than trying, and failing, to fetch the entire /Pages-tree for documents with corrupt XRef tables, let's fallback to indexing all objects *before* trying to invoke the `Catalog.getAllPageDicts` method.
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
Finally, also changes the `pagePromises` to a *private* property since it's not supposed to be accessed from the "outside".
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
For one thing, this simplifies iteration since we no longer have to worry about/check if `pageCache`-entries are undefined (which will happen for *sparse* `Array`s).
Of particular note is that we're no longer attempting to "null" the `pageCache`-entry from within the `PDFPageProxy._destroy`-method. Given that *synchronous* JavaScript will always run to completion[1] and that we're looping through all pages in `WorkerTransport.destroy` and immediately clear the cache afterwards, that code did/does not really make a lot of sense (as far as I can tell).
Finally, also changes the `pageCache` to a *private* property since it's not supposed to be accessed from the "outside".
---
[1] Unless there are errors, of course.
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of *another* oversight on my part, we're only caching /Page references once we've found the correct page. As long as all pages are loaded *in order* this doesn't really matter (happens by default in the viewer), but when `disableAutoFetch` is used the pages may be fetched in a more random order (this patch reduces the asynchronicity of `Catalog.getPageDict` slightly in that case).
Given that [bug 1336591](https://bugzilla.mozilla.org/show_bug.cgi?id=1336591) was just closed as fixed, thus fixing issue 8022 in Firefox, let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.
PR 8207 added caching to improve the performance of `Catalog.getPageDict`, by not having to repeatedly fetch the same data and also reducing the asynchronicity of that method.
However, because of annoying off-by-one errors[1] the caching became less efficient than it could/should be.[2] Note here that the /Pages-tree is zero-indexed, and that e.g. `pageIndex = 5` thus correspond to the *sixth* page of the document.
---
[1] In particular the `currentPageIndex + count < pageIndex` part.
[2] For example, even when loading a relatively small/simple document such as `tracemonkey.pdf` in the viewer, the number of `xref.fetchAsync(currentNode)` calls are reduced from `56` to `44` with this patch.
These changes improves the consistency ever so slightly in the `PDFOutlineViewer._dispatchEvent` method, by making sure that we can tell the following two cases apart:
- The "pagesloaded" event has *not yet* been fired.
- The "pagesloaded" event has been fired, but no pages were available.
*This patch can be tested e.g. with the `poppler-85140-0.pdf` document from the test-suite.*
For some sufficiently corrupt documents the `getDocument` call will succeed, but fetching even the very first page fails. Currently we only print error messages (in the console) from the `{BaseViewer, PDFThumbnailViewer}.setDocument` methods, but don't actually provide these errors to allow the viewer to handle them properly.
In practice this means that the GENERIC viewer won't display the `errorWrapper`, and in the MOZCENTRAL viewer the *browser* loading indicator is never hidden (since we never unblock the "load" event).
Trying to shadow a non-existent property is always an implementation mistake, since it leads to the `shadow`-call not having any effect.
In PR 14152 I overlooked the fact that it's fairly easy to enforce this during development/testing, since that can help catch e.g. simple spelling bugs.
Currently the `Catalog.metadata` getter only handles errors during parsing, however in a *corrupt* PDF document fetching of the raw /Metadata can obviously fail as well.
Without this patch the `PDFDocumentProxy.getMetadata` method, in the API, can thus fail which it *never* should and this will cause the viewer to not initialize all state as expected.
Fixes one of the documents in issue 14305.
Given that [bug 1336572](https://bugzilla.mozilla.org/show_bug.cgi?id=1336572) was just closed as fixed, thus fixing issue 8019 in Firefox[1], let's add a test-case to enable us to catch any future regressions either in PDF.js or in browsers themselves.
---
[1] It also seems to be working in Google Chrome, although I'm having a slightly difficult time deciphering *exactly* what configurations were affected when looking through issue 8019.
*This patch improves handling of a couple of PDF documents from issue 14303.*
- Update `XRef.indexObjects` to actually clear *all* XRef-caches. Invalid XRef tables *usually* cause issues early enough during parsing that we've not populated the XRef-cache, however to prevent any issues we obviously need to clear that one as well.
- Improve the /Root dictionary validation in `XRef.parse` (PR 9827 follow-up). In addition to checking that a /Pages entry exists, we'll now also check that it can be successfully fetched *and* that it's of the correct type. There's really no point trying to use a /Root dictionary that e.g. `Catalog.toplevelPagesDict` will reject, and this way we'll be able to fallback to indexing the objects in corrupt documents.
- Throw an `InvalidPDFException`, rather than a general `FormatError`, in `XRef.parse` when no usable /Root dictionary could be found. That really seems more appropriate overall, since all attempts at parsing/recovery have failed. (This part of the patch is API-observable, hence the tag.)
With these changes, two existing test-cases are improved and the unit-tests are updated/re-factored to highlight that. In particular `GHOSTSCRIPT-698804-1-fuzzed.pdf` will now both load and "render" correctly, whereas `poppler-395-0-fuzzed.pdf` will now fail immediately upon loading (rather than *appearing* to work).
*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.
The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.
Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.
---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
Given that we're able to "render" this document, let's extend the unit-test to actually check that we're able to obtain the operatorList; although given the overall issues in the document it'll be empty.
This was added in PR 14311, but given that I completely missed to update the `PDFDocument.getPage` signature accordingly it's completely unused.
Given that things work just as fine as-is, let's simply remove that optional parameter for now; sorry about the churn here!
This only applies to severely corrupt documents, where it's possible that the `Parser` throws when we try to access e.g. a /Kids-entry in the /Pages-tree.
Fixes two of the issues listed in issue 14303, namely the `poppler-742-0.pdf...` and `poppler-937-0.pdf...` documents.
Given that Node.js doesn't support Workers, general PDF.js performance will be worse when compared to browsers. In an attempt to improve at least memory usage a little bit, update the Node.js examples to release page resources once parsing is done for that page.
This patch is essentially a continuation of PR 11263, which tried to improve loading/initialization performance of *very* large/long documents.
Note that browsers, in general, don't handle a huge amount of DOM-elements very well, with really poor (e.g. sluggish scrolling) performance once the number gets "large". Furthermore, at least in Firefox, it seems that DOM-elements towards the bottom of a HTML-page can effectively be ignored; for the PDF.js viewer that means that pages at the end of the document can become impossible to access.
Hence, in order to improve things for these *very* large/long documents, this patch will now enforce usage of the (recently added) PAGE-scrolling mode for these documents. As implemented, this will only happen once the number of pages *exceed* 15000 (which is hopefully rare in practice).
While this might feel a bit jarring to users being *forced* to use PAGE-scrolling, it seems all things considered like a better idea to ensure that the entire document actually remains accessible and with (hopefully) more acceptable performance.
Fixes [bug 1588435](https://bugzilla.mozilla.org/show_bug.cgi?id=1588435), to the extent that doing so is possible since the document contains 25560 pages (and is 197 MB large).
This was added on the assumption that the viewer would (eventually) start using the `PDFSinglePageViewer` for e.g. PAGE-scrolling mode and PresentationMode. However, having both a `PDFViewer` and a `PDFSinglePageViewer` side-by-side in the viewer would've been tricky to implement well, which is why PR 14112 implemented PAGE-scrolling for the general `BaseViewer` instead.
Given that the default viewer is no longer (potentially) going to use `PDFSinglePageViewer`, there's code in the `SecondaryToolbar` (and related CSS rules) which is now unnecessary.
Since NPM 7, which is over a year old now since it released in October
2020, NPM automatically transforms lock files from version 1 to version
2. In the NPM 7 release notes they reported:
"One change to take note of is the new lockfile format, which is
backwards compatible with npm 6 users. The lockfile v2 unlocks the
ability to do deterministic and reproducible builds to produce a
package tree."
Not only is this change backwards compatible (so older versions of NPM
will still be able to install everything as expected), reproducability
is also a nice property to have and modern NPM versions will otherwise
constantly do the conversion anyway, causing contributors to explicitly
have to revert the change. Therefore, I believe we should do this now
since it doesn't break backwards compatibility for consumers of this
file. It only means that producers of this file (i.e., us contributors)
need to use at least NPM 7 or higher (as of writing NPM 8 is even
available). According to https://nodejs.org/en/download/releases/ this
means contributors should at least run Node.js 15.0.0, while 17.1.0 is
the most recent as of writing, so to me that sounds reasonable to ask.
In Puppeteer 11 we noticed that Firefox doesn't shut down once the tests
are done anymore. I tracked this down to the `page.goto` call, in
`startBrowser`, never resolving anymore. I can only assume that
something changed in Puppeteer, possibly in combination with recent
Firefox Nightly versions, that caused this, but haven't been able to
fully track it down.
However, I did find that the problem is that the `load` event no longer
triggers, so fortunately we can fix the problem by explicitly waiting
for the `domcontentloaded` event instead. In general this change might
even be better since we now wait until the test framework is fully
loaded before we continue. Note that this also still works for the
current Puppeteer version.
I did find two upstream references that appear to track this issue, both
on the Puppeteer side and on the Firefox side, making me further suspect
that the issue is partly on both sides:
- https://github.com/puppeteer/puppeteer/issues/5806
- https://bugzilla.mozilla.org/show_bug.cgi?id=1706353
If a browser cannot be started, we currently get the following log:
`Error while starting firefox: [object Object]`. This is simply an
oversight from the initial Puppeteer integration work since we never got
into this code path before. With this fix the error log becomes more
useful: `Error while starting firefox: connect ECONNREFUSED ::1:45387`
*Please note:* While this patch on its own is sufficient to prevent the worker-thread from hanging, however in combination with PR 14311 these PDF documents will both load *and* render correctly.
Rather than focusing on the particular structure of these PDF documents, it seemed (at least to me) to make sense to try and prevent all circular references when fetching/looking-up data using the XRef table.
To avoid a solution that required tracking the references manually everywhere, the implementation settled on here instead handles that internally in the `XRef.fetch`-method. This should work, since that method *and* the `Parser`/`Lexer`-implementations are completely synchronous.
Note also that the existing `XRef`-caching, used for all data-types *except* Streams, should hopefully help to lessen the performance impact of these changes.
One *potential* problem with these changes could be certain *browser* exceptions, since those are generally not catchable in JavaScript code, however those would most likely "stop" worker-thread parsing anyway (at least I hope so).
Finally, note that I settled on returning dummy-data rather than throwing an exception. This was done to allow parsing, for the rest of the document, to continue such that *one* bad reference doesn't prevent an entire document from loading.
Fixes two of the issues listed in issue 14303, namely the `poppler-91414-0.zip-2.gz-53.pdf` and `poppler-91414-0.zip-2.gz-54.pdf` documents.
*This patch basically extends the approach from PR 10392, by also checking the last page.*
Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).
Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.
Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
- This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
- For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
- This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.
As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).
Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
Given that not all pages necessarily are being accessed, or that the pages may be accessed out of order, using a `Map` seems like a more appropriate data-structure here.
Furthermore, this patch also adds (currently missing) caching for XFA-documents. Loading a couple of such documents in the viewer, with logging added, shows that we're currently re-creating `Page`-instances unnecessarily for XFA-documents.
For this particular PDF document, we have `/W [1 2 166666666666666666666666666]` which obviously makes no sense.
While this patch makes no attempt at actually validating the entries in the /W-array, we'll now simply abort all processing when the end of the PDF document has been reached (thus preventing hanging the browser).
Please note that this patch doesn't enable the PDF document to be loaded/rendered, but at least it fails "correctly" now.
Fixes one of the issues listed in issue 14303, namely the `REDHAT-1531897-0.pdf`document.
This bug was surprisingly difficult to track down, since it didn't just depend on range-requests being used but also on how quickly the document was loaded. To even be able to reproduce this locally, I had to use a very small `rangeChunkSize`-value (note the unit-test).
The cause of this bug is a bogus entry in the XRef-table, causing us to attempt to request data from *beyond* the actual document size and thus getting into an infinite loop.
Fixes *one* of the issues listed in issue 14303, namely the `PDFBOX-4352-0.pdf` document.
*This patch can be tested e.g. with the `sizes.pdf` document in the test-suite.*
While this patch isn't necessarily the best solution, e.g. it might be possible to solve this with *only* CSS, it's what I was able to come up with to address an old issue.
The solution here re-uses the `spread`-class in PresentationMode, since that one already takes care of centering pages *vertically*, together with a dummy-page that takes up the entire height of the window.
Finally, some PresentationMode-related CSS-rules are also simplified slightly, since the changes in PR 14112 (using Page-scrolling) allows some clean-up here.
The issue that this patch fixes has existed ever since the viewer was first re-factored into components, however it only really affects the `disableAutoFetch = true` mode.
By default we're fetching all pages in `BaseViewer.setDocument`, and as part of the parsing/initialization we're also populating the `PDFLinkService`-pagesRefCache. The purpose of that cache is to make navigating to any internal destinations faster, by not having to (asynchronously) lookup the pageNumber via the API when handling the destination.
In comparison, when the `disableAutoFetch = true` mode is being used we're instead *lazily* initializing the pages in the `BaseViewer.#ensurePdfPageLoaded`-method. For some reason, that I can only assume is a simple oversight, we're not attempting to update the `PDFLinkService`-pagesRefCache in that case.
In the `BaseViewer` this cache is mostly relevant in the `disableAutoFetch = true` mode, since the pages are being initialized *lazily* in that case.
In the `PDFThumbnailViewer` this cache is mostly used for thumbnails that are actually being rendered, as opposed to those created directly from the "regular" pages.
Please note that I'm not suggesting that we remove these caches because they're only used in some situations, but rather because they're for all intents and purposes actually *redundant*. In the API itself, we're already caching both the page-promises and the actual pages themselves on the `WorkerTransport`-instance.
Hence these viewer-caches aren't really necessary in practice, and adds what to me mostly seems like an unnecessary level of indirection.[1]
Given that the viewer now relies on caching in the API itself, this patch also adds a new unit-test to ensure that page-caching works (and keep working) as expected.
---
[1] In the `WorkerTransport.getPage`-method the parameter is being validated on every call, but that's hardly enough code to warrant keeping the "duplicate" caches in the viewer in my opinion.
*Please note:* These changes will primarily benefit longer documents, somewhat at the expense of e.g. one-page documents.
The existing `PDFDocumentProxy.getStats` function, which in the default viewer is called for each rendered page, requires a round-trip to the worker-thread in order to obtain the current document stats. In the default viewer, we currently make one such API-call for *every rendered* page.
This patch proposes replacing that method with a *synchronous* `PDFDocumentProxy.stats` getter instead, combined with re-factoring the worker-thread code by adding a `DocStats`-class to track Stream/Font-types and *only send* them to the main-thread *the first time* that a type is encountered.
Note that in practice most PDF documents only use a fairly limited number of Stream/Font-types, which means that in longer documents most of the `PDFDocumentProxy.getStats`-calls will return the same data.[1]
This re-factoring will obviously benefit longer document the most[2], and could actually be seen as a regression for one-page documents, since in practice there'll usually be a couple of "DocStats" messages sent during the parsing of the first page. However, if the user zooms/rotates the document (which causes re-rendering), note that even a one-page document would start to benefit from these changes.
Another benefit of having the data available/cached in the API is that unless the document stats change during parsing, repeated `PDFDocumentProxy.stats`-calls will return *the same identical* object.
This is something that we can easily take advantage of in the default viewer, by now *only* reporting "documentStats" telemetry[3] when the data actually have changed rather than once per rendered page (again beneficial in longer documents).
---
[1] Furthermore, the maximium number of `StreamType`/`FontType` are `10` respectively `12`, which means that regardless of the complexity and page count in a PDF document there'll never be more than twenty-two "DocStats" messages sent; see 41ac3f0c07/src/shared/util.js (L206-L232)
[2] One example is the `pdf.pdf` document in the test-suite, where rendering all of its 1310 pages only result in a total of seven "DocStats" messages being sent from the worker-thread.
[3] Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "documentStats"-case we'll then iterate through the data to avoid double-reporting telemetry; see https://searchfox.org/mozilla-central/rev/8f4c180b87e52f3345ef8a3432d6e54bd1eb18dc/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#515-549
Given that all modern browsers now support `postMessage` transfers, and have for years, it no longer seems necessary for the PDF.js library to support using Workers unless the `postMessage` transfers functionality is available.
This patch is a follow-up to PR 11123, which made it impossible to *manually* disable `postMessage` transfers for performance reasons (since it increases memory usage), which hasn't caused any bug reports as far as I know.[1]
Hence we'll now only support *proper* Worker implementations, with fully working `postMessage` transfers, and fallback to using "fake" Workers otherwise.
---
[1] At the time of that PR we still "supported" IE, which is why this code was left intact.
On the main thread call `driver.log` and the message will output in the
terminal with the pdf id and the message.
I've been using this a lot when trying to find certain PDFs or logging
stats.
It shouldn't be necessary to iterate through *all* pages when using a non-default `spreadMode`, since we already know which page(s) should become visible.
This code is a left-over from the initial (local) implementation that resulted in PR 14112, however I forgot to clean-up some things such as e.g. this loop.
Also fixes an outdated comment, see PR 14204 which removed the mentioned data-structure.
This patch preserves the old behaviour of appending a `loadingIcon`-div to all pages that are not yet loaded/rendered. However, the actual `loadingIcon`-spinner (i.e. the `loading-icon.gif` image) will only be displayed on *visible* pages to improve performance.
To avoid having to iterate through all pages in the document, which doesn't seem like a good idea for a PDF document with thousands of pages, we use a combination of the currently visible *and* cached pages to toggle the `loadingIcon`-spinner.
This code is the last piece[1] of the viewer that's not using standard `class`es, and by converting this code we get rid of some now unneeded boilerplate code (slightly reducing the size of the *built* `web/viewer.js` file).
Note that while this code was originally imported from a separate repository, it was last sync-ed with upstream *five years* ago which is why this re-factoring should be OK as far as I'm concerned (and we've done some other clean-up since then as well).
---
[1] Technically the `web/debugger.js` file is left as well, however that code is first of all not bundled in the *built* `web/viewer.js` file and secondly it's not even loaded by default either.
There's obviously no guarantee that this will work in general, if the document is sufficiently corrupt, but it should hopefully be better than just throwing `InvalidPDFException` as currently happens.
Please note that, as is often the case with corrupt documents, it's somewhat difficult to know if we're rendering the document "correctly" with this patch[1]. In this case even Adobe Reader cannot open the document, which is always a good sign that it's *really* corrupt, however we're at least able to render *something* with this patch.
---
[1] Whatever "correct" even means when dealing with corrupt PDF documents, where often times different PDF viewers won't agree completely.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=931481;
- real space chars are pushed in the chunk but when there is an extra spacing, the next char position must be compared with the previous one;
- for example, an extra spacing can cancel a space so visually there are no space.
- First step to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1737260;
- several interactive pdfs use the possibility to hide/show buttons to show different icons;
- render pushbuttons on their own canvas and then insert it the annotation_layer;
- update test/driver.js in order to convert canvases for pushbuttons into images.
Reporting telemetry, in Firefox, includes using `JSON.stringify` on the data and then sending an event to the `PdfStreamConverter.jsm`-code.
In that code the event is handled and `JSON.parse` is used to retrieve the data, and in the "pageInfo"-case we'll then proceed to ignore everything except *the first* such event; see https://searchfox.org/mozilla-central/rev/24fac1ad31fb9c6e9c4c767c6a7ff45d226078f3/toolkit/components/pdfjs/content/PdfStreamConverter.jsm#509-514
All-in-all, sending the "pageInfo" telemetry for each rendered page is thus unnecessary and this patch makes the viewer send it only *once* instead.
*This is a tentative patch, since we unfortunately cannot easily test it (as far as I can tell).*
In Firefox this (obviously) works as-is, but in Google Chrome the `markedContent` spans are inserted within the regular text-content (in the DOM) and with non-zero heights.
*This is a tentative patch, since I don't have the necessary hardware to test it.*
See https://developer.mozilla.org/en-US/docs/Web/CSS/text-size-adjust, which is currently ignored in Firefox.
It seems overall safer, and more future-proof, to simply add this to the *entire* `textLayer` rather than its individual elements.
With the previous patch, this helper function is no longer used and keeping it around will simply increase the size of the builds.
This removal is purposely done *separately*, to make it easy to revert the patch in the future if this helper function would become useful again.
This relies on the fact that `Set`s preserve the insertion order[1], which means that we can utilize an iterator to access the *first* stored view.
Note that in the `resize`-method, we can now move the visible pages to the back of the buffer using a single loop (hence we don't need to use the `moveToEndOfArray` helper function any more).
---
[1] This applies to `Map`s as well, although that's not entirely relevant here.
This small change will help validate an important part of the upcoming re-factoring, regarding the *correct* iteration of the `Set` in the `PDFPageViewBuffer.resize` method in particular.
Previously, when we created a shading pattern canvas we created it
as the same size as the page. This was good for caching if the same
pattern was used over and over again, but when lots of different
shadings are created that caused us to create many full page
canvases.
Instead of creating the full page canvses, create the canvas
as the same size as the current path bounding box. This reduces memory
consumption by a lot since most paths are pretty small. Also, in real world
PDFs it's rare for a shading (non shading fill) to be reused over and over again.
Bug 1721949 is an example where the same pattern is reused and it will be slightly
slower than before.
In `beginGroup` we create a new canvas that is the size of the
bounding box and we translate it to the offset. This means we don't need to
also apply the bounding box during `paintFormXObjectBegin`.
This improves #6961 quite a bit, but it still is missing the indention
in the ruler.
The `PDFPageViewBuffer`-code is very important for the correct function of the viewer, but it's currently not tested at all.
While the `PDFPageViewBuffer` is obviously intended to be used with `PDFPageView`-instances, it only accesses a couple of `PDFPageView` properties/methods and consequently it's fairly easy to unit-test this code with dummy-data.
These unit-tests should help improve our confidence in this code, and will also come in handy with other changes that I'm working on (regarding modernizing and re-factoring the `PDFPageViewBuffer`-code).
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1739502;
- when the target area was the current content area, everything was pushed in it instead of creating a new one (and consequently a new pageArea is created).
- the pdf shows an alignment issue on page 4:
- the hAlign is "center" but the subform was the width of its parent, so compute the real width of the subform with tb layout;
- there is an extra empty page at the end of the pdf:
- there is a subform with some hidden elements which are not rendered for now (since there is no plugged JS engine it isn't possible to draw them in changing their visibility).
- so in case a subform is empty and has no real dimensions (at least one is 0), we just consider it as empty.
We were incorrectly using the transform in the pattern before it had been
adjusted causing the pattern to be misplaced relative to the page.
Fixes: ShowText-ShadingPattern.pdf (already in corpus)
Fixes: #8111Fixes: #9243
Subfrom nomin displays even though it's subform is set to <occur max=-1 min=0>
If we look through specs of XFA 3.3 : https://www.pdfa.org/norm-refs/XFA-3_3.pdf
- The min attribute is used when processing a form that contains data. Regardless of the data at least this number of instances is included. It is permissible to set this value to zero, in which case the container is entirely excluded if there is no data for it.
However, in our case it doesn't happen, because we let our empty dataNode get through. Though by setting a clause:
- eliminate unmatched data with occur min=0
we are checking our empty data and sending it to uselessNode array where at the end it gets removed;
The way that we're currently handling the last-`id` is very old, and there's no longer any good reason to special-case things when only one thumbnail is visible.
Furthermore, we can also modernize the loop slightly by using `for...of` instead of `Array.prototype.some()` when checking for fully visible thumbnails.
Note how in `PDFPageViewBuffer.resize` we're manually iterating through the visible pages in order to build a Set of the visible page `id`s. By instead moving the building of this Set into the `getVisibleElements` helper function, as part of the existing parsing, this code becomes *ever so slightly* more efficient.
Furthermore, more direct access to the visible page `id`s also come in handy in other parts of the viewer as well.
In the `BaseViewer.isPageVisible` method we no longer need to loop through the visible pages, but can instead directly check if the pageNumber is visible.
In the `PDFRenderingQueue.getHighestPriority` method, when checking for "holes" in the page layout, we can also avoid some unnecessary look-ups this way.
Very short strings can narrowly miss the existing Bidi-detection threshold, leading to incorrect text-selection and copying behaviour.
In my testing, neither Adobe Reader or PDFium seem to handle copying "correctly" for this document. Hence it's not entirely clear to me that we actually want to fix this, since tweaking these heuristics can *obviously* cause regressions elsewhere (and our test coverage for RTL-text isn't exactly great).
Starting a new path will wipe out any of the current subpaths in the
current graphics state, so we should reset the min/maxes.
This makes a number of the bounding boxes smaller and reduces the number
of composed pixels. For the smask tests in the corpus, the number of
composed pixesl goes from 19,872,109 to 19,676,905. The difference is much
larger on other PDFs though.
*Sometimes I'll hopefully learn to optimize my code directly when writing it, rather than having to do multiple clean-up passes; sorry about the churn here!*
For most page layouts there won't be any "holes" in the visible pages (or thumbnails), and in those cases it'd obviously be preferable not having to repeat any checks of already rendered pages.
Rather than only checking the "distance" between the first/last pages, we can instead compare the theoretical number of pages (between first/last) with the actually visible number of pages instead. This way, we're able to better detect the "holes"-case and can skip unnecessary parsing in the common case.
I missed this one spot in PR 12870, when converting the other cases in the "keydown" event handler. However, given that it only matters in PresentationMode and/or when "page-fit" zooming is enabled, this oversight shouldn't have had any user-observable impact (but we should fix it nonetheless).
This is a follow-up to PR 10675, since there I completely overlooked that we also need to handle the case where a PDF document has *failed* to load when the "supportsRangedLoading" message is sent to the viewer.
It seems that issue 10301 was fixed by PR 13424, by combining the spans, however given that we don't have a lot of test coverage for RTL-text I figured that adding a simple reference test wouldn't hurt (rather than just closing the issue as WORKSFORME).
There were some links not working in some XFA files,I realized that the anchor tag that contains the link has an inline display and couldn't receive any height, solved this by adding a "position: absolute". Tested with two different files in Firefox Nightly and Chrome and now all links are working perfectly fine. Added reftest to avoid future regressions
Embedded JS in PDF keep throwing alert reagdring specific version of Acrobat (Spanish and version 5.0 or greater).
This happens because:
- JS in pdf is enabled
- PDF contains some unsupported features (e.g. XFA)
Alert come when app.formVersion = undefined || app.formVersion < 5.0
In pdf.js we were using FORM_VERSION = undefined. After researching based on https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/pdfs/acrobatsdk_jsapiref.pdf\#G4.1993509 and Acrobat DC we decided to go with the larger number to avoid unnecessary popups.
Through investigation we realise that VIEWER_VERSION should have same value - a number.
Due to all that, we implemented 21.00720099 as a value for both FORMS_VERSION and VIEWER_VERSION
The only reason for using a `DocumentFragment` in the first place, originally added in PR 8724, was to prevent errors in the `PDFPageView`-constructor. However, we should be able to simply make its `container`-option *optional* instead, since it's not being used for anything else in the class.
Note that pre-rendering still works correctly in my testing, and given that the `BaseViewer` keeps references to all `PDFPageView`-instances (via its `_pages` Array) it also shouldn't be possible to "lose" any pages/canvases this way.
Unfortunately there exist PDF documents where all pageLabels are empty strings, see e.g. http://www.cs.cornell.edu/~ragarwal/pubs/blk-switch.pdf (taken from an old issue), which result in the pageNumber-input being completely blank. That doesn't seem very helpful, and this patch simply extends the approach used to ignore pageLabels that are identical to standard page numbering.
In the GENERIC viewer, e.g. when dragging-and-dropping a new PDF document which automatically opens the outline, there can now be breaking errors in the `{BaseViewer, PDFThumbnailViewer}.#getScrollAhead` methods since there's no visible pages/thumbs during loading; sorry about the breakage!
This was a stupid oversight on my part, since the first/last visible pages have obviously already been rendered at the point when we're checking for any potential "holes" in the page layout.
While this will obviously not have any measurable effect on performance, we should nonetheless avoid doing completely unnecessary checks here.
This is a very old "issue", which has existed since essentially forever, and it affects all of the available scrollModes. However, in the recently added Page-mode it's particularily noticeable since we use a *simulated* scroll direction there.
When deciding what page(s) to pre-render, we only consider the current scroll direction. This works well in most cases, but can break down at the start/end of the document by trying to pre-render a page *outside* of the existing ones. To improve this, we'll thus *force* the scroll direction at the start/end of the document.
*Steps to reproduce:*
0. Open the viewer, e.g. https://mozilla.github.io/pdf.js/web/viewer.html
1. Enable vertical scrolling.
2. Press the <kbd>End</kbd> key.
3. Open the devtools and, using the DOM Inspector, notice how page 13 is *not* being pre-rendered.
This allows us to compose much smaller regions of soft
mask making them much faster. This should also allow
for further optimizations in the pattern code.
For example locally I see issue #6573 go from 55s
to 5s with this change.
Fixes#6573
The old method of handling soft masks had a number of issues where the temporary
drawing canvas and the suspended main canvas could get out of sync
(e.g. mismatched save/restores or clip state) or we could end up compositing at
the wrong time. A good example of things getting out sync is the reduced test
case in #9017.
To fix this I've changed two big things:
1) Duplicate all the needed graphics state from the temporary canvas to the
suspended main canvas. This ensure the canvases stay in sync so that when we
switch back to the main canvas the graphics state stack is the same
(e.g. transforms, clip paths).
2) Immediately composite after each drawing operation. This ensures that if
there's an active clip region that we'll still be able to composite the correct
portions of the canvas. Note: This solution could be avoided by using
getImageData and putImageData since those ignore clipping region, but this is
very very slow. Note2: I also think the old way of only compositing at the end
of the soft mask is incorrect and can lead to wrong colors if drawing over the
same region, but in practice this doesn't seem to matter much.
Fixes: #5781Fixes: #5853Fixes: #7267Fixes: #7891Fixes: #8403Fixes: #8624Fixes: #12798Fixes: #13891Fixes: #9017 (reduced test case)
Fixes: https://bugzilla.mozilla.org/show_bug.cgi?id=1703683
With ESLint 8 we should now finally be able to start using modern `class` features, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Public_class_fields and https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_class_fields
However, while both ESLint and Acorn now support this, it unfortunately turns out that Escodegen (which we use during building) still lack the necessary support. Looking at https://github.com/estools/escodegen there's not been any updates since last year, and there's also open PRs adding support for these new `class` features.
To avoid blocking usage of these `class` features in the PDF.js code-base, in particular *private* fields/methods, this patch thus proposes that we (hopefully temporarily) switch to an `escodegen` fork that has the necessary support; please see https://www.npmjs.com/package/@javascript-obfuscator/escodegen
While I have no reason to doubt the security of the `escodegen` fork, this patch nonetheless pins the version number. Furthermore, I've also diffed the output of the two `.js`-files in this forked package against the original files without finding anything that looks immediately "dangerous".
With ResetForm-action support added in PR 14083, there's a regression in the `issue12716` test-case. More specifically the border around the "Clear Form"-link is now rendered *twice*, once in the canvas via the appearance-stream and once in the annotationLayer via the border-data.
This looks slightly weird, and was most likely not intended, which is why this patch suggests that we ignore the border in the annotationLayer when an appearance-stream exists.
Apparently Node.js has added *global* `URL.createObjectURL` support, but not done the same thing for `Blob`. Hence we also need to check for the availability of `Blob` in the `createObjectURL` helper function, and it's probably a good idea to also update `examples/node/pdf2svg.js` to work-around this until these changes reach an official PDF.js release.
Trying to render these Annotation-types, when the borderWidth is `0`, causes a "hairline" border to appear. If these Annotations included an appearance stream, as they are supposed to, this wouldn't have happened and the simplest solution here seem to be to just ignore these particular Annotations.
- PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
- the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
- no space are "drawn": it just moves the cursor but they aren't added in the chunk;
- so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
- to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
- it was a pretty good idea in general but it fails with some fonts where space was too big:
- in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).
*Please note:* This is a tentative patch, since I don't have the necessary a11y-software to actually test it.
To avoid having to add a new API-method just for a single string, I figured that adding the new property to the existing `documentInfo`-data (accessed via `PDFDocumentProxy.getMetadata` in the API) will hopefully be deemed acceptable.
In these cases there's no good reason, in my opinion, to duplicate the `shadow`-lines since that unnecessarily increases the risk of simple typos (see the previous patch).
With this typo the shadowing doesn't actually work, which causes these checks to be unnecessarily repeated. In this particular case it didn't have a significant performance impact, however we should definately fix this nonetheless.
Looking at the code, I do have to agree with the point made in issue 12731 about it being unexpected/unhelpful that the `PDFFindController.executeCommand`-method isn't directly usable with the "find"-event.
The reason for it being this way is, as so often, for historical reasons: The `executeCommand`-method was added (just) prior to the introduction of the `EventBus` in the viewer.
Obviously we cannot simply change the existing `PDFFindController.executeCommand`-method, since that'd be a breaking change in code which has existed for over five years.
Initially I figured that we could simply add a new method in `PDFFindController` that'd accept the state from the "find"-event, however after thinking about this and looking through the use-cases in the default viewer I settled on a slightly different approach: Let the `PDFFindController` just listen for the "find"-event (on the `EventBus`-instance) directly instead, which also removes one level of (unneeded) indirection during searching in the default viewer.
For GENERIC builds of the PDF.js library, the old `PDFFindController.executeCommand`-method is still available with a deprecation warning.
Many years ago now there were some `Promise` implementations that had issues resolving with an *implicitly* `undefined` value. That should no longer be the case, and we've not been using the `Promise.resolve(undefined)` format for a long time, hence this patch fixes the few remaining cases.
Having recently worked with this code, in PR 14096 (and indirectly in PR 14112), I happened to notice a pre-existing issue with spreadModes at higher zoom levels.
The `PDFRenderingQueue` code was written back when the viewer only supported "normal" vertical scrolling, and some edge-cases related to spreadModes are thus not perfectly supported. Depending on the zoom level, it's possible that there are "holes" in the currently visible page layout, and those pages will not be pre-rendered as you'd expect.
*Steps to reproduce:*
0. Open the viewer, e.g. https://mozilla.github.io/pdf.js/web/viewer.html
1. Enable vertical scrolling.
2. Enable the ODD spreadMode.
3. Scroll down, such that both pages 1 and 3 are visible.
4. Zoom-in until *only* page 1 and 3 are visible.
5. Open the devtools and, using the DOM Inspector, notice how page 2 is *not* being pre-rendered despite all surrounding pages being rendered.
If a PDF included an embedded TrueType font whose preferred character
map (cmap) was in "format 2", the code would select that character map
and then refuse to read it because of an unsupported format, thus
causing the characters not to be rendered. This commit implements
support for format 2 as described at the link below.
https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6cmap.html
With the previous commit, both of the `PDFViewer` and `PDFSinglePageViewer` clases are now small/simple enough that it no longer seems necessary to keep them in separate files.
This implements a new Page scrolling mode, essentially bringing (and extending) the functionality from `PDFSinglePageViewer` into the regular `PDFViewer`-class. Compared to `PDFSinglePageViewer`, which as its name suggests will only display one page at a time, in the `PDFViewer`-implementation this new Page scrolling mode also support spreadModes properly (somewhat similar to e.g. Adobe Reader).
Given the size and scope of these changes, I've tried to focus on implementing the basic functionality. Hence there's room for further clean-up and/or improvements, including e.g. simplifying the CSS/JS related to PresentationMode and implementing easier page-switching with the mouse-wheel/arrow-keys.
This patch (slightly) simplifies a couple of `onProgress` and `onUnsupportedFeature` call-sites.
Finally, while unrelated, also removes some unnecessary `return undefined;` statements (PR 11601 follow-up).
For Circle, Square, and Polygon Annotations it's currently only possible to toggle the associated PopupAnnotation by clicking on its border. Depending on the border width, and also the current zoom-level in the viewer, that can make interacting with certain Annotations *practically* impossible (which is the case in issue 14107).
Hence, in order to improve this, change the "fill"-property of the SVG element in the annotationLayer to make the *entire* element part of the click/mouse-over target.
*Please note:* Given that this is a viewer-related issue, there's no simple way to test this as far as I can tell.
This is unfortunately *yet another* bug in the `preEvaluateFont`-implementation, and I've lost count of the number of times I've had to tweak this code over the years :-(
I really cannot help thinking that PR 4423 was way too simplistic, since it missed a bunch of cases that leads to broken font rendering in many PDF documents.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1734802
- the exact sentence from the spec:
"The token SOLIDUS (a slash followed by no regular characters) introduces a unique valid name defined by the empty sequence of characters."
- so just remove the warning.
Note that PR 14049 removed this, since https://github.com/mozilla/pdf.js/pull/14049#discussion_r716245518 claimed that it's not necessary anymore. Unfortunately that, in my testing on Windows, actually re-introduced exactly the issue described in the comment; more specifically once the *last* character has been entered in the comb-field it's now again incorrectly scrolled (with the first character being invisible) until focus is lost.
This can be tested with e.g. `f1040.pdf`, see page 2, from the test-suite.
With a recent addition to the HTML specification, the internal structured clone algorithm used in browsers is (or will be, once it's implemented) *directly* accessible to JavaScript; please see https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/structuredClone
Hence we'll *eventually* not need to maintain our own structured clone functionality in the `LoopbackPort`-class in the API, however for the time being we'll feature detect `structuredClone` and fallback to the existing PDF.js implementation.
Given that https://bugzilla.mozilla.org/show_bug.cgi?id=1722576 has landed in Firefox 94, we should no longer need the manually implemented `cloneValue`-functionality in MOZCENTRAL builds. Note also that in the Firefox built-in PDF Viewer it's not possible for users to *easily* disable workers, which should further reduce the risk of these changes.
This patch helps reduce some duplication, given that we now have a few essentially identical `addLinkAttributes` call-sites in the code-base.
To prevent runtime errors in the Annotation/XFA-layer code, we'll warn if a custom/incomplete `PDFLinkService` is being used (limited to GENERIC builds).
Please note that we (obviously) don't want to unconditionally pre-render more than one page all the time, since that could very easily lead to overall worse performance in some documents.[1]
However, when spreadModes are enabled it does make sense to attempt to pre-render both of the pages of the next/previous spread.
---
[1] Since it may cause pre-rendering to unnecessarily compete for parsing resources, on the worker-thread, with "regular" rendering.
Note how both the annotationLayer and the document outline will apply various URL-related options when creating the link-elements.
For consistency the `xfaLayer`-rendering should obviously use the same options, to ensure that the existing options are indeed applied to all URLs regardless of where they originate.
Given that `NodeList`s can be iterated using `for..of` we can use that instead, since it's a little bit nicer and easier to read than the `Array.prototype.forEach` format.
- it aims to fix#12721.
- Thanks to PR #14023, we've now the fieldObjects in the annotation layer so we can easily map fields names on their id if needed.
- Reset values in the storage, in the JS sandbox and in the visible html elements.
In the referenced bug, the embedded fonts contain custom CMap-data that only include strings. Note how for embedded composite TrueType fonts we're using the CMap-data when building the glyph mapping, and currently we end up with a completely empty map because the code expects only CID *numbers*.
Furthermore, just fixing the glyph mapping alone isn't sufficient to fully address the bug, since we also need to consider this "special" kind of CMap-data when looking up glyph widths.
Having recently worked with, and reviewed patches touching, this code it seemed that it's probably not a bad idea to move that functionality into `createValidAbsoluteUrl` as new options instead.
For the `addDefaultProtocolToUrl` functionality in particular, the existing helper function was not only moved but slightly improved as well. Looking at the code, I realized that there's a small risk that it would incorrectly match a *relative* URL-string too.
With these changes, the `createValidAbsoluteUrl` call-sites in the `src/core/`-code can be simplified a little bit.
*Please note:* This patch may, indirectly, change the format of the `unsafeUrl`-property returned with relevant Annotations and OutlineItems; hence the `api-minor` tag.
However, I'd argue that it's actually more correct this way since the whole purpose of `unsafeUrl` is/was to return the URL data as-is without any parsing done.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1716758;
- some buttons have a JS action with the pattern `app.launchURL(...)` (or similar) so extract when it's possible the url and generate a <a> element with the href equals to the found url;
- pdf.js already had some code to handle that so this patch slightly refactor that.
- it aims to fix#14021;
- the N dict is empty here so just create a default one;
- it implies that the checked checkbox has no appearance so create a default one too in order to print it;
- in the pdf in the issue, a checked box is not printed because it has no default appearance so we need to guess its appearance from its state.
In order to implement this, we utilize the existing `bidi` function to infer the text-direction of /T and /Contents entries. While this may not be perfect in cases where one PopupAnnotation mixes LTR and RTL languages, it should work well enough in most cases.
To avoid having to add *two new* properties in lots of annotations, supplementing the existing `title`/`contents`-properties, this patch instead re-factors the existing code such that the properties are replaced by Objects (containing `str` and `dir`).
*Please note:* In order avoid breaking existing third-party implementations, `GENERIC`-builds of the PDF.js library will still provide the old `title`/`contents`-properties on annotations returned by `PDFPageProxy.getAnnotations`.
- In the pdf in issue #14071, some select fields don't contain any values;
- the corresponding node has a bindItems and a bind elements and _bindItems function was just not called.
In particular the `_processStreamMessage`-method is a bit cumbersome to read, given the way that the current streamController/streamSink is accessed, which we can improve with a couple of local variables.
After PR 11601, the `paintJpegXObject` operator is no longer used for anything. While I don't think we can just remove it, and essentially leave a "hole" in the `OPS` structure, we should at least mark it as explicitly unused to aid readability/maintainability of the code.
This replaces direct `document.getElementsByName` lookups with a helper method which:
- Lets the AnnotationLayer use the data returned by the `PDFDocumentProxy.getFieldObjects` API-method, such that we can directly lookup only the necessary DOM elements.
- Fallback to using `document.getElementsByName` as before, such that e.g. the standalone viewer components still work.
Finally, to fix the problems reported in issue 14003, regardless of the code-path we now also enforce that the DOM elements found were actually created by the AnnotationLayer code.
With these changes we'll thus be able to update form elements on all visible pages just as before, but we'll additionally update the AnnotationStorage for not-yet-rendered elements thus fixing a pre-existing bug.
*This is similar to the "isSymbolicFont"-property, which is no longer exported by default after PR 11777.*
Both "isMonospace" and "isSerifFont" are internal properties, used during font parsing and building of the glyph mapping on the worker-thread.
However both of these properties are completely unused on the main-thread and/or in the API, and accessing them they will now require setting the `fontExtraProperties`-option when calling `getDocument`.
Currently any AppOptions set using e.g. the "webviewerloaded" event listener can/will by default be overridden when the Preferences are read.
To avoid that happening the "disablePreferences"-option can be used, however unless it's been explicitly set all non-default AppOptions will be silently ignored. This patch thus attempts to improve the current situation somewhat, for third-party implementations, by logging a warning in the console when this happens.
With this patch we'll ensure that only valid absolute URLs can be used in XFA documents, similar to the existing validation done for "regular" PDF documents.
Furthermore, we'll also attempt to add a default protocol (i.e. `http`) to URLs beginning with "www." in XFA documents as well; this on its own is enough to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1731240
Rather than re-computing this value in a number of different places throughout the code-base[1], we can expose this in the API via the existing `PixelsPerInch`-structure instead.
There's also been feature requests asking for the old `CSS_UNITS` viewer constant to be made accessible, such that it could be used in third-party implementations.
I suppose that it could be argued that it's somewhat confusing to place a unitless property in `PixelsPerInch`, however given that the `PDF_TO_CSS_UNITS`-property is defined strictly in terms of the existing properties this is hopefully deemed reasonable.
---
[1] These include:
- The viewer, with the `CSS_UNITS` name.
- The reference-tests.
- The display-layer, when rendering images; see PR 13991.
Given the simplicity of this functionality, we can move it from the default viewer and into the `BaseViewer` class instead. This way, it's possible to support more scripting functionality in the standalone viewer components; please see PR 14038.
Please note that I purposely went with `increaseScale`/`decreaseScale`-method names, rather than using "zoom", to better match the existing `currentScale`/`currentScaleValue` getters/setters that's being used in the `BaseViewer` class.
Currently we only exclude /Encoding entries that also contains a /Differences array, which is the cause of the text-selection problem in the referenced issue.
In order to address this we'll now also exclude /Encoding entries that contain one of the predefined *named* encodings, and no longer require that it also contains a /Differences array.
*Please note:* This patch cases a small "regression" in the `bug1130815-text` test-case, however this is actually an improvement when compared with Adobe Reader and PDFium (in Google Chrome).
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1719148;
- JS can set a property for a non-rendered annotation using the annotationStorage but the other unset default properties must be used when the annotation is finally rendered;
- so this patch just adds the properties already set in the annotationStorage to the default value.
This reduces some unnecessary duplication, since we currently have essentially the same code in a handful of places in the `Font.fallbackToSystemFont`-method.
Rather than forcing the "regular" `EventBus` to check and handle `isInAutomation` for every `dispatch` call, we can take advantage of subclassing instead.
Hence this PR introduces a new `AutomationEventBus` class, which extends `EventBus`, and is used by the default viewer when `isInAutomation === true`.
In this particular case the `CMap`-data that we create contains only numbers, but no strings, which causes `PartialEvaluator.readToUnicode` to create a ToUnicode-map with only empty strings.
*Please note:* This is yet another case where I don't know if it's necessarily the best and most correct solution, but it does fix the referenced issue.
- it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1722036.
- AS and V should share the same value for checkbox: it's at least what the specs say;
- the pdf in the above bug opens correctly in Acrobat so it likely means that AS is chosen over V.
Originally the library/viewer didn't support forms, and the hiding of the Download-buttons (when new documents are opened) didn't really matter all that much. Hence the simplest solution, at the time, was to hide the Download-buttons since we use the URL as a fallback when downloading data in the GENERIC `DownloadManager`.
Nowadays we obviously want to support saving of forms in the GENERIC viewer, regardless of how the document was opened, which could thus *potentially* lead to the fallback download-URL being wrong.
In order to be able to show the Download-buttons unconditionally, this patch slightly re-factors the viewer to track the download-URL *separately* to prevent any issues there.
*Please note:* As mentioned in the issue, the ViewBookmark-buttons are specific to the initial URL when the viewer is first opened. Hence they (still) don't make sense when a new document has been opened, since it's then impossible to obtain a usable link to the *currently* active document.
By using CSS variables to set the width of the zoom dropdown, we can simplify both the relevant CSS and JS code. This will not only improve overall maintainability of this code, but should also make it (slightly) easier for third-party users that want to customize the width.
Note in particular that by having the code in `Toolbar._adjustScaleWidth` lookup the values of the CSS variables, we no longer need to worry about keeping hard-coded values up-to-date with the CSS rules.
- it aims to fix#14024.
- this patch adds an attribute `acroformExportValue` to the HTML input in order to set the checked attribute in taking into account the exportValue for the checkboxes with the same name.
*Please note:* All of this feels very handwavy, but at least it passes all tests locally. Hopefully we have enough tests for this part of the font code.
For non-embedded composite standard fonts with an "incomplete" /CIDToGIDMap, we'll now fallback to an *explicitly defined* /ToUnicode map even when that one happens to be an /Identity-H or /Identity-V map.
The `Font.fallbackToSystemFont` method is unfortunately getting more and more special-cases, however that might be unavoidable given all the weird non-embedded fonts found in the wild :-(
While these types apparently makes sense in TypeScript environments, we really don't want to extend the *public* API by simply exporting the relevant classes directly in `src/pdf.js` (since they should never be called/initialized manually).
Please see e.g. issue 12384 where this was first requested, and note that a possible work-around was also provided there. This patch simply implements that work-around[1], which will hopefully be helpful to TypeScript users.
---
[1] Based on the discussion in PR 13957, the two previous patches appear to be necessary for this to actually work.
*This fixes something that I noticed, having recently looked at both the `Lexer.getObj` and `writeValue` code.*
Please note that I unfortunately don't have an example of a form where saving fails without this patch. However, given its overall simplicity and that unit-tests are added, it's hopefully deemed useful to fix this potential issue pro-actively rather than waiting for a bug report.
At this point one might, and rightly so, wonder if there's actually any real-world PDF documents where a `null` value is being used?
Unfortunately the answer is *yes*, and we have a couple of examples in the test-suite (although none of those are related to forms); please see: `issue1015`, `issue2642`, `issue10402`, `issue12823`, `issue13823`, and `pr12564`.
While some of the output looks worse to my eye, this behavior more
closely matches what I see when I open the PDFs in Adobe acrobat.
Fixes: #4706, #9713, #8245, #1344
Currently it's possible to accidentally, e.g. by simply copy-and-pasting from an existing test-case, add an unnecessary `"link": true`-entry for locally available PDF files.
This leads to inconsistencies in the manifest file, and doesn't feel like a great developer experience. However we can easily fix it by having `verifyManifestFiles` fail in this situation, and doing so actually turned up a couple of existing cases.
Similar to other viewer components, e.g. the `PDFFindBar` and `PDFPresentationMode`, there's no need to create a `PDFHistory`-instance when it's not going to be used.
The `MessageHandler`-implementation already handles either of these callbacks being undefined, hence there's no particular reason (as far as I can tell) to add no-op functions here.
Also, in a couple of `MessageHandler`-methods, utilize an already existing local variable more.
Following the STR in the issue, this patch reduces the number of `PartialEvaluator.getTextContent`-related `postMessage`-calls by approximately 78 percent.[1]
Note that by enforcing a relatively low value when batching text chunks, we should thus improve worst-case scenarios while not negatively affect all `textLayer` building.
While working on these changes I noticed, thanks to our unit-tests, that the implementation of the `appendEOL` function unfortunately means that the number and content of the textItems could actually be affected by the particular chunking used.
That seems *extremely* unfortunate, since in practice this means that the particular chunking used is thus observable through the API. Obviously that should be a completely internal implementation detail, which is why this patch also modifies `appendEOL` to mitigate that.[2]
Given that this patch adds a *minimum* batch size in `enqueueChunk`, there's obviously nothing preventing it from becoming a lot larger then the limit (depending e.g. on the PDF structure and the CPU load/speed).
While sending more text chunks at once isn't an issue in itself, it could become problematic at the main-thread during `textLayer` building. Note how both the `PartialEvaluator` and `CanvasGraphics` implementations utilize `Date.now()`-checks, to prevent long-running parsing/rendering from "hanging" the respective thread. In the `textLayer` building we don't utilize such a construction[3], and streaming of textContent is thus essentially acting as a *simple* stand-in for that functionality.
Hence why we want to avoid choosing a too large minimum batch size, since that could thus indirectly affect main-thread performance negatively.
---
[1] While it'd be possible to go even lower, that'd likely require more invasive re-factoring/changes to the `PartialEvaluator.getTextContent`-code to ensure that the batches don't become too large.
[2] This should also, as far as I can tell, explain some of the regressions observed in the "enhance" text-selection tests back in PR 13257.
Looking closer at the `appendEOL` function it should potentially be changed even more, however that should probably not be done here.
[3] I'd really like to avoid implementing something like that for the `textLayer` building as well, given that it'd require adding a fair bit of complexity.
This is essentially a simplified version of the code that's used in `PDFPageView`, which will hopefully reduce the number of issues opened specifically about blurry rendering.
However, note that *ideally* users should base their implementations on the `components/` examples rather than using the API directly (the "viewer components" already support HiDPI-screens).
While these changes will obviously not have a significant effect on overall memory usage, it cannot hurt as far as I'm concerned. This patch makes the following changes:
- Clear out `_textDivProperties` once rendering is done, since those properties are only necessary to keep alive when *enhanced* text-selection is being used.
- Reduce the size of the `_textDivProperties`-entries by default, since a majority of the properties are only relevant when *enhanced* text-selection is being used.
The old `update`-signature started to annoy me back when I added optional content support to the viewer, since we're (often) forced to pass in a bunch of arguments that we don't care about whenever these methods are called.
This is tagged `api-minor` since `PDFPageView` is being used in the `pageviewer` component example, and it's thus possible that these changes could affect some users; the next commit adds fallback handling for the old format.
In the referenced PDF document the /Contents stream contains MarkedContent-operators, however no optional content dictionary exists; according to [the specification](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G7.3883825):
> Null values or references to deleted objects shall be ignored. If this entry is
not present, is an empty array, or contains references only to null or deleted
objects, the membership dictionary shall have no effect on the visibility of
any content.
In the PDF document some of the glyphs have bogus `differences`-entries[1] that cannot be resolved to valid glyph names, thus causing the glyph mapping to fail.
My initial idea was to use a similar approach as in the `PartialEvaluator._simpleFontToUnicode`-method, to extract the charCodes from those entries, however it turned out that that didn't actually help in this case (the mapping was still wrong).
To fix this I'm thus proposing that we fallback to the /ToUnicode map when no other useable data exists (e.g. no post-table), since it *hopefully* shouldn't make things any worse than leaving parts of the glyph map empty (which currently happens).
---
[1] As can be seem below, some of the entries are completely normal while others are non-standard:
```
Differences (array)
0 = 65
1 = /g5167
2 = /space
3 = /g11927
4 = /g17737
5 = /g11540
6 = /g2180
7 = /K
8 = /P
9 = /two
10 = /zero
11 = /one
12 = /five
13 = /four
14 = /g6932
15 = /g7246
16 = /g1691
17 = /g2343
18 = /g14792
19 = /g3325
20 = /g4280
21 = /g20383
22 = /g18166
23 = /g16988
24 = /g17943
25 = /g19223
26 = /g10830
27 = 97
28 = /g982
29 = /g1226
30 = /g5059
31 = /g2677
32 = /g1042
33 = /g11568
34 = /L
35 = /three
36 = /seven
37 = /g2364
38 = /g12063
39 = /g5356
40 = /g2173
41 = /g17877
42 = /g7273
43 = /g7647
44 = /g7224
45 = /g19327
46 = /g5054
47 = /g2342
48 = /g10136
49 = /g6856
50 = /g13381
51 = /g7257
52 = /g12093
53 = /g2359
```
- aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1720179
- in some pdfs the XFA array in AcroForm dictionary doesn't contain an entry for 'datasets' (which contains saved data), so basically this patch allows to overwrite the AcroForm dictionary with an updated XFA array when doing an incremental update.
- when some named nodes in the template don't have their counterpart in datasets we create some nodes: the main node mustn't belong to the datasets namespace because it doesn't make sense and Acrobat Reader isn't able to read pdf with such nodes.
- so created nodes under a datasets node have a namespaceId set to -1 and consequently when serialized no namespace prefix will appear.
There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:
- We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead.
- We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.
Despite its name, the fonts in ItcSymbol-family are "regular" fonts and not Symbol ones. However, given that the font name contains the word "Symbol" we ended up picking the wrong code-path in the `Font.fallbackToSystemFont`-method.
*Please note:* While this patch ensures that the text becomes readable, by falling back a standard font, the rendering will obviously not be perfect. However, that's the PDF generators "fault" since non-embedded fonts cannot be guaranteed to render correctly in all environments.
Given that the Fetch API is normally being used now, these changes are probably less important now than they used to be. However, given that it's simple enough to implement this I figured why not just fix issue 9883 (better late than never I suppose).
Testing:
- delete the pdf file while the initial request is inflight
- delete the pdf file after the initial request has finished
Repeat for a small file and large file, exercising both one-off and
chunked transports.
This patch changes the `PDFDocumentLoadingTask.destroy`-method and the `_fetchDocument`-function to be `async`, which slightly simplifies the relevant code.
Furthermore, remove the catch-handler from the `WorkerTransport.getPageIndex`-method since it's no longer needed. Given that the `MessageHandler` is nowadays wrapping every possible Exception, it's no longer necessary to try and re-wrap the reason here.
While e.g. the `simpleviewer` and `singlepageviewer` examples work, since they're based on the `BaseViewer`-class, the standalone `pageviewer` example currently doesn't support either XFA- or StructTree-layers. This seems like an obvious oversight, which can be easily addressed simply by exporting the necessary functionality through `pdf_viewer.component.js`, similar to the existing Text/Annotation-layers.
While working on, and testing, these changes I happened to notice a number of smaller things that's also fixed in this patch:
- Ensure that `XfaLayerBuilder.render` always have a *consistent* return type, to prevent possible run-time failures in `PDFPageView`; PR 13908 follow-up.
- Change the order of the options in the `XfaLayerBuilder`-constructor to agree with the parameter order in the `DefaultXfaLayerFactory.createXfaLayerBuilder`-method.
- Add a missing `textHighlighterFactory`-option, in the JSDocs for the `PDFPageView`-class.
- A couple of small tweaks in the `TextLayerBuilder.render`-method: Re-use an existing Array rather than creating a new one, and replace an `if` with optional chaining instead.
*Please note:* For now XFA-support is currently disabled by default, similar to the regular viewer.
While running the unit-tests with some logging statements added to this code, I noticed that `PasswordException` was missing from the list of potential Errors that could be passed to the `wrapReason` function.
A lot of the code in this getter has existed ever since the initial PresentationMode-implementation was first added all the way back in PR 1938 (which is nine years ago now).
At this point in time however, there's now a simpler way detect if a browser supports the FullScreen API and we should thus be able to simplify this getter; please refer to https://developer.mozilla.org/en-US/docs/Web/API/Document/fullscreenEnabled#browser_compatibility
This command was added all the way back when basic CI-support was first introduced (using Travis at the time), however it's never really intended to be used e.g. for local development.
By having a `npm test`-command listed in the `package.json` file, there's a very real risk that someone unfamiliar with the code-base would only run that one and thus miss all the other (more important) test-suites[1].
Hence this patch which removes the `npm test`-command, and instead simply calls the relevant gulp-task[2] directly in the GitHub Actions configuration.
---
[1] Which consist of the unit-tests (run in browsers), the font-tests (potentially), the reference-tests, and the integration-tests.
[2] Which is also renamed slightly, to better fit its current usage.
Something that I *just* realized is that while PR 13854 fixed an issue as reported, it could still cause bugs in other similarily broken documents since we'll not insert a matching endMarkedContent-operator in the operatorList.
Currently, in the `PartialEvaluator`, we only support Optional Content in Form-/XObjects. Hence this patch adds support for Image-/XObjects as well, which looks like a simple oversight in PR 12095 since the canvas-implementation already contains the necessary code to support this.
The Viewer API definitions do not compile because of missing imports and
anonymous objects are typed as `Object`. These issues were not caught
during CI because the test project was not compiling anything from the
Viewer API.
As an example of the first problem:
```
/**
* @implements MyInterface
*/
export class MyClass {
...
}
```
will generate a broken definition that doesn’t import MyInterface:
```
/**
* @implements MyInterface
*/
export class MyClass implements MyInterface {
...
}
```
This can be fixed by adding a typedef jsdoc to specify the import:
```
/** @typedef {import("./otherFile").MyInterface} MyInterface */
```
See https://github.com/jsdoc/jsdoc/issues/1537 and
https://github.com/microsoft/TypeScript/issues/22160 for more details.
As an example of the second problem:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} An Object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
*/
function getPageSizeInches({ view, userUnit, rotate }) {
...
}
```
generates the broken definition:
```
function getPageSizeInches({ view, userUnit, rotate }: Object) {
...
}
```
The jsdoc should specify the type of each nested property:
```
/**
* Gets the size of the specified page, converted from PDF units to inches.
* @param {Object} options An object containing the properties: {Array} `view`,
* {number} `userUnit`, and {number} `rotate`.
* @param {number[]} options.view
* @param {number} options.userUnit
* @param {number} options.rotate
*/
```
- Use `Node.TEXT_NODE` rather than a magical constant, in `TextHighlighter._convertMatches`, to improve readability. According to MDN, this has been supported since "forever": https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType#browser_compatibility
- Remove the `pageIdx`-property, on `TextLayerBuilder`-instances, since the re-factoring in PR 13908 meant that it's now unused.
- Remove the `matches`-property, on `TextLayerBuilder`-instances, since the re-factoring in PR 13908 meant that it's now unused.
Generalizing, and documenting, the `PDFHistory`-implementation as part of the web-interfaces doesn't seem entirely necessary and in hindsight I'm not entirely sure why we need it since:
- The `PDFHistory` implementation is/was written specifically for the default viewer use-case, which is why e.g. the `simpleviewer` component example isn't using it. (While it *could* be used there, it'd need to be manually created/initialized correctly.)
- There's only *one* `PDFHistory`-implementation present (and no other viewer-component will fail without it being available), as opposed to the other web-interfaces documented in this file.
- The `PDFHistory` implementation is not even usable with e.g. the `pageviewer` component example, since it (obviously) requires a complete viewer to work. (This is in contrast to e.g. `IPDFTextLayerFactory` and `IPDFAnnotationLayerFactory`.)
*This is done separately from the previous patch, to make it easier to revert these changes once they've been included in a couple of releases.*
Please note that because these two options are mutually exclusive, which is a large part of the reason for the previous patch, it's not guaranteed that the fallback-values will always be correct in every situation (but it's the best that we can do).
*This is a follow-up to PRs 13867 and 13899.*
This patch is tagged `api-minor` for the following reasons:
- It replaces the `renderInteractiveForms`/`includeAnnotationStorage`-options, in the `PDFPageProxy.render`-method, with the single `annotationMode`-option that controls which annotations are being rendered and how. Note that the old options were mutually exclusive, and setting both to `true` would result in undefined behaviour.
- For improved consistency in the API, the `annotationMode`-option will also work together with the `PDFPageProxy.getOperatorList`-method.
- It's now also possible to disable *all* annotation rendering in both the API and the Viewer, since the other changes meant that this could now be supported with a single added line on the worker-thread[1]; fixes 7282.
---
[1] Please note that in order to simplify the overall implementation, we'll purposely only support disabling of *all* annotations and that the option is being shared between the API and the Viewer. For any more "specialized" use-cases, where e.g. only some annotation-types are being rendered and/or the API and Viewer render different sets of annotations, that'll have to be handled in third-party implementations/forks of the PDF.js code-base.
Moves the logic out of TextLayerBuilder to handle
highlighting matches into a new separate class `TextHighlighter`
that can be used with regular PDFs and XFA PDFs.
To mimic the current find functionality in XFA, two arrays
from the XFA rendering are created to get the text content
and map those to DOM nodes.
Fixes#13878
Current pdf_viewer definitions result in errors like the following when
trying to use them in a ts project:
[error] TypeScript error
node_modules/.pnpm/pdfjs-dist@2.10.377/node_modules/pdfjs-dist/web/pdf_viewer.d.ts:1:15
- error TS2691: An import path cannot end with a '.d.ts' extension.
Consider importing 'pdfjs-dist/types/web/pdf_viewer.component.js'
instead.
1 export * from "pdfjs-dist/types/web/pdf_viewer.component.d.ts";
Import/export statements in typescript should not include file extensions.
Currently a `TESTING = true` environment variable will *always* take precedence in the various build-tasks, and there's no way to explicitly disable it for a particular build. That's clearly an oversight on my part, however it's easy enough to fix this; sorry about breaking this!
The `loadAndEnablePDFBug` helper function, in `web/app.js`, can be simplified a little bit by making it `async`. Furthermore, given how `PDFBug` is being used, we can also (slightly) re-factor `PDFBug.init` such that the `PDFBug.enable`-call is done internally rather than having to handle that manually at the call-site.
(Finally, utilize `await` more in the `loadFakeWorker` helper function.)
This way there cannot be any *incorrect* cache hits, since Refs are guaranteed to be unique.
Please note that the reason for caching by Ref rather than doing something along the lines of the `localShadingPatternCache` (which uses a `Map` directly), is that TilingPatterns are streams and those cannot be cached on the `XRef`-instance (this way we avoid unnecessary parsing).
*This patch is very similar to the recently fixed `renderInteractiveForms`-options, see PR 13867.*
As far as I can tell, this *subtle* bug has existed ever since `AnnotationStorage`-support was first added in PR 12106 (a little over a year ago).
The value of the `includeAnnotationStorage`-option, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls where the form-data has been changed by the user in between, can thus *wrongly* return a cached operatorList.
In the viewer we're only using the `includeAnnotationStorage`-option when printing, which is probably why this has gone unnoticed for so long. Note that we, for performance reasons, don't cache printing-operatorLists in the API.
However, there's nothing stopping an API-user from using the `includeAnnotationStorage`-option during "normal" rendering, which could thus result in *subtle* (and difficult to understand) rendering bugs.
In order to handle this, we need to know if the `AnnotationStorage`-instance has been updated since the last `PDFPageProxy.render` call. The most "correct" solution would obviously be to create a hash of the `AnnotationStorage` contents, however that would require adding a bunch of code, complexity, and runtime overhead.
Given that operatorList caching in the API doesn't have to be perfect[1], but only have to avoid *false* cache-hits, we can simplify things significantly be only keeping track of the last time that the `AnnotationStorage`-data was modified.
*Please note:* While working on this patch, I also noticed that the `renderInteractiveForms`- and `includeAnnotationStorage`-options in the `PDFPageProxy.render` method are mutually exclusive.[2]
Given that the various Annotation-related options in `PDFPageProxy.render` have been added at different times, this has unfortunately led to the current "messy" situation.[3]
---
[1] Note how we're already not caching operatorLists for pages with *huge* images, in order to save memory, hence there's no guarantee that operatorLists will always be cached.
[2] Setting both to `true` will result in undefined behaviour, since trying to insert `AnnotationStorage`-values into fields that are being excluded from the operatorList-building will obviously not work, which isn't at all clear from the documentation.
[3] My intention is to try and fix this in a follow-up PR, and I've got a WIP patch locally, however it will result in a number of API-observable changes.
At this point in time, all of the supported browsers (in the PDF.js project) have native `ReadableStream` implementations; see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence the polyfill is *only* necessary in Node.js environments now, and we shouldn't need to do any detailed feature detection either (since that was only done for the non-Chromium versions of the MS Edge browser).
Finally, we can slightly reduce the size of the Chromium-extension since the polyfill shouldn't be needed there either.
By not adding any additional non-`Dict` entries to the list of candidates for merging of sub-dictionaries, we can very slightly reduce the amount of parsing required by not having to *again* iterate through unmergeable data.
The original idea behind including the class name, when logging errors, was to improve things in the *hypothetical case* where `PDFViewer`- and `PDFSinglePageViewer`-instances would be used side-by-side.
Given that all of the relevant methods are synchronous this seem unlikely to really be necessary, and furthermore it's probably best to avoid using `this.constructor.name` since that's not guaranteed to do what you intend (we've seen repeated issues with minifiers mangling function/class names).
Once we're finally able to get rid of SystemJS, which is unfortunately still blocked on [bug 1247687](https://bugzilla.mozilla.org/show_bug.cgi?id=1247687), we might also want to clean-up (or even completely remove) the `BaseException` abstraction and simply extend `Error` directly instead.
At that point we'd need to (explicitly) set the `name` on each class anyway, so this patch is essentially preparing for future clean-up. Furthermore, after the `BaseException` abstraction was added there's been *multiple* issues filed about third-party minification breaking our code since `this.constructor.name` is not guaranteed to always do what you intended.
While hard-coding the strings indeed feels quite unfortunate, it's likely the "best" solution to avoid the problem described above.
Rather than caching only the *last* `PDFPageProxy.getAnnotations` call, and having to handle the intent separately, we can instead implement the caching in exactly the same way as done in the `PDFPageProxy.{render, getOperatorList}` methods.
This patch removes the only remaining closure in the `src/display/api.js` file, utilizing a similar approach as used in lots of other parts of the code-base, which results in a small decrease in the size of the *build* `pdf.js` file.
Given that `PDFWorker` is exposed through the *public* API, this complicates things somewhat since there's a couple of worker-related properties that really should stay *private*. Initially, while working on PR 13813, I believed that we'd need support for private (static) class fields in order to get rid of this closure, however I've managed to come up with what's hopefully deemed an acceptable work-around here.
Furthermore, some helper functions were simply moved into the `PDFWorker` class as static methods, thus simplifying the overall implementation (e.g. we don't need to manually cache the Promise in the `PDFWorker._setupFakeWorkerGlobal`-method).
Finally, as part of this re-factoring a number of missing JSDoc-comments were added which *together* with the removal of the closure significantly improves the `gulp jsdoc` output for the `PDFWorker` class.
*Please note:* This patch is tagged with `api-minor` since it deprecates `PDFWorker.getWorkerSrc()` in favor of the shorter `PDFWorker.workerSrc`, with the fallback limited to `GENERIC` builds.
It shouldn't be necessary to assign these variables to the global scope (as far as I can tell), either explicitly with `window` or implicitly with `var`, and this way we don't need to disable the ESLint `no-undef` rule; fixes another small part of issue 13862.
*Please note:* I wasn't going to put additional work into this code after PR 13869, however these changes looked so simple that I figured trying to get rid of the few remaining "Code scanning alerts" wouldn't hurt.
However, this file would still very much benefit from additional clean-up and re-factoring work, since it's quite old and currently contains some dead code (commented out).
The value of the `renderInteractiveForms` parameter, as passed to the `PDFPageProxy.render` method, will (potentially) affect the size/content of the operatorList that's returned from the worker (for documents with forms).
Given that operatorLists will generally, unless they contain huge images, be cached in the API, repeated `PDFPageProxy.render` calls that *only* change the `renderInteractiveForms` parameter can thus return an incorrect operatorList.
As far as I can tell, this *subtle* bug has existed ever since `renderInteractiveForms`-support was first added in PR 7633 (which is almost five years ago).
With the previous patch, fixing this is now really simple by "encoding" the `renderInteractiveForms` parameter in the *internal* renderingIntent handling.
With the changes made in PR 13746 the *internal* renderingIntent handling became somewhat "messy", since we're now having to do string-matching in various spots in order to handle the "oplist"-intent correctly.
Hence this patch, which implements the idea from PR 13746 to convert the `intent`-strings, used in various API-methods, into an *internal* renderingIntent that's implemented using a bit-field instead. *Please note:* This part of the patch, in itself, does *not* change the public API (but see below).
This patch is tagged `api-minor` for the following reasons:
1. It changes the *default* value for the `intent` parameter, in the `PDFPageProxy.getAnnotations` method, to "display" in order to be consistent across the API.
2. In order to get *all* annotations, with the `PDFPageProxy.getAnnotations` method, you now need to explicitly set "any" as the `intent` parameter.
3. The `PDFPageProxy.getOperatorList` method will now also support the new "any" intent, to allow accessing the operatorList of all annotations (limited to those types that have one).
4. Finally, for consistency across the API, the `PDFPageProxy.render` method also support the new "any" intent (although I'm not sure how useful that'll be).
Points 1 and 2 above are the significant, and thus breaking, changes in *default* behaviour here. However, unfortunately I cannot see a good way to improve the overall API while also keeping `PDFPageProxy.getAnnotations` unchanged.
Without this patch, when using `PDFPageView` directly[1] this CSS variable won't be updated and consequently things won't work as intended.
This is purposely implemented such that when a `PDFPageView`-instance is part of a viewer, we don't repeatedly set the CSS variable for every single page.
---
[1] See e.g. the "pageviewer" example in the `examples/components/` folder.
Given that issue 13862 tracks updating/modernizing the code, this patch purposely limits the scope of the changes. In particular, the following things are still left to address:
- The ESLint `no-undef` errors; for now the rule is simply disabled globally in this file.
- A couple of unused variables are commented out for now, but could perhaps just be removed.
The new command is a *variation* of the standard `gulp test` command and will run all unit/font/integration-tests just as normal, while *only* running ref-tests for XFA-documents to speed up development.
Given that we currently have (some) unit-tests for XFA-documents, and that we may also (in the future) want to add integration-tests, it thus makes sense to run all test-suites in my opinion.
*Please note:* Once this patch has landed, I'll submit a follow-up patch to https://github.com/mozilla/botio-files-pdfjs such that we can also run the new command on the bots.
Given how trivial the `isEOF` function is, we can simply inline the check at the various call-sites and remove the function (which ought to be ever so slightly more efficient as well).
Furthermore, this patch also changes the `EOF` primitive itself to a `Symbol` instead of an Object since that has the nice benefit of making it unclonable (thus preventing *accidentally* trying to send `EOF` from the worker-thread).
Given that the GitHub Advanced Security workflow now covers everything that LGTM does, but generally faster and with better GitHub-integration, there's no longer much point in also running LGTM separately.
As a follow-up to this patch, we should also disable/remove the LGTM-integration from the PDF.js repository.
The `viewerCssTheme` option was not rendered because its entry in
`preferences_schema.json` did not have a `title`.
The order of keys in `preferences_schema.json` determines the order of the
rendered preferences in the options UI. Since `viewerCssTheme` affects the UI
very significantly, I have moved the option to the top.
The only purpose, according to the README and existing files, is to
parse an integer from those lines, so (\d+) is sufficient for that. This
avoids potential exponential backtracking as flagged by CodeQL. I have
compared the output of the script with and without these changes and the
resulting files are the same.
Even though the code as-is *should* be safe, given that we're using an Object with a `null` prototype, it cannot hurt to change this to a Map to prevent any issues (since we're parsing unknown and potentially unsafe data).
Overall I also think that these changes improve the `parseQueryString` call-sites, since we now have a proper way of checking for the existence of a particular key (and don't have to use `in` which stringifies the keys in the Object).
This patch also changes the default, when no `value` exists, from `null` to an empty string since the use of `decodeURIComponent` currently can modify the value in a somewhat surprising way (at least to me).
Note how `decodeURIComponent(null) === "null"` which is unlikely to be what you actually want, whereas `decodeURIComponent("") === ""` which seems much more helpful.
This allows us to get the quality checks that LGTM does into GitHub
Advanced Security. Since it not only runs security checks anymore, the
workflow is also renamed to CodeQL to make this more explicit (and this
matches the documentation better).
This makes it consistent with the GitHub Advanced Security file and,
more importantly, ensures that all steps have a proper name for better
visibility.
While fixing issue 13794, I noticed that cancelling the `ReadableStream` returned by the `PDFPageProxy.streamTextContent`-method could lead to "Uncaught promise" messages in the console.[1]
Generally speaking, we don't really care about errors when *cancelling* a `ReadableStream` and it thus seems reasonable to simply suppress any output in those cases.
---
[1] Although, after that issue was fixed you'd now need to set the API-option `stopAtErrors = true` to actually trigger this.
This patch utilizes the same approach as used in lots of other parts of the code-base, which thus *slightly* reduces the size of this code.
By removing some of the (current) indirection, we can also simplify the JSDocs a little bit. Looking at the `gulp jsdoc` output, this actually seem to *improve* the documentation for this class.
Prior to PR 13042, when scripting wasn't really possible to use outside of the full viewer, the `enableScripting` option made sense.
However, at this point in time having to both pass in a `PDFScriptingManager`-instance *and* set the `enableScripting`-boolean when creating a `BaseViewer`-instance feels redundant and (mostly) annoying. Hence this patch, which removes the *separate* boolean and always enables scripting when `scriptingManager` is provided.
The relevant "viewer component" examples are also updated (with a comment), but in such a way that scripting support won't just break when used with the current PDF.js releases.
The PDF in bug 1721949 uses many unique pattern objects
that references the same shading many times. This caused
a new canvas pattern to be created and cached many times
driving up memory use.
To fix, I've changed the cache in the worker to key off the
shading object and instead send the shading and matrix
separately. While that worked well to fix the above bug,
there could be PDFs that use many shading that could
cause memory issues, so I've also added a LRU cache
on the main thread for canvas patterns. This should prevent
memory use from getting too high.
- All the scale factors in for the substitution font were wrong because of different glyph positions between Liberation and the other ones:
- regenerate all the factors
- Text may have polish chars for example and in this case the glyph widths were wrong:
- treat substitution font as a composite one
- add a map glyphIndex to unicode for Liberation in order to generate width array for cid font
- In order to better compute text fields size, use line height with no gaps (and consequently guessed height for text are slightly better in general).
- Fix default background color in fields.
Given that we've over time been reducing the number of `compatibilityParams` in use, there's now few enough left that I think it makes sense to simply inline them directly in the `web/app_options.js` file.
Note that we recently inlined/removed the separate `src/display/api_compatibility.js` file, see PR 13525, and that it (in my opinion) thus makes sense to do the same in the `web/`-folder. This patch will also slightly reduce the size of *built* `web/viewer.js` file, which cannot hurt.
For code that's part of the core library, rather than e.g. the `web/`-folder, we should always be careful about *directly* accessing any DOM methods.
The `navigator` is one such structure, which shouldn't be assumed to always be available and we should thus check that it's actually present.[1]
Hence this patch re-factors the `navigator.platform` access, in the `AnnotationLayer`-code, to ensure that it's generally safe. Furthermore, to reduce unnecessary repeated string-matching to determine the current platform, we're now using a shadowed getter which is evaluated only once instead (at first access).
---
[1] Note e.g. the `isSyncFontLoadingSupported` getter, in the `src/display/font_loader.js` file.
- an Image element was created, attached to its parent but the $globalData property was not set and that led to an error.
- the pdf in bug 1722029 has 27 rendered rows (checked in Acrobat) when only one was displayed: this patch some binding issues around the occur element.
This patch makes use of the existing `ignoreErrors` option, thus allowing a page to continue parsing/rendering even if (some of) its sub-streams are corrupt. Obviously this may cause *part* of a page to be broken/missing, however it should be better than (potentially) rendering nothing.
Also, to the best of my knowledge, this is the first bug of its kind that we've encountered.
To avoid having to pass in a bunch of, for a `BaseStream`-instance, mostly unrelated parameters when initializing a `StreamsSequenceStream`-instance, I settled on utilizing a callback function instead to allow conditional Error-suppression.
Note that the `StreamsSequenceStream`-class is a *special* stream-implementation that we only use when the `/Contents`-entry, in the `/Page`-dictionary, consists of an Array with streams.
There's no good reason, as far as I can tell, to explicitly define a bunch of methods to be `undefined`, which the current unconditional "copying" of methods will do.
Note that of the `OPS` ~23 percent don't, for various reasons, have an associated method on the `CanvasGraphics.prototype`.
For e.g. the `gulp mozcentral` command, the *built* `pdf.js` file decreases from `304 607` to `301 295` bytes with this patch. The improvement comes mostly from having less overall indentation in the code.
2021-07-25 13:14:58 +02:00
1382 changed files with 224247 additions and 106551 deletions
Mozilla takes the security of our software seriously. If you believe you have found a security vulnerability in PDF.js, please report it to us as described below.
## Reporting security vulnerabilities
**Please don't report security vulnerabilities through public GitHub issues.**
Instead, please report security vulnerabilities in [Bugzilla](https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&component=PDF%20Viewer&groups=firefox-core-security) and make sure that the checkbox in the "Security" section is checked so the required access controls are automatically configured:
The Mozilla security team will process the bug as described in [Mozilla's security bugs policy](https://www.mozilla.org/en-US/about/governance/policies/security-group/bugs).
Feel free to stop by our [Matrix room](https://chat.mozilla.org/#/room/#pdfjs:mozilla.org) for questions or guidance.
@ -23,9 +23,8 @@ Feel free to stop by our [Matrix room](https://chat.mozilla.org/#/room/#pdfjs:mo
### Online demo
Please note that the "Modern browsers" version assumes native support for
features such as e.g. `async`/`await`, `ReadableStream`, optional chaining, and
nullish coalescing.
Please note that the "Modern browsers" version assumes native support for the
latest JavaScript features; please also see [this wiki page](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support).
+ Modern browsers: https://mozilla.github.io/pdf.js/web/viewer.html
@ -41,7 +40,7 @@ PDF.js is built into version 19+ of Firefox.
+ The official extension for Chrome can be installed from the [Chrome Web Store](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm).
*This extension is maintained by [@Rob--W](https://github.com/Rob--W).*
+ Build Your Own - Get the code as explained below and issue `gulp chromium`. Then open
+ Build Your Own - Get the code as explained below and issue `npx gulp chromium`. Then open
Chrome, go to `Tools > Extension` and load the (unpackaged) extension from the
directory `build/chromium`.
@ -53,25 +52,21 @@ To get a local copy of the current code, clone it using git:
$ cd pdf.js
Next, install Node.js via the [official package](https://nodejs.org) or via
[nvm](https://github.com/creationix/nvm). You need to install the gulp package
globally (see also [gulp's getting started](https://github.com/gulpjs/gulp/blob/master/docs/getting-started.md#getting-started)):
$ npm install -g gulp-cli
If everything worked out, install all dependencies for PDF.js:
[nvm](https://github.com/creationix/nvm). If everything worked out, install
all dependencies for PDF.js:
$ npm install
Finally, you need to start a local web server as some browsers do not allow opening
PDF files using a `file://` URL. Run:
$ gulp server
$ npx gulp server
and then you can open:
+ http://localhost:8888/web/viewer.html
Please keep in mind that this requires a modern and fully up-to-date browser; refer to [Building PDF.js](https://github.com/mozilla/pdf.js/blob/master/README.md#building-pdfjs) for non-development usage of the PDF.js library.
Please keep in mind that this assumes the latest version of Mozilla Firefox; refer to [Building PDF.js](https://github.com/mozilla/pdf.js/blob/master/README.md#building-pdfjs) for non-development usage of the PDF.js library.
It is also possible to view all test PDF files on the right side by opening:
@ -82,11 +77,11 @@ It is also possible to view all test PDF files on the right side by opening:
In order to bundle all `src/` files into two production scripts and build the generic
viewer, run:
$ gulp generic
$ npx gulp generic
If you need to support older browsers, run:
$ gulp generic-legacy
$ npx gulp generic-legacy
This will generate `pdf.js` and `pdf.worker.js` in the `build/generic/build/` directory (respectively `build/generic-legacy/build/`).
Both scripts are needed but only `pdf.js` needs to be included since `pdf.worker.js` will
@ -95,7 +90,7 @@ be loaded by `pdf.js`. The PDF.js files are large and should be minified for pro
## Using PDF.js in a web application
To use PDF.js in a web application you can choose to use a pre-built version of the library
or to build it from source. We supply pre-built versions for usage with NPM and Bower under
or to build it from source. We supply pre-built versions for usage with NPM under
the `pdfjs-dist` name. For more information and examples please refer to the
[wiki page](https://github.com/mozilla/pdf.js/wiki/Setup-pdf.js-in-a-website) on this subject.
@ -112,7 +107,7 @@ You can play with the PDF.js API directly from your browser using the live demos
More examples can be found in the [examples folder](https://github.com/mozilla/pdf.js/tree/master/examples/). Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via `gulp dist-install` command.
More examples can be found in the [examples folder](https://github.com/mozilla/pdf.js/tree/master/examples/). Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via `npx gulp dist-install` command.
For an introduction to the PDF.js code, check out the presentation by our
We're currently working on <ahref="draft/index.html">better API docs</a>, but the API is well documented in [api.js](https://github.com/mozilla/pdf.js/blob/master/src/display/api.js).
The generated API documentation, from the inline comments in [api.js](https://github.com/mozilla/pdf.js/blob/master/src/display/api.js), is available below.
<iframesrc="draft/index.html"title="PDF.js API documentation"></iframe>
@ -37,26 +38,22 @@ Before downloading PDF.js please take a moment to understand the different layer
## Download
Please refer to [this wiki page](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support) for information about supported browsers.
<divclass="row">
<divclass="col-md-4">
<h3>Prebuilt</h3>
<h3>Prebuilt (modern browsers)</h3>
<p>
Includes the generic build of PDF.js and the viewer.
└── package.json - package definition and dependencies
```
## Trying the Viewer
With the prebuilt or source version, open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `gulp server`.
With the prebuilt or source version, open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `npx gulp server`.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.