125 Commits

Author SHA1 Message Date
Jonas Jenwald
bd05b255fa [api-major] Apply the userUnit using CSS, to fix the text/annotation layers (bug 1947248)
Rather than modifying the "raw" dimensions of the page, we'll instead apply the `userUnit` as an *additional* scale-factor via CSS.

*Please note:* It's not clear to me if this solution is fully correct either, or if there's other problems with it, but it at least *appears* to work.

---

With these changes, the following CSS variables are now assumed to be available/set as necessary: `--total-scale-factor`, `--scale-factor`, `--user-unit`, `--scale-round-x`, and `--scale-round-y`.
2025-02-11 14:36:06 +01:00
Jonas Jenwald
c8be02f2a7 [api-minor] Simplify the TextLayer.#getAscent fallback (PR 12896 follow-up)
At the time of PR 12896 the `fontBoundingBox{Ascent, Descent}` properties were not yet available by default in Fírefox, however that's no longer the case since Firefox 116; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1801198.

Hence this patch which replaces the "full" fallback with a warning and uses the `ascent`/`descent` values from the fonts in the PDF document (as we did previously). Obviously the TextLayer won't look as good in that case, but it's a simpler and shorter solution.
2025-02-01 10:11:57 +01:00
Calixte Denizet
a45e4a391a Use Calibri and Lucida Console, when it's possible, in place of sans-serif and monospaced (bug 1922063)
A recent change in Firefox induced too much difference between the text widths computed in using a Canvas
and the ones computed by the text layout engine when rendering the text layer. Consequently, the
text selection can be bad on Windows with some fonts like Arial or Consolas.
This patch is a workaround to try to use in first place some fonts which don't have the problem.
2024-10-05 20:45:25 +02:00
Jonas Jenwald
5b3d3c7dd9 Ensure that textLayers can be rendered in parallel, without interfering with each other
Note that the textContent is returned in "chunks" from the API, through the use of `ReadableStream`s, and on the main-thread we're (normally) using just one temporary canvas in order to measure the size of the textLayer `span`s; see the [`#layout`](5b4c2fe1a8/src/display/text_layer.js (L396-L428)) method.

*Order of events, for parallel textLayer rendering:*
 1. Call [`render`](5b4c2fe1a8/src/display/text_layer.js (L155-L177)) of the textLayer for page A.
 2. Immediately call `render` of the textLayer for page B.
 3. The first text-chunk for pageA arrives, and it's parsed/layout which means updating the cached [fontSize/fontFamily](5b4c2fe1a8/src/display/text_layer.js (L409-L413)) for the textLayer of page A.
 4. The first text-chunk for pageB arrives, which means updating the cached fontSize/fontFamily *for the textLayer of page B* since this data is unique to each `TextLayer`-instance.
 5. The second text-chunk for pageA arrives, and we don't update the canvas-font since the cached fontSize/fontFamily still apply from step 3 above.

Where this potentially breaks down is between the last steps, since we're using just one temporary canvas for all measurements but have *individual* fontSize/fontFamily caches for each textLayer.
Hence it's possible that the canvas-font has actually changed, despite the cached values suggesting otherwise, and to address this we instead cache the fontSize/fontFamily globally through a new (static) helper method.

*Note:* Includes a basic unit-test, using dummy text-content, which fails on `master` and passes with this patch.

Finally, pun intended, ensure that temporary textLayer-data is cleared *before* the `render`-promise resolves to avoid any intermittent problems in the unit-tests.
2024-09-11 15:28:51 +02:00
razh
665fff020e Fix ensureMinFontSizeComputed calculation if <body> is a flex container
Given:

```css
html,
body {
  height: 100%;
}

body {
  display: flex;
}
```

The `<div>` appended to the `<body>` will take up the full height of the
viewport due to the implicit `align-items: stretch` of flex containers.

This results in an incorrect computed `minFontSize` value.
2024-07-10 08:41:09 -04:00
Jonas Jenwald
f3d177e3e4 [api-minor] Remove the deprecated renderTextLayer and updateTextLayer functions (PR 18104 follow-up) 2024-06-30 15:16:00 +02:00
Nicolò Ribaudo
5b29e935e1
Overrride the minimum font size when rendering the text layer
Browsers have an accessibility option that allows user to enforce
a minimum font size for all text rendered in the page, regardless
of what the font-size CSS property says. For example, it can be
found in Firefox under `font.minimum-size.x-western`.

When rendering the <span>s in the text layer, this causes the
text layer to not be aligned anymore with the underlying canvas.
While normally accessibility features should not be worked around,
in this case it is *not* improving accessibility:
- the text is transparent, so making it bigger doesn't make it more
  readable
- the selection UX for users with that accessibility option enabled
  is worse than for other users (it's basically unusable).

While there is tecnically no way to ignore that minimum font size,
this commit does it by multiplying all the `font-size`s in the text
layer by minFontSize, and then scaling all the `<span>`s down by
1/minFontSize.
2024-06-25 14:58:08 +02:00
Calixte Denizet
ff6180a4c9 Add an option to enable/disable hardware acceleration (bug 1902012) 2024-06-12 18:41:07 +02:00
Jonas Jenwald
f2e7eee00e Don't register a pending TextLayer until render is invoked (PR 18104 follow-up)
After the re-factoring in PR 18104 there's now a *theoretical* risk that a pending `TextLayer` is never removed, which we can avoid by not registering it until `render` is invoked.
Note that this doesn't affect the viewer or tests, but if a third-party user calls `new TextLayer(...)` without a following call of either the `render`- or `cancel`-method we'd block global clean-up without this patch.
2024-05-26 18:38:40 +02:00
Aditi
9edca0a5ed Add lang attribute to canvas element
Fixes issue #16843.
In certain cases, the text layer was misaligned
due to a difference between the `lang` attribute
of the viewer and the canvas. This commit addresses
the problem by adding the `lang` attribute to the canvas.

The issue was caused because PDF.js uses serif/sans-serif
fonts to generate the text layer and relies on system fonts.
The difference in the `lang` attribute led to different fonts
being picked, causing the misalignment.
2024-05-21 19:41:24 +05:30
Jonas Jenwald
15b5808eee [api-minor] Re-factor the basic textLayer-functionality
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.

This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.

Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
2024-05-17 14:20:20 +02:00
Jonas Jenwald
d8e0fca609 Don't invoke cleanupTextLayer when there are pending textLayers
*Please note:* This doesn't really affect the viewer, but may affect the library API if multiple PDF documents are opened in parallel.

Since we clean-up "global" textLayer-data when destroying a PDF document, this means that other active PDFs could potentially break by invoking `cleanupTextLayer` unconditionally. Note that textLayer rendering is an asynchronous task, and we thus need to ensure those are all finished before running clean-up.
2024-05-17 08:52:10 +02:00
Jonas Jenwald
d5f3829f91 Actually disable TextLayerRenderTask.prototype.#processItems when MAX_TEXT_DIVS_TO_RENDER is reached (PR 18089 follow-up)
I broke this accidentally in PR 18089, sorry about that!
Note that since `#processItems` is private we can no longer just "replace" the method as was done in PR 18052.
2024-05-16 11:48:11 +02:00
Jonas Jenwald
036fd11ad7 Improve the TextLayerRenderTask implementation
- Change all possible semi-private methods into properly private ones. Note that this code is old enough to predate standard classes.

 - Move the `appendText` helper function into `TextLayerRenderTask`, as a private method, to avoid having to manually pass in the scope.

 - Simplify `#layoutText` by directly passing in all necessary data. This is possible after the changes PR 18052.
2024-05-14 14:10:17 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
8d86e18a32 Restore the MAX_TEXT_DIVS_TO_RENDER limit in the textLayer
This limit is currently completely non-functional, since the check happens *after* the entire textLayer has been parsed and appended to the DOM. It seems that this has been *accidentally* broken ever since the introduction of `ReadableStream` support.
The reason that this hasn't caused noticeable textLayer-related performance issues in practice is probably because we nowadays manage to coalesce the textLayer into fewer overall DOM elements, whereas years ago many PDF documents ended up with one DOM element *per* glyph.

By moving this check, and thus restoring the functionality, we're also able to remove the `render` helper function and simplify the code.
2024-05-07 13:04:00 +02:00
Jonas Jenwald
30840e411e Ensure that the textLayer styleCache is always cleared, even on failure
By also moving it to the `TextLayerRenderTask`-instance, we can avoid a bit of manual parameter passing.
2024-05-07 13:04:00 +02:00
Jonas Jenwald
049848ba00 Unify the ReadableStream and TextContent code-paths in src/display/text_layer.js
The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.
2024-05-07 13:03:57 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Calixte Denizet
f84f48b5d0 Avoid to have the text layer mismatching the rendered text with mismatching locales (bug 1869001)
The system locale (used in OffscreenCanvas) can be different from the one guessed by Fluent,
consequently, in order to avoid any mismatch, we just use an attached canvas element.
The original issue can easily be reproduced locally in adding a lang="ja" in viewer.html
(or with an other language for Japanese users).
2024-01-04 19:20:20 +01:00
Calixte Denizet
7851c0da8d [Debugger] Add some info about substitution font
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
2023-10-09 12:06:33 +02:00
Jonas Jenwald
f87ec67ab1 [api-major] Remove various deprecated functionality and options 2023-09-23 17:44:09 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00
Jonas Jenwald
4bf8e5c13d Tweak the --scale-factor CSS-variable warning threshold (issue 16254)
This is apparently needed to account for the rounding used in Chromium-browsers, such that the warning message isn't displayed unnecessarily.
2023-04-06 13:11:12 +02:00
Jonas Jenwald
8bf5e96af9 Only warn about missing --scale-factor CSS-variable for visible textLayers (PR 16162 follow-up)
This is something that I completely overlooked in PR 16162, which in some cases cause the default viewer to incorrectly print warnings.
This can be reproduced with the PAGE scrolling-mode, and/or the PresentationMode, and this patch simply work-around it by checking the visibility as well (since the warning is a best-effort solution anyway).
2023-03-20 12:51:26 +01:00
Jonas Jenwald
0e54a3c37a Warn about missing/incorrect --scale-factor CSS-variable in renderTextLayer (issue 16139)
Unfortunately I don't believe that we can simply add a default `--scale-factor` CSS-variable to the `container`-element, since that might not be entirely appropriate/correct in all cases.[1]
However, we can at least print a console-error to hopefully make this situation more apparent to users. (This is purposely not using the `warn` helper-function, since those messages can be disabled.)

---
[1] One example is in our reference-tests, where we don't need to add it to the `container`-element itself.
2023-03-16 11:53:12 +01:00
Jonas Jenwald
5075d0495b
Use OffscreenCanvas as intended for all code-paths in src/display/text_layer.js (PR 15722 follow-up)
Currently some `getCtx` calls will have `isOffscreenCanvasSupported === undefined` set, meaning that `OffscreenCanvas` isn't being used as intended, since no `TextLayerRenderTask._isOffscreenCanvasSupported` property exists.

*Please note:* This patch is written using the GitHub UI, since I'm currently without a dev machine, so hopefully it works correctly.
2023-02-24 11:29:58 +01:00
Jonas Jenwald
cafdc48147 [api-minor] Add a new PageViewport-getter to access the original, un-scaled, viewport dimensions
While reviewing recent patches, I couldn't help but noticing that we now have a lot of call-sites that manually access the `PageViewport.viewBox`-property.
Rather than repeating that verbatim all over the code-base, this patch adds a lazily computed and cached getter for this data instead.
2022-12-11 18:37:35 +01:00
Calixte Denizet
a989b5a879 Set the dimensions of the various layers at their creation
- Use a unique helper function in display/display_utils.js;
- Move those dimensions in css' side.
2022-12-10 14:35:06 +01:00
Jonas Jenwald
0274245e90 Remove the unused TextLayerRenderTask._renderingDone property (PR 15259 follow-up)
This is yet another property that I forgot to remove in PR 15259.
2022-12-05 11:49:14 +01:00
Jonas Jenwald
fe8fded23b [api-minor] Combine the textContent/textContentStream parameters
Rather than handling these parameters separately, which is a left-over from back when streaming of textContent was originally added, we can simply pass either data directly to the `TextLayer` and let it handle things accordingly.

Also, improves a few JSDoc comments and `typedef`-imports.
2022-12-04 21:22:14 +01:00
Calixte Denizet
eed9bf71c5 Refactor the text layer code in order to avoid to recompute it on each draw
The idea is just to resuse what we got on the first draw.
Now, we only update the scaleX of the different spans and the other values
are dependant of --scale-factor.
Move some properties in the CSS in order to avoid any updates in JS.
2022-12-01 18:42:43 +01:00
Jonas Jenwald
7c25b1b455 [api-minor] Remove the TextLayer timeout parameter (PR 15742 follow-up)
The deprecation is included in the current release, i.e. version `3.1.81`, and given the edge-case nature of this option I really don't think that we need to keep it deprecated for multiple releases.
2022-11-29 19:57:38 +01:00
Jonas Jenwald
b3e161c328 [api-minor] Deprecate the TextLayer timeout parameter
This has never really been used anywhere within the PDF.js library[1], and when streaming of textContent was introduced this parameter was effectively made redundant.
Note that when streaming of textContent is used, all text-layout has already happened by the time that this `timeout`-functionality is actually invoked (thus making it pointless).
While the `timeout`-functionality may still "work" when the textContent is provided upfront, although it's never been used/tested, streaming will generally perform better (in e.g. a viewer setting).

*Please note:* While unrelated here, also removes a now unused property that I forgot in PR 15259.

---
[1] At least not since the code was moved into its current file, which happened in PR 6619 and landed seven years ago.
2022-11-24 23:08:39 +01:00
Jonas Jenwald
1e7274e9c6 [api-minor] Move the handling of unbalanced markedContent to the worker-thread (PR 15630 follow-up) 2022-10-27 11:14:54 +02:00
Jonas Jenwald
980acddbfa Prevent textLayer errors in documents with unbalanced beginMarkedContent/endMarkedContent operators (issue 15629) 2022-10-26 18:35:48 +02:00
Jonas Jenwald
60f6272ed9 Use more for...of loops in the code-base
Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.
2022-10-03 13:08:38 +02:00
Jonas Jenwald
571ce13dd6 [api-major] Remove the enhanceTextSelection functionality (PR 15145 follow-up)
For the `gulp mozcentral` command, this reduces the size of the *built* `pdf.js` file by `> 10` kB.
2022-08-28 15:04:47 +02:00
Calixte Denizet
51c8e2f3ab Fix text selection with hdpi screens (#15229) 2022-07-28 19:44:13 +02:00
Jonas Jenwald
815c28da0e [api-minor] Deprecate the enhanceTextSelection functionality 2022-07-07 16:15:31 +02:00
Jonas Jenwald
c21f4faaf8 Reduce unnecessary usage of Array.prototype.concat()
There are obviously cases where using `concat` makes perfect sense, since that method doesn't change any of the existing Arrays; see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/concat

However, in a few cases throughout the code-base that's not an issue and using `concat` only leads to unnecessary intermediate allocations. With modern JavaScript we can thus replace those with a combination of `push` and spread-syntax, which wasn't originally possible when the code was written.
2022-06-19 13:40:52 +02:00
Jonas Jenwald
8129815538 Enable the unicorn/prefer-dom-node-append ESLint plugin rule
This rule will help enforce slightly shorter code, especially since you can insert multiple elements at once, and according to MDN `Element.append()` is available in all browsers that we currently support.

Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/API/Element/append
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-dom-node-append.md
2022-06-12 13:07:03 +02:00
Tim van der Meij
a57a4bc6c2
Merge pull request #15018 from Snuffleupagus/issue-15016
Expose `TextLayerRenderTask` in the TypeScript definitions (issue 15016, PR 14013 follow-up)
2022-06-10 22:18:35 +02:00
Tim van der Meij
f0b5aee6b8
Merge pull request #15014 from Snuffleupagus/prefer-at
Enable the `unicorn/prefer-at` ESLint plugin rule (PR 15008 follow-up)
2022-06-10 22:12:35 +02:00
Jonas Jenwald
e046b811b7 Expose TextLayerRenderTask in the TypeScript definitions (issue 15016, PR 14013 follow-up)
While `TextLayerRenderTask` apparently makes sense in TypeScript environments, given that it's being returned by the `renderTextLayer`-function in the API, we really don't want to extend the *public* API by simply exporting the class directly in `src/pdf.js` since it should never be called/initialized manually.
Hence we follow the same pattern as in PR 14013, and add some very basic unit-tests to ensure that `renderTextLayer` always returns a `TextLayerRenderTask`-instance as expected.
2022-06-10 22:12:32 +02:00
jerry1100
b716e82d18 Extend TextLayerRenderParameters.container type to include HTMLElement.
In PR #14717, the type was changed from a HTMLElement to a DocumentFragment.
This broke TypeScript projects that use a HTMLElement container.

To remedy this, we extend the type of container to also include HTMLElement.
2022-06-10 06:50:47 -07:00
Jonas Jenwald
9ac4536693 Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)
Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md
2022-06-09 21:21:19 +02:00
Jonas Jenwald
af5789125f Try to remove the mozOpaque canvas-property (PR 6551 follow-up)
According to MDN, see https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/mozOpaque, the `mozOpaque` canvas-property is not only non-standard (obviously) but it's also been deprecated.
Instead it's recommended to use `alpha = false` when getting the canvas-context, see https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/getContext#contextattributes, which all of our affected code is already doing.
2022-05-09 13:03:08 +02:00
Jonas Jenwald
7f0589c74a Change the type of the container property, in the TextLayerRenderParameters typedef (issue 14716)
Given that the textLayer-code has been using a `DocumentFragment` ever since PR 3356 (back in 2013), simply updating the type of the `container` property should be fine.
This patch also tries to, ever so slightly, improve the grammar of a couple of other properties in the typedef.
2022-03-24 22:42:37 +01:00
Calixte Denizet
61d1063276 Fix issues in text selection
- PR #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
  - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
    - no space are "drawn": it just moves the cursor but they aren't added in the chunk;
    - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
  - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
    - it was a pretty good idea in general but it fails with some fonts where space was too big:
    - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).
2021-10-17 16:27:05 +02:00