616 Commits

Author SHA1 Message Date
Jonas Jenwald
0edfd29a3e Improve text-selection for Type3 fonts, using d0 operators, with empty /FontBBox-entries (issue 19624)
For Type3 glyphs with `d1` operators it's easy to compute a fallback bounding box, however for `d0` the situation is more difficult.
Given that we nowadays compute the min/max of basic path-rendering operators on the worker-thread, we can utilize that by parsing these Type3 operatorLists to guess a more suitable fallback bounding box.
2025-03-10 16:21:54 +01:00
Jonas Jenwald
d8d7235876 Simplify the ColorSpaceUtils.singletons handling (PR 19564 follow-up)
With the changes in PR 19564 the actual `ColorSpace`-classes where separated from the various static "helper" methods.
Hence it seems that we can now simplify/shorten this old code to instead cache the "standard" ColorSpaces directly on the `ColorSpaceUtils`-class.
2025-03-05 15:02:05 +01:00
Jonas Jenwald
fbf1f2ba15 Remove ColorSpaceUtils.parseAsync and simplify the ColorSpace "API-surface"
This patch reduces the number of `ColorSpaceUtils` static-methods, and in particular the `parseAsync` method is removed and it's now instead possible to have `parse` optionally return a Promise.
This thus removes the need to manually check if a `ColorSpace`-instance is cached, note the changes in the `src/core/evaluator.js` file.
2025-03-05 12:43:58 +01:00
Calixte Denizet
971be48b60 Support using ICC profiles in using qcms (bug 860023) 2025-03-05 10:29:59 +01:00
Jonas Jenwald
4be79748c9 Add a GlobalColorSpaceCache to reduce unnecessary re-parsing
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
2025-03-01 14:21:05 +01:00
Jonas Jenwald
bdfa96878d Invoke TranslatedFont.prototype.loadType3Data only *once* per font
Currently we're first loading the font, and then for Type3 fonts we're invoking `loadType3Data` every time that the font is encountered.
That seems completely unnecessary, and it's probably connected to the age of this code, since the `loadType3Data`-method will only run once anyway (note the caching).
2025-02-26 15:17:11 +01:00
Jonas Jenwald
d428db63c3 Improve the "FontFallback" handling on the worker-thread
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
2025-02-24 09:34:58 +01:00
Jonas Jenwald
839e23f5c2 Send disableFontFace and fontExtraProperties as part of the exported font-data
These options are needed in the `FontFaceObject` class, and indirectly in `FontLoader` as well, which means that we currently need to pass them around manually in the API.
Given that the options are (obviously) available on the worker-thread, it's very easy to just provide them when creating `Font`-instances and then send them as part of the exported font-data. This way we're able to simplify the code (primarily on the main-thread), and note that `Font`-instances even had a `disableFontFace`-field already (but it wasn't properly initialized).
2025-02-24 09:34:48 +01:00
Jonas Jenwald
641e2f506e [api-minor] Re-factor how the useWorkerFetch option is used internally
With the recently added OpenJPEG no-wasm fallback we need to send the `wasmUrl` option to the worker-thread *regardless* of the value of the `useWorkerFetch` option, since the fallback won't work if we don't have a URL to `import` it from.
For consistency the code is re-factored to always send the factory-urls to the worker-thread, and simply check the `useWorkerFetch` option there instead.

Also, as a follow-up to PR 19525, introduce a new `useWasm` option that can be used in e.g. browser-tests to forcibly disable WebAssembly usage.
2025-02-22 09:56:53 +01:00
Jonas Jenwald
36979e9eb2 Fix all outstanding ESLint arrow-body-style warnings
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
2025-02-17 15:45:44 +01:00
Jonas Jenwald
88e5da1e37 Combine the main-thread message handlers for CMap-, StandardFontData-, and Wasm-files
Currently we have three separate and virtually identical message handlers for this data, which can easily be combined into a single message handler instead.
2025-02-07 14:33:15 +01:00
Jonas Jenwald
db53320da8 Initialize the image-options, on the worker-thread, once per document
Currently we're initializing the image-options for every page, which seems unnecessary since it should suffice to do that once per document.

Also, changes the `BasePdfManager` constructor to improve readability/documentation a little bit.
2025-01-30 11:52:15 +01:00
Jonas Jenwald
6038b5a992 Handle JPX wasm fetch-response errors correctly (PR 19329 follow-up)
Currently we're not checking that the response is actually OK before getting the data, which means that rather than throwing an error we can get an empty `ArrayBuffer`.

To avoid duplicating code we can move an existing helper into `src/core/core_utils.js` and re-use it when fetching the JPX wasm-file as well.
2025-01-17 10:20:16 +01:00
Calixte Denizet
94b4b54ef6 [api-major] Add openjpeg.wasm to pdf.js (bug 1935076)
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.

So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
2025-01-16 21:09:50 +01:00
Jonas Jenwald
74c1795c9f Use Dict iteration more (PR 19051 follow-up)
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
2025-01-02 15:09:19 +01:00
Jonas Jenwald
20d5332009 For images that include SMask/Mask entries, ignore an SMask defined in the current graphics state
From section [11.6.4.3 Mask Shape and Opacity](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G10.4848628) in the PDF specification:
 - An image XObject may contain its own *soft-mask image* in the form of a subsidiary image XObject in the `SMask` entry of the image dictionary (see "Image Dictionaries"). This mask, if present, shall override any explicit or colour key mask specified by the image dictionary's `Mask` entry. Either form of mask in the image dictionary shall override the current soft mask in the graphics state.
2024-12-30 14:25:07 +01:00
Tim van der Meij
e2bbcb544a
Merge pull request #19045 from Snuffleupagus/api-rm-isChrome
[api-minor] Disable `ImageDecoder` usage by default in Chromium browsers
2024-11-17 16:32:48 +01:00
Jonas Jenwald
c082169cae Enable the ESLint no-var rule in the src/core/evaluator.js file
This was previously attempted in PR 13371, but had to be reverted because of issues related to SystemJS (which has since been removed).

Also, while unrelated, shortens an existing conditional assignment.
2024-11-15 12:36:51 +01:00
Jonas Jenwald
471284f51b [api-minor] Disable ImageDecoder usage by default in Chromium browsers
Given that there are multiple issues with `ImageDecoder` in Chromium browsers, affecting both BMP and JPEG images, for now we (by default) disable that functionality there to avoid problems.

This also means that we can remove the previously added, and separate, `isChrome` API-option.
2024-11-14 12:05:15 +01:00
Jonas Jenwald
9bf9bbda0b
Merge pull request #19031 from Snuffleupagus/api-isImageDecoderSupported
[api-minor] Add a `getDocument` option to disable `ImageDecoder` usage
2024-11-13 09:19:05 +01:00
Jonas Jenwald
65eedfb0fc [api-minor] Add a getDocument option to disable ImageDecoder usage
This allows end-users to forcibly disable `ImageDecoder` usage, even if the browser appears to support it (similar to the pre-existing option for `OffscreenCanvas`).
2024-11-12 17:12:42 +01:00
Jonas Jenwald
16e86878d2 Add a PartialEvaluator helper for fetching CMap and Standard Font data
This avoids a little bit of code duplication, which cannot hurt.
2024-11-11 11:57:28 +01:00
Calixte Denizet
b649b6f8dd Use a BMP decoder when resizing an image
The image decoding won't block the main thread any more.
For now, it isn't enabled for Chrome because issue6741.pdf leads to a crash.
2024-10-28 14:09:52 +01:00
Jonas Jenwald
b048420d21 [api-minor] Remove the CMapCompressionType enumeration
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
2024-10-24 11:08:16 +02:00
Jonas Jenwald
50c291eb33 Unconditionally cache built-in CMaps on the worker-thread
Given that we've not shipped, nor used, anything except binary CMaps for years let's just cache them unconditionally (since that's a tiny bit less code).
2024-10-24 10:15:09 +02:00
Jonas Jenwald
236c8d862e Re-factor how we handle missing, corrupt, or empty font-file entries
This improves the fixes for e.g. issue 9462 and 18941 slightly and allows better fallback behaviour for non-standard fonts.
2024-10-22 17:07:12 +02:00
Jonas Jenwald
63b34114b1 Fallback to a standard font if a font-file entry doesn't contain a Stream (issue 18941)
The PDF document is clearly corrupt, since it has /FontFile2 entries that are Dictionaries which obviously isn't correct.
While there's obviously no guarantee that things will look perfect this way, actually rendering the text at all should be an improvement in general.
2024-10-22 11:51:28 +02:00
Calixte Denizet
e7ab8cd8c1 Fallback on gray colorspace when there are no colorspace and no name in the scn/SCN arguments
It fixes #18894.
2024-10-13 16:02:07 +02:00
Jonas Jenwald
67af371e58 Ignore non-existing /Shading resources during parsing (issue 18765) 2024-09-19 21:55:02 +02:00
Calixte Denizet
482994cc04 Use a transparent color when setting fill/stroke colors in a pattern context but with no colorspace 2024-07-22 09:56:10 +02:00
alexcat3
1c364422a6 Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
Add unit test to check compatability with such cmaps

In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
2024-07-06 11:29:21 -04:00
Calixte Denizet
8c9a665728 Always use DW if it's a number for the font default width (bug 1903731) 2024-06-20 15:33:34 +02:00
Jonas Jenwald
604e8977e9 Add a helper function for handling locally cached image data (PR 18269 follow-up)
This avoids having to duplicate the same exact code multiple times.
2024-06-18 17:20:40 +02:00
Jonas Jenwald
22ca7d52d3 Ensure that dependencies are added to the operatorList for locally cached images (issue 18259) 2024-06-18 12:25:53 +02:00
Jonas Jenwald
ce52ce063e Change parsingType3Font to a getter (PR 14448 follow-up)
We can easily "compute" `parsingType3Font` from the `type3FontRefs`-value, and thus avoid having to separately track two related properties.
2024-05-25 10:46:12 +02:00
Jonas Jenwald
cfcb700ecc Prevent XRef errors from breaking font loading (bug 1898802)
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
2024-05-24 21:37:35 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Jonas Jenwald
6c05f8b381 Add even more validation of width-data (PR 18017 follow-up)
I missed this case in PR 18017, sorry about that.
2024-05-02 11:24:15 +02:00
Jonas Jenwald
d411a072a4 Add more validation of width-data
The current `PartialEvaluator.extractWidths` implementation only contains *partial* validation of the width-data.
2024-04-29 10:51:16 +02:00
Jonas Jenwald
08eb0566f7 Validate additional font-dictionary properties 2024-04-29 08:21:28 +02:00
Calixte Denizet
551e63901c Simplify the way to pass the glyph drawing instructions from the worker to the main thread
and remove the use of eval in the font loader.
2024-04-27 21:28:31 +02:00
Jonas Jenwald
91898e5923 Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
2024-04-20 11:10:09 +02:00
Calixte Denizet
52ea2333b3 Remove the tag for missing font subset when trying to find a substitution
Fixes #17929.
2024-04-11 20:34:28 +02:00
Tim van der Meij
2e5282928f
Merge pull request #17854 from Snuffleupagus/rm-PromiseCapability
[api-minor] Replace the `PromiseCapability` with  `Promise.withResolvers()`
2024-04-02 15:21:43 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Jonas Jenwald
07a8836ab2 Ensure that Mesh /Shadings have non-zero width/height (issue 17848) 2024-03-29 22:58:25 +01:00
Calixte Denizet
9c3471dd01 Don't render corrupted inlined images
Fixes #17794.
2024-03-15 15:33:18 +01:00