3010 Commits

Author SHA1 Message Date
Calixte Denizet
196affd8e0 Fix decoding of JPX images having an alpha channel
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
2024-06-03 20:08:11 +02:00
Calixte Denizet
9654ad570a Decompress when it's possible images in using DecompressionStream
Getting images is already asynchronous, so we can use this opportunity
to use DecompressStream (which is async too) to decompress images.
2024-06-02 14:00:05 +02:00
Calixte Denizet
6fa98ac99f [api-minor] Simplify how the list of points are structured
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
2024-05-30 15:36:15 +02:00
Jonas Jenwald
27436d52b2 Reduce indentation when parsing new annotations in getOperatorList
This code has, over the years, become more complex and less indentation generally helps readability.
2024-05-25 12:00:44 +02:00
Jonas Jenwald
ce52ce063e Change parsingType3Font to a getter (PR 14448 follow-up)
We can easily "compute" `parsingType3Font` from the `type3FontRefs`-value, and thus avoid having to separately track two related properties.
2024-05-25 10:46:12 +02:00
Jonas Jenwald
c349ac3a5d Skip the temporary variable when calling #findStreamLength (PR 18125 follow-up) 2024-05-25 10:38:32 +02:00
Jonas Jenwald
cfcb700ecc Prevent XRef errors from breaking font loading (bug 1898802)
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
2024-05-24 21:37:35 +02:00
Jonas Jenwald
3afa9bfc42 Improve /Page validation for linearized documents (issue 18138)
The referenced PDF document contains corrupt linearization-data, that doesn't point to the *first* page as intended.
2024-05-22 12:04:02 +02:00
Jonas Jenwald
57014d0d13 Support corrupt PDF documents that contain "endsteam" commands (issue 18122)
This patch also re-factors the findStreamLength-helper to avoid even more code duplication.
2024-05-21 13:38:17 +02:00
Jonas Jenwald
59637c1fa8
Merge pull request #18115 from Snuffleupagus/freeze-evaluatorOptions
Freeze `evaluatorOptions` in the src/core/pdf_manager.js file
2024-05-21 12:19:04 +02:00
Jonas Jenwald
440b4b6eeb Support charCodes larger than 32-bit in adjustMapping (issue 18117)
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
2024-05-20 12:13:55 +02:00
Jonas Jenwald
3cd6c6c0e6 Freeze evaluatorOptions in the src/core/pdf_manager.js file
Given that these options are passed from the API we don't want to accidentally modify them.
2024-05-18 15:16:12 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
4aee67227e Remove the unused Font.prototype.spaceWidth getter (PR 13424 follow-up)
This getter became unused in PR 13424, well over two years ago, and apparently none of us noticed that.
2024-05-11 11:50:51 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Jonas Jenwald
1b811ac113
Merge pull request #18034 from Snuffleupagus/FileSpec-filename-stripPath
[api-minor] Improve the `FileSpec` implementation
2024-05-03 09:03:17 +02:00
Jonas Jenwald
6c05f8b381 Add even more validation of width-data (PR 18017 follow-up)
I missed this case in PR 18017, sorry about that.
2024-05-02 11:24:15 +02:00
Jonas Jenwald
2b69fb76ac [api-minor] Improve the FileSpec implementation
- Check that the `filename` is actually a string, before parsing it further.
 - Use proper "shadowing" in the `filename` getter.
 - Add a bit more validation of the data in `pickPlatformItem`.
 - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
2024-05-01 18:02:05 +02:00
Jonas Jenwald
bf4e36d1b5 [api-minor] Expose the /Desc-attribute of file attachments in the viewer (issue 18030)
In the viewer this will be displayed in the `title` of the hyperlink, which is probably the best we can do here given how the viewer is implemented.
2024-05-01 09:02:11 +02:00
Jonas Jenwald
f6cd03955b [api-minor] Move the page reference/number caching into the API
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
2024-04-29 18:54:06 +02:00
Jonas Jenwald
2b2ade7883
Merge pull request #18018 from Snuffleupagus/CompiledFont-tweak-caching
Reduce code-duplication when caching data in `CompiledFont.getPathJs`
2024-04-29 17:39:35 +02:00
Jonas Jenwald
85ff8f34e2 Reduce code-duplication when caching data in CompiledFont.getPathJs 2024-04-29 13:18:31 +02:00
Jonas Jenwald
d411a072a4 Add more validation of width-data
The current `PartialEvaluator.extractWidths` implementation only contains *partial* validation of the width-data.
2024-04-29 10:51:16 +02:00
Jonas Jenwald
08eb0566f7 Validate additional font-dictionary properties 2024-04-29 08:21:28 +02:00
Calixte Denizet
551e63901c Simplify the way to pass the glyph drawing instructions from the worker to the main thread
and remove the use of eval in the font loader.
2024-04-27 21:28:31 +02:00
calixteman
d1f494d68c
Merge pull request #17986 from calixteman/fix_struct_tree
Allow to insert several annotations under the same parent in the structure tree
2024-04-24 18:32:00 +02:00
Calixte Denizet
45fa867577 Allow to insert several annotations under the same parent in the structure tree
While testing stamp insertion with the added pdf, I noticed that the tags using a MCID
weren't considered when trying to attach an annotation to it.
2024-04-24 16:23:05 +02:00
Jonas Jenwald
7206d0a237 Validate explicit destinations on the worker-thread to prevent DataCloneError (issue 17981)
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
2024-04-22 22:51:35 +02:00
Tim van der Meij
522af265a7
Merge pull request #17977 from Snuffleupagus/parseImageProperties-TypedArray
Update `JpxImage.parseImageProperties` to support TypedArray data in IMAGE_DECODERS builds
2024-04-22 18:31:31 +02:00
Jonas Jenwald
9e80c6d228
Merge pull request #17978 from Snuffleupagus/pr-17428-followup
Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)
2024-04-22 16:46:23 +02:00
Calixte Denizet
55f943c4fa Use the pdf.js warn when using jpx decoder
Fixes #17980.
2024-04-22 16:02:45 +02:00
Tim van der Meij
335d8394cd
Merge pull request #17979 from Snuffleupagus/image-errors-shorter-msg
[api-minor] Remove the image-related error message prefixes
2024-04-22 15:35:10 +02:00
Jonas Jenwald
912b57b95d [api-minor] Remove the image-related error message prefixes
Other custom errors, based on `BaseException`, do not use such a format.
2024-04-20 12:51:45 +02:00
Jonas Jenwald
91898e5923 Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
2024-04-20 11:10:09 +02:00
Jonas Jenwald
8970786d5b Update JpxImage.parseImageProperties to support TypedArray data in IMAGE_DECODERS builds
Given that the `decode` method only returns the actual image-data, a user would now need to invoke `parseImageProperties` to obtain e.g. the width and height.
This method only accepts `BaseStream`-instances, which are (obviously) not exposed, hence we extend it in IMAGE_DECODERS builds to wrap TypedArray data into the expected format.
2024-04-20 09:38:52 +02:00
Calixte Denizet
901d995a7e Correctly update the xref table when an annotation is deleted 2024-04-18 21:27:39 +02:00
Tim van der Meij
7290faf840
Merge pull request #17956 from calixteman/jpx_exceptions
[JPX] Throw an exception with the error messages returned by openjpeg
2024-04-16 20:48:23 +02:00
Calixte Denizet
ebcae3014c [JPX] Throw an exception with the error messages returned by openjpeg 2024-04-16 19:02:24 +02:00
Tim van der Meij
c08b09d3b9
Fix JpxImage API issues (PR 17946 follow-up)
This commit changes the `JpxImage.decode` method signature to define the
`ignoreColorSpace` argument as optional with a default value. Note that
we already set this default value in the `getBytes` method of the
`src/core/decode_stream.js` file since this option only seems useful for
certain special cases and therefore shouldn't be mandatory to provide.

Moreover, the JPX fuzzer is changed to use the new `JpxImage` API.
2024-04-16 18:02:47 +02:00
calixteman
12c4119cbd
Merge pull request #17946 from calixteman/openjpeg
[api-minor] Add a jpx decoder based on OpenJPEG 2.5.2
2024-04-16 13:41:58 +02:00
Calixte Denizet
2e83cfbbc1 [api-minor] Add a jpx decoder based on OpenJPEG 2.5.2
The decoder is compiled in WASM:
https://github.com/mozilla/pdf.js.openjpeg

Fixes #17289, #17061, #16485, #13051, #6365, #4648, #12213.
2024-04-16 12:54:36 +02:00
Jonas Jenwald
a41bb40fbb [api-minor] Update the minimum supported Safari version to 16.4
This patch updates the minimum supported browsers as follows:
 - Safari 16.4, which was released on 2023-03-27; see https://developer.apple.com/documentation/safari-release-notes/safari-16_4-release-notes

Nowadays we usually we try, where feasible and possible, to support browsers/environments that are about two years old. The reasons for limiting support to a slightly more recent Safari version include:
 - Safari has always been slower, compared to other browsers, at implementing e.g. new JavaScript features.
 - Trying to provide support for Safari is often difficult, and over the years we have seen *a lot* of bugs that are specific to Safari.
 - Safari is, and has been for many years, only listed as "mostly" supported in the FAQ.
 - This allows us to remove feature-testing, only relevant to Safari, from the main code-base.

By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the built-in Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js core contributors.

*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
2024-04-15 12:44:37 +02:00
Calixte Denizet
acc56491c9 Warn when a non-embedded font has an invalid name
It can be helpful to find out some heuristics when trying
to find a substitution font.
2024-04-12 13:59:18 +02:00
Calixte Denizet
52ea2333b3 Remove the tag for missing font subset when trying to find a substitution
Fixes #17929.
2024-04-11 20:34:28 +02:00
Calixte Denizet
41aaa083df Don't render annotations with a null dimension
Fixes #17906.
2024-04-09 16:03:49 +02:00
Tim van der Meij
d01a0bd0c8
Fix annotation border style parsing by handling empty dash arrays
The PDF specification states that empty dash arrays, i.e. arrays with
zero elements, are in fact valid. In that case the dash array simply
corresponds to a solid, unbroken line. However, this case was erroneously
being flagged as invalid and therefore the annotation was not drawn
because its width was set to zero. This commit fixes the issue by
allowing dash arrays to have a length of zero.
2024-04-08 16:34:27 +02:00
Calixte Denizet
3f2f98336e Update the current stride before composing when decoding a text region
Fixes #17871.

We do something similar to:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/pdfium/core/fxcodec/jbig2/JBig2_TrdProc.cpp;l=373-379;drc=24c6be6924df3ff585bb63f6aed4e2c81e791fb2
2024-04-03 18:44:50 +02:00
Calixte Denizet
8f5d907a52 Don't translate char codes when platform,encoding isn't (3,0) 2024-04-03 16:08:11 +02:00