3044 Commits

Author SHA1 Message Date
Jonas Jenwald
8728f7f134 Support an odd number of digits in hexadecimal strings (issue 18645)
See https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1840792
2024-08-23 16:31:43 +02:00
Jonas Jenwald
908f453384
Merge pull request #18627 from richard-smith-preservica/rcs/send-page-dict-requests-in-parallel
Send fetch requests for all page dict lookups in parallel
2024-08-21 13:58:03 +02:00
Richard Smith (smir)
a67b9aec6c Send fetch requests for all page dict lookups in parallel
- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
 - Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
 - PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
2024-08-21 11:08:14 +01:00
Jonas Jenwald
6dd31183be Use standard glyph mapping for non-embedded and non-composite Calibri fonts (issue 18208)
Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font.
Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see b47c7eca83/src/core/fonts.js (L1186-L1190)
2024-08-19 19:10:35 +02:00
Jonas Jenwald
aebb8534f3 Limit base-class initialization checks to development and TESTING modes
We have a number of base-classes that are only intended to be extended, but never to be used directly. To help enforce this during development these base-class constructors will check for direct usage, however that code is obviously not needed in the actual builds.

*Note:* This patch reduces the size of the `gulp mozcentral` output by `~2.7` kilo-bytes, which isn't a lot but still cannot hurt.
2024-08-12 12:26:35 +02:00
Calixte Denizet
5f95d9b1ba [Editor] Allow Float32Array for quadpoints in annotations (bug 1907958)
Added annotations could have some quadpoints (highlight, ink).
The isNumberArray check was returning false and consequently the annotation wasn't
printable.
The tests didn't catch this issue because the quadpoints were passed as Array.
So driver.js has been updated in order to pass them as Float32Array in order
to be in a situation similar to the real life one.
2024-07-31 16:23:01 +02:00
Jonas Jenwald
c4cd405a8f Ignore non-dictionary nodes when parsing StructTree data (issue 18503) 2024-07-28 12:08:44 +02:00
Calixte Denizet
c3065629ca [Editor] Correctly save a non-ascii alt text 2024-07-24 19:13:45 +02:00
Calixte Denizet
482994cc04 Use a transparent color when setting fill/stroke colors in a pattern context but with no colorspace 2024-07-22 09:56:10 +02:00
Calixte Denizet
71bae38afb Fallback on DeviceGray when a colorspace cannot be parsed 2024-07-21 17:56:31 +02:00
Jonas Jenwald
d24a61c648 Allow /XYZ destinations without zoom parameter (issue 18408)
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-18 13:29:32 +02:00
Tim van der Meij
c77b97daff
Update the JS/CSS files for the new Prettier/Stylelint versions 2024-07-13 16:29:47 +02:00
calixteman
9b1b5ff7e7
Merge pull request #18419 from calixteman/reuse_old_dict_when_updating
[Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext
2024-07-11 11:24:15 +02:00
Jonas Jenwald
e8d35c25ee
Merge pull request #18412 from Snuffleupagus/issue-18059
Also update the width/unicode data when replacing missing glyphs in non-embedded Type1 fonts (issue 18059)
2024-07-11 10:52:17 +02:00
Calixte Denizet
6711123f68 [Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext 2024-07-11 10:44:21 +02:00
Jonas Jenwald
403d023617 Allow e.g. /FitH destinations without additional parameter (bug 1907000)
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-11 10:36:44 +02:00
Jonas Jenwald
56653e5770 Also update the width/unicode data when replacing missing glyphs in non-embedded Type1 fonts (issue 18059)
*Please note:* This causes a little bit of movement in the `issue2770` test-case, however this matches the rendering in both Adobe Reader and PDFium.
2024-07-09 09:41:01 +02:00
Jonas Jenwald
f9d63201eb Revert "Remove the unused Font.prototype.spaceWidth getter (PR 13424 follow-up)"
This reverts commit 4aee67227e3c5b28f1bb4d8fa6b2ad882bc23b5a.
2024-07-09 09:28:10 +02:00
alexcat3
1c364422a6 Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
Add unit test to check compatability with such cmaps

In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
2024-07-06 11:29:21 -04:00
Tim van der Meij
ccb141e211
Merge pull request #18393 from Snuffleupagus/mustBeViewedWhenEditing-params
Check the relevant parameters inside of the `mustBeViewedWhenEditing` method
2024-07-05 15:33:45 +02:00
Jonas Jenwald
38528d1116 Remove the renderForms parameter from the Annotation getOperatorList methods
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
2024-07-05 12:25:18 +02:00
Jonas Jenwald
5f744904ac Check the relevant parameters inside of the mustBeViewedWhenEditing method
Similar to the `mustBeViewed` method, we can check the relevant parameters within the `mustBeViewedWhenEditing` method itself since that (in my opinion) slightly helps readability of the code in the `src/core/document.js` file.
2024-07-05 11:38:55 +02:00
Jonas Jenwald
a4ffc1066c Move the internal API/Worker isEditing-state into RenderingIntentFlag
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
2024-07-04 23:34:30 +02:00
Calixte Denizet
832fc93aa4 Use vertical variant of a char when it's in a missing vertical font (bug 1905623) 2024-07-03 09:46:54 +02:00
Calixte Denizet
64635f3b35 [api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
 - if the annotation has to be composed with the page then the canvas must be correctly
   composed with its parent. That means we should move the canvas under canvasWrapper
   and we should extract composing info from the drawing instructions...
   Currently it's the case with highlight annotations.
 - we use some extra memory for those canvas even if the user will never edit them, which
   the case for example when opening a pdf in Fenix.

So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
2024-07-02 14:11:40 +02:00
Calixte Denizet
42bb2b0737 Fix the computation of unitsPerEm when the fontMatrix has some negative coefficients
It's a follow-up of #18253.
2024-06-24 16:40:07 +02:00
calixteman
a081dd25eb
Merge pull request #18306 from calixteman/bug1903731
Always use DW if it's a number for the font default width (bug 1903731)
2024-06-20 16:52:29 +02:00
Calixte Denizet
8c9a665728 Always use DW if it's a number for the font default width (bug 1903731) 2024-06-20 15:33:34 +02:00
Jonas Jenwald
3fae2d71f6 Don't throw if there's not enough data to get the header in FlateStream (issue 18298)
Following in the footsteps of PR 17340.
2024-06-20 13:03:17 +02:00
Jonas Jenwald
604e8977e9 Add a helper function for handling locally cached image data (PR 18269 follow-up)
This avoids having to duplicate the same exact code multiple times.
2024-06-18 17:20:40 +02:00
Jonas Jenwald
22ca7d52d3 Ensure that dependencies are added to the operatorList for locally cached images (issue 18259) 2024-06-18 12:25:53 +02:00
Calixte Denizet
d1452206d9 Compute correctly the unitsPerEm value from the fontMatrix when converting a font (bug 1539074) 2024-06-15 17:51:34 +02:00
Calixte Denizet
ff6180a4c9 Add an option to enable/disable hardware acceleration (bug 1902012) 2024-06-12 18:41:07 +02:00
Calixte Denizet
e3faa40f0f Don't display annotations with a PMD (barcode stuff) entry (bug 1899804)
There's no specification for that (even if it's possible to have an idea from
the xfa specs) so we just want to hide them in order to avoid to display something
wrong.
2024-06-10 21:01:37 +02:00
Calixte Denizet
196affd8e0 Fix decoding of JPX images having an alpha channel
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
2024-06-03 20:08:11 +02:00
Calixte Denizet
9654ad570a Decompress when it's possible images in using DecompressionStream
Getting images is already asynchronous, so we can use this opportunity
to use DecompressStream (which is async too) to decompress images.
2024-06-02 14:00:05 +02:00
Calixte Denizet
6fa98ac99f [api-minor] Simplify how the list of points are structured
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
2024-05-30 15:36:15 +02:00
Jonas Jenwald
27436d52b2 Reduce indentation when parsing new annotations in getOperatorList
This code has, over the years, become more complex and less indentation generally helps readability.
2024-05-25 12:00:44 +02:00
Jonas Jenwald
ce52ce063e Change parsingType3Font to a getter (PR 14448 follow-up)
We can easily "compute" `parsingType3Font` from the `type3FontRefs`-value, and thus avoid having to separately track two related properties.
2024-05-25 10:46:12 +02:00
Jonas Jenwald
c349ac3a5d Skip the temporary variable when calling #findStreamLength (PR 18125 follow-up) 2024-05-25 10:38:32 +02:00
Jonas Jenwald
cfcb700ecc Prevent XRef errors from breaking font loading (bug 1898802)
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
2024-05-24 21:37:35 +02:00
Jonas Jenwald
3afa9bfc42 Improve /Page validation for linearized documents (issue 18138)
The referenced PDF document contains corrupt linearization-data, that doesn't point to the *first* page as intended.
2024-05-22 12:04:02 +02:00
Jonas Jenwald
57014d0d13 Support corrupt PDF documents that contain "endsteam" commands (issue 18122)
This patch also re-factors the findStreamLength-helper to avoid even more code duplication.
2024-05-21 13:38:17 +02:00
Jonas Jenwald
59637c1fa8
Merge pull request #18115 from Snuffleupagus/freeze-evaluatorOptions
Freeze `evaluatorOptions` in the src/core/pdf_manager.js file
2024-05-21 12:19:04 +02:00
Jonas Jenwald
440b4b6eeb Support charCodes larger than 32-bit in adjustMapping (issue 18117)
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
2024-05-20 12:13:55 +02:00
Jonas Jenwald
3cd6c6c0e6 Freeze evaluatorOptions in the src/core/pdf_manager.js file
Given that these options are passed from the API we don't want to accidentally modify them.
2024-05-18 15:16:12 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
4aee67227e Remove the unused Font.prototype.spaceWidth getter (PR 13424 follow-up)
This getter became unused in PR 13424, well over two years ago, and apparently none of us noticed that.
2024-05-11 11:50:51 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00