595 Commits

Author SHA1 Message Date
Jonas Jenwald
16e86878d2 Add a PartialEvaluator helper for fetching CMap and Standard Font data
This avoids a little bit of code duplication, which cannot hurt.
2024-11-11 11:57:28 +01:00
Calixte Denizet
b649b6f8dd Use a BMP decoder when resizing an image
The image decoding won't block the main thread any more.
For now, it isn't enabled for Chrome because issue6741.pdf leads to a crash.
2024-10-28 14:09:52 +01:00
Jonas Jenwald
b048420d21 [api-minor] Remove the CMapCompressionType enumeration
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps.
Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
2024-10-24 11:08:16 +02:00
Jonas Jenwald
50c291eb33 Unconditionally cache built-in CMaps on the worker-thread
Given that we've not shipped, nor used, anything except binary CMaps for years let's just cache them unconditionally (since that's a tiny bit less code).
2024-10-24 10:15:09 +02:00
Jonas Jenwald
236c8d862e Re-factor how we handle missing, corrupt, or empty font-file entries
This improves the fixes for e.g. issue 9462 and 18941 slightly and allows better fallback behaviour for non-standard fonts.
2024-10-22 17:07:12 +02:00
Jonas Jenwald
63b34114b1 Fallback to a standard font if a font-file entry doesn't contain a Stream (issue 18941)
The PDF document is clearly corrupt, since it has /FontFile2 entries that are Dictionaries which obviously isn't correct.
While there's obviously no guarantee that things will look perfect this way, actually rendering the text at all should be an improvement in general.
2024-10-22 11:51:28 +02:00
Calixte Denizet
e7ab8cd8c1 Fallback on gray colorspace when there are no colorspace and no name in the scn/SCN arguments
It fixes #18894.
2024-10-13 16:02:07 +02:00
Jonas Jenwald
67af371e58 Ignore non-existing /Shading resources during parsing (issue 18765) 2024-09-19 21:55:02 +02:00
Calixte Denizet
482994cc04 Use a transparent color when setting fill/stroke colors in a pattern context but with no colorspace 2024-07-22 09:56:10 +02:00
alexcat3
1c364422a6 Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
Add unit test to check compatability with such cmaps

In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
2024-07-06 11:29:21 -04:00
Calixte Denizet
8c9a665728 Always use DW if it's a number for the font default width (bug 1903731) 2024-06-20 15:33:34 +02:00
Jonas Jenwald
604e8977e9 Add a helper function for handling locally cached image data (PR 18269 follow-up)
This avoids having to duplicate the same exact code multiple times.
2024-06-18 17:20:40 +02:00
Jonas Jenwald
22ca7d52d3 Ensure that dependencies are added to the operatorList for locally cached images (issue 18259) 2024-06-18 12:25:53 +02:00
Jonas Jenwald
ce52ce063e Change parsingType3Font to a getter (PR 14448 follow-up)
We can easily "compute" `parsingType3Font` from the `type3FontRefs`-value, and thus avoid having to separately track two related properties.
2024-05-25 10:46:12 +02:00
Jonas Jenwald
cfcb700ecc Prevent XRef errors from breaking font loading (bug 1898802)
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
2024-05-24 21:37:35 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Jonas Jenwald
6c05f8b381 Add even more validation of width-data (PR 18017 follow-up)
I missed this case in PR 18017, sorry about that.
2024-05-02 11:24:15 +02:00
Jonas Jenwald
d411a072a4 Add more validation of width-data
The current `PartialEvaluator.extractWidths` implementation only contains *partial* validation of the width-data.
2024-04-29 10:51:16 +02:00
Jonas Jenwald
08eb0566f7 Validate additional font-dictionary properties 2024-04-29 08:21:28 +02:00
Calixte Denizet
551e63901c Simplify the way to pass the glyph drawing instructions from the worker to the main thread
and remove the use of eval in the font loader.
2024-04-27 21:28:31 +02:00
Jonas Jenwald
91898e5923 Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
2024-04-20 11:10:09 +02:00
Calixte Denizet
52ea2333b3 Remove the tag for missing font subset when trying to find a substitution
Fixes #17929.
2024-04-11 20:34:28 +02:00
Tim van der Meij
2e5282928f
Merge pull request #17854 from Snuffleupagus/rm-PromiseCapability
[api-minor] Replace the `PromiseCapability` with  `Promise.withResolvers()`
2024-04-02 15:21:43 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Jonas Jenwald
07a8836ab2 Ensure that Mesh /Shadings have non-zero width/height (issue 17848) 2024-03-29 22:58:25 +01:00
Calixte Denizet
9c3471dd01 Don't render corrupted inlined images
Fixes #17794.
2024-03-15 15:33:18 +01:00
Calixte Denizet
a6eadf8150 Avoid to access to a missing cidSystemInfo property
Fixes #17689.
2024-02-19 09:55:23 +01:00
Jonas Jenwald
a7bcc81eb1 Add a dummy beginMarkedContentProps operator when optional content parsing fails (issue 17679) 2024-02-17 13:45:16 +01:00
Jonas Jenwald
363dce6744 Use a limit, in more places, when splitting strings
This should be a *tiny* bit more efficient, since it avoids parsing substrings that we don't care about.

*Please note:* I cannot find an ESLint rule to enforce this automatically.
2024-02-02 13:10:52 +01:00
Calixte Denizet
7f2428a77e Reduce memory use and improve perfs when computing the bounding box of a bezier curve (bug 1875547)
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
2024-01-24 23:41:14 +01:00
Jonas Jenwald
fa583427ef Always export the "raw" /ToUnicode-data from PartialEvaluator.preEvaluateFont (PR 13354 follow-up)
This, ever so slightly, simplifies the implementation in the `PartialEvaluator.extractDataStructures`-method.
2024-01-22 13:06:32 +01:00
Jonas Jenwald
f21a30dfb4 Convert the PartialEvaluator.readToUnicode method to be async 2024-01-22 12:47:06 +01:00
Jonas Jenwald
f5c01188dc Convert the PartialEvaluator.extractDataStructures method to be async 2024-01-22 12:47:06 +01:00
Jonas Jenwald
cf0797dfbd Use await consistently in the PartialEvaluator.setGState method 2024-01-22 12:47:06 +01:00
Jonas Jenwald
1cc83c4fdc Use await consistently in the PartialEvaluator.buildFormXObject method 2024-01-22 12:47:06 +01:00
Tim van der Meij
49b2d9b5af
Merge pull request #17556 from Snuffleupagus/issue-17554
Ensure that `EvaluatorPreprocessor.opMap` has a null-prototype (issue 17554)
2024-01-21 20:58:09 +01:00
Jonas Jenwald
d7e41d4cb6 Ensure that EvaluatorPreprocessor.opMap has a null-prototype (issue 17554)
This accidentally regressed in PR 16956, sorry about that!
2024-01-21 19:59:13 +01:00
Jonas Jenwald
3c2c0ecd88 Use the ESLint arrow-body-style rule in more spots in src/core/evaluator.js 2024-01-21 17:42:33 +01:00
Jonas Jenwald
d1bef8cb86 Use await consistently in the PartialEvaluator.translateFont method 2024-01-21 17:36:50 +01:00
Jonas Jenwald
fc62eec901 Convert the handleSetFont methods, in src/core/evaluator.js, to be async 2024-01-21 17:32:05 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Calixte Denizet
405f573d70 Take into account empty lines when extracting text content from the appearance
Fixes #17492.
2024-01-14 20:23:29 +01:00
Calixte Denizet
7839e7b495 Preserve the whitespaces when getting text from FreeText annotations (bug 1871353)
When the text of an annotation is extracted in using getTextContent, consecutive white spaces
are just replaced by one space and. So this patch add an option to make sure that white
spaces are preserved when appearance is parsed.
For the case where there's no appearance, we can have a fast path to get the correct string
from the Content entry.
When an existing FreeText is edited, space (0x20) are replaced by non-breakable (0xa0) ones
to make to see all of them on screen.
2024-01-05 10:20:32 +01:00
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
e547b198a3 Compute the length of the final image-bitmap/data on the worker-thread
Currently this is done in the API, but moving it into the worker-thread will simplify upcoming changes.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
709d89420e Re-factor how the GenericL10n class fetches localization-data
- Re-factor the existing `fetchData` helper function such that it can fetch more types of data, and it now supports "arraybuffer", "json", and "text".
   This only needed minor adjustments in the `DOMCMapReaderFactory` and `DOMStandardFontDataFactory` classes.[1]

 - Expose the `fetchData` helper function in the API, such that the viewer is able to access it.

 - Use the `fetchData` helper function in the `GenericL10n` class, since this should allow fetching of localization-data even if the default viewer is run in an environment without support for the Fetch API.

---
[1] While testing this I also noticed a minor inconsistency when handling standard font-data on the worker-thread.
2023-11-14 13:45:14 +01:00
Calixte Denizet
7851c0da8d [Debugger] Add some info about substitution font
When pdfBug is true, the substitution font is used in the text layer in order
to be able to know what is the font really used thanks to the devtools.
And to be sure that fonts are loaded, the font cache isn't cleaned up when
the debugger is active.
2023-10-09 12:06:33 +02:00