- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
- Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
- PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font.
Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see b47c7eca83/src/core/fonts.js (L1186-L1190)
We have a number of base-classes that are only intended to be extended, but never to be used directly. To help enforce this during development these base-class constructors will check for direct usage, however that code is obviously not needed in the actual builds.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `~2.7` kilo-bytes, which isn't a lot but still cannot hurt.
Added annotations could have some quadpoints (highlight, ink).
The isNumberArray check was returning false and consequently the annotation wasn't
printable.
The tests didn't catch this issue because the quadpoints were passed as Array.
So driver.js has been updated in order to pass them as Float32Array in order
to be in a situation similar to the real life one.
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
Add unit test to check compatability with such cmaps
In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
Similar to the `mustBeViewed` method, we can check the relevant parameters within the `mustBeViewedWhenEditing` method itself since that (in my opinion) slightly helps readability of the code in the `src/core/document.js` file.
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
- if the annotation has to be composed with the page then the canvas must be correctly
composed with its parent. That means we should move the canvas under canvasWrapper
and we should extract composing info from the drawing instructions...
Currently it's the case with highlight annotations.
- we use some extra memory for those canvas even if the user will never edit them, which
the case for example when opening a pdf in Fenix.
So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
There's no specification for that (even if it's possible to have an idea from
the xfa specs) so we just want to hide them in order to avoid to display something
wrong.
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
- These changes will allow a simpler way of implementing PR 17770.
- The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).
- This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
- We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).
- Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
*Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).