84 Commits

Author SHA1 Message Date
Jonas Jenwald
d428db63c3 Improve the "FontFallback" handling on the worker-thread
Remove the `Catalog.prototype.fontFallback` method, and move its code into `PDFDocument.prototype.fontFallback` instead, to reduce the indirection a little bit.
Pass the `evaluatorOptions` directly to the `TranslatedFont.prototype.fallback` method, since nothing else in the `TranslatedFont`-class needs it now.
2025-02-24 09:34:58 +01:00
Jonas Jenwald
33cba30bdb Search for destinations in both /Names and /Dests dictionaries (issue 19474)
Currently we only use either one of them, preferring the NameTree when it's available.
2025-02-14 15:49:05 +01:00
Jonas Jenwald
74c1795c9f Use Dict iteration more (PR 19051 follow-up)
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
2025-01-02 15:09:19 +01:00
Jonas Jenwald
3a797317c8 Shorten some if-statements in the src/core/ folder
Thanks to modern JavaScript there's some existing code that can be shortened slightly.
2024-12-26 18:37:25 +01:00
Jonas Jenwald
2c0cc48d1b Replace the forEach method in Dict with "proper" iteration support 2024-11-17 12:45:32 +01:00
Jonas Jenwald
805f962181 Reduce duplication when collecting optional content groups
After PR 18825 we can easily "compute" the optional content groups, and can thus avoid tracking them manually.
2024-10-15 13:20:30 +02:00
Alexander Grahn
441efe456e Optional Content (OC) radiobutton (RB) groups implemented. Resolves #18823.
The code parses the /RBGroups entry in the OC configuration dict and adds the property `rbGroups' to instances of the OptionalContentGroup class. rbGroups takes an array of Sets, where each Set instance represents an RB group the OptionalContentGroup instance is a member of. Such a Set instance contains all OCG ids within the corresponding RB group. RB groups an OCG is associated with are processed when its visibility is set to true, as required by the PDF spec.
2024-10-15 11:34:45 +02:00
Richard Smith (smir)
a67b9aec6c Send fetch requests for all page dict lookups in parallel
- When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent
 - Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad
- We can then await on the cached Promise without making the requests pipeline
- This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately
 - PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes
- I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays
2024-08-21 11:08:14 +01:00
Jonas Jenwald
d24a61c648 Allow /XYZ destinations without zoom parameter (issue 18408)
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-18 13:29:32 +02:00
Jonas Jenwald
403d023617 Allow e.g. /FitH destinations without additional parameter (bug 1907000)
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-11 10:36:44 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Jonas Jenwald
2b69fb76ac [api-minor] Improve the FileSpec implementation
- Check that the `filename` is actually a string, before parsing it further.
 - Use proper "shadowing" in the `filename` getter.
 - Add a bit more validation of the data in `pickPlatformItem`.
 - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
2024-05-01 18:02:05 +02:00
Jonas Jenwald
7206d0a237 Validate explicit destinations on the worker-thread to prevent DataCloneError (issue 17981)
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
2024-04-22 22:51:35 +02:00
Calixte Denizet
136c1faa7f Display outlines even if one has no title
Fixes #17856.
2024-03-29 21:30:24 +01:00
Jonas Jenwald
0d039937f9 Add better support for /Launch actions with /FileSpec dictionaries (issue 17846) 2024-03-26 20:15:48 +01:00
Jonas Jenwald
3c78ff5fb0 [api-minor] Implement basic support for OptionalContent Usage dicts (issue 5764, bug 1826783)
The following are some highlights of this patch:
 - In the Worker we only extract a *subset* of the potential contents of the `Usage` dictionary, to avoid having to implement/test a bunch of code that'd be completely unused in the viewer.

 - In order to still allow the user to *manually* override the default visible layers in the viewer, the viewable/printable state is purposely *not* enforced during initialization in the `OptionalContentConfig` constructor.

 - Printing will now always use the *default* visible layers, rather than using the same state as the viewer (as was the case previously).
   This ensures that the printing-output will correctly take the `Usage` dictionary into account, and in practice toggling of visible layers rarely seem to be necessary except in the viewer itself (if at all).[1]

---
[1] In the unlikely case that it'd ever be deemed necessary to support fine-grained control of optional content visibility during printing, some new (additional) UI would likely be needed to support that case.
2024-03-12 13:18:15 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Calixte Denizet
0c38c6e103 Improve performance of optional content parsing 2023-10-25 17:50:53 +02:00
Jonas Jenwald
bf9c33e60f Add support for "GoToE" actions with destinations (issue 17056)
This shouldn't be very common in practice, since "GoToE" actions themselves seem quite uncommon; see PR 15537.
2023-10-04 11:14:23 +02:00
Calixte Denizet
a8573d4e1b [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
2023-09-16 18:34:58 +02:00
Jonas Jenwald
b5b061cdb6 Slightly re-factor the parameter handling in Catalog.parseDestDictionary
While it makes sense to check that the `destDict` parameter is indeed a Dictionary, since that data comes from the PDF document itself, the `resultObj` parameter is an internal PDF.js implementation detail that should always be correct (or tests will fail).
2023-09-08 13:27:31 +02:00
Jonas Jenwald
df9cce39c0 Slightly reduce asynchronicity when parsing Annotations
Over time the amount of "document level" data potentially needed during parsing of Annotations have increased a fair bit, which means that we currently need to ensure that a bunch of data is available for each individual Annotation.
Given that this data is "constant" for a PDF document we can instead create (and cache) it lazily, only when needed, *before* starting to parse the Annotations on a page. This way the parsing of individual Annotations should become slightly less asynchronous, which really cannot hurt.

An additional benefit of these changes is that we can reduce the number of parameters that need to be explicitly passed around in the annotation-code, which helps overall readability in my opinion.

One potential drawback of these changes is that the `AnnotationFactory.create` method no longer handles "everything" on its own, however given how few call-sites there are I don't think that's too much of a problem.
2023-09-08 13:27:27 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Jonas Jenwald
1b4a7c5965 Introduce more optional chaining in the src/core/ folder
After PR 12563 we're now free to use optional chaining in the worker-thread as well. (This patch also fixes one previously "missed" case in the `web/` folder.)

For the MOZCENTRAL build-target this patch reduces the total bundle-size by `1.6` kilobytes.
2023-05-15 12:38:28 +02:00
Calixte Denizet
cfb908c999 Add a cache to avoid to load several times a local font
On my computer, it takes few tenths of a second to load a local font.
Since a font can be used several times in a document, the cache will
improve performances.
2023-05-10 20:01:21 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
5f64621d46 Use String.prototype.replaceAll() where appropriate
This fairly new method allows replacing *multiple* occurrences within a string without having to use regular expressions.

Please refer to:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll#browser_compatibility
2023-03-22 15:31:10 +01:00
Jonas Jenwald
23930a249e [api-minor] Let Catalog.getAllPageDicts return an *empty* dictionary when loading the first /Page fails (issue 15590)
In order to support opening certain corrupt PDF documents, particularly hand-edited ones, this patch adds support for letting the `Catalog.getAllPageDicts` method fallback to returning an *empty* dictionary to replace (only) the first /Page of the document.
Given that the viewer cannot initialize/load without access to the first page, this will thus allow e.g. document-level scripting to run as expected. Note that by effectively replacing a corrupt or missing first /Page in this way[1], we'll now render nothing but a *blank* page for certain cases of broken/corrupt PDF documents which may look weird.

*Please note:* This functionality is controlled via the existing `stopAtErrors` option, that can be passed to `getDocument`, since it's easy to imagine use-cases where this sort of fallback behaviour isn't desirable.

---
[1] Currently we still require that a /Pages-dictionary is found though, however it *may* be possible to relax even that assumption if that becomes absolutely necessary in future corrupt documents.
2022-11-03 12:51:48 +01:00
Jonas Jenwald
d470010293 Re-factor the PDF version parsing in the worker-thread
Part of this is very old code, and back when support for parsing the catalog-version was added things became less clear (in my opinion).
Hence this patch tries to improve things, by e.g. validating the header- and catalog-version separately.
2022-10-15 12:06:39 +02:00
Jonas Jenwald
ce66fefbff [api-minor] Add partial support for the "GoToE" action (issue 8844)
*Please note:* The referenced issue is the only mention that I can find, in either GitHub or Bugzilla, of "GoToE" actions.
Hence why I've purposely settled for a very simple, and partial, "GoToE" implementation to avoid complicating things initially.[1] In particular, this patch only supports "GoToE" actions that references the /EmbeddedFiles-dict in the PDF document.

See https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2048909

---
[1] Usually I always prefer having *real-world* test-cases to work with, whenever I'm implementing new features.
2022-10-06 10:33:07 +02:00
Jonas Jenwald
60f6272ed9 Use more for...of loops in the code-base
Most, if not all, of this code is old enough to predate the general availability of `for...of` iteration.
2022-10-03 13:08:38 +02:00
Jonas Jenwald
cc4baa2fe9 [api-minor] Add basic support for the SetOCGState action (issue 15372)
Note that this patch implements the `SetOCGState`-handling in `PDFLinkService`, rather than as a new method in `OptionalContentConfig`[1], since this action is nothing but a series of `setVisibility`-calls and that it seems quite uncommon in real-world PDF documents.

The new functionality also required some tweaks in the `PDFLayerViewer`, to ensure that the `layersView` in the sidebar is updated correctly when the optional-content visibility changes from "outside" of `PDFLayerViewer`.

---
[1] We can obviously move this code into `OptionalContentConfig` instead, if deemed necessary, but for an initial implementation I figured that doing it this way might be acceptable.
2022-09-01 17:34:24 +02:00
Jonas Jenwald
216b86a082 [api-minor] Support Named-actions in the outline (issue 15367)
Apparently this is implemented in e.g. Adobe Reader, and the specification does support it, however it cannot be commonly used in real-world PDF documents since it took over ten years for this feature to be requested.
2022-08-30 18:47:45 +02:00
Calixte Denizet
5f0c95e70e [JS] Embedded JS scripts can have some null chars 2022-07-15 16:05:25 +02:00
Jonas Jenwald
9ac4536693 Enable the unicorn/prefer-at ESLint plugin rule (PR 15008 follow-up)
Please find additional information here:
 - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/at
 - https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/docs/rules/prefer-at.md
2022-06-09 21:21:19 +02:00
Jonas Jenwald
df5a4fd0a7 Support encoded dest-strings in /GoTo destination dictionaries (issue 14864)
Interestingly enough this appears to be the very first case of *encoded* dest-strings, in /GoTo destination dictionaries, that we've actually come across. What's really fascinating is that it's less than a week after issue 14847, given that these issues are *somewhat* similar.
2022-05-02 10:14:32 +02:00
Jonas Jenwald
71370d012b Support destinations in NameTrees with encoded keys (issue 14847)
Initially I considered updating the `NameOrNumberTree`-implementation to handle encoded keys, however that quickly became somewhat messy (especially in the `NameOrNumberTree.get`-method) since only NameTrees using string-keys.
Hence the easiest solution, as far as I'm concerned, was thus to just update the `Catalog.destinations`-getter instead. Please note that in the referenced PDF document the `Catalog.destination`-method will thus fallback to fetch all destinations, which should be fine since this is the very first case of encoded keys that we've seen.

Also changes the `NameOrNumberTree.getAll`-method to prevent a possible run-time error, although we've so far not seen such a case, for any non-Array Kids-entries found in a NameTree/NumberTree.

Finally, to improve overall consistency and to hopefully prevent future bugs, the patch also updates a couple of other `NameTree` call-sites to correctly handle encoded keys. (Note that the `Catalog.attachments`-getter was already doing this.)
2022-04-27 11:19:55 +02:00
Jonas Jenwald
5bc7339c1b Add support for the /Catalog Base-URI when resolving URLs (issue 14802)
As far as I can tell, this is actually the very first time that we've seen a PDF document with a Base-URI specified in the /Catalog; please refer to the specification:
https://web.archive.org/web/20220309040754if_/https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf#G11.2097122

To simplify the overall implementation, this new parameter is accessed via the existing `BasePdfManager.docBaseUrl`-getter and will thus override any user-specified `docBaseUrl` API-parameter.
2022-04-19 17:14:52 +02:00
Jonas Jenwald
a919959d83 Slightly simplify the Catalog._readMarkInfo method
We don't need to first check if the Dictionary contains the key, since trying to get a non-existent key simply returns `undefined` and we're already ensuring that the value is a boolean.
Furthermore, we shouldn't need to worry about the `Object.prototype` containing enumerable properties since the checks (in `src/core/worker.js`) done for `Array.prototype` *indirectly* also cover `Object`s. (Keep in mind that an `Array` is just a special kind of `Object` in JavaScript.)
2022-04-05 16:37:51 +02:00
Jonas Jenwald
addb4cb12b Use String.prototype.repeat() in a couple of spots
Rather than using a temporary Array to manually create repeated strings, we can use `String.prototype.repeat()` instead.
The reason that we didn't use this from the start is most likely because some browsers, notably IE, didn't support this; note https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat#browser_compatibility
2022-03-30 15:42:40 +02:00
Jonas Jenwald
c0736647f9 Add general iteration support in the RefSet and RefSetCache classes
This patch removes the existing `forEach` methods, in favor of making the classes properly iterable instead. Given that the classes are using a `Set` respectively a `Map` internally, implementing this is very easy/efficient and allows us to simplify some existing code.
2022-03-18 14:27:34 +01:00
Jonas Jenwald
939e6f0c4c Fix a couple of small typos in JSDoc typedef comments
While this doesn't affect the official API documentation, these cases should nonetheless be fixed.
2022-03-04 12:11:52 +01:00
Jonas Jenwald
99cd24ce3e Remove the isString helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isString`-calls.
2022-02-26 16:33:41 +01:00
Jonas Jenwald
3704283f5b Remove the isBool helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls.
2022-02-23 13:31:03 +01:00
Jonas Jenwald
82f1ee1755 Re-factor the Catalog.viewerPreferences method
This removes the `ViewerPreferencesValidators` structure, and thus (slightly) simplifies the code overall. With these changes we only have to iterate through, and validate, the actually available Dictionary entries.
2022-02-23 13:25:56 +01:00
Jonas Jenwald
05edd91bdb Remove the isNum helper function
The call-sites are replaced by direct `typeof`-checks instead, which removes unnecessary function calls. Note that in the `src/`-folder we already had more `typeof`-cases than `isNum`-calls.

These changes were *mostly* done using regular expression search-and-replace, with two exceptions:
 - In `Font._charToGlyph` we no longer unconditionally update the `width`, since that seems completely unnecessary.
 - In `PDFDocument.documentInfo`, when parsing custom entries, we now do the `typeof`-check once.
2022-02-22 11:55:34 +01:00
Jonas Jenwald
b282814e38 Prefer instanceof Name rather than calling isName() with one argument
Unless you actually need to check that something is both a `Name` and also of the *correct* type, using `instanceof Name` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isName` helper function for where it makes sense.
2022-02-21 12:45:00 +01:00
Jonas Jenwald
4df82ad31e Prefer instanceof Dict rather than calling isDict() with one argument
Unless you actually need to check that something is both a `Dict` and also of the *correct* type, using `instanceof Dict` directly should be a tiny bit more efficient since it avoids one function call and an unnecessary `undefined` check.

This patch uses ESLint to enforce this, since we obviously still want to keep the `isDict` helper function for where it makes sense.
2022-02-21 12:44:56 +01:00
Jonas Jenwald
2cb2f633ac Remove the isRef helper function
This helper function is not really needed, since it's just a wrapper around a simple `instanceof` check, and it only adds unnecessary indirection in the code.
2022-02-19 15:33:42 +01:00