pdf.js

Author	SHA1	Message	Date
Pascal Maximilian Bremer	6d7157a875	Fix Typo:XFATemplate class Para Styling paddingight => paddingRight	2024-11-06 12:04:55 +01:00
Calixte Denizet	d59f9648a9	Simplify toRomanNumerals function	2024-11-05 22:35:35 +01:00
Jonas Jenwald	fdfcfbc351	Merge pull request #19005 from Snuffleupagus/core_utils-shorten Shorten a few helper functions in `src/core/core_utils.js`	2024-11-05 21:46:44 +01:00
Jonas Jenwald	e92a929a58	Try to improve handling of missing trailer dictionaries in `XRef.indexObjects` (issue 18986) The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary. Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document. In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that might be a valid /Root dictionary. There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than immediately throwing `InvalidPDFException` as we previously did here. Please note: I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.	2024-11-05 18:19:26 +01:00
Jonas Jenwald	2c90eee5a8	Shorten a few helper functions in `src/core/core_utils.js` In a few cases we can ever so slightly shorten the code without negatively impacting the readability.	2024-11-05 13:58:00 +01:00
Tim van der Meij	e930f3030c	Merge pull request #18992 from Snuffleupagus/getPdfManager-inline-flushChunks Inline the `flushChunks` helper function, used in `getPdfManager` on the worker-thread	2024-11-02 18:58:29 +01:00
Jonas Jenwald	e5485108ec	Merge pull request #18990 from Snuffleupagus/ensure-structTree-serializable Ensure that serializing of StructTree-data cannot fail during loading	2024-11-02 15:17:10 +01:00
Jonas Jenwald	2145a7b9ca	Use the `hexNumbers` structure in the `stringToUTF16HexString` helper We can re-use the `hexNumbers` structure here, since that allows us to directly lookup the hexadecimal values and shortens the code.	2024-11-02 15:00:32 +01:00
Jonas Jenwald	196f7d7df1	Inline the `flushChunks` helper function, used in `getPdfManager` on the worker-thread - This helper function has only a single call-site, and the function is fairly short. - It'll only be invoked if range requests are disabled, or if the entire PDF manages to load before the headers are resolved (which is very unlikely). Hence, by default, this helper function is not invoked. - By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code. Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).	2024-11-02 11:06:30 +01:00
Jonas Jenwald	b26dc19392	Ensure that serializing of StructTree-data cannot fail during loading I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would intermittently cause (some of) the AnnotationLayers to break with errors printed in the console (see below). In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used. Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations. ``` Uncaught (in promise) Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" } viewer.mjs:8801:55 \#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15 Uncaught (in promise) Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" } viewer.mjs:8801:55 ```	2024-11-01 17:43:59 +01:00
Jonas Jenwald	8f47d06d07	Add helper functions to allow using new `Uint8Array` methods This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64 Please note: These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.	2024-10-29 10:22:35 +01:00
Jonas Jenwald	bfc645bab1	Introduce some `Uint8Array.fromBase64` and `Uint8Array.prototype.toBase64` usage in the main code-base See https://github.com/tc39/proposal-arraybuffer-base64	2024-10-29 10:22:35 +01:00
Jonas Jenwald	f9fc477080	Improve the implementation of the `PDFDocument.fingerprints`-getter - Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already implicitly assuming a particular length. - Move the constants into the `fingerprints`-getter, since they're not used anywhere else. - Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64	2024-10-29 10:22:35 +01:00
Jonas Jenwald	48a18585f2	Allow `StreamsSequenceStream` to skip sub-streams that are not actual Streams (issue 18973) This extends PR 13796 to also handle the case where sub-streams contain invalid data, i.e. anything that isn't a Stream, however please note that in these cases there's no guarantee that we'll render the page "correctly". Note that Adobe Reader, i.e. the PDF reference implementation, cannot render the last page of the referenced PDF document.	2024-10-29 09:36:08 +01:00
Calixte Denizet	b649b6f8dd	Use a BMP decoder when resizing an image The image decoding won't block the main thread any more. For now, it isn't enabled for Chrome because issue6741.pdf leads to a crash.	2024-10-28 14:09:52 +01:00
Tim van der Meij	5418060bbc	Merge pull request #18951 from Snuffleupagus/CMap-isCompressed [api-minor] Remove the `CMapCompressionType` enumeration	2024-10-27 14:42:00 +01:00
Jonas Jenwald	8a2b95418a	Re-factor the `ImageResizer._goodSquareLength` definition Move the `ImageResizer._goodSquareLength` definition into the class itself, since the current position shouldn't be necessary, and also convert it into an actually private field.	2024-10-27 11:03:04 +01:00
Jonas Jenwald	b048420d21	[api-minor] Remove the `CMapCompressionType` enumeration After the binary CMap format had been added there were also some ideas about maybe providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.	2024-10-24 11:08:16 +02:00
Jonas Jenwald	50c291eb33	Unconditionally cache built-in CMaps on the worker-thread Given that we've not shipped, nor used, anything except binary CMaps for years let's just cache them unconditionally (since that's a tiny bit less code).	2024-10-24 10:15:09 +02:00
calixteman	1ad09779f1	Merge pull request #18910 from calixteman/image_decoder1 Use ImageDecoder in order to decode jpeg images (bug 1901223)	2024-10-23 13:54:07 +02:00
Calixte Denizet	b6c4f0b69e	Use ImageDecoder in order to decode jpeg images (bug 1901223)	2024-10-23 10:42:01 +02:00
Jonas Jenwald	236c8d862e	Re-factor how we handle missing, corrupt, or empty font-file entries This improves the fixes for e.g. issue 9462 and 18941 slightly and allows better fallback behaviour for non-standard fonts.	2024-10-22 17:07:12 +02:00
Jonas Jenwald	63b34114b1	Fallback to a standard font if a font-file entry doesn't contain a Stream (issue 18941) The PDF document is clearly corrupt, since it has /FontFile2 entries that are Dictionaries which obviously isn't correct. While there's obviously no guarantee that things will look perfect this way, actually rendering the text at all should be an improvement in general.	2024-10-22 11:51:28 +02:00
Jonas Jenwald	805f962181	Reduce duplication when collecting optional content groups After PR 18825 we can easily "compute" the optional content groups, and can thus avoid tracking them manually.	2024-10-15 13:20:30 +02:00
Jonas Jenwald	424f81c4db	Merge pull request #18825 from agrahn/rbgroups implementing optional content radiobutton groups	2024-10-15 13:11:19 +02:00
Alexander Grahn	441efe456e	Optional Content (OC) radiobutton (RB) groups implemented. Resolves #18823 . The code parses the /RBGroups entry in the OC configuration dict and adds the property `rbGroups' to instances of the OptionalContentGroup class. rbGroups takes an array of Sets, where each Set instance represents an RB group the OptionalContentGroup instance is a member of. Such a Set instance contains all OCG ids within the corresponding RB group. RB groups an OCG is associated with are processed when its visibility is set to true, as required by the PDF spec.	2024-10-15 11:34:45 +02:00
Calixte Denizet	8b7b39f5d6	Some jpx images can have a mask It fixes #18896.	2024-10-14 21:50:32 +02:00
calixteman	e1f9fa4ea5	Merge pull request #18895 from calixteman/issue18894 Fallback on gray colorspace when there are no colorspace and no name in the scn/SCN arguments	2024-10-13 17:56:52 +02:00
Calixte Denizet	e7ab8cd8c1	Fallback on gray colorspace when there are no colorspace and no name in the scn/SCN arguments It fixes #18894.	2024-10-13 16:02:07 +02:00
Calixte Denizet	4dea773c5b	Clamp the hival parameter of Indexed color space to the range [0; 255] Since this value is used to allocate an array, it makes sense to avoid to use too much memory. From the specs, this value must be in the range [0; 255] (see section 8.6.6.3). This patch removes the unused property 'highVal'.	2024-10-12 23:50:58 +02:00
Calixte Denizet	f2f56b6464	Avoid exceptions in the console with ill-formed flate streams It fixes #18876.	2024-10-10 12:07:30 +02:00
Jonas Jenwald	662bd022ce	Reduce duplication in the `PDFDocument.calculationOrderIds` getter	2024-10-08 12:24:09 +02:00
Jonas Jenwald	e3b5ed2e40	Improve the promise-caching in the `PDFDocument.fieldObjects` getter After PR 18845 we're accessing this getter more, hence it seems like a good idea to ensure that the initial `formInfo` access is covered as well. While unlikely to be a problem in practice, at least theoretically that data may not be available and the code in `fieldObjects` could thus currently be unintentionally invoked more than once.	2024-10-08 12:15:04 +02:00
Calixte Denizet	3194f3de8b	Keep the empty lines in the text fields It fixes #18036.	2024-10-05 16:19:41 +02:00
Calixte Denizet	3103deaa44	Fix missing annotation parent in using the one from the Fields entry Fixes #15096.	2024-10-04 20:00:19 +02:00
Calixte Denizet	8410252eb8	[Editor] Make stamp annotations editable (bug 1921291)	2024-10-03 21:54:08 +02:00
Calixte Denizet	c9050be863	[Editor] Add the possibility to save an updated stamp annotation (bug 1921291)	2024-10-02 11:45:16 +02:00
Calixte Denizet	2481a4bab9	Write the display flags in F entry when saving an annotation (issue 18072)	2024-10-01 17:26:39 +02:00
calixteman	c46ac3f73f	Merge pull request #18800 from calixteman/popup_deletion [Editor] When deleting an annotation with popup, then delete the popup too	2024-09-26 18:04:24 +02:00
Calixte Denizet	0382dd0e25	[Editor] When deleting an annotation with popup, then delete the popup too	2024-09-26 17:52:25 +02:00
Jonas Jenwald	7db9941e0f	Add basic support for non-embedded GillSansMT fonts (issue 18801) Given the following excerpt from the [Wikipedia article](https://en.wikipedia.org/wiki/Gill_Sans), mapping this to Helvetica should hopefully be fine: > It has been described as "the British Helvetica" because of its lasting popularity in British design.	2024-09-26 16:42:54 +02:00
Calixte Denizet	fc1564f476	Correctly compute the font size when printing a text field with an auto font size (bug 1917734)	2024-09-25 14:05:54 +02:00
Jonas Jenwald	67af371e58	Ignore non-existing /Shading resources during parsing (issue 18765)	2024-09-19 21:55:02 +02:00
Calixte Denizet	78dd35483c	Read a signed integer when using PUSHW in sanitizing a font (bug 1919513)	2024-09-18 22:09:17 +02:00
Calixte Denizet	ddba096191	Make tagged images visible for screen readers (bug 1708040) The idea is to insert a span in the text layer with an aria-role set to img and use the bounding box provided by the attribute field in the tag dict in order to have non-null dimensions for the image to make it "visible".	2024-09-05 17:59:42 +02:00
Calixte Denizet	a62ceedb69	[Editor] Make highlight annotations editable (bug 1883884) The goal of this patch is to be able to edit existing highlight annotations.	2024-09-03 15:27:55 +02:00
Jonas Jenwald	8728f7f134	Support an odd number of digits in hexadecimal strings (issue 18645) See https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1840792	2024-08-23 16:31:43 +02:00
Jonas Jenwald	908f453384	Merge pull request #18627 from richard-smith-preservica/rcs/send-page-dict-requests-in-parallel Send fetch requests for all page dict lookups in parallel	2024-08-21 13:58:03 +02:00
Richard Smith (smir)	a67b9aec6c	Send fetch requests for all page dict lookups in parallel - When adding page dict candidates to the lookup tree, also initiate fetching them from xref, so if they are not yet loaded at all, the XHR will be sent - Only at the top level - assume that if there is a /Pages tree, it is sensibly structured and the number of requests won't be too bad - We can then await on the cached Promise without making the requests pipeline - This has a significant performance improvement for load-on-demand (i.e. with auto-fetch turned off) when a PDF has a large number of pages in the top level /Pages collection, and those pages are spread through a file, so every candidate needs to be fetched separately - PDFs with many pages where each page is a big image and all the pages are at the top level are quite a common output for digitisation programmes - I would have liked to do something like "if it's the top level collection and page count = number of kids, then just fetch that page without traversing the tree" but unfortunately I agree with comments on #8088 that there is no good general solution to allow for /Pages nodes with empty /Kids arrays	2024-08-21 11:08:14 +01:00
Jonas Jenwald	6dd31183be	Use standard glyph mapping for non-embedded and non-composite Calibri fonts (issue 18208) Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font. Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see `b47c7eca83/src/core/fonts.js (L1186-L1190)`	2024-08-19 19:10:35 +02:00

1 2 3 4 5 ...

3090 Commits