pdf.js

Author	SHA1	Message	Date
Jonas Jenwald	fe5967c84e	Merge pull request #19029 from nicolo-ribaudo/eslint-flat-config Migrate to ESLint flat config	2024-11-12 16:22:54 +01:00
Nicolò Ribaudo	9e6ff979db	Migrate to ESLint flat config Flat config is the new config system used by ESLint 9. To make the migration easier, they also added flat config support to ESLint 8. This commit migrates the various ESLint configs in the repository to use the new system, without upgrading to ESLint 9 yet.	2024-11-12 16:15:17 +01:00
Calixte Denizet	4bf7787084	Simplify saving added/modified annotations. Having this map to collect the different changes will allow to know if some objects have already been modified.	2024-11-12 10:59:38 +01:00
Jonas Jenwald	1a56b35af7	Merge pull request #19003 from Snuffleupagus/api-unittest-image-helpers Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js`	2024-11-06 09:11:28 +01:00
Jonas Jenwald	e92a929a58	Try to improve handling of missing trailer dictionaries in `XRef.indexObjects` (issue 18986) The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary. Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document. In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that might be a valid /Root dictionary. There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than immediately throwing `InvalidPDFException` as we previously did here. Please note: I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.	2024-11-05 18:19:26 +01:00
Jonas Jenwald	f2fb3b95ce	Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js` This avoids repeating the same code multiple times, and as part of the changes we'll also utilize existing PDF.js helpers more.	2024-11-04 14:09:34 +01:00
Jonas Jenwald	cbf0ca71bf	[api-minor] Only support the Fetch API for "remote" PDF documents in Node.js environments The Fetch API has been supported since Node.js version 18, see https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility	2024-11-03 16:18:10 +01:00
Jonas Jenwald	c7407230c1	[api-minor] Load Node.js packages/polyfills with `process.getBuiltinModule` This allows synchronous loading of Node.js modules and (indirectly) packages, thus simplifying the code a fair bit.	2024-11-03 16:13:58 +01:00
Jonas Jenwald	f78a8f3c54	Use the `toBase64Util` helper function in the unit-tests	2024-11-03 11:25:19 +01:00
Tim van der Meij	5f77b907eb	Merge pull request #18997 from Snuffleupagus/Node-enable-Blob-unittest Enable the 'gets PDF filename from query string appended to "blob:" URL' unit-test in Node.js	2024-11-03 11:11:36 +01:00
Jonas Jenwald	faf9e32ecb	Enable the "should have an alt attribute from toolTip" unit-test in Node.js Despite the pending-message mentioning "Image", this appears to be another case where the code actually depends on [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#browser_compatibility); note `cf3ca8b5bc/src/core/xfa/template.js (L3453)`	2024-11-03 00:15:44 +01:00
Jonas Jenwald	15fbee158c	Enable the 'gets PDF filename from query string appended to "blob:" URL' unit-test in Node.js The necessary functionality has been supported in Node.js for quite some time now, please see: - https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#browser_compatibility - https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL_static#browser_compatibility	2024-11-02 23:53:03 +01:00
Jonas Jenwald	4e12906061	Move the various DOM-factories into their own files - Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code. - By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used. - This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build. Note: This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.	2024-11-01 13:31:28 +01:00
calixteman	2bee3af0ee	Merge pull request #18967 from calixteman/bug1910431 Make util.scand a bit more flexible with dates which don't match the given format (bug 1910431)	2024-10-28 09:36:34 +01:00
Calixte Denizet	230d7f9229	Make util.scand a bit more flexible with dates which don't match the given format (bug 1910431)	2024-10-27 19:19:06 +01:00
Jonas Jenwald	b048420d21	[api-minor] Remove the `CMapCompressionType` enumeration After the binary CMap format had been added there were also some ideas about maybe providing other formats, see [here](https://github.com/mozilla/pdf.js/pull/8064#issuecomment-279730182), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.	2024-10-24 11:08:16 +02:00
Jonas Jenwald	6c3336f04f	Re-factor the `DefaultFileReaderFactory` unit-test helper We can re-use the existing helpers from `src/display/` rather than re-implementing the functionality here.	2024-10-21 13:20:31 +02:00
Calixte Denizet	3103deaa44	Fix missing annotation parent in using the one from the Fields entry Fixes #15096.	2024-10-04 20:00:19 +02:00
Calixte Denizet	c9050be863	[Editor] Add the possibility to save an updated stamp annotation (bug 1921291)	2024-10-02 11:45:16 +02:00
Calixte Denizet	0382dd0e25	[Editor] When deleting an annotation with popup, then delete the popup too	2024-09-26 17:52:25 +02:00
Calixte Denizet	fc1564f476	Correctly compute the font size when printing a text field with an auto font size (bug 1917734)	2024-09-25 14:05:54 +02:00
Jonas Jenwald	bb302dd993	[api-minor] Pass `CanvasFactory`/`FilterFactory`, rather than instances, to `getDocument` This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances. As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.	2024-09-23 11:26:30 +02:00
Jonas Jenwald	0a621ba73a	Use `fs/promises` in the Node.js unit-tests (PR 17714 follow-up) This is available in all Node.js versions that we currently support, and using it allows us to remove callback-functions; please see https://nodejs.org/docs/latest-v18.x/api/fs.html#promises-api	2024-09-22 12:57:23 +02:00
Calixte Denizet	46fac8b2c1	[Editor] Take into account the device pixel ratio when drawing an added image Fixes #18626.	2024-09-16 14:48:26 +02:00
Jonas Jenwald	c52e8485a7	Merge pull request #18731 from Snuffleupagus/TextLayer-ensureCtxFont Ensure that textLayers can be rendered in parallel, without interfering with each other	2024-09-11 16:29:53 +02:00
Jonas Jenwald	5b3d3c7dd9	Ensure that textLayers can be rendered in parallel, without interfering with each other Note that the textContent is returned in "chunks" from the API, through the use of `ReadableStream`s, and on the main-thread we're (normally) using just one temporary canvas in order to measure the size of the textLayer `span`s; see the [`#layout`](`5b4c2fe1a8/src/display/text_layer.js (L396-L428)`) method. Order of events, for parallel textLayer rendering: 1. Call [`render`](`5b4c2fe1a8/src/display/text_layer.js (L155-L177)`) of the textLayer for page A. 2. Immediately call `render` of the textLayer for page B. 3. The first text-chunk for pageA arrives, and it's parsed/layout which means updating the cached [fontSize/fontFamily](`5b4c2fe1a8/src/display/text_layer.js (L409-L413)`) for the textLayer of page A. 4. The first text-chunk for pageB arrives, which means updating the cached fontSize/fontFamily for the textLayer of page B since this data is unique to each `TextLayer`-instance. 5. The second text-chunk for pageA arrives, and we don't update the canvas-font since the cached fontSize/fontFamily still apply from step 3 above. Where this potentially breaks down is between the last steps, since we're using just one temporary canvas for all measurements but have individual fontSize/fontFamily caches for each textLayer. Hence it's possible that the canvas-font has actually changed, despite the cached values suggesting otherwise, and to address this we instead cache the fontSize/fontFamily globally through a new (static) helper method. Note: Includes a basic unit-test, using dummy text-content, which fails on `master` and passes with this patch. Finally, pun intended, ensure that temporary textLayer-data is cleared before the `render`-promise resolves to avoid any intermittent problems in the unit-tests.	2024-09-11 15:28:51 +02:00
Calixte Denizet	06f9d8002d	Consider foo-\nBar as a compound word Fixes #18693.	2024-09-11 15:01:54 +02:00
Calixte Denizet	bae32b4fd2	[JS] Let AFSpecial_KeystrokeEx match a format without 'decoration' (bug 1916714) It'll let the user enter 1234567 instead of 123-4567 for example. It works like this in other pdf viewers.	2024-09-09 20:29:14 +02:00
Tim van der Meij	c159cb1335	Merge pull request #18682 from Snuffleupagus/responseHeaders Use response-`Headers` in the different `IPDFStream` implementations	2024-09-08 11:49:50 +02:00
Calixte Denizet	68332ec236	Avoid to have a white line around the canvas The canvas must have the same dims as the page in order to avoid to see the page background.	2024-09-07 20:12:29 +02:00
Jonas Jenwald	840cc5e0d4	Use response-`Headers` in the different `IPDFStream` implementations Given that the `Headers` functionality is now available in all browsers/environments that we support, [see MDN](https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility), we can utilize "proper" `Headers` in the helper functions that are used to parse the response.	2024-09-07 12:34:53 +02:00
Calixte Denizet	ddba096191	Make tagged images visible for screen readers (bug 1708040) The idea is to insert a span in the text layer with an aria-role set to img and use the bounding box provided by the attribute field in the tag dict in order to have non-null dimensions for the image to make it "visible".	2024-09-05 17:59:42 +02:00
Jonas Jenwald	d3a94f17cb	Use `Headers` consistently in the different `IPDFStream` implementations The `Headers` functionality is now available in all browsers/environments that we support, which allows us to consolidate and simplify how the `httpHeaders` API-option is handled; see https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility Also, simplifies the old `NetworkManager`-constructor a little bit.	2024-09-02 11:56:24 +02:00
Nicolò Ribaudo	229ad1bb2c	Use the URL global instead of the deprecated url.parse The Node.js url.parse API (https://nodejs.org/api/url.html#urlparseurlstring-parsequerystring-slashesdenotehost) is deprecated because it's prone to security issues (to the point that Node.js doesn't even publish CVEs for it anymore). The official reccomendation is to instead use the global URL constructor, available both in Node.js and in browsers. Node.js filesystem APIs accept URL objects as parameter, so this also avoids a few URL->filepath conversions.	2024-08-27 18:19:25 +02:00
Jonas Jenwald	8728f7f134	Support an odd number of digits in hexadecimal strings (issue 18645) See https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1840792	2024-08-23 16:31:43 +02:00
Nicolò Ribaudo	f051597e23	Allow specifying custom match logic in PDFFindController This patch allows embedders of PDF.js to provide custom match logic for seaching in PDFs. This is done by subclassing the PDFFindController class and overriding the `match` method. `match` is called once per PDF page, receives as parameters the search query, the page contents, and the page index, and returns an array of { index, length } objects representing the search results.	2024-08-13 10:45:57 +02:00
Jonas Jenwald	c4fdb28573	Remove `PDFWorkerUtil` and move its contents into `PDFWorker` instead This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.	2024-07-29 11:22:43 +02:00
Jonas Jenwald	c4cd405a8f	Ignore non-dictionary nodes when parsing StructTree data (issue 18503)	2024-07-28 12:08:44 +02:00
Jonas Jenwald	d116ab1a22	Shorten the errors mentioning API parameters in `BaseCMapReaderFactory` and `BaseStandardFontDataFactory` The current error-messages also mention internal parameters, which an end-user obviously don't have to care about. So, let's try to avoid confusion here by only including the API parameters.	2024-07-27 16:54:54 +02:00
Calixte Denizet	c3065629ca	[Editor] Correctly save a non-ascii alt text	2024-07-24 19:13:45 +02:00
Jonas Jenwald	d24a61c648	Allow /XYZ destinations without zoom parameter (issue 18408) According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-18 13:29:32 +02:00
Jonas Jenwald	8a979c2d0e	[api-minor] Remove `Outliner` from the official API As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used. Hence this seems like something that should be kept internal and thus only exposed in TESTING-builds.	2024-07-16 13:08:26 +02:00
calixteman	9b1b5ff7e7	Merge pull request #18419 from calixteman/reuse_old_dict_when_updating [Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext	2024-07-11 11:24:15 +02:00
Calixte Denizet	6711123f68	[Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext	2024-07-11 10:44:21 +02:00
Jonas Jenwald	403d023617	Allow e.g. /FitH destinations without additional parameter (bug 1907000) According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-11 10:36:44 +02:00
Jonas Jenwald	5ee61690f3	Merge pull request #18390 from alexcat3/fix-issue-18099 Handle toUnicode cMaps that omit leading zeros in hex encoded UTF-16 (issue 18099)	2024-07-06 18:57:07 +02:00
alexcat3	1c364422a6	Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099) Add unit test to check compatability with such cmaps In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is <start_char_code> <end_char_code> <start_unicode_codepoint> As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read <00> <7F> <0000> Instead it omitted two leading zeros from the UTF-16 like this <00> <7F> <00> This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this	2024-07-06 11:29:21 -04:00
Tim van der Meij	2a44203d96	Fix the "caches image resources at the document/page level as expected (issue 11878)" unit test This unit test fails occasionally (albeit much less than before thanks to PR #17663), so we change the parsing time check's divisor to prevent it from happening again. If the last page's rendering time is less than or equal to 50% of the first page's rendering time that should be enough proof that no worker thread re-parsing occurred while also providing a wide enough range to avoid intermittents. Note that the assertion is now equal to the one we already have in the "caches image resources at the document/page level, with main-thread copying of complex images (issue 11518)" unit test which seems to work reliably so far.	2024-07-06 16:30:07 +02:00
Jonas Jenwald	38528d1116	Remove the `renderForms` parameter from the Annotation `getOperatorList` methods The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.	2024-07-05 12:25:18 +02:00
Jonas Jenwald	f3d177e3e4	[api-minor] Remove the deprecated `renderTextLayer` and `updateTextLayer` functions (PR 18104 follow-up)	2024-06-30 15:16:00 +02:00

1 2 3 4 5 ...

1268 Commits