Beanz/pdf.js - pdf.js - Gitea: Git with a cup of tea

Beanz/pdf.js

Author	SHA1	Message	Date
Calixte Denizet	37f4712f7e	Update the named page destinations when some pdf are combined (bug 1997379) and remove link annotations pointing on a deleted page.	2025-11-07 18:22:19 +01:00
Calixte Denizet	ad97c5b816	Update the page labels tree when a pdf is extracted (bug 1997379)	2025-11-07 15:59:57 +01:00
Calixte Denizet	bc87f4e8d6	Add the possibility to create a pdf from different ones (bug 1997379) For now it's just possible to create a single pdf in selecting some pages in different pdf sources. The merge is for now pretty basic (it's why it's still a WIP) none of these data are merged for now: - the struct trees - the page labels - the outlines - named destinations For there are 2 new ref tests where some new pdfs are created: one with some extracted pages and an other one (encrypted) which is just rewritten. The ref images are generated from the original pdfs in selecting the page we want and the new images are taken from the generated pdfs.	2025-11-07 14:57:48 +01:00
Calixte Denizet	19ff148163	Fix incremental saving with hybrid references This patch removes some previous fixes which are now likely fixed by #17636. Fixes #20302.	2025-10-04 18:31:55 +02:00
Calixte Denizet	4d15bfec0d	Only apply word spacing when there is a 0x20 in the text chunk Fixes #20319.	2025-10-03 22:18:02 +02:00
Calixte Denizet	af144be3ba	Don't iterate over all empty slots in the xref entries (bug 1980958)	2025-08-25 14:02:08 +02:00
Calixte Denizet	ebc3411727	Use the cached annotations when collecting them by types	2025-08-21 18:04:00 +02:00
Calixte Denizet	9e5ee1e5a7	[Editor] Add the ability to get all the editable annotations in a pdf document We want to be able to show all the comments in a pdf even if the pages where they are haven't been rendered. And it'll help to fix the issue #18915.	2025-08-18 21:31:11 +02:00
Calixte Denizet	57ce4f8f43	Use a HTML date/time input when a field requires a date or a time. The user will be able to enter a date in the format corresponding to their locale and it'll be formatted in using the format provided by the pdf.	2025-07-24 22:01:45 +02:00
calixteman	1b427a3af5	Merge pull request #20016 from ryzokuken/move-getcontext [api-minor] Move getContext call to InternalRenderTask	2025-07-08 22:20:19 +02:00
Ujjwal Sharma	b1b728d47f	[api-minor] Move getContext call to InternalRenderTask This is a precursor to moving the call into a worker thread to let us use `OffscreenCanvas`. The current position wouldn't work since we make transformations to the canvas object after the getContext call, which isn't allowed for OffscreenCanvas. Also it isn't allowed to clone or `transferControlToOffscreen` the canvas after the `getContext` call.	2025-07-04 00:53:51 +02:00
Calixte Denizet	3bdc5d54fe	Get the text under highlight/squiggly/underline/strikethrough annotations (bug 1885505) and add an invisible element containing the text in the annotation layer to make it readable by a screen reader.	2025-06-22 21:47:29 +02:00
Calixte Denizet	5789afd3f8	Create the css color to use with the canvas in the worker It slightly reduces the time spent to draw and the memory used.	2025-05-19 14:52:24 +02:00
Jonas Jenwald	ab672f0b77	Replace `PDFWorker.fromPort` with a generic `PDFWorker.create` method This allows us to simply invoke `PDFWorker.create` unconditionally from the `getDocument` function, without having to manually check if a global `workerPort` is available first.	2025-05-17 16:13:41 +02:00
Jonas Jenwald	b629bafd1c	Allow to, optionally, keep Unicode escape sequences in `stringToPDFString` (PR 17331 follow-up) Currently some of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty. The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification. Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are: - Parsing named destinations[2] and URLs. - Handling "strings" that are actually /Name-instances. - Building a lookup Object/Map based on some PDF data-structure. NOTE: The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above. --- [1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27". [2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.	2025-04-30 20:51:10 +02:00
Jonas Jenwald	adc9eb5a5a	Always fallback to checking all destinations, when lookup fails (issue 19835) In the referenced PDF document the keys, in the /Dests dictionary, need to account for PDFDocEncoding. To improve destination handling in general we'll now unconditionally fallback to always checking all destinations.	2025-04-20 14:53:10 +02:00
Calixte Denizet	be1f5671bb	[api-minor] Use a Path2D when doing a path operation in the canvas (bug 1946953) With this patch, all the paths components are collected in the worker until a path operation is met (i.e., stroke, fill, ...). Then in the canvas a Path2D is created and will replace the path data transfered from the worker, this way when rescaling, the Path2D can be reused. In term of performances, using Path2D is very slightly improving speed when scaling the canvas.	2025-03-22 20:35:24 +01:00
Jonas Jenwald	9e8d4e4d46	[api-minor] Attempt to support fetching the raw data of the PDF document from the `PDFDocumentLoadingTask`-instance (issue 15085) The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed. Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.	2025-03-16 10:09:44 +01:00
Jonas Jenwald	7b5cd9cddd	Use arrow functions with some `Promise.then` calls A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.	2025-03-02 19:57:38 +01:00
Jonas Jenwald	2e62f426fe	Use arrow function with various Array methods A lot of this is quite old code, which we can shorten slightly by using arrow functions instead of "regular" functions.	2025-03-02 15:19:04 +01:00
Jonas Jenwald	d5ce35f744	Move the EXIF-block replacement into `JpegStream` (PR 19356 follow-up) Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block. Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.	2025-02-20 12:41:39 +01:00
Jonas Jenwald	36979e9eb2	Fix all outstanding ESLint `arrow-body-style` warnings Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.	2025-02-17 15:45:44 +01:00
Jonas Jenwald	33cba30bdb	Search for destinations in both /Names and /Dests dictionaries (issue 19474) Currently we only use either one of them, preferring the NameTree when it's available.	2025-02-14 15:49:05 +01:00
Jonas Jenwald	db43f158dc	Inline the default Factory-definitions in `getDocument` - Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code. - A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays. - For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.	2025-01-18 14:09:14 +01:00
Jonas Jenwald	75cba72ca6	[api-major] Replace `MissingPDFException` and `UnexpectedResponseException` with one exception These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead. Besides an error message, the new `ResponseException` instances also include: - A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`. - A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.	2025-01-16 22:51:05 +01:00
Jonas Jenwald	6f062abb76	Skip LinkAnnotations when collecting field objects (issue 19281) The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields. To improve performance we'll thus ignore those when collecting the field objects.	2025-01-04 11:54:45 +01:00
Jonas Jenwald	c6e3fc4fe6	Take the `userUnit` into account in the `PageViewport` class (issue 19176)	2024-12-08 15:51:04 +01:00
Jonas Jenwald	f8d11a3a3a	Merge pull request #19074 from Rob--W/issue-12744-test Add test cases for redirected responses	2024-12-02 19:06:55 +01:00
Rob Wu	f97b4b9a66	Add test cases for redirected responses Regression tests for issue #12744 and PR #19028	2024-12-02 17:57:49 +01:00
Rob Wu	28b0220bc2	Replace createTemporaryNodeServer with TestPdfsServer Some tests rely on the presence of a server that serves PDF files. When tests are run from a web browser, the test files and PDF files are served by the same server (WebServer), but in Node.js that server is not around. Currently, the tests that depend on it start a minimal Node.js server that re-implements part of the functionality from WebServer. To avoid code duplication when tests depend on more complex behaviors, this patch replaces createTemporaryNodeServer with the existing WebServer, wrapped in a new test utility that has the same interface in Node.js and non-Node.js environments (=TestPdfsServer). This patch has been tested by running the refactored tests in the following three configurations: 1. From the browser: - http://localhost:8888/test/unit/unit_test.html?spec=api - http://localhost:8888/test/unit/unit_test.html?spec=fetch_stream 2. Run specific tests directly with jasmine without legacy bundling: `JASMINE_CONFIG_PATH=test/unit/clitests.json ./node_modules/.bin/jasmine --filter='^api\|^fetch_stream'` 3. `gulp unittestcli`	2024-12-02 17:57:49 +01:00
Rob Wu	131d4650a5	Drop trailing whitespace from test/unit/api_spec.js test/unit/api_spec.js is the only JS file in the tree with trailing whitespace. Because `trim_trailing_whitespace = true` in .editorconfig, any editor supporting EditorConfig would trim whitespace when the file is changed, which results in test failures. This commit fixes the issue by trimming the trailing whitespace and adjusting the test expectations.	2024-11-24 23:37:16 +01:00
Jonas Jenwald	1a56b35af7	Merge pull request #19003 from Snuffleupagus/api-unittest-image-helpers Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js`	2024-11-06 09:11:28 +01:00
Jonas Jenwald	e92a929a58	Try to improve handling of missing trailer dictionaries in `XRef.indexObjects` (issue 18986) The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary. Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document. In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that might be a valid /Root dictionary. There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than immediately throwing `InvalidPDFException` as we previously did here. Please note: I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.	2024-11-05 18:19:26 +01:00
Jonas Jenwald	f2fb3b95ce	Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js` This avoids repeating the same code multiple times, and as part of the changes we'll also utilize existing PDF.js helpers more.	2024-11-04 14:09:34 +01:00
Calixte Denizet	c9050be863	[Editor] Add the possibility to save an updated stamp annotation (bug 1921291)	2024-10-02 11:45:16 +02:00
Calixte Denizet	0382dd0e25	[Editor] When deleting an annotation with popup, then delete the popup too	2024-09-26 17:52:25 +02:00
Jonas Jenwald	bb302dd993	[api-minor] Pass `CanvasFactory`/`FilterFactory`, rather than instances, to `getDocument` This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances. As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.	2024-09-23 11:26:30 +02:00
Calixte Denizet	ddba096191	Make tagged images visible for screen readers (bug 1708040) The idea is to insert a span in the text layer with an aria-role set to img and use the bounding box provided by the attribute field in the tag dict in order to have non-null dimensions for the image to make it "visible".	2024-09-05 17:59:42 +02:00
Jonas Jenwald	c4fdb28573	Remove `PDFWorkerUtil` and move its contents into `PDFWorker` instead This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.	2024-07-29 11:22:43 +02:00
Jonas Jenwald	c4cd405a8f	Ignore non-dictionary nodes when parsing StructTree data (issue 18503)	2024-07-28 12:08:44 +02:00
Calixte Denizet	c3065629ca	[Editor] Correctly save a non-ascii alt text	2024-07-24 19:13:45 +02:00
Jonas Jenwald	d24a61c648	Allow /XYZ destinations without zoom parameter (issue 18408) According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-18 13:29:32 +02:00
Jonas Jenwald	403d023617	Allow e.g. /FitH destinations without additional parameter (bug 1907000) According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-11 10:36:44 +02:00
Jonas Jenwald	5ee61690f3	Merge pull request #18390 from alexcat3/fix-issue-18099 Handle toUnicode cMaps that omit leading zeros in hex encoded UTF-16 (issue 18099)	2024-07-06 18:57:07 +02:00
alexcat3	1c364422a6	Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099) Add unit test to check compatability with such cmaps In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is <start_char_code> <end_char_code> <start_unicode_codepoint> As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read <00> <7F> <0000> Instead it omitted two leading zeros from the UTF-16 like this <00> <7F> <00> This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this	2024-07-06 11:29:21 -04:00
Tim van der Meij	2a44203d96	Fix the "caches image resources at the document/page level as expected (issue 11878)" unit test This unit test fails occasionally (albeit much less than before thanks to PR #17663), so we change the parsing time check's divisor to prevent it from happening again. If the last page's rendering time is less than or equal to 50% of the first page's rendering time that should be enough proof that no worker thread re-parsing occurred while also providing a wide enough range to avoid intermittents. Note that the assertion is now equal to the one we already have in the "caches image resources at the document/page level, with main-thread copying of complex images (issue 11518)" unit test which seems to work reliably so far.	2024-07-06 16:30:07 +02:00
Jonas Jenwald	06334c97ef	Improve the `loadingParams` functionality in the API - Move the definition of the `loadingParams` Object, to simplify the code. - Add a unit-test, since none existed and the viewer depends on this functionality.	2024-05-24 09:26:40 +02:00
Jonas Jenwald	c5f92437f7	Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up) For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.	2024-05-14 13:58:36 +02:00
Jonas Jenwald	6d523c316c	[api-minor] Include the document /Lang attribute in the textContent-data - These changes will allow a simpler way of implementing PR 17770. - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done once per PDF document (and most PDFs don't included this data). - This makes the /Lang attribute directly available in the `textLayer`, which has the following advantages: - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer). - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770. Please note: This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).	2024-05-14 12:44:41 +02:00
Jonas Jenwald	2b69fb76ac	[api-minor] Improve the `FileSpec` implementation - Check that the `filename` is actually a string, before parsing it further. - Use proper "shadowing" in the `filename` getter. - Add a bit more validation of the data in `pickPlatformItem`. - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.	2024-05-01 18:02:05 +02:00

1 2 3 4 5 ...