PDF - Extract
Declaration
<AMPDF SOURCE="text" SESSION="text" PASSWORD="encrypted text" EXTRACT="text (options)" RESULTVARIABLE="text" IMAGEFORMAT="text (options)" DESTINATIONFILE="text" PAGE="number" />
Description
Extracts the specified contents of an existing PDF file into a variable, another PDF file, an image file, or a text file.
Practical usage
Used to extract images or text from a PDF file and reuse them in a different PDF file, or export them in various formats.
Parameters
Resource
Property | Type | Required | Default | Markup | Description |
---|---|---|---|---|---|
Resource | --- | --- | --- | --- | Specifies the source of the PDF file. The available options are:
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes. |
Session | Text | Yes, if the Resource parameter is set to Session | PDFSession1 | SESSION="mySession" | The existing session to associate with this activity. This parameter becomes active and is required if the Resource parameter is set to Session. |
Source PDF | Text | Yes, if the Resource parameter is set to File | (Empty) | SOURCE="C:\temp\source.pdf" | The PDF path and file name of where to extract content. This parameter becomes active and is required if the Resource parameter is set to File. |
Password (optional) | Text | Yes, if the Resource parameter is set to File | (Empty) | PASSWORD="encrypted" | The password required to open the existing PDF file (if required). |
Output
Property | Type | Required | Default | Markup | Description |
---|---|---|---|---|---|
Output type | Text (options) | Yes | Text |
|
Specifies the output type the extracted contents are saved as. The available options are:
|
Populate variable with extracted text | Text | Yes, if Output type parameter is set to Text | (Empty) | RESULTVARIABLE="varName" | The name of the existing variable of where to save the extracted text. This parameter becomes active and is required if the Output type parameter is set to Text. |
Save as type | Text (options) | Yes, if the Output type parameter is set to Images | JPEG |
|
Specifies the file format the extracted images are saved as. This parameter becomes active and is required if the Output type parameter is set
to Images. The available
options are:
|
Destination | Text (options) | Yes, if the Output type parameter is set to PDF, Image(s), Multi-page TIFF, or Text file | (Empty) |
|
The destination folder of where to save the extracted content. This parameter becomes active and is required if the Output type parameter is set to PDF, Images, Multi-page TIFF, or Text file.
NOTE:
If Output type is set
to Image(s), the destination
must be a folder name (for example, c:\destinationFolder) in order to
output multiple images.
If Output type is set to PDF, Multi-page TIFF, or Text file, the destination file must match the specified output type. For example, if set to PDF, the destination file must be a .pdf file (for example, c:\temp\contents.pdf). |
Page
Property | Type | Required | Default | Markup | Description |
---|---|---|---|---|---|
Page range | --- | Yes | All | --- | Specifies the pages to extract content from in the PDF file. The
available options are:
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes. |
Page(s) | Text | Yes, if Page range is set to Page(s) | (Empty) | PAGE="1,3,5" | If enabled, specifies the pages to extract content from in the PDF file. For a single page, enter the page number. Use a comma (,) to specify more than one page (for example, 1,3,5). Use a dash (-) to specify a range of pages (for example, 5-10). |
Examples
- Copy and paste the sample AML code below directly into the Task Builder Steps Panel.
- To successfully run the sample code, update parameters containing user credentials, files, file paths, or other information specific to the task to match your environment.
Example 1
This sample task extracts content from multiple pages of an existing an PDF file and then stores it in a new PDF file.
<AMPDF SOURCE="C:\temp\sourceFile.pdf" EXTRACT="page" DESTINATIONFILE="C:\temp\newFile.pdf" PAGE="1,2" />
Example 2
This sample task extracts images from page 2 of an existing an PDF file and then stores them in a folder.
<AMPDF SOURCE="C:\temp\sourceForm.pdf" EXTRACT="image" DESTINATIONFILE="C:\temp" PAGE="2" />