PDF - Extract

Declaration

<AMPDF SOURCE="text" SESSION="text" PASSWORD="encrypted text" EXTRACT="text (options)" RESULTVARIABLE="text" IMAGEFORMAT="text (options)" DESTINATIONFILE="text"  PAGE="number" />

Related Topics

Description

Extracts the specified contents of an existing PDF file into a variable, another PDF file, an image file, or a text file.

Practical usage

Used to extract images or text from a PDF file and reuse them in a different PDF file, or export them in various formats.

Parameters

Resource

Property Type Required Default Markup Description
Resource --- --- --- --- Specifies the source of the PDF file. The available options are:
  • File (default) - The source derives from a PDF file located on the system. This option is normally selected if only a single activity is required to complete the operation.
  • Session - The source PDF is obtained from a pre-configured session created in an earlier step with the use of the PDF - Create session activity. This option is normally selected if a combination of related activities is required to complete an operation. Consolidating several activities to a single session can eliminate redundancy. Moreover, a single task supports multi-session executions which can improve efficiency and speed up production.
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes.
Session Text Yes, if the Resource parameter is set to Session PDFSession1 SESSION="mySession" The existing session to associate with this activity. This parameter becomes active and is required if the Resource parameter is set to Session.
Source PDF Text Yes, if the Resource parameter is set to File (Empty) SOURCE="C:\temp\source.pdf" The PDF path and file name of where to extract content. This parameter becomes active and is required if the Resource parameter is set to File.
Password (optional) Text Yes, if the Resource parameter is set to File (Empty) PASSWORD="encrypted" The password required to open the existing PDF file (if required).

Output

Property Type Required Default Markup Description
Output type Text (options) Yes Text
  • EXTRACT="text"
  • EXTRACT="page"
  • EXTRACT="image"
  • EXTRACT="multipage_tiff"
  • EXTRACT="text_file"
Specifies the output type the extracted contents are saved as. The available options are:
  • Text (default) - Text is saved into an existing variable.
  • PDF - Content is extracted and saved as a PDF file.
  • Image(s) - Images are extracted and saved as individual image files.
  • Multi-page TIFF - Content is extracted and saved as a single, multi-page TIFF file.
  • Text file - Text is extracted and saved as a text file.
Populate variable with extracted text Text Yes, if Output type parameter is set to Text (Empty) RESULTVARIABLE="varName" The name of the existing variable of where to save the extracted text. This parameter becomes active and is required if the Output type parameter is set to Text.
Save as type Text (options) Yes, if the Output type parameter is set to Images JPEG
  • IMAGEFORMAT="jpeg"
  • IMAGEFORMAT="png"
  • IMAGEFORMAT="bmp"
  • IMAGEFORMAT="tiff"
  • IMAGEFORMAT="gif"
Specifies the file format the extracted images are saved as. This parameter becomes active and is required if the Output type parameter is set to Images. The available options are:
  • JPEG (default) - Extracted images are saved in the JPEG file format.
  • PNG - Extracted images are saved in the PNG (Portable Network Graphics) file format.
  • BMP - Extracted images are saved in the BMP (Bitmap) file format.
  • TIFF - Extracted images are saved in the TIFF (Tagged Image File Format) file format.
  • GIF - Extracted images are saved in the GIF (Graphics Interchange Format) file format.
Destination Text (options) Yes, if the Output type parameter is set to PDF, Image(s), Multi-page TIFF, or Text file (Empty)
  • DESTINATIONFILE="C:\temp"
  • DESTINATIONFILE="C:\temp\content.pdf"
  • DESTINATIONFILE="C:\temp\content.tiff"
  • DESTINATIONFILE="C:\temp\text.txt"

The destination folder of where to save the extracted content. This parameter becomes active and is required if the Output type parameter is set to PDF, Images, Multi-page TIFF, or Text file.

NOTE: If Output type is set to Image(s), the destination must be a folder name (for example, c:\destinationFolder) in order to output multiple images.  

If Output type is set to PDF, Multi-page TIFF, or Text file, the destination file must match the specified output type. For example, if set to PDF, the destination file must be a .pdf file (for example, c:\temp\contents.pdf).

Page

Property Type Required Default Markup Description
Page range --- Yes All --- Specifies the pages to extract content from in the PDF file. The available options are:
  • All - Extracts content from all pages in the PDF file.
  • Page(s) - Extracts content from one or more specific pages in the PDF file.
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes.
Page(s) Text Yes, if Page range is set to Page(s) (Empty) PAGE="1,3,5" If enabled, specifies the pages to extract content from in the PDF file. For a single page, enter the page number. Use a comma (,) to specify more than one page (for example, 1,3,5). Use a dash (-) to specify a range of pages (for example, 5-10).

Description

Error Causes

On Error

Examples

NOTE:
  • Copy and paste the sample AML code below directly into the Task Builder Steps Panel.
  • To successfully run the sample code, update parameters containing user credentials, files, file paths, or other information specific to the task to match your environment.

Example 1

This sample task extracts content from multiple pages of an existing an PDF file and then stores it in a new PDF file.

Copy
<AMPDF SOURCE="C:\temp\sourceFile.pdf" EXTRACT="page" DESTINATIONFILE="C:\temp\newFile.pdf" PAGE="1,2" />

Example 2

This sample task extracts images from page 2 of an existing an PDF file and then stores them in a folder.

Copy
<AMPDF SOURCE="C:\temp\sourceForm.pdf" EXTRACT="image" DESTINATIONFILE="C:\temp" PAGE="2" />