PDF - Extract

Declaration

<AMPDF SOURCE="text" PASSWORD="encrypted text" EXTRACT="text (options)" 
IMAGEFORMAT="text (options)" DESTINATIONFILE="text" PAGE="number" />

Description: Extracts the specified contents of an existing PDF document into a variable, another PDF file, an image file or text file.

Practical Usage

Used to take images or text out of a PDF file and use them in another PDF file or export them in various formats.

Resource Parameters

Property

Type

Required

Default

Markup

Description

Resource

---

---

---

---

Indicates where the source PDF should originate from. This is a design mode parameter used only during task construction and configuration, thus, comprises no markup. The available options are:

  • File (default) - Specifies that the source PDF derives from a file located on the system. This option is normally chosen if only a single activity is required to complete an operation.

  • Session - Specifies that the source PDF is obtained from a pre-configured session created in an earlier step with the use of the PDF - Create session activity. This options is normally chosen if a combination of activities within the same action group are required. Linking several activities to a single session eliminates redundancy and improves efficiency. Several sessions can exist in a single task. In addition, multiple sessions can run simultaneously without interference.

Session

Text

Yes if Resource parameter is set to Session

PDFSession1

SESSION="mySession"

The name of an existing session to associate this activity with. This parameter is active only if the Resource parameter is set to Session.

Source PDF

Text

Yes if Resource parameter is set to File

(Empty)

SOURCE="C:\temp\source.pdf"

The path and file name of an existing PDF document in which to extract contents from. This parameter is active only if the Use previously created session parameter is disabled.

Password (optional)

Text

Yes if Resource parameter is set to File

(Empty)

PASSWORD="encrypted"

The password required to open the existing PDF document (if required).

Output Parameters

Property

Type

Required

Default

Markup

Description

Output type

Text (options)

Yes

Text

  1. EXTRACT="text"
  2. EXTRACT="page"
  3. EXTRACT="image"
  4. EXTRACT="text_file"

The type of output that extracted contents should be saved as. The available options are:

  • Text (default) - Text will be populated into an existing variable.

  • PDF - Content will be extracted onto a PDF file.

  • Image(s) - Images will be extracted to an image file.

  • Text file - Text will be extracted to a text file.

Populate variable with extracted text

Text

Yes if Output type parameter is set to Text

(Empty)

RESULTVARIABLE="varName"

The name of an existing variable in which to save extracted text. This parameter is active only if the Output type parameter is set to Text.

Save as type

Text (options)

Yes if Output type parameter is set to Image(s)

JPEG

  1. IMAGEFORMAT="jpeg"
  2. IMAGEFORMAT="png"
  3. IMAGEFORMAT="bmp"
  4. IMAGEFORMAT="tiff"
  5. IMAGEFORMAT="gif"

The format that image content should be saved as. This parameter is active only if the Output type parameter is set to Image(s). The available options are:

  • JPEG (default) - Images will be saved in JPEG file format.  

  • PNG - Images will be saved in PNG (Portable Network Graphics) file format.

  • BMP - Images will be saved in BMP (Bitmap) file format.

  • TIFF - Images will be saved in TIFF (Tagged Image File Format) format.

  • GIF - Images will be saved in GIF (Graphics Interchange Format) file format.

Destination

Text (options)

Yes if Output type parameter is set to PDF, Image(s) or Text file

(Empty)

  1. DESTINATIONFILE="C:\temp"
  2. DESTINATIONFILE="C:\temp\text.txt"
  3. DESTINATIONFILE="C:\temp\content.pdf"

The destination in which extracted content should be saved to. This parameter is active only if the Output type parameter is set to PDF, Image(s) or Text file.

Note: If Output type is set to Image(s), the destination must be a folder name (e.g., c:\destinationFolder) to output multiple images.  If Output type is set to PDF or Text file,the destination file must match the specified output type. For example, if set to PDF, the destination file must be a .pdf file (e.g., c:\temp\contents.pdf). If set to Text file, the destination file must be a .txt file (e.g., c:\temp\textFile.txt).

Page Parameters

Property

Type

Required

Default

Markup

Description

All

---

---

---

---

If enabled, extraction will be performed on all pages of the PDF document. This is a visual mode parameter used only during design time, therefore, contains no properties or markups.

Page(s)

Text

No

(Empty)

FIELDNAME="Field1"

If enabled, extraction will be performed on an individual page or specific pages of the PDF document. Enter the page number to specify a single page (e.g., 2 or 4). Use a comma (,) to specify more than one page (e.g., 1,3,5). Use a dash (-) to specify a range of pages (e.g., 5-10).

Description tab - A custom description can be provided on the Description tab to convey additional information or share special notes about a task step.

Error Causes tab - Specify how this step should behave upon the occurrence of an error. (Refer to Task Builder > Error Causes Tab for details.)

On Error tab - Specify what AWE should do if this step encounters an error as defined on the Error Causes tab. (Refer to Task Builder > On Error Tab for details.)

Example

The sample AML code below can be copied and pasted directly into the Steps panel of the Task Builder.

Example 1

Extract from page "1,2" of existing PDF "C:\temp\sourceFile.pdf" and store in "C:\temp\newFile.pdf".

<AMPDF SOURCE="C:\temp\sourceFile.pdf" EXTRACT="page" 
 DESTINATIONFILE="C:\temp\newFile.pdf" PAGE="1,2" />

Example 2

Extract images from page 2 of existing PDF "C:\temp\sourceForm.pdf" and store in folder "C:\temp".

<AMPDF SOURCE="C:\temp\sourceForm.pdf" EXTRACT="image" 
 DESTINATIONFILE="C:\temp" PAGE="2" />