PDF - Search

Declaration

<AMPDF ACTIVITY="search" SOURCE="text" PASSWORD="encrypted text" FIND="text" RESULTDATASET="text" REGEX="YES/NO" PAGE="number" />

Description: Searches for the occurrence of one or more text strings (e.g., particular characters, words, or patterns of characters) in a PDF document and populates a dataset with the results. Regular expressions can be used to provide a more concise and flexible method of finding text.

Practical Usage

Commonly used to locate a word or phrase inside of a PDF document in order to retrieve relevant data (e.g., page(s) where text was found, font information, text position) and/or perform other operations referencing the found text (e.g., Insert Text, Clipboard - Copy, Clipboard - Paste).

Resource Parameters

Property

Type

Required

Default

Markup

Description

Resource

---

---

---

---

Indicates where the source PDF should originate from. This is a design mode parameter used only during task construction and configuration, thus, comprises no markup. The available options are:

  • File (default) - Specifies that the source PDF derives from a file located on the system. This option is normally chosen if only a single activity is required to complete an operation.

  • Session - Specifies that the source PDF is obtained from a pre-configured session created in an earlier step with the use of the PDF - Create session activity. This options is normally chosen if a combination of activities within the same action group are required. Linking several activities to a single session eliminates redundancy and improves efficiency. Several sessions can exist in a single task. In addition, multiple sessions can run simultaneously without interference.

Session

Text

Yes if Resource parameter is set to Session

PDFSession1

SESSION="mySession"

The name of an existing session to associate this activity with. This parameter is active only if the Resource parameter is set to Session.

Source PDF

Text

Yes if Resource parameter is set to File

(Empty)

SOURCE="C:\temp\source.pdf"

The path and filename of the PDF document in which to extract contents from. This parameter is active only if the Resource parameter is set to File.

Password (optional)

Text

No

(Empty)

PASSWORD="encrypted"

The password required to open the existing PDF document (if applicable). This parameter is active only if the Resource parameter is set to File.

Criteria Parameters

Property

Type

Required

Default

Markup

Description

Find

Text

Yes

(Empty)

FIND="Network Automation"

The text string to search for.

Use regular expression

Yes/No

No

No

REGEX="YES"

If set to YES, indicates that the value entered in the Find parameter is a regular expression. If set to NO (default), the value is literal text.

Create and populate dataset

Text

No

(Empty)

SIGNNAME="SignatureName"

The name of the dataset to create and populate search results. See Datasets below for more details.

Pages Parameters

Property

Type

Required

Default

Markup

Description

All

---

---

---

---

If enabled, search will be performed on all pages of the PDF document. This is a visual mode parameter used only during design time, therefore, contains no properties or markup.

Page(s)

Text

No

(Empty)

FIELDNAME="Field1"

If enabled, search will be performed on an individual page or specific pages of the PDF document. Enter the page number to specify a single page. Use a comma (,) to specify more than one page (e.g., 1,3,5). Use a dash (-) to specify a range of pages (e.g., 5-10).

Datasets

Similar to a database or spreadsheet, a dataset is a multiple column, multiple row container object used to populate a collection of data. Each column represents a particular variable (e.g., name, type, value). Each row corresponds to a given member of the dataset in question. This activity creates and populates a dataset with the following fields (rows):

Name

Type

Return Value

theDataset.FontIsAccessible

True/False

Indicates whether the font representing the found text is present (installed) in the system.

theDataset.FontIsEmbedded

True/False

Indicates whether the font representing the found text is embedded.

theDataset.FontIsSubset

True/False

Specifies whether the font representing the found text is a subset.

theDataset.FontName

Text

The name of the font matching the found text.

theDataset.FontSize

Number

The size of the font matching the found text.

theDataset.ForegroundColor

Text (Color)

The HTML color value of the font matching the found text represented in

theDataset.Page

Number

The page number where the matching text was found.

theDataset.Position

Number

The position of the matching text represented in hash code.

theDataset.Text

Text

The text found.

theDataset.XIndent

Number

The X coordinate of the found text.

theDataset.YIndent

Number

The Y coordinate of the found text.

Example

The sample AML code below can be copied and pasted directly into the Steps panel of the Task Builder.

Description: Search existing PDF "C:\temp\myDocument.pdf" for text "PDF Security" on page "1-5". Create and populate dataset "theDataset" with search result.

<AMPDF ACTIVITY="search" SOURCE="C:\temp\myDocument.pdf" FIND="PDF Security" RESULTDATASET="theDataset" PAGE="1-5" />