PDF - Search

Declaration

<AMPDF ACTIVITY="search" SESSION="text" SOURCE="text" PASSWORD="text (encrypted)" FIND="text" RESULTDATASET="text" REGEX="YES/NO" PAGE="number" />

Related Topics

Description

Searches for the occurrence of one or more text strings (that is, particular characters, words, or patterns of characters) in a PDF file and then populates a dataset with the results. Regular expressions can be used to provide a more concise and flexible method of finding text.

Practical usage

Commonly used to locate a word or phrase inside of a PDF file in order to retrieve relevant data (for example, pages where text was found, font information, text position) and/or perform other operations referencing the found text (for example, Text - Insert, Clipboard - Copy, Clipboard - Paste).

Parameters

Resource

Property Type Required Default Markup Description
Resource --- --- --- --- Specifies the source of the PDF file. The available options are:
  • File (default) - The source derives from a PDF file located on the system. This option is normally selected if only a single activity is required to complete the operation.
  • Session - The source PDF is obtained from a pre-configured session created in an earlier step with the use of the PDF - Create session activity. This option is normally selected if a combination of related activities is required to complete an operation. Consolidating several activities to a single session can eliminate redundancy. Moreover, a single task supports multi-session executions which can improve efficiency and speed up production.
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes.
Session Text Yes, if the Resource parameter is set to Session PDFSession1 SESSION="mySession" The existing session to associate with this activity. This parameter becomes active and is required if the Resource parameter is set to Session.
Source PDF Text Yes, if the Resource parameter is set to File (Empty) SOURCE="C:\temp\source.pdf" The PDF path and file name of where to search for text strings. This parameter becomes active and is required if the Resource parameter is set to File.
Password (optional) Text Yes, if the Resource parameter is set to File (Empty) PASSWORD="encrypted" The password required to open the existing PDF file (if required).

Criteria

Property Type Required Default Markup Description
Find Text Yes (Empty) FIND="PDF Security" The text string to search for in the PDF file.
Use regular expression Yes/No No No REGEX="YES" If selected, indicates that the value entered in the Find parameter is a regular expression. If disabled (default), the value is literal text.
Create and populate dataset Text Yes (Empty) RESULTDATASET="theDataset"

The name of the dataset to create and populate with search results.See Datasets for more information on the fields this dataset creates.

Page

Property Type Required Default Markup Description
Page range --- Yes All --- Specifies the pages search for the text string in the PDF file. The available options are:
  • All - Searches for the text string on all pages in the PDF file.
  • Page(s) - Searches for the text string on one or more specific pages in the PDF file.
NOTE: This parameter does not contain markup and is only displayed in visual mode for task construction and configuration purposes.
Page(s) Text Yes, if Page range is set to Page(s) (Empty) PAGE="1,3,5" If enabled, specifies the pages to search for the text string in the PDF file. For a single page, enter the page number. Use a comma (,) to specify more than one page (for example, 1,3,5). Use a dash (-) to specify a range of pages (for example, 5-10).

Description

Error Causes

On Error

Additional notes

Datasets

A dataset is a multiple column, multiple row container object. This activity creates and populates a dataset containing a specific set of fields in addition to the standard dataset fields. The table below describes these fields (assuming the dataset name assigned was theDataset).

Name Type Return Value
theDataset.FontIsAccessible True/False Indicates whether the font representing the found text is present (installed) in the system.
theDataset.FontIsEmbedded True/False Indicates whether the font representing the found text is embedded.
theDataset.FontIsSubset True/False Specifies whether the font representing the found text is a subset.
theDataset.FontName Text The name of the font matching the found text.
theDataset.FontSize Number The size of the font matching the found text.
theDataset.ForegroundColor Text (Color) The HTML color value of the font matching the found text represented in
theDataset.Page Number The page number where the matching text was found.
theDataset.Position Number The position of the matching text represented in hash code.
theDataset.Text Text The text found.
theDataset.XIndent Number The X coordinate of the found text.
theDataset.YIndent Number The Y coordinate of the found text.

Example

NOTE:
  • Copy and paste the sample AML code below directly into the Task Builder Steps Panel.
  • To successfully run the sample code, update parameters containing user credentials, files, file paths, or other information specific to the task to match your environment.

Description

This sample task searches a PDF file for a text string on multiple pages and then creates and populates a dataset with the results.

Copy
<AMPDF ACTIVITY="search" SOURCE="C:\temp\myDocument.pdf" FIND="PDF Security" RESULTDATASET="theDataset" PAGE="1-5" />