OCR - Get Lines
Declaration
<AMOCR ACTIVITY="get_lines" IMAGE="text" ALLPAGE="yes/no" TOP="number" LEFT="number" WIDTH="number" HEIGHT="number" RESULTDATASET="text" />
Description: Retrieves line information from an image and populates an dataset with the results.
Practical Usage
Can be used along with the OCR - Get words, Screen capture and Move mouse activities to find the X, Y position of a word inside an object (e.g. button or control) in a non-Windows API type window (e.g., Java Window) to moving your mouse to the correct location.
General Parameters
Property |
Type |
Required |
Default |
Markup |
Description |
---|---|---|---|---|---|
Image |
Text |
Yes |
(Empty) |
IMAGE="C:\temp\Image.jpg" |
The path and filename of the image file to retrieve one or more lines of text from. Supported image formats are JPG, PNG, TIFF, GIF and BMP. NOTE: Although a variety of formats are supported, image data with lossless compression such as TIFF is recommended |
Entire image/page |
|
|
|
|
If enabled, lines of text will be searched within the entire image/page (enabled by default). This is a visual mode parameter used only during design-time, therefore, contains no markup. |
Specified region (improves accuracy) |
|
|
|
|
If enabled, a specific region of the image will be searched. Press Pick Region to open a dialog allowing you to select an image region. See Pick Region Dialog for more details. This is a visual mode parameter used only during design-time, therefore, contains no markup. |
Top |
Number |
Yes if specifying region |
(Empty) |
TOP="223" |
The top most pixel coordinate of the image. This parameter is active only if the Specified region parameter is enabled. |
Left |
Number |
Yes if specifying region |
(Empty) |
LEFT="115" |
The left most pixel coordinate of the image. This parameter is active only if the Specified region parameter is enabled. |
Width |
Number |
Yes if specifying region |
(Empty) |
WIDTH="647" |
The total width of the image in pixels. This parameter is active only if the Specified region parameter is enabled. |
Height |
Number |
Yes if specifying region |
(Empty) |
HEIGHT="647" |
The total height of the image in pixels. This parameter is active only if the Specified region parameter is enabled. |
Filter (optional) |
Text |
No |
(Empty) |
FILTER="text" |
The filter to use on this operation. |
Match case |
Yes/No |
No |
No |
MATCHCASE="yes" |
If set to YES, the operation becomes case sensitive. Set to NO by default. |
Create and populate dataset with word(s) information |
Text |
Yes |
(Empty) |
RESULTVARIABLE="theText" |
The name of the dataset to create and populate with information about the retrieved word(s). See Datasets below for more details. |
Advanced Parameters
Property |
Type |
Required |
Default |
Markup |
Description |
---|---|---|---|---|---|
Page Range All |
Yes/No |
No |
Yes |
ALLPAGE="yes" |
If set to YES, lines of text will be retrieved from all pages in a range (YES by default). If this parameter is set to YES, the Pages parameter is ignored. |
Page Range Pages |
Number |
No |
No |
|
If set to YES, lines of text will be retrieved from specific pages in a range (NO by default). Supports specification of a single page, specific pages or a sequence of pages in a range (see the Markup column for examples). Note that only GIF images support multiple pages. If this parameter is set to YES, the All parameter is ignored. |
Languages |
Yes/No |
No |
English |
SPANISH="YES" PORTUGUESE="YES" |
The language(s) of the word(s) contained in the image file that should be read. Available languages are:
|
Use ICR (digits only) |
Yes/No |
No |
No |
ICR="YES" |
If set to YES, ICR (Intelligent Character Recognition), a more advanced handwriting recognition system, will be used to recognize numbers or digits. Set to NO by default. |
Invert image colors |
Yes/No |
No |
No |
INVERT="YES" |
If set to YES, image colors will be transformed from light to dark and dark to light. If this activity has trouble recognizing words, inverting may add more contrast to the text, thus, may assist in accurate reads. |
Datasets
A dataset is a multiple column, multiple row container object. This activity creates and populates a dataset containing a specific set of fields. The table below describes these fields (assuming the dataset name assigned was theDataset).
Name |
Type |
Return Value |
---|---|---|
theDataset.PageIndex |
Number |
Returns the page index. |
theDataset.LineIndex |
Number |
Returns the line index. |
theDataset.Text |
Text |
Returns the retrieved text. |
theDataset.Top |
Number |
Returns the top most pixel coordinate of the image. |
theDataset.Left |
Number |
Returns the left most pixel coordinate of the image. |
theDataset.Width |
Number |
Returns the width of the image in pixels. |
theDataset.Height |
Number |
Returns the height of the image in pixels. |
Example
The sample AML code below can be copied and pasted directly into the Steps panel of the Task Builder.
Description: Get line(s) from image "C:\Sample OCR\OCR1.TIF" and populate dataset "theLine" using OCR. Selected region is "{X=251,Y=278,Width=1911,Height=545}".
<AMOCR ACTIVITY="get_lines" IMAGE="C:\Sample OCR\OCR1.TIF" ALLPAGE="yes" TOP="251" LEFT="278" WIDTH="1911" HEIGHT="545" FILTER="text" RESULTDATASET="theLine" />