OCR - Get line(s)

OCR - Get Lines

Declaration

<AMOCR ACTIVITY="get_lines" IMAGE="text" ALLPAGE="yes/no" 
TOP="number" LEFT="number" WIDTH="number" 
HEIGHT="number" RESULTDATASET="text" />

Description: Retrieves line information from an image and populates an dataset with the results.

IMPORTANT: GPL GhostScript (32-bit) library is required to support PDF files. You can download the installer for the 32-bit Windows version at https://www.ghostscript.com/download/gsdnld.html.

Practical Usage

Can be used along with the OCR - Get words, Screen capture and Move mouse activities to find the X, Y position of a word inside an object (e.g. button or control) in a non-Windows API type window (e.g., Java Window) to moving your mouse to the correct location.

General Parameters

Property	Type	Required	Default	Markup	Description
Image	Text	Yes	(Empty)	IMAGE="C:\temp\Image.jpg"	The path and filename of the image file to retrieve one or more lines of text from. Supported image formats are JPG, PNG, TIFF, GIF and BMP. NOTE: Although a variety of formats are supported, image data with lossless compression such as TIFF is recommended
Entire image/page					If enabled, lines of text will be searched within the entire image/page (enabled by default). This is a visual mode parameter used only during design-time, therefore, contains no markup.
Specified region (improves accuracy)					If enabled, a specific region of the image will be searched. Press Pick Region to open a dialog allowing you to select an image region. See Pick Region Dialog for more details. This is a visual mode parameter used only during design-time, therefore, contains no markup.
Top	Number	Yes if specifying region	(Empty)	TOP="223"	The top most pixel coordinate of the image. This parameter is active only if the Specified region parameter is enabled.
Left	Number	Yes if specifying region	(Empty)	LEFT="115"	The left most pixel coordinate of the image. This parameter is active only if the Specified region parameter is enabled.
Width	Number	Yes if specifying region	(Empty)	WIDTH="647"	The total width of the image in pixels. This parameter is active only if the Specified region parameter is enabled.
Height	Number	Yes if specifying region	(Empty)	HEIGHT="647"	The total height of the image in pixels. This parameter is active only if the Specified region parameter is enabled.
Filter (optional)	Text	No	(Empty)	FILTER="text"	The filter to use on this operation.
Match case	Yes/No	No	No	MATCHCASE="yes"	If set to YES, the operation becomes case sensitive. Set to NO by default.
Create and populate dataset with word(s) information	Text	Yes	(Empty)	RESULTVARIABLE="theText"	The name of the dataset to create and populate with information about the retrieved word(s). See Datasets below for more details.

Advanced Parameters

Property	Type	Required	Default	Markup	Description
Page Range All	Yes/No	No	Yes	ALLPAGE="yes"	If set to YES, lines of text will be retrieved from all pages in a range (YES by default). If this parameter is set to YES, the Pages parameter is ignored.
Page Range Pages	Number	No	No	PAGE="1-3" PAGE="2,4,6" PAGE="3"	If set to YES, lines of text will be retrieved from specific pages in a range (NO by default). Supports specification of a single page, specific pages or a sequence of pages in a range (see the Markup column for examples). Note that only GIF images support multiple pages. If this parameter is set to YES, the All parameter is ignored.
Languages	Yes/No	No	English	SPANISH="YES" PORTUGUESE="YES"	The language(s) of the word(s) contained in the image file that should be read. Available languages are: English Spanish Portugese French Russian Dutch German Italian
Use ICR (digits only)	Yes/No	No	No	ICR="YES"	If set to YES, ICR (Intelligent Character Recognition), a more advanced handwriting recognition system, will be used to recognize numbers or digits. Set to NO by default.
Invert image colors	Yes/No	No	No	INVERT="YES"	If set to YES, image colors will be transformed from light to dark and dark to light. If this activity has trouble recognizing words, inverting may add more contrast to the text, thus, may assist in accurate reads.

Datasets

A dataset is a multiple column, multiple row container object. This activity creates and populates a dataset containing a specific set of fields. The table below describes these fields (assuming the dataset name assigned was theDataset).

Name	Type	Return Value
theDataset.PageIndex	Number	Returns the page index.
theDataset.LineIndex	Number	Returns the line index.
theDataset.Text	Text	Returns the retrieved text.
theDataset.Top	Number	Returns the top most pixel coordinate of the image.
theDataset.Left	Number	Returns the left most pixel coordinate of the image.
theDataset.Width	Number	Returns the width of the image in pixels.
theDataset.Height	Number	Returns the height of the image in pixels.

Example

The sample AML code below can be copied and pasted directly into the Steps panel of the Task Builder.

Description: Get line(s) from image "C:\Sample OCR\OCR1.TIF" and populate dataset "theLine" using OCR. Selected region is "{X=251,Y=278,Width=1911,Height=545}".

<AMOCR ACTIVITY="get_lines" IMAGE="C:\Sample OCR\OCR1.TIF" 
ALLPAGE="yes" TOP="251" LEFT="278" 
WIDTH="1911" HEIGHT="545" FILTER="text" 
RESULTDATASET="theLine" />

v10 | 202208121109

Copyright Help/Systems LLC and its group of companies.
All trademarks and registered trademarks are the property of their respective owners.