Pick Region dialog (Tesseract OCR Engine)
Overview
The OCR action provides the option to select a specific region within an image for more precise text retrieval using the Pick Region dialog.
To access the Pick Region Box for the Tesseract OCR Engine
- From the Task Builder, expand the OCR action, and then select the desired OCR activity.
- Click the OCR Engine box, and then select Tesseract.
-
Select the Specified region (improves accuracy) parameter, and then click Pick Region.
To pick a region using the Tesseract OCR Engine
- Click the Languages box, and then select the languages that correspond with the text you want to retrieve from the image.
- For the Get text activity, select Exact copy (do not format text) if you want to retrieve the text as it is formatted in the image (optional).
- In the Image Preview pane, hold down the left mouse button, and then drag the mouse pointer to draw a box around the region you want to retrieve text from.
- The OCR Preview pane updates and displays a preview of the text retrieved from the specified region. If the desired text was not retrieved, repeat step 3, or click the Page Segmentation Mode box to select a different mode to attempt a more accurate scan, based on the position of the words to retrieve. The available modes are:
- Auto Osd - Provides automatic page segmentation with orientation and script detection (OSD).
- Single Column - Assumes the lines of text in the image are in a single column of text, varying in size.
- Single Block Vertical Text - Assumes the lines of text in the image are in a single uniform block of vertically aligned text.
- Single Block (default) - Assumes the lines of text in the image are in a single uniform block of text.
- Single Line - Assumes the entire image only contains a single line of text.
- Single Word - Assumes the image only contains a single word.
- Circle Word - Assumes the image only contains a single word contained within a circle.
- Single Char - Assumes the image only contains a single character.
- Sparse Text - Scans the entire image to retrieve as much text as possible in no particular order.
- Sparse Text Osd - Scans the entire image to retrieve as much text as possible using orientation and script detection (OSD).
- Raw Line - Assumes the image only contains a single line of text, bypassing hacks that are Tesseract-specific.
- To edit the region without using your mouse, manually adjust the Top, Left, Width, and Height pixel coordinates, and then click Refresh
to update the Image Preview pane.
- If you select or clear selections for the Languages or Exact copy settings, or change the page in a multipage file after picking a region, click Refresh
to update the Image Preview pane.
- When you are finished, click OK to set the region.