What is
Optical Character Recognition?
Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents or PDF files into editable and searchable data.
OCR is essentially technology that has been trained to take image based files, recognise the information within and then covert this into digital data.
How does it work?
When a document, say a PDF is loaded into the software it will analyse and divide it into different elements such as text blocks, tables, images. From here it will then divide the lines of text into words and then into characters. The technology applies pattern recognition to apply logic as to what the extracted text is and will present you with the recognised text.
Does this all happen automatically?
OCR technology gets smarter and more accurate the more data is fed into it. You essentially ‘train’ the software to recognise documents and test its understanding before launch. The more data you can give it, the higher the accuracy. The software will even give you a percent of how confident it is in its decision making.
STP explained
STP stands for Straight Through Processing
STP Rate = The number of documents that are processed end to end without human intervention.
Most OCR software focuses only on giving accurate results but by combining this technology with RPA it enables documents to be digitised and for that information to be used automatically. RPA could take the digital data and populate an ERP system, or run a match between Invoices and Purchase Orders as an example.
Frequently Asked Questions
What sorts of documents could you use OCR/RPA on?
There are a wide range of different documents you could use the combination of OCR and RPA to process. Think about documents such as Purchase Orders, Billing Statements, Contracts Claims, Automobile Insurance Claims, Health Insurance Claims and Invoices.
What principles is OCR based on?
The most advanced optical character recognition systems are focused on replicating natural or “animal like” recognition. In the heart of these systems lie three fundamental principles: Integrity, Purposefulness and Adaptability. The principle of integrity says that the observed object must always be considered as a “whole” consisting of many interrelated parts. The principle of purposefulness supposes that any interpretation of data must always serve some purpose. Lastly, the principle of adaptability means that the program must be capable of self-learning.
What type of files can be read by OCR software?
JPG OR JPEG PDF (Vector PDF, Raster PDF or Hybrid PDF) PNGTIF OR TIFF.
Can OCR recognise digital camera images?
Images captured by a digital camera differ from scanned documents or image-only PDFs. They often have defects such as distortion at the edges and dimmed light, making it more difficult for most OCR applications, to correctly recognise the text. The latest software updates have seen vendors incorporating technology specifically designed for processing camera images.
Can OCR be used to extract data in table format?
Yes, OCR can be used to extract virtually unlimited tables,even where you may have multiple different table types in the same documents, these can be extracted easily.
What if my scanned document is not correctly oriented?
Using its processing logic, the Learning Instance automatically rotates or orients the document to a correct vertical position.
Does OCR support handwritten documents?
It’s possible, but as a general rule of thumb we would advise that handwritten documents are not suitable to be extracted by OCR and the same could even be said for cursive fonts.
How can I improve my OCR accuracy?
Effective pre-processing can greatly improve OCR results. Additionally, computer vision technology can use “context clues” to further increase accuracy.
Can OCR detect lines, shapes, colours, etc on page?
No, even advanced OCR engines only retrieve text.To detect other features, you need a more comprehensive data capture software.