🧪 Beta tests of Alphamoon's automation platform are open. Sign up and process invoices for three months for free.
06 Dec 2022

OCR Software For Invoices: All You Need To Know About Automated Invoice Workflows

Invoice OCR software combines object detection and text recognition techniques to transform images into processable text. Invoice recognition software delivers the best results when paired with data extraction, which focuses on pulling selected information (fields) obtained from the document automatically. Find out more about automating invoice workflows, and take a look at our comparison of OCR software products to help you determine the best choice for your business.

Read this article to learn:

  • What is an invoice workflow
  • What are automated invoices
  • What fields are commonly extracted from invoices
  • How to determine the best OCR software for your business

Prototypes of invoices date back to ancient Mesopotamia, around 2900 BCE. Back then, simple notes of transactions – often describing the terms of a barter – were carved on stone or clay slabs. The idea was simple – keeping a transaction record between two sides.

Today, sellers issue invoices to begin the procure-to-pay process. An invoice lists all goods or services, contains pricing information, and determines the companies participating in the transaction.

Needless to say, invoices are critical documents from the business perspective. One can find insights into the current cash flow of a company, see all pending or escrow payments (the latter one occurring as a result of a pro forma invoice), validate the trustfulness of vendors (whether they pay on time), or cross-check the sold goods with the inventory.

All together, the various tasks that refer to the processing of invoices establish the so-called invoice management – a set of frameworks that determine the whole range of responsibilities in this field of Accounts Payable.

While invoice management encapsulates the whole framework, including the division of tasks across team members, there is a specific chain of events that needs to happen to end the procure-to-pay cycle. That chain of events is called an invoice workflow.

What is an invoice workflow?

Invoice workflow, also referred to as invoice approval workflow, consists of all tasks between receiving an invoice and finalizing the payment. Handling invoices is the responsibility of AP teams, and manual invoice processing – transferring data to ERP systems – is often the go-to method.

On the other hand, document automation is responsible for speeding up invoice workflows. AI assists users in understanding documents, classifying them, and extracting all the valuable information stored inside them.

The typical steps in an invoice workflow include the following:

  • Invoice matching
  • Invoice validation
  • Invoice approval or request for approval
  • Payment

In case of any discrepancies, more steps may occur. Alternatively, more steps can also be the result of industry-specific requirements.

An Invoice Workflow consists of 6 stages. The document is received, matched, validated, approved, paid and its data entered into and ERP or another internal system

Invoice matching

Each payment of an invoice needs to be appropriately verified.

Invoice matching compares the information with other documents, such as purchase orders. This is an essential step in an invoice workflow and is common in selling goods between suppliers and small and medium businesses.

Purchase orders are issued by the buyers and contain a detailed overview of all the goods that the buyer wishes to purchase. When the goods are sold, the seller issues an invoice. Before the payment is confirmed, the buyer needs to perform matching.

Consequently, the amounts between a purchase order and an invoice are correct, meaning that each party pays and receives the agreed-upon amount.

Invoice validation

Invoice validation refers to the process of carefully checking each invoice for potential errors. While matching refers to comparing documents between themselves, validation guarantees that all required details – e.g., tax information, correct bank account number, SWIFT or IBAN code for international wires, addresses, and official business names – are in place. Validation is a security measure that prevents accounting errors. Here’s what can happen when such a check is not established:

  • You may overpay your suppliers as a result of fallible information concerning the number of goods or services sold
  • You may delay payments if there is not enough data about either side of the trade
  • Last but not least, wrong calculation of taxes creates the danger of penalties

Invoice approval

A self-explanatory part of the workflow is the moment when the AP team receives a confirmation that a given payment can be released. That’s the invoice approval.

Invoice data entry (also known as invoice-to-data)

Data entry starts with extracting information from documents and importing it into any given system. Thanks to intelligent document processing techniques, data entry is a highly automatable task.

While OCR software converts images – such as scanned invoices – into text, data extraction tools help Accounts Payable specialists build databases and source info for cash flow estimation purposes.

What are automated invoices?

Now that we have reviewed the main tasks in an invoice workflow, we can pinpoint the main areas where document automation helps.

Accounting automation is one of the most thriving fields in document automation. Since many documents circulating between companies are part of sales transactions, there is a pressing need to automate the most tedious and repetitive tasks in the workflow.

When invoices are automatic, various types of software assist in processing them. That software may include OCR for invoices, document management platforms, General Ledger software, and more. Within the tasks we have described, invoice matching, validation, and data entry can all be supported by AI.

Note: Dig into the topic of automated invoices in Alphamoon’s guide.

OCR software helps with invoice recognition and invoice capture. The goal is transforming images – such as scanned invoices – into text. This is because abstract entities from images cannot be processed further. Invoice OCR determines all the various fields and is the necessary step before the AP automation software can move to extract information.

And here’s the best part – Alphamoon’s OCR and data extraction platform enables teams to save up to 70% of time usually lost during manual invoice processing.

Carry on to learn about other benefits.

Automated invoice processing workflow

Note: Curious about all business outcomes of the automated invoicing process? Read our take on the subject.

Benefits of automated invoices

As we have expressed earlier in this article, automation of invoices speeds up the work and takes the heavy weight of tedious tasks off the shoulders. In particular:

  • All document-related workflows are faster and more secure
  • Paper use decreases, which helps companies save on energy and office supplies
  • Solutions based on Artificial Intelligence and Machine Learning become better over time
  • Employees have time and space to focus on more challenging tasks

What fields are commonly extracted from invoices?

One reason why invoices take much time is that companies use all kinds of templates. For older OCR technology, where the software operates on fixed positions of each field, the diversity of invoice templates causes serious issues.

That’s why the most advanced OCR for invoices – such as Alphamoon – deploys machine learning and deep learning to enable the software to improve continually.

Alphamoon can extract the following information:

  • Seller/buyer name – official names of each party
  • Seller/buyer address – used for postage
  • Seller/buyer contact data – details including email address
  • Document issue date – information on the time when the seller generated the invoice
  • Document due date – deadline for the payment
  • Document number – used for tracking purposes
  • Items on the document – goods or services listed
  • Quantities of goods
  • Prices of goods
  • VAT ID – tax information
  • Summaries – total values
  • Signatures – handwritten pieces
  • Logo – elements of branding

Note: Looking for a solution to extract all of the above data from your invoices? Get in touch with us.

Alphamoon's data extraction feature, presented on a laptop screen

What is the best OCR software for invoices?

In order to provide the highest possible objectivity in testing various data extraction software for invoices, we used a publicly available dataset – The RVL-CDIP Dataset – and compared the following solutions:

  • ABBYY FlexiCapture for Invoices Cloud
  • Google Document AI Invoice Parser
  • Microsoft Azure Form Recognizer invoice model
  • Kofax AP Essentials for Invoice Automation

The RVL-CDIP dataset consists of various documents – letters, memos, emails, governmental forms, handwritten notes, advertisements, etc. – including a sample set of 188 invoices. The dataset is characterized by low quality, noise, and low resolution, typically 100 dpi. All of these issues seriously decrease the effectiveness of OCR software.

Since the documents were not prepared for evaluating information extraction tasks, the Alphamoon team annotated them manually to prepare the so-called ground truth. We have chosen seven fields for the data extraction test:

  • invoice number
  • invoice date
  • total amount
  • seller name
  • seller address
  • buyer name
  • buyer address

A bunch of invoices used for this comparison test are shown below.

Examples of invoices from an outdated set of sample invoice templates
A sample set of invoices was used to compare data extraction providers. Source: own material.

After uploading the set on the platforms, the comparison results clearly indicated that Alphamoon achieved the highest accuracy score among all tested platforms.

Our findings:

  • Alphamoon’s accuracy score has reached 82.5%, meaning that the vast majority of all the seven defined fields were correctly extracted. In other words, less than every fifth field would require correction.
  • Microsoft Azure Form Recognizer invoice model achieved a 75.4% of accuracy. In the case of Microsoft’s tool, every fourth field would require human correction.
  • The two tools were followed by Google Document AI Invoice Parser’s 68.1%, ABBYY FlexiCapture for Invoices Cloud’s 51,9%, and Kofax AP Essentials for Invoice Automation’s 15.8%.
Accuracy of data extraction from invoices. Graph compares performance of five brands: Kofax, Abbyy, Google, Microsoft, and Alphamoon
Accuracy score of Alphamoon, Microsoft, Google, ABBYY, and Kofax in a comparison of data extraction tools. Source: own material

F-score metric explained

As the leading metric for the test, we used an accuracy metric, the F-score:



TP = True Positives

FP = False Positives

FN = False Negatives

F-score captures the approximation of each tool’s percentage of time savings.

To be even more specific, when the data extraction tool makes a mistake, it recognizes some other piece of information as the one that was intended. In such a case, the tool makes two mistakes – one false positive (extracted data is incorrect) and one false negative (the software did not capture the required information). Nonetheless, that still requires only one correction to amend the error. F-score captures that nuance by having the 0.5 factor in the denominator of our formula.

Below you can see a comparison of just one particular invoice and how all the tools managed to extract specific data pieces from it.

The green frames indicate the correctly extracted data – True Positives; red frames – False Positives; and purple ones – False Negatives.

Comparison of field extraction by various data extraction tools
Percentage of data extraction correctness between each data extraction software. Source: own material

Straight-Through Processing metric

The second metric we’ve used is Straight-Through Processing – the percentage of documents processed without human intervention.

Alphamoon’s data extraction tool has successfully processed a complete set of data fields from 36.7% of the whole collection of invoices. In other words, 36.7% of invoices did not require any amendment or correction.

Microsoft and Google tools scored 8.5%; meanwhile, ABBYY and Kofax did not process any invoice from the dataset 100% accurately.

Chart showing the Straight-Through Processing score achieved by Alphamoon, Microsoft, Google and ABBYY in data extraction comparison
Straight-Through Processing score from a set of invoices in a comparison between Alphamoon, Microsoft, Google, ABBYY, and Kofax. Source: own material

Challenges in invoice processing

Here are the most common issues that we identified in the test:

  • Finding the correct total amount to pay. Documents for payment contain many places where a certain amount of money appears. A common mistake platforms make is marking the net amount instead of the gross amount.
  • Sellers place logos and other graphics that often contain text. This confuses the model because that text might be mistaken for the “seller name” field. Another difficulty is that the logo is a design artistry with non-standard fonts, lettering, and other shapes. There are better setups for rule-based tools.
  • Invoices have their difficulty in the fact that they consist of many fields. Since there is no universal template for invoices, OCR platforms not supported by AI struggle with the more complex invoice designs.

Subscribe to Automated by Alphamoon for exclusive tips on document automation.

Tell us about yourself.(Required)

By subscribing you agree to receive news and marketing content from Alphamoon. You can unsubscribe at any time by clicking the link in the footer of our emails. Read our privacy policy to know how we process your data.
This field is for validation purposes and should be left unchanged.

Testing Alphamoon on a set of modern invoices

Since the dataset was challenging, we tested Alphamoon against a more up-to-date dataset. We did not run the test on any other vendor’s platform.

Below are examples of the invoices from that second dataset.

A sample set of modern invoices was used to test Alphamoon’s accuracy. Source: own material

The internal dataset of modern invoices has yielded astonishing results. Alphamoon Invoices has achieved an accuracy of 98.4% and a Straight-Through Processing score of 81.7%.

Over 98% of accuracy means that information extraction is no longer your worry. Your team focuses on data modeling and knowledge processing rather than manual tasks.

Furthermore, the tool retrains and learns as you go because the more invoices are processed, the more accurate the tool is. Alphamoon generates a systematic overview of all extracted fields, providing a better understanding of all the information stored in your documents.

Alphamoon accuracy and STP score chart in invoices processing
Alphamoon’s accuracy and Straight-Through Processing score in an additional test of data extraction from invoices. Source: own material

To better portray the strengths of our platform, we have extrapolated data concerning particular fields extracted across all invoices.

The accuracy of all data field extractions ranged from 100% to 96.6%.

Alphamoon field extraction breakdown presented by fields most commonly extracted from invoices, including Buyer VAT number, buyer address, Invoice number, seller address, and more
The accuracy level of data extraction by Alphamoon from each field from a set of invoices. Source: own materials.

Summarizing the comparison of Invoice data extraction, Alphamoon topped each competitor in both metrics used in the evaluation – accuracy and Straight-Through Processing.

Alphamoon is your best OCR software for invoice processing, with the highest accuracy and easy-to-use cloud platform. Contact us today and let’s get you started with AP automation!

Complementary reading:

Less paperwork. More time for business.

Learn more