OCR for receipts is software that automates the conversion of images (scans of paper receipts) to processable data. OCR goes hand in hand with data extraction, which pulls out fields containing essential information specific to each document type. In this article, we explain both technologies for automated receipts in more detail and showcase a comparison of Alphamoon against the most popular providers – ABBYY, Google, and Microsoft.
Receipts contain valuable information about every transaction that a given business concludes. Interestingly, the definition from Investopedia indicates a key characteristic of a receipt – a written acknowledgment that something of value has been transferred from one party to another.
The key word is “written”.
We can distinguish several types of receipts.
- Paper receipts. Typically printed in stores, physical copies are small slips of paper that include all critical information about the seller, goods or services sold, and the payment method.
- Digital receipts. Automatically generated confirmations of transactions are delivered directly to a customer’s email inbox.
- Handwritten slips. Although hard to come by these days, handwritten slips and the so-called carbon copy slips are written by the seller on paper without any automation or digital print-out.
- Gift receipts. If the customer requests a gift receipt, the standard template changes slightly to remove the price of goods or services purchased within the transaction. As the name indicates, gift receipts are used in commerce when the buyer wants to hide the price.
- Shipping slips. Particularly useful in dropshipping and e-commerce, shipping slips (also known as packing slips or packing receipts) are notes included in a package. Shipping slips can be both digital and physical.
Before we deep dive into the technology that drives receipt automation, let’s explain the main differences in processing digital and paper slips.
Digital vs. traditional receipts
While we can distinguish a few types of receipts, paper receipts are still very commonly used in business transactions. In fact, a recent study GreenAmerica shed some light on the division between digital and paper slips. Younger consumers, who generally tend to be more apprehensive of the negative impact of business on the environment, favor digital copies over paper ones.
On the other hand, older generations may fear the technological barrier – the most universal is the fear that digital slips can lead to data leaks.
The criticism of paper slips
While there is no fresh data concerning the division between paper and digital slips, here’s what we can deduct from various papers and studies.
Even if the trends indicate a shift to digital copies, businesses in the UK alone print over 11 billion receipts annually. This amount translates to over ten million trees cut and over one billion gallons of water used in the process.
The worst of it all? Over 85% of these slips are thrown away unused.
If you’re wondering whether the problem is, in fact, a big one, take a look at the length of receipts from CVS Pharmacy, an American company that became the receipt equivalent of Amazon’s packaging.
There are health hazards related to the everyday use of paper slips too. Over 93% of printed receipts are coated with Bisphenol-A (BPA) or Bisphenol-S (BPS) – chemical substances that may lead to neurological problems in workers who absorb these coatings through their skin.
As a result, there are numerous reasons to ditch paper copies that, at the same time, favor automatic receipts.
While not all businesses and consumers are ready to go fully digital, it’s not hard to see that a transformation is necessary. The good news is that there are ways to limit the processing time of receipts, what’s more, these translate to lower energy consumption.
What are automated receipts?
Automated receipts refer to physical copies of receipts that are digitized and processed automatically. While the most obvious method is to scan receipts, there are ways to automate the further processing of these transactional slips, and these methods rely on intelligent document automation workflows.
What is the receipts automation workflow?
Intelligent document automation combines all the most advanced techniques of AI, ML, and DL to perform a series of actions aimed at processing any document.
The ideal automation workflow depends on whether you’re sticking to traditional paper slips or already digitizing them. We’ll explain both scenarios.
Note: If you would like to learn more about applying document automation to your receipt workflow, then you don’t want to miss our upcoming webinar from the Alphamoon Document Processing Academy series. Use the sign-up below to learn more and save your spot.
Paper receipts automation
For paper receipts, it’s best to start with OCR software.
In general, OCR enables scanned documents to be converted to readable text. Hence, the receipt OCR component processes scanned receipts and through machine learning algorithms, it learns to increase accuracy over time.
In other words, OCR deals with receipt image recognition – the tool recognizes the type of document by analyzing its visual components.
OCR tools deploy Natural Language Processing methods to convert and analyze text.
Depending on which OCR software for receipts you choose, you may work with a range of document formats. However, the most common are JPG, JPEG, PNG, and PDFs. In the case of PDFs, you will need a PDF Splitting component to save even more time with automation.
Once the software recognizes all the receipt elements – rows, numbers, handwritten signatures, etc. – the next element in the sequence is data extraction. Pre-defined fields – such as seller address or the gross amount (value of the transaction) – are automatically extracted and saved. All these fields appear in a file you can download as CSV or XLSX.
The formats are not random – they are the most common formats used in ERP systems, which source this kind of digitized knowledge from documents.
Digital receipts automation
For digital receipts, the OCR part is much easier. While scans pose a challenge for image-to-text conversion, digital copies are never blurry or partially damaged. In this case, OCR for digital receipts performs the same actions – the workflow aims at delivering a readable text format.
Digital copies of receipts also make it easier to perform data extraction. They contain information that is often well-structured and vastly increases data extraction tools’ capabilities.
How does OCR for receipts work?
Let’s focus on the OCR component.
OCR, which means Optical Character Recognition, performs two actions – object detection and text recognition using Natural Language Processing methods.
Object detection means that the OCR software understands the structure of the document. Receipts usually contain a combination of words and numbers and special signs (dots, currency symbols). Sometimes receipts also include logos, slogans, and barcodes. Nonetheless, receipts may be relatively easy to process since they do not contain complex objects such as tables.
Natural Language Processing encapsulates all techniques in which algorithms process text to understand its context. OCR tools use various algorithms and NLP techniques to understand the text. The receipt OCR tool understands contextually that particular parts of printed text refer to, e.g., an address, while others to the goods or services transferred through the transaction.
The best OCR software for receipts performs these operations with as few errors as possible.
What fields are commonly extracted from receipts?
If you’re considering to introduce a document data extraction software, think from the perspective of the most valuable pieces of information from your business’ perspective. That ideally would be the information that speeds up processes such as payroll, KYC, or complaints processing.
The most common fields extracted from receipts are:
- Seller name
- Seller address
- Date of the transaction (Sell date)
- Total amount
- Cash and change (depending on the form of payment)
- Payment confirmation details (depending on the form of payment)
- Terminal no.
- Seller ID (could be the name of the clerk who processes the transaction)
What is the best OCR software for receipts?
In a comparison of four various OCR tools for receipt processing, Alphamoon’s IDP platform has topped the solutions from Google, ABBYY, and Microsoft – all leading cloud providers.
The comparison was conducted on 347 receipts; an open dataset called the SROIE dataset.
The SROIE Dataset for receipts consists of different quality receipts and ideally checks the accuracy of data extraction. On this dataset, there are only four fields labeled: seller name, seller address, sell date, and gross amount.
For this part of the test, we compared Alphamoon with the following tools:
- ABBYY Cloud OCR SDK for receipts
- Google Document AI Expense Parser
- Microsoft Azure Form Recognizer receipt model
Below is a set of 5 various receipts from the model.
The accuracy of Alphamoon in the receipts data extraction has once again proven to be the highest among the vendors.
Alphamoon has achieved 89.5% accuracy, followed by Microsoft’s 87.8%. The performance of Google reached 68.3%, while ABBYY got only half of the fields correctly.
Contrary to the dataset we’ve used in the invoices test; the receipts did not create a similar challenge. Although the receipts were outdated, too, the sheer design of receipts is less prone to change and visual variation.
Judging by the second metric, Straight-Through Processing, Alphamoon correctly extracted 100% of data from 62.2% of receipts. Next in line was Microsoft’s Azure Form Recognizer receipt model – 51.6% – then Google’s Document AI Expense Parser – 14.7%. ABBYY Cloud OCR SDK for receipts has once again taken the 4th position, with only 6.3%.
Let’s look at two examples of the same receipt and all four vendors’ data extraction accuracy scores.
In the image below, ABBYY software struggled with the wrong classification of a paper punch that the tool recognized as the Seller Name field.
The compilation below shows another difficulty faced by most tools – the various fields containing monetary values.
A common mistake that the data extraction software makes is marking the net amount instead of the gross amount – as is the case of Microsoft and Google in the screenshot below.
How to get started with Alphamooon’s OCR software: Online platform or API for receipts OCR
There are two ways to get started with Receipt OCR from Alphamoon.
Suppose the number of documents you process monthly is low (up to 1000 pages). In that case, we invite you to join the early access list for Alphamoon Workspace – a document automation subscription for small and medium businesses. The platform for processing documents is available at flexible pricing tiers and ways to bring you the benefits of complete document workflows.
Click here to describe your needs, and we will add you to the waiting list.
Companies that would prefer to implement Alphamoon’s OCR for receipts through an API should contact our sales team – click here to get in touch.