11 Aug 2022
Blog

6 Common Challenges Of Image-To-Text Extraction

Modern technology used for automatic document processing can tackle many types of damage that can happen to a hard copy of a document. That may be a few drops of mayonnaise from a morning sandwich, a signature written in the wrong part of the doc, or a smudged piece of printed text. In this short article, you’ll see six common problems of image-to-text extraction that Alphamoon’s AI OCR handles easily.

Ever spilled your coffee over a document that you were about to sign?

Or received a printed copy in such bad quality that you immediately had flashbacks from watching YouTube at 240p screen resolution?

Well, you’re not alone.

Even though companies lean towards process automation and digitization of documents, hard copies of documents are nonetheless far from becoming a thing of the past. In some cases, hard copies have far more importance than digitized copies. So is the situation in the legal sector where physical documents are the only valid evidence usable in court proceedings, whereas the goal of digitization is archiving.

The legal sector is not alone. Hard copies of documents are a common practice in finance, real estate, and medical sectors. A staggering 90% of professionals in the latter sector admit they’re still heavily reliant on paper and manual processes.

Since physical documentation isn’t going anywhere, it’s best to automate as much of its processing as possible. Now, we’ll cover the following issues with printed documents and shed some light on how Alphamoon’s engine performs when encountering these obstacles:

  • stains on documents
  • multilingual signs in text
  • blurred text
  • “text on text” problem
  • small characters
  • noisy background

Challenges Of Image-To-Text Extraction

The truth is we don’t always get documents in perfect form, scanned and ready for further processing.

Take a look at the invoice below.

invoice sample

This invoice includes several obstacles to traditional OCR tools that you may have used in the past (if you have worked with any kind of document automation).

Psst – if you’re unsure of the difference between AI-supported OCR and traditional OCR tools, read our comparison of data extraction tools.

check data extraction tools comparison

If you’re working with some outdated tools for document processing, then any kind of image-to-text extraction from damaged documents – such as this invoice – is a chore.

Stains on documents

Colored liquids are one of the most challenging enemies of OCR tools. Not only does it blur the printed text (since liquids dissolve the ink, causing it to smear over more fibers of paper), but forms an object that can be identified wrongly.

Take a look at the stain below.

stains on documents

Traditional OCR tools wouldn’t process this type of field, because the stain adds a visual variable to the template. By combining the joint forces of AI and ML, Alphamoon’s engine understands this change of background as an interference, thanks to the training based on similar cases.

Multilingual signs in text

It is not always the case that there are all elements in only one language on a single document. Invoices often include names and addresses that are foreign and therefore might include some specific signs.

If your data extraction tool doesn’t feature multiple languages, then any special sign would likely be marked as background noise.

multilingual signs in text

Blurred text

Have you ever shaken the screen of your phone and taken a blurry photo?

Of course, you have!

It’s all good in the times when removing a photo happens instantly. That pain was real before the digital camera was invented.

Blurry text is another common problem when extracting text from images. Smudges on the printed text can be a serious difficulty with older technology. Luckily, Alphamoon’s IDP engine manages to process most blurred texts in documents.

blurred text

P.S. You can always verify the extracted information and alter it. That way, the engine recalibrates to better manage similar problems in the future.

“Text on text” issue

Texts blend, especially when any kind of handwriting is added to the mix.

In this example below, you’ll see a “checked” symbol on the name of the street KENNEDY. Since OCR engines scan text and first recognize the shapes included in a given document, that one change usually causes a failed attempt at extraction. While working on OCR for handwriting, we’ve also trained our AI to handle this kind of problem.

text-on-text

Let’s consider a different type of document to analyze two more issues.

Passports and IDs are the real image-to-text extraction nightmare.

Take a look at the passport below.

passport

All these beautiful colors, shapes, lines, and intricacies can cause a serious artificial headache for most engines. Correct information extracted from an ID can be crucial for account verification in a bank or any other financial institution, right? If the system is fully automated, a person could be rejected due to a faulty reading of the document.

Small characters

Documents contain different fields and information, so the sizes of the given fields or texts are also different. A given text may be too small to read. That’s the case of the headings printed on a Polish template of an ID.

Alphamoon’s engine has been trained on hundreds of templates that included small characters.

Small characters

Noisy background

To complete the picture, here’s the final nail to the coffin – the problem of a noisy background.

noisy background

The noisy background’s one of the hardest challenges of text-to-image extraction. Text is printed in various colors, and it’s easy for the AI to lose parts of the text. Any kind of gradients and shapes printed in the background of the text may cause the OCR tool to lose parts of the text.

A solution to all six? Alphamoon AI OCR feature

As long as there was only legacy OCR, documents such as those shown above were a big problem to process. Even if the layout of the invoice we have used in this article would be the same for years, all of the issues we’ve listed would prevent correct information extraction.

Therefore, the new wave of AI technology – Intelligent Document Processing – significantly advances the capabilities of traditional OCRs. So is the case of our tool – Alphamoon – a state-of-the-art and reliable platform for document automation that features an AI-based OCR component. This component is used for many types of use cases, including debt collection processes and invoice automation too.

Thanks to the use of Machine Learning, Alphamoon’s OCR is trained: it processes thousands of documents and learns from the data, gaining experience.

In business terms, it means that the IDP platform provides positive ROI in the long term perspective. Every batch of documents you upload and process improve the future results.

That’s the baseline for every tool that deploys Optical Character Recognition, or automated image generation.

Think of it this way – an AI does not know what a cat is, at least not until it’s been shown an image and told that it’s a cat. Cats differ in terms of sizes, colors, and other features. AI won’t be able to tell automatically that a Persian cat is the same species of animal as a Ragdoll. Unless – you guessed it – someone taught it.

cats & AI

The bigger the number of images of cats added to the training batch, the higher the chance that the AI model will be able to tell whether there’s a cat in a given image or generate a random cat picture.

The same principle of data augmentation is used to increase the capability of Alphamoon’s engine. When teaching AI models, random perturbations are added to make the technology more robust. This works for challenges such as blurred text or fragments, object rotations, or noise – all the challenges we’ve explained in this article.

That’s a wrap – get in touch with us and worry no more about spilled coffee on your documents.

morning coffee

Less paperwork. More time for business.

Learn more