Nov 18, 2022 | Articles

The Challenges of Continual Learning in Natural Language Processing (NLP)

Continual Learning is a problem of training machine learning models incrementally. In this article, you’ll learn more about implementing this methodology in Alphamoon’s document automation platform.

Intelligent document processing revolves around the constant improvement of results yielded by the AI engine, which decreases the degree of necessary human intervention. The goal of the AI engine is to act as an intelligent assistant to the business user – one that works in the background and requires minimal supervision.

In numerous NLP models, reaching 100% of extraction accuracy would mean that AI is entirely error-free. Still, even then, the state-of-the-art engines are built with a human-in-the-loop framework in mind.

In other words, humans cannot be removed from the equation.

Note: Here’s an overview of how robots assist humans instead of taking jobs.

Even though reaching 100% accuracy is almost impossible with today’s technology, there are methods to improve the results constantly. A way to keep improving these outcomes of extraction is Continual Learning.

What is Continual Learning?

In general terms, Continual Learning is a problem of training machine learning models incrementally. Incremental, in this case, means that the model is trained multiple times with different data subsets.

Definition of Continual Learning

Datasets are not only vital to teaching the model, but they often constitute the biggest obstacle to achieving better results.


Because the more comprehensive the variety of datasets, the more robust the engine becomes. Here’s an example. Invoice automation works best when the model has a history of various templates and scans of different quality – by expanding the covered ground, the model learns to capture each field with constantly improving results.

Continual Learning assumes a scenario when the model will be trained multiple times. At the same time, the starting assumptions about the data domain and classes set may change.

Let’s dig deeper.

Continual Learning – the perfect scenario vs. reality

In a perfect ML scenario:

  • Models improve through complete, balanced, and high-quality datasets. In the case of document automation, it means that the set of documents includes scans without faults, blurred elements, etc.
  • Data is independent and identically distributed (the so-called i.i.d.), which means that documents do not differ within the dataset (e.g., if we classify documents, they have similar templates in the train-and-test dataset)
  • The data domain does not change over time (e.g., if we classify documents, no new document templates occur after model deployment)
  • All classes are known during model training (only in classification problems, e.g., during training, we classify documents as one of [order, claim], and after deployment, it does not change)

Have you ever encountered any “perfect scenario” in real life?

Probably not. In reality, documents tend to be messy, arrive at your company’s doorstep in various forms, and differ even within the same subset. It’s the reality of business – expecting otherwise is pointless.

Here’s how the actual world works:

  • Most datasets are small, unbalanced, and low-quality
  • Data is not i.i.d. (some convoluted dependencies and data collection biases occur, which affects model performance)
  • Data domain changes over time (e.g., new document templates appear in the dataset after model deployment, which distorts the results of the extraction and confuses the algorithms)
  • Not all classes appear during model training (e.g., during training, we classify documents as one of [order, claim], but after some time, we want to organize documents as one of [order, claim, cheque])
A diagram explaining the characteristics of continual learning in a real-life scenario and a perfect scenario

The three Continual Learning Scenarios

To better understand the above problems, think from the business perspective. Halfway through your paperwork, you may encounter a repeating pattern you’d like to segregate into a different subset. Other times, you may use a new piece of data that exists in a specific type of document. And so on.

Since document automation models constantly expand better to illustrate the issues and challenges of real-life cases, there are three common scenarios to consider for Continual Learning.

  • Domain Incremental (data domain changes, domain shift)
  • Class Incremental (the set of classes grows over time)
  • Task Incremental (new tasks occur over time)
Class Incremental Continual Learning

Why is Continual Learning challenging, then?

On top of the three scenarios, one also needs to understand the factors that determine the effectiveness of each model.

  • Since various changes to the dataset may emerge, the algorithms that analyze documents need to be trained multiple times
  • Stability-plasticity dilemma. We must keep previously acquired knowledge (stability) and learn new knowledge over time (plasticity)
  • Neural networks are overly plastic. They overfit what they see now and forget past knowledge. (Note: Read about neural networks here)
How Neural Network works
Source: Wikipedia

  • The architecture of the NLP model itself has an impact on the Continual Learning process. Even minor architectural differences between similar models can lead to completely different results
  • Aside from form changes in the architecture, the model’s size matters too. The bigger it is, the harder it will be to train it continually
  • Models in CL setup are very prone to hyperparameters
  • Limited possibilities for tuning hyperparameters in production environments
  • Data distribution often changes over time
  • Even a slight change in data distribution may impact the model’s performance
  • Model performance degradation can be tricky to detect in a production environment
  • GDPR compliance often prevents data storage if the data is not necessary from the business perspective. It is, however, a widespread practice to keep a portion of the data to replay as a reminder of past experiences to model

How do we overcome the CL challenges at Alphamoon?

Alphamoon’s engine utilizes technology based on deep neural networks for completing a range of tasks. Continual Learning methods focus on large and complex deep learning models and follow the divide-and-conquer principle.

In other words, the algorithm that constitutes the foundation of Alphamoon’s platform solves smaller automation tasks and therefore arrives at significantly better results. We decompose complex problems into smaller and more manageable subproblems. This approach to neural networks has a few benefits:

  • By using Continual Learning methods, Alphamoon’s platform can handle scenarios where domains change after deployment
  • There is no need to retrain deployed model from scratch multiple times during its lifecycle
  • CL methods protect models from forgetting and allow them to acquire new knowledge at the same time – tackling the stability-plasticity dilemma
  • Since a streaming fashion of data provision characterizes production environments, CL methods allow models to be trained effectively with the constant addition of new data
  • By using CL methods, we can deploy models that were pre-trained on small datasets and continually improve them as new data arrive
  • Using Class Incremental Continual Learning techniques allows training classification models with an increasing number of classes (Note: read about the research on Class Incremental Continual Learning conducted by our engineer Mateusz Wójcik)
  • Task Incremental Continual Learning techniques help the model in solving multiple tasks with high accuracy
  • Current SOTA (State Of The Art) methods allow training neural networks incrementally, almost as effectively as in the classic scenario

If the above sounds too techy, let’s look at the business dimension of Continual Learning principles in document automation.

  • Thanks to the application of Continual Learning, business users have control over the data that will be used to train the model
  • By solving the subproblems, the algorithm becomes more accurate
  • By keeping the knowledge from past templates, the model can adjust to any changes, even if you need to process documents from a few years back
  • With time passing by, the model becomes optimized in your business case
How Continual Learning in document automation benefits business

Are you curious about other solutions to document automation challenges?

Carry on reading:

The article is based on input provided by Alphamoon’s ML Engineer, Mateusz Wójcik.

Related Articles