Inside Alphamoon: How We Developed An Internal App For Data Annotation

When your goal is to optimize business processes through automation for others, the last thing you’d like to see is time wasted on manual work in your own backyard. Data annotation was always a time-consuming task. So, one of our engineers Michał wrote an app to improve the process. Here’s the story of its development – from a concept to the usable MVP presented to the teammates.

Data annotation (also known as data labeling) is a core part of the AI learning process. Labeling data in various formats enables AI algorithms to understand the context of the source they process – text, image, and so on.

This technique is widely used in various AI-driven tools, e.g. Grammarly utilizes sentiment annotation that teaches AI the emotions related to particular words and phrases. As a result, Grammarly’s tool suggests changes in written text but also manages to understand context and phrasing that’s better fitted for the goal of the text.

To get where leading AI/ML tools are, teams need to spend most of their time working on data and its quality. Data annotation occupies the pole position as the most time-consuming task in project development.

We’ve had our own share of data annotation challenges too.

Data annotation – the fuel for the IDP engine

Alphamoon’s platform for document processing deploys AI to understand documents like humans.

Wondering what happens backstage?

Users, who upload documents onto their accounts, enable the learning process of the tool. After processing dozens of documents, Alphamoon’s engine understands the layout better, as well as context and relations between entities in sets of files.

Now, to make that possible though, the engine requires an input of annotated documents.

Consider an example. Upon “seeing” an image without context, AI cannot understand that an apple is, in fact, an apple. Someone has to explain what an apple is – by showing an image of an apple.

The more images of an apple the AI analyzes, the higher the chance it will understand all possible features of an apple.

And when someone tries to trick the algorithm into thinking that an image of a llama portrays an apple, a well-trained AI will recognize the foul play.

Now, let’s apply it to our case of document processing.

The bigger the set of properly annotated documents – let’s say invoices – the higher the tool’s accuracy. And the engine needs to have that source of knowledge.

Now, that’s where Michał Hetmańczuk’s internal app came in handy for the engineering teams at Alphamoon.

We developed an app to smoothen internal processes

Before automation kicks in, Michał’s task was to manually classify documents by type – invoice, policy, or insurance document. The challenge’s that there are often several joint documents in a single file, and they also differ between themselves. Therefore, each file has to be reviewed separately.

The initial process that Michał wanted to optimize was the following:

  • open a file,
  • scroll through it and determined its type,
  • then describe this in a separate, specific file.
Data annotation file

When I joined Alphamoon and got my first task, which was data annotation, I manually repeated the process over and over. This was not the most efficient use of my time, so I thought – why not change it?” – explains Michał. “At first, I prepared a template with file names, and then I marked which pages were what. After some time, however, I decided that it was necessary to find a better way out.

Michał saw the opportunity to innovate.

In a matter of four days, he created a Data Annotation Automation app, using Streamlit library; Front-end was developed in Python.

The app solved the issue of multiple documents in a single file. It allows our team to seamlessly go through the file – page by page – and mark where the next document starts and ends. This is much quicker than marking each page separately.

In addition, Michał included a mechanism for marking the document type (since the set of formats was limited due to the use cases Alphamoon currently automates).

Michał and Karolina at work

App users can view all document files in one place, without the need to click between two separate file locations.

Needless to say, we saw a huge benefit of time-saving almost instantly.

Michał has improved the way of annotating data for the whole team. This will help us avoid many errors that would have occurred due to manual work. Faster data annotation also improves the processing speed, and this translates into collaboration with our clients.” – explains Karolina Cwojdzińska, Head of Customer Support at Alphamoon.

On top of the business value, this initiative also gives an insight into the culture of Alphamoon. Instead of wasting time on a tedious process, we’ve created a tool that makes things easier.

Michał’s initiative turned out to be an awesome example of how we collaborate as a team, exchanging observations and challenging ourselves as creators and engineers. That’s what lies at the core of Alphamoon – a team that works together and has fun doing so.” – adds Karolina.

Quote of Head of Customer Support

Juggling between everyday tasks and developing other internal improvements isn’t easy. However, we’re set for the next challenges in enhancing the way we work daily.

This is not the end of the story. I’ve already had the opportunity to demonstrate how my app works to employees on our team. Once they start using it, I’d like to collect regular feedback and implement improvements to make the app as beneficial as possible.” ~ concludes Michał Hetmańczuk.

Want to find out more about working at Alphamoon? Check out our open positions.

Related Articles