At Alphamoon, everyone has room for growth. And it’s not just an empty slogan. We recently explained how our teammate developed an internal tool for data annotation. In this article, we share the story of two of our ML engineers – Mateusz Wójcik and Witold Kościukiewicz – who put their Ph.D.s to practice in the area of Continual Learning and Information Extraction. Their expertise was also rewarded at the most recognized ML conference in Poland – ML in PL.
Each day in the tech industry can bring about a new cutting-edge solution, a breakthrough in how things are done, or simply a new competitor to watch out for. Consequently, technology companies constantly need to study changes or trends to improve their solutions.
At the center of each of these enhancements are people.
Engineers who test various methodologies in order to arrive at the one that pushes the tech forward.
At Alphamoon, the entire team works in a collaborative way, which guarantees that our platform provides outstanding quality in document automation. Among these engineers are Mateusz Wójcik and Witold Kościukiewicz, whose academic papers will have a significant impact on our engine.
But before we dig into that topic, let’s rewind the clock a few years.
Mateusz and Witold applied to Alphamoon for an industrial doctorate at the company. With their ongoing studies, they had to combine the challenges of professional careers and their ambitions to pursue academic recognition. Back then, Alphamoon was a software house, and there was no plan of transforming into a product-driven start-up.
Only a year ago, they both started an implementation doctorate funded by the ministry. Their topics referred to Continual Learning and Information Extraction, two critical areas in document automation.
It’s vital to understand the specifics of an implementation Ph.D., which combines science and business. As Adam Gonczarek, Alphamoon’s CTO who assists in the supervision of the project, explains:
“An industrial Ph.D. is such a win-win for both sides. The guys focus on research to help develop the technology and cover research gaps in the company. By doing so, they help us develop the product. Besides, they have an excellent opportunity to deepen their knowledge in the field and learn about the practical side of the technology, which we enable them to do at Alphamoon.“
In this case, Wroclaw University of Science and Technology and Alphamoon. Both entities support the development of research papers so that the findings can be applied to actual use cases observed in the business world.
Our engineers focus on two different topics:
1. Continual Learning
2. Extraction of knowledge from unstructured text data
Let’s dive into them.
Continual Learning
In the paper Neural Architecture for Online Ensemble Continual Learning, Mateusz took the Class Incremental Continual Learning problem under the microscope.
This problem is challenging because the model’s architecture impacts the entire Continual Learning process. Even small architectural differences between similar models can cause completely different results during CL, and the bigger the model, the harder it is to train it continually. Additionally, the algorithms that analyze documents need to be trained multiple times.
Note: Read more about why Continual Learning is challenging.
Inspired by the work of researchers in ensemble-based continual learning and the Mixture of Experts’ approaches, Mateusz proposed improvements to the architecture given by the authors.
He introduced into the model a differentiable layer approximating the result of the KNN algorithm and proposed a novel method for aggregating the ensemble’s votes. He also compared various continual learning methods and, backed up by extensive study, different configurations of data sets. As a result of this research, Mateusz managed to arrive at significant observations.
- The model introduces a significant improvement in quality compared to the tested methods
- The phenomenon of forgetting was reduced in the model
- The model gives higher results, up to 18 percent points higher than the methods tested with a memory buffer
- These methods save part of the data and store it, so they should be the best, but they were beaten by our method
- The model was better than the baseline methods in all evaluated setups
“The research I’m doing will have two main applications. Not only will the university draw from it, but we will be able to develop viable solutions that we will implement into our product at Alphamoon.” ~ comments Mateusz Wójcik.
Extraction of knowledge from unstructured text data
Witold’s Ph.D. thesis focuses on Knowledge Extraction from Unstructured Text Data.
His research work so far concerned deep Machine Learning methods from the field of Natural Language Processing (NLP). The main goal was to develop a technique for extracting relations between named entities occurring in text documents detected by methods already implemented in our company.
One of the results is a method that enables the extraction of relations with significantly higher efficiency than methods that have existed in the literature so far. Witold extended the methodology to include a memory matrix in which writing and reading allow the creation of a feedback loop in the processing of the model. And this is significant for improving communication and knowledge transfer.
What does it mean for the NLP algorithm that fuels Alphamoon’s engine?
Witold’s methodology improves the relationship between four various tasks: mention detection, coreference resolution, entity classification, and relation classification. The relationship between them is essential for such a hard, multi-task problem because it allows tasks to share information and make the exchange multidirectional. Finally, the model’s overall performance is, therefore, improved, as proven by higher accuracy on each of the subtasks.
However, that’s not all.
The rest of the research involves working on the proprietary method that generates a graph as a neighborhood matrix based on the model’s direct similarity of the representations generated at each stage.
In addition, Witold also analyzed our company’s customers in the context of the relation extraction task. He researched what they need, what data they have, and how their needs align with what we are working on and the leading trends in the literature on relationship extraction. This work also includes preparing datasets based on actual customer data.
Alphamoon at ML in PL
Mateusz and Witold recently had the opportunity to present the results of their work at the ML in PL conference, the largest conference in Poland covering Machine Learning.
In the poster session, 34 people presented their insights and papers, in the form of posters that summed up the research. During the event, each contestant could explain their work and participate in the contest for the best poster.
Ultimately, Witold and Mateusz swept the competition off the floor, winning first and second place in the best poster award!
What’s next
It’s just the beginning of the start of the guys’ research. They will continue to work on their Ph.D. research which they will eventually put together in a dissertation.
“Research requires an incremental and iterative approach. It is constant work, but every small advance brings us closer to using that in a product. Every job develops technology that the competition doesn’t have. That puts us forward and guarantees that Alphamoon has the cutting-edge technology running in the veins of the platform.” ~ says Witold Kościukiewicz.
In addition, Mateusz is set to present his research results during NeurIPS, another leading Machine Learning conference.
Do you like what you read? Keep in touch with our news – subscribe to our newsletter or learn more about working at Alphamoon and check out our open positions.
You can also read more:
- How We Developed An Internal App For Data Annotation
- The Challenges of Continual Learning in Natural Language Processing
- Document Automation Reduces Energy Consumption By 80%