Heartex is able to raise $22.5 million to fund its AI-oriented open-source data labelling system.
Heartex, an organisation which describes itself as an “open source” platform for data labeling, has announced that it has been awarded 25 million dollars as part of its Series A funding round, which was managed by Redpoint Ventures. Unusual Ventures, Bow Capital, and Swift Ventures were also involved in the round, which brought the total amount of money raised to $30 million.
The co-founder and chief executive officer of Heartex, Michael Malyuk, said that the funds are intended to enhance the quality of its Heartex product and to increase the Heartex staff, which will expand from 28 to 68 employees by the end of the year.
“Coming from engineering and machine learning backgrounds, Heartex’s founding team knew what value machine learning and AI could bring to the organization,” Malyuk shared with TechCrunch via email. “At the time, we worked in a variety of organisations and industries, but they all had the same issues with model accuracy due to poor quality training information.” We determined that the best option was to establish internal teams with the expertise in their domain to be responsible for keeping and notating learning data. “Who are the most reliable sources of information other than your own expert team?”
Engineers from the software industry, Michael Malyuk, Maxim Tkachenko, and Nikolay Lyubimov, founded Heartex in the year 2019. Lyubimov had been an engineer of seniority at Huawei Prior to his move to Yandex, he worked as a backend programmer for the speech technology and dialogue systems.
The connections to Yandex, which has been dubbed “the Google of Russia,” might make some concerned, especially in the light of allegations made by European Union members that Yandex’s news division played a significant role in the dissemination of Kremlin propaganda. Heartex is based in San Francisco, California; however, a large portion of its engineers are located in Georgia, which is the former Soviet Republic of Georgia.
When asked about it, Heartex states that it doesn’t keep any data regarding customers and is open-sourcing the foundation of its labelling system to permit inspection. “We’ve built a data architecture that keeps data private on the customer’s storage, separating the data plane and control plane,” Malyuk explained. “Regarding the team and their location, the team is a global team that has no members located in Russia.”
In addition to its geopolitical connections, Heartex aims to tackle the issue Malyuk thinks is an essential challenge for business in the quest to maximise the value of information using AI. There’s an increasing number of companies who want to become “data-centric.” Gartner recently announced that the usage of AI has grown by 270% over the last couple of years. Yet, many companies have a difficult time using AI fully.
If you believe that, as Malyuk claims, the idea of data labelling is getting more attention from companies that are looking into AI This is due to the fact that data labels are a crucial component in the AI creation process. Most AI programmes “learn” to understand images as well as audio files, in addition to text and other instances which are labelled by human-like annotations. The labels enable the system to improve the relationship between the instances (e.g., for instance, the relationship between the description “kitchen sink” and an image of a sink in the kitchen) by adding data it didn’t have previously (e.g., images of kitchen sinks weren’t part of the data that was utilised in order to “teach” the model).
The issue arises because there are different labels that aren’t the same. Labeling legal documents, medical images, and even scientific literature requires certain skills, which not every annotationist possesses. As humans, annotationists are susceptible to making errors. For instance, in the MIT research study on one of the popular AI data sets, researchers discovered inaccurate data such as the dog’s breed being mistakenly interpreted to be a distinct breed or the case associated with Ariana Grande’s high notes being classified as whistles.
Malyuk hasn’t claimed that HeartEx can solve all the problems. In the interview, Malyuk declared that HeartEx was developed to help label workflows that could be used to develop various AI applications. HeartEx also includes tools for managing quality data as well as analytics and reporting. For instance, data engineers who use Heartex can view the email addresses and names of reviewers and annotators. They are tied to labels that they’ve created or reviewed. This allows them to evaluate the validity of labels as well as, in the best scenario, correct any issues before they impact the data used to train.
TThe view from the C-Suite is simple. “It’s all about improving production AI model accuracy in order to fulfil the main business goal of the project,” Malyuk said. “We’re witnessing an increase of C-suite executives who are accountable for AI (or machine-learning) tasks related to research and data. They’ve proven the advantages of AI when they invest strategically in processes, data technology, and data as well as employees. AI can deliver extraordinary benefits to businesses in all kinds of situations. We’ve also seen that success can bring greater sorrow. Teams that achieve success quickly can create models with a lot of value faster, rather than just relying on their first experiences and the data they’ve gathered from models already used.
In the field of tools for data labelling, Heartex is in competition with startups like AIMMO, Labelbox, Scale AI, and Snorkel AI, as well as Google and Amazon (which provide tools for data labelling through Google Cloud or SageMaker, respectively). But Malyuk believes that the company’s emphasis on software, not services, sets it apart from the competition. In contrast to many rivals, Heartex does not offer labelling services through its software platform.
“As we’ve created an all-inclusive platform, our clients come from a variety of different sectors. ” “We also have companies that may not be as large in terms of customers as a lot of Fortune 100 companies.” “Our platform has been used by more than 100,000 data scientists around the world,” Malyuk said, but Malyuk did not disclose the revenues. “[Our customers] are setting up internal teams to analyse data and purchasing our product as their production AI models aren’t working as well, and they recognise that bad quality training data is the most significant reason.”