Written by: Gavril Bilev, PhD, Senior Data Scientist and Mark Mace, PhD, Data Scientist
The strength of AI comes from its ability to deliver highly-accurate results faster (and less expensively) than a human could. But when it comes to understanding how AI works and whether it can live up to its promise, oftentimes business leaders are left scratching their heads. Or worse, stuck in a lengthy and costly implementation of a new tool with no results to show for it.
The AI developed at LinkSquares is fundamentally different from that at other companies. But if you don’t hold a PhD and have expertise in data science, it can be difficult to discern both how it’s different and what impact it will have on you. In this document, we break down some of the key differences in our approach to developing the AI that’s foundational to the LinkSquares product.
#1 LinkSquares Develops Algorithms That Work Iteratively to Drive Constant Improvement
To achieve high accuracy, models typically need a large number of annotations, which can be both costly and time consuming to achieve. If limitations exist on the number of annotations or the training data is not selected in an intelligent way, models can perform poorly. At LinkSquares, we have developed algorithms that work iteratively. Our models purposefully seek out data that is most dissimilar to data they have previously seen. This allows us to build models which generalize to the entire dataset, even if only trained on a subset of all documents. In building these models we have built in a sufficient cushion to accommodate adverse scenarios where complicated extractions necessitate more data.
#2 LinkSquares Provides Models as a Service
In machine learning, the concept of 'data drift’ refers to the tendency of a model’s accuracy to degrade over time because of changes in the input to the model. Simply put, language evolves over time. One example could be what happens when a company acquires another company; the acquiring company ends up with completely different language patterns in their documents not seen when the original model was trained because they’ve inherited a completely new set of documents from the company being acquired. Another example is simply the introduction of new terms (e.g. GDPR, SLA) or the evolution of standard language (e.g. act of God, force majeure). Since we provide models as a service, instead of a one-shot product, the maintenance and upkeep (refreshing them periodically with new data to reduce model drift) is our responsibility, not the customers. And since we are improving models across millions of documents, we are able to more easily spot trends in language and proactively address them.
#3 LinkSquares Uses Professional Annotators Rather Than Requiring Users to Annotate Themselves
Bad annotations translate to poorly performing models. We have also witnessed the fact that inconsistencies in annotations done by non-professionals produce noisy, messy extractions. At LinkSquares we have gone through the process of creating hundreds of annotation manuals. We have seen what works and what doesn’t with respect to the current state-of-the-art NLP architectures, and we are confident that our accuracy is the best on the market.
#4 LinkSquares Delivers Value, Even When AI Cannot
There are some extractions which cannot be handled by present-day AI technology for a number of reasons. For example, there may not be enough patterns in the language, or an extraction might require knowledge outside of the immediate text. Our approach to custom values begins by trying to build a highly-performant model. However, in rare cases where the model underperforms, we have a highly- trained team of human annotators ready to process files and deliver the metadata clients need.
#5 LinkSquares Layers on Human QA as Part of Its Proprietary OCR Process
A text extraction model can only be as good as the text it is trained on. The conversion of an image, like a scanned PDF, to text requires optical character recognition (OCR). However, no OCR solution is perfect. Even the best OCR on the market, when confronted with a poor resolution scan, can return unusable text. The only way to ensure that the language in a contract is accurately converted to text is to bring in humans for quality assurance. This is an expensive measure but ensures that we are able to build high quality models and make high quality predictions on all incoming contracts. This is a major driver as to why other solutions are less expensive, and also less accurate.
AI is core to the LinkSquares product and provides our customers with data and insights across all of their contracts. Our unique approach to developing, maintaining, and delivering AI gives our customers fast, accurate data that they can depend on.
Want to learn more? Contact LinkSquares today.