Teaching Machines the ‘Language of Law’
March 06, 2020
adoption legal technology ai machine learning technology and digital events ai and data science daniel katz legalese
I recently had the privilege of leading a roundtable where we discussed the increasing importance of training AI tools that specialize in legal domain-specific language, the “language of law”. To date, there has been significant commercial progress in the application of some forms of AI, including Natural Language Processing (NLP). So many of the AI and NLP tools on the market now are effective because they work off of plain everyday English. In legal, however, these same techniques often aren’t enough, as the “language of law” is different from ordinary everyday English. There are complex clauses, Latin phrases, and many terms of art that are not used in virtually any other context.
While there have been some notable efforts to apply AI in law, there is still a wide gap between current AI systems and many of the domain-specific needs of the legal industry. Actually teaching machines the “language of law” is the next step forward.
Teaching Machines to Understand Legalese
Ordinary AI tools cannot very easily process the terms of art and idiosyncrasies of legal language. This is not all that surprising – many humans struggle with the complexity of legal language, so why wouldn’t machines as well? Indeed, much of lawyers’ professional training focuses on the task of using legal language in such a precise manner that we refer to it as its own language: Legalese.
Building legal-specific AI and NLP tools requires a richer semantic understanding of text, which has in many ways continued to prove elusive. Though the landscape has improved over the past few years, the average AI or NLP products that claims to be able to handle legal language may still fall short of expectations.Underneath the hood, many AI and NLP products are built and trained on general language. This means that it is often left to the user to train the machine on legal language.
Elevate’s LexPredict platform was built to reduce the amount of preliminary training needed. It has a legal-specific dictionary designed specifically to process and analyze legal language. It contains a variety of key elements, including a digitized and open-sourced version of Black’s Law Dictionary. We also developed an open source tool for pre-training machine learning models on public documents known as OpenEDGAR.
There is an assumption that software out of the box can immediately deliver robust solutions to complex problems. This is more bluster than reality. However, the LexPredict platform provides users with a significant head start.
The Human Element
As noted above, the members of the roundtable–myself included–routinely encounter the misconception that AI can simply work well without human intervention. This view is mistaken. The reality is that AI tools are built by people, and must continue to be maintained by people, to achieve the best outcomes possible.
A great example of this is spam filters. An email spam filter analyzes the address and the text of incoming emails, and judges whether that message might be spam. Spam filters work pretty well, but they are not 100% accurate. Sometimes spam makes it through a filter. Sometimes an email you know is not spam is tagged and dropped into your spam folder anyway. Human intervention is required to make sure the spam filter is always working as well as possible.
Spam filters use simple machine learning algorithms to do their job, yet the human element is vital to their operation. Legal AI tools are far more complex, and so require diligent human oversight throughout their development. This is where legal subject matter expertise adds significant value. The machine performs the identification and triage work efficiently, which enables the subject matter expert to focus on the most critical details.
By stepping in to oversee the final review of documents, and confirm or reject the results from the AI, we get the best results when humans and machines work together. Effective machine learning algorithms only become better through a process of evolution shepherded by subject matter experts who know what data needs to be found.
Tech Adoption and Buy-In
Another topic central to our roundtable discussion was the difficulty with tech adoption. Lawyers are trained to assess and analyze risk. The adoption of new legal technologies can entail significant risk–the risk that taking a novel approach will produce unexpected outcomes, or outcomes that are inferior to well-worn legacy processes.
Spurring the adoption of new technology requires some creativity. It begins by examining how an organization currently operates. What tools do they use? Who at the organization may be more willing to try something new? Often, new technology needs the approval of just a small group of early adopters. Small pilot projects can go a long way toward persuasively demonstrating the impact of new, AI-powered tools. In some instances, we can address skepticism and wariness by disseminating briefing documents that explain the benefits of the application of new technology and clarify where that technology belongs within the organization.
Finding the right motivators is key to successful adoption. Attorneys and other legal professionals may lack motivation when it comes to cost containment and efficiency–they may be perfectly happy and very successful with current costs and pricing structures. A more effective approach is to emphasize increased revenues, along with the opportunity cost of being unable to fulfill the terms of an RFP that stipulates technology-based process improvements.
Tech tools have come a long way. Our roundtable focused on taking the next steps toward still more improvement. Machines need to be better trained in the language of law because language is the “coin of the realm” here in law. The roundtable addressed both practical and technical strategies for overcoming the challenges of synthesizing technology with human expertise, increasing adoption, and achieving the best possible outcomes via efficient, cost-saving, revenue-generating technology.
Back to Blogs