Note: This transcript has been adjusted to improve readability. Transcripts are generated using speech recognition software and human transcribers. The context and more than 95% of the actual transcript have been preserved. We strongly encourage our listeners to listen to the audio.
Welcome to the Elevate Together Podcast: Voices of change in the business of law.
Dan Katz: Live from Elevate studios in Los Angeles and from my house in Chicago, Illinois, it’s the Elevate Together Podcast. I’m Dan Katz, Vice President of Data Science Innovation, here at Elevate. And I’m joined by my colleague, Eric Detterman, to kickoff a special series we call “Inside the Engine Room,” where we highlight key stories and people who bring you Elevate’s award-winning products and services. So, it’s great to have you here today, Eric.
Just to kick us off, we have a lot of problems in law that are similar to the problems faced in other sectors. We have large volumes of data that have grown exponentially over the past few decades. We have technology that’s beginning to better harvest and store that data. That’s been useful in other fields, and it’s beginning to come here in law. One field, in particular, that’ll be relevant to our conversation today is finance. Finance and law have intertwined if you think about it, as have law and innovation broadly. But today, we’re inside the ‘engine room’ with somebody who sits at this intersection of law, finance, and technologies, that’s Elevate’s own Eric Detterman, who has a wide range of experience in all these fields. Hi, Eric, and welcome to Inside the Engine Room. It’s great to have you here.
Eric Detterman: Hey, Dan. Pleased to speak with you today.
DK: It’s my pleasure. So what we want to do today is highlight some of the work that you’re doing here at Elevate with respect to data engineering, machine learning, and natural language processing. I think it’d be useful for people to hear about your own background. You draw upon the experiences you’ve had professionally and the work that you’re doing today. We’ll start with a bit of a background about you, give folks a brief introduction, and we’ll talk about a few things you’ve done along the way.
ED: Sure, I’d be happy to. A bit of background on me is, I did undergrad. as an economist with quite a bit of statistical training that lends itself to software development. Initially, I started as a process engineer, really working with a small boutique consulting firm in Michigan.We’d go out and help Fortune 1000 and Fortune 500 companies re-design their processes to be more efficient and more effective. After a while, I got tired of travelling every week, so jumped into the legal industry with mortgage and foreclosure and bankruptcy processing. We were designing and building a very large, very high-volume software application for handling foreclosures, bankruptcies, and similar areas. So really, over my career, I’ve worked in different industries and areas, all with a software slant and all with some kind of a complex workflow or complex machine-learning type capability.
DK: What were some of the technical aspects associated with processing mortgages and mortgage documents? And those are legal documents of a sort. What were some of the technical issues that you confronted in that type of work?
ED: Yes, it was quite challenging because, at that time, it was 2006, and so it was getting quite busy right ahead of the foreclosure and banking crisis in 2008. It was very high volume, and some of the challenges we faced there were a very heavy document-centric workflow, intertwined with hundreds of clients – all of the large banks you would expect, different courts, different governmental entities – and a lot of complexity around rules and regulations, which were very fast-moving and changing quite rapidly . We had to engineer very strong and complex solutions that aimed for 100% accuracy on things because if there were any problems in that process, you had to start anew. And to do that at scale across hundreds of thousands of cases is quite a difficult task.
DK: A lot has been written about the financial crisis, and particularly this work around mortgage processing, and one of the things people said is organisations had difficulty just finding, collecting, and organising all of this loan information that they, in theory, we’re supposed to have possession of, but had a lot of difficulties doing the basic finding and collecting. Was that some of the type of work that you faced?
ED: Absolutely. Mortgages are bought and sold in bulk. You would often have the standard issues on the document that did not match what you would expect. There was a lot of work to go out to different systems to pull the information to consolidate. And unfortunately, at that time, just the state of the world with machine-learning and natural language processing, there weren’t capable solutions to do some of the techniques that we have now with automated extraction, whether it’s pulling out parties or dates or currencies or some of the like. So really, it was a much more manual process and still technically driven and software-driven, but it definitely wasn’t as far along as it is today.
DK: You did this work, and then obviously, as they worked through the backlogs, the demand for this type of work wasn’t infinite. You ended up making a move into the broader financial world at that point. I think about my own experiences. I’ve spent a lot of time applying statistical models to forecast or predict all types of different problems. One of the signature problems that everybody wants to make progress on is predicting the stock market. You can ask a person, and you can ask a machine. That was some of the work you were doing, trying to build models to predict securities, equities, what have you. Tell us a little bit about some of the challenges in algorithmic trading.
ED: Yes, as you said, it’s a very textbook problem to say, “Gee, I wish I could have a robot or a machine predict which way the market’s going to go today or next month. Or should I buy Apple? Should I buy Tesla? What should I do?” Something like that. And so, much of the work and the background that I had in statistics and economics came into play in the software development side. After I left the mortgage processing world, I moved into what’s now called quantitative trading. So that is really building machine-learning and statistical models to influence your investment or trading decisions. So really, it could be something as simple as, do I think that Apple or Google or pick your other company is going to go up tomorrow? Should I buy it? And if yes, then executing on that.
So really, it was building those models out. And the challenge within finance is just the sheer volume of data, getting that data into a shape and a format that lends itself to prediction. Because it takes a lot of massaging, we’ll say, to get it into a form that can be predicted in a timely manner so that the prediction is useful. Because, of course, the data is always changing, it’s always coming in, so everything you’re doing is building not just to build a model but to predict something and then reading it in a static format; it’s real-time and always changing.
DK: And, of course, applying those types of principles and ideas. Not as much of a real-time need to do prediction is the outcome of a lawsuit. You can think of it as an asset. There’s a gain and a liability to someone, and people want to forecast that all the time. So you were in the deep there in pure finance, and then you got involved with a Fintech company called GreenKey Technologies, and you were their first employee, the CIO of the company, helping bring that company more to market. Tell us a little bit about what you were doing there at GreenKey.
ED: So, as you said, I was employee number one at GreenKey. We were building up a Voice over IP, which is basically an industrial or institutional strength telephony application that can run on your computer to allow you to trade – if you’re an institutional trader trading, say, oil or a fixed income or something of the sort – from anywhere with your counterparts. It was enabled with speech recognition and natural language processing and other techniques to extract value from that data stream: the spoken word. The uses there are just simply communicating and creating a data stream for compliance purposes and then for data purposes because there is value in more obscure or less well-known markets to know where they’re trading at and to know what they’re doing.
One of the interesting things is that if somebody is looking at equities to, say, buy 500 shares of IBM, that’s a fairly standard conversation. But when you get into some of the more esoteric products, you get into very complex jargon and terminology and things very similar to what we have in law. What was required, which was quite fascinating, is building up specialized speech recognition models and tools to extract that information accurately. And it’s the same sort of approach that we took with building LexNLP and ContraxSuite to teach it the legal domain knowledge.
DK: Yes, traders have a specific lexicon, to say the least. There’s this challenge that people deny that they put a sell or buy order in and having a record of all of that and automatic transcription of what’s going on, and also allowing people to flag bad behavior, as you mentioned on the compliance side. These are all the types of things that we want in other types of legal workstreams, beyond finance. So, you arrived at GreenKey when they hit a key milestone, and that’s about when you got connected up with us at LexPredict before we sold the company to Elevate. I guess I would say that we had brought the company to a certain point from a product perspective, but you brought that kind of CIO/CTO discipline to productisation. Maybe tell us a little bit about how you whipped us into shape and getting some proper products out of what we were doing.
ED: I have known Mike Bommarito, one of the founders of LexPredict, now for about ten years, so quite a while. I’ve always been impressed by what you and Mike have been doing. When I came in, I saw a very capable, foundationally-strong technology product that was very well-suited for complex consulting, document analytics, text extraction, and data extraction. And really, where we wanted to take it was to make it more of a product and a platform that a wider range of people can use. Easier to use across a broader range of problems. And I’ll say more of a self-service tool rather than a tool that would require a data scientist or a developer type to operate. A lot of my experience at GreenKey was building and scaling and operating globally distributed development teams, software development teams to build technology products. I brought that same sort of approach to LexPredict and building out ContraxSuite. A lot of what we were doing has been taking it from a ‘consulting Swiss Army knife’ or a tool kit that it started as and building that out into the platform that it is today.
DK: The product was built out of doing work for individual customers on individual problems, and so we assembled what was necessary on a per problem basis. You helped us create a fully integrated proper tech stack. We can do document analytics broadly. Contracts are a particular document type. What are some of the key issues on this path; where do you need to go to the next level with a fully integrated tech stack? What are some of the issues there?
ED: We would always refer to ContraxSuite as a platform and not as a product. The delineation there is that a platform is a much more open-ended and more flexible and extensible application. Really what we see in law and practice are many of the problems are very similar, but they’re different enough that you need to change something or extend something or do something different to achieve the best outcome. And so really, we kept it very flexible, so that a trained team, trained individuals, can go in and turn the dials and turn the knobs and do different things to get a better result. We’ve kept it very developer and technology-focused with ways to integrate it with other applications. If we look back over 2020, which has been a roller coaster of a year, you’ve seen new problems crop up just due to Covid and the implications of that. And so we don’t know what 2021 will bring or what new laws are coming or changes, so really what we know is that we need that flexibility and that way to configure on the fly and get the information out. Have a strong solution and not just a point solution.
DK: Let’s try to help the audience decompose a product. As we like to talk about it, there’s a front-end – typically a user experience – where there’s workflow that people are moving through from step to step to step. There’s an engine that’s processing perhaps a document in the background, and there’s a database where stuff is stored. Data has to come into the system, it’s moved through an engine, possibly people are touching it, maybe it’s stored in a database, and there’s a front end. These are kind of the moving parts that are out there. So, when you thought about some of the key issues, maybe we’ll start with the front end and the workflow.
ED: Just like you’ve described it, there’s the user interface – the user experience that people are accustomed to seeing. Consumer software has gotten to be so good that everybody expects it to look very clean and polished because their personal email is very easy to use. Often it’ll even try and complete the sentence for you. So they expect machine-learning or AI-type functionality to be very easy for them to use and interpret. In terms of components, the user interface is critical, that’s really what people see and experience. And then, moving backward, we’re certainly getting documents from some system. It could be a DMS; it could be a CLM system; it could be an internal site, wherever it may be, and then processing those according to some rules or some process or some approach to get information out of it. We’re feeding that document into the back end, to the engine, and extracting the information, and then saving it into the database so you can do things like reporting, analytics, workflow distribution, and all of the sort of stuff that people expect to have when they’re processing thousands of documents.
Often, what we see (that we’ve tried to avoid) is to really make it such that the parts can be split apart and built up in different ways, and so we can bring in different machine learning models and techniques that are ever-evolving and changing over time. Typically, what we’ll do is to have the machine learning model totally separated from the application and build that in a way that we can do something like, let’s say it’s a good unsure or bad outcome, we’ll put that with something like a stop light on the front end that’s very easy for the user to understand and interpret. The idea here is that they don’t need to know what’s happening inside the engine or within the machine learning models; they simply need to follow the patterns that they’re already accustomed to from the user experience standpoint to get value out of the tool.
DK: I think that the average buyer today in the legal space, though, is highly enamoured with workflow and front-end, and there’s no reason they shouldn’t be, but you have this problem that sometimes exists where the product, if you were really to look at it, has a Ferrari front-end in terms of a beautiful body of a car, but there’s a lawnmower engine on the inside. Now, we took the opposite perspective when we were getting started, which was engine first. If you have a bad engine, you can put a front-end on it, but eventually, you’re going to have a problem with performance. You’re not going to get there very fast, even if the car looks pretty in the showroom.
ED: We definitely focused on the foundation of the product first and the core components. If you don’t have a strong foundation on a software product, it’s very difficult to build them well, scale them, and continue growing them and providing value. We focused on building strong techniques and approaches. We open-sourced the componentry on LexNLP and demonstrated what it is so people can go in and understand the code, understand what it’s doing, and verify what it’s doing. And if they’ve got the skill set, they can go in and extend it to meet their problems. We’re seeing new things where somebody may take techniques that are created more in the broader ecosystem (one of the Silicon Valley startups or technology companies) and bring that in within our platform and then leverage that for something to help solve their problem. Certainly, building the front-end and the end-user experience is important, but it’s the easier part. And once you have the foundation, you really can take it in a lot of different directions. This concept of micro-products or micro-modules all built on the same stack, and all of these are doing different aspects, but it’s all within the same ecosystem.