ELF webinar on Artificial Intelligence

{ dr. Homoki Péter / 2023.06.26 }

This is a transcript of my presentation for the European Lawyers Foundation You can find the presentation slides here.

As previously mentioned, CCBE and ELF have published this Guide on the use of AI for lawyers and law firms in the EU a year ago. Many things have changed since, but my number one advice is still: please read it.

The guide is available in English and in Hungarian. It gives you a lot more detail, including a much wider overview of the tools that lawyers can use. This 30 minutes we have today is not enough to go to any depth.

Today, I will only talk about a part that could not get enough emphasis in the Guide, which is a specific type of “foundational model” called a large language models, like GPT-4.

These large language models are increasingly dominant, but they are still not the best solution for every legal task.

What’s a large language model? It’s a very sophisticated language model that is assisted by a huge dataset that was trained on an enormous amount of data with very complex technology built on artificial neural networks.

Language models themselves are just representatives of the probabilities of specific words following each other. They show the probabilities for text sequences: depending on the earlier parts of the sequence, what will be the next part?

You can already find some language models from the late 19th century. That does not sound a very exciting application. But with the assistance of today’s technology, this simple modelling can become so sophisticated that people consider it as magical, like an artificial person is writing long essays for them.

You can see here a very high-level illustration of the currently most popular large language model, „GPT” by OpenAI. These models transform words first into numbers in a way that retains as much as possible the contextual meaning of the words used, the semantic meaning of the word. This transformation is a very important part of the trick.

These input tokens are guided through a number of blocks that are actually artificial neural networks each.

And out of these blocks, we receive these “contextualised tokens”, which provide a lot more precise meaning of each token, in relation to what its meaning is in the full input, the full context.

From these tokens, the first “response token” (output) is generated by this probability model. The next response tokens will be generated taking into account both the previous response tokens and the original input tokens.

These generations go on, until the large language model generates a stop token or until some other parameters of the language model make it stop. These response tokens then are turned back into text.

These probability values used in the neural networks are learned by way of first training the networks on very large amounts of text, during what is called a self-supervised training.

So first, large language models are trained during the training phase, and based on these trainings, they can provide these predictions (inferences) that is the essence language models.

These large language models utilize a different way of computation, that is relying on machine learning, and not based on explicit programming on how to reach the required end result.

Very briefly let’s visualise what neural networks look like in these large language models. Neural networks are a different computing paradigm. There is no separate central processing unit, no memory, no separate instructions. Everything is included in this mesh. The weights and the “biases” of the network do the processing and serve also as a memory, and they represent the instructions, thanks to the results of the training. They transform the input into the required output.

Let’s turn our attention back to large language models. This slide just shows you the family of all large language models that it is very diverse. We are currently living in a veritable “Cambrian explosion” of large language models. The most popular branch is the grey one, that of the decoder-only transformer-based language models, but that doesn’t mean that lawyers will be using only these type of large language models, or that these tools are the best for all kinds of legal jobs.

Let’s see in general, what difference will large language models make?

They actually show up as a new layer on the top of existing software. Or replacing software parts.

They make it possible to automate things that were not previously possibly or reasonable to automate.

In general, they provide an improvement in language-centric computer capabilities.

The next question is: why now? Why was there so recently such a huge change in the capabilities of artificial intelligence?

This is related to something that is, in complexity theory,1 called as emergent abilities. This means that a small quantitative change results in qualitative changes, there are new, and in our case, unexpected, improvements in the performance of the language models.

Just like a human is not the same as the sum of all the cells making up that human.

Or a large flock of birds behaves differently than 150 000 separate starlings. In the case of large language models, researchers have identified in 2018 that if neural networks become more complex – the more parameters they have –, then new and exciting abilities will appear.

Such as: we can use the input we send to the large language model as a kind of instruction to the model for a wide range of possible tasks. Previously, we had to first do a pretraining of the large language model, and then finetune the language model to each and every specific task that we wanted to use the model for. Like finetuning for the classification of judgements for two different specific legal subjects.

And these fine-tunings were resource intensive, and they needed some specialist expertise.

Now, it seems that thanks to something called in-context learning, or also called as zero-shot or few-shot learning, we can achieve very good results in a way that we only give different kinds of instructions to our large language model. And the model is able to provide good results without fine-tuning.

Another such emergent ability was that the logic reasoning abilities of the large language models became surprisingly better, compared to the previous versions.2

Besides emergent abilities, another major cause for change was the specific fine-tuning of these models to respond better to instructions. This fine-tuning made this model to be able to complete tasks better.

Finally, a major change was that thanks to the ChatGPT product made public in November 2022, a much larger audience became aware of the capabilities of these large language models. Such capabilities were visible for developers and the ML community in late 2021, but the public visibility made these products enjoyable for almost everyone.

So what do these large language models make possible for lawyers?

Well, the most important thing is that the pool of experts that we can involve in automating our legal works has become a lot bigger.

We no longer need machine learning engineers to automate legal tasks.

It’s enough that we have access to consultants that have a generic IT skill set – which is still better than most lawyers have. These less-skilled experts are more available even in small countries.

Second is that the implementation costs go down. We don’t need costly data preparation for fine-tuning, which was a major block to using large language models for legal work.

We can also replace some complex software with with simpler ones that use the text generation capabilities of large language models for example, for grammar uses.

And in general we can use less tools: we can use large language models as a more integrated tool instead of a number of diverse tools.

Also, lawyers will be able to use a conversational user interfaces that in a number of cases are easier to use than traditional and complex user interfaces. Like we can use chat to interact with software.

Now let’s see some very specific examples of using large language models.

My first example starts with automatic speech transcription. This has been available for lawyers for a long time, but not for all languages. Now, everyone can have access to free models that can handle a large number of languages to transcript audio into text like this one.

The quality of the transcription is pretty good even in Hungarian as you can see from the numbers of the slide, called Word Error Rate. You can find a number of front-ends for this free model by OpenAI called Whisper, like this one on the slide.

But Whisper is a speech recognition model. It’s not considered as a large language model. But this front-end shown here is using large language models for the next run after the speech recognition has been completed.

The large language model redacts the transcribed text into something more readable and probably useful. The speech recognition part is not sending out any data to third parties. But the large language model part is.

Let’s turn to the next page. Contract automation is the oldest field in legal automation. It has been done since the late 1980s. How will large language models affect this field?

Well, it will not work in a way that you give a generic description for the large language model to create a contract for you. That will not be useful.

For these large language models to work reliably, they need a lot more structure and context than this.

Humans can write nice contracts from three line descriptions, but only if we know the context sufficiently. For example, because we have already worked with the guy giving the short instruction.

Unless there is a way to hand over this context to the large language model, they will not be able to create a good contract from a three-liner.

That’s why large language models are not a replacement for the structured document automation systems that we see nowadays. But they are a very useful addition to them.

This example shows you the versatility of GPT-4 in adjusting generic contract terms to a specific desired wording in a user friendly way in Hungarian.

Hungarian language is pretty complex. It’s not like you can define a generic provision and rely on a standard contract automation software to handle the fine details, like how many parties there are.

It is possible to use some external grammatic tools to make these grammatic abstractions possible, they are not reliable.

But with large language models, you can see that it’s enough if you a) use generic provisions in the contract automation software, and b) give a detailed instruction to a language model on the transformation of the provision to the desired format. This way, it’s easier to automate your documents. You don’t have to tinker with the grammatic structure, you can reuse provisions across contracts in different fields. Such as for a sale contract or a lease contract.

This works even for Hungarian, even for similar small languages out of the box. Now this also means that when using document automation software with these capabilities, lawyers in the EU will be able to use the same document automation software across borders.

The next example is called legal open book question answering. Even now, you can ask ChatGPT legal questions, and ChatGPT will definitely give you an answer. The less you know about that area of law, like North Macedonian contract law, the more convincing ChatGPT will sound. When you go into the details, you see that the answers are usually not precise at all.

But a different approach is also possible. You can use the GPT models in a way that they should give you an answer based on specific parts of a text that they retrieve from a legal source. In this case, the response is not based on the pre-training data of ChatGPT, but on documents the user supplied.

In case of my experiment, these documents were the Hungarian civil code and the Hungarian civil procedure code. I’ve tried to measure if the answers will be more precise this way.

These legal texts are usually much longer than the limits of the GPT model, called the „maximum context length”. So you first break up the text into the maximum size for a single input to the given language model, called maximum context length.

Then we convert these text parts into embeddings. If you remember the slide about large language models, this is the way you have to convert text into numbers in a way that retains the semantic meaning of the text as well. The next step is to find the most relevant text parts of the full legal source, and then retrieve these relevant text parts only. Just like you do in any kind of legal search, information retrieval task.

And from these most relevant text parts, you’re feeding these most relevant text parts into the large language model, like GPT-4, and ask GPT-4 to answer your question based on these text parts.

As you can see on the slide, the results became much better this way. What’s more interesting, I could take a look at what kind of questions are difficult for this method.

I’ve been using generic questions that laymen or lawyers would ask from a computer regarding these legal texts. But often these answers are not readily available in the legal text. They rely on multiple layers of definitions that are spread across different text parts.

GPT-4 could provide more convincing answers this way: 75% of the questions were answered correctly. While with the ChatGPT version, which was using only the pre-trained data, the correct response rate for the same questions was 33%.

Of course, this was not a very precise experiment. It was just good to get a feeling of what kind of research could be done in this area. But this experiment has shown why lawyers need to have their own lawyer-focused benchmarks for question answering and similar tasks.

While there are already lots of benchmarks in the natural language processing field, they are not done with lawyers in mind. They are more theoretical approaches for those specialists working in natural language processing.

There already are a number of benchmarks specifically for legal use as well, but separate tasks, separate jurisdictions, languages need separate benchmarks. Even if we should harmonize most of these benchmarks in a certain way, neither the correct answers nor the questions will be the same across countries.

What I would like to mention last is that large language models bring about changes that can disrupt markets. First, they disrupt the software market serving lawyers, which is not a very mature market. Publishers of legaltech software have to adapt in many fields to the capabilities of these new large language models.

It’s very important for us is that it’s not realistic that we, as lawyers, will be able to deal with all the technical complexities in detail. Lawyers will not take over the jobs of programmers, but we have to use this opportunity provided by AI tools to support and use those type of software which will help us retain our independence from the software market a lot better.

We can avoid some of the vendor lock-in problems with the use of large language models. Problems that we can already see with contract automation or with practice management software. Lawyers will have to adapt their data and their processes to a specific piece of software. And because serving lawyers in the EU is a not a big market, these providers do go out of business. So you again have to invest lots of time and effort in a great hurry to use another provider, adapt again your data and your processes.

We can try to address these issues in two ways. One is to have the legal specific benchmarks that I’ve mentioned, which will make it possible for us to compare different solutions, which one is better for our own purposes.

The other way is to standardize exchange format between these technical providers. Both directions need some kind of investment on behalf of lawyers and their bars.

Let’s take a look at this last picture. We have heard that lawyers who will be able to use large language models and other AI have a competitive edge compared to lawyers who are not using it. But can they retain such competitive edge in the long term?

Competition will not end when every lawyer is using the same type of software. Small firms really do face a considerable challenge here because their main competitive edge is in the first line: they can rely on large language models to broaden their knowledge, to boost their processes, so on. But that’s the only thing that can set them apart from each other.

However, the AI ecosystems provide more opportunities for those firms who have access to more resources. These firms may capitalize in the long term on having access to different applications or even different models and data. That will be able to provide for them an edge against each other. These factors can become long term differentiators.

Finally, it’s important to mention that the most valuable data held is client data, which is subject to confidentiality and also subject to data protection measures. So law firms that would like to rely on such client data to gain any advantage, will first need to get an approval from the clients for such training.

  1. See complexity theory, e.g. https://complexityexplained.github.io/ComplexityExplained.pdf. 

  2. These emergent abilities are also the reason why the most popular large language models are generative/decoder-only large language models: it became possible to use decoder-only large language models to tasks that were previously encoder only. But that still doesn’t mean all large language models have to be generative. 

» Back