File size: 6,496 Bytes
e6397e8
271c55c
 
e6397e8
 
c760a81
e6397e8
 
c760a81
81a5a72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea5245b
81a5a72
44f81a6
81a5a72
 
 
ea5245b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81a5a72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
datasets:
- ricdomolm/lawma-all-tasks
language:
- en
license: mit
tags:
- legal
---

# Lawma 8B

Lawma 8B is a fine-tune of Llama 3 8B Instruct on 260 legal classification tasks derived from [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Lawma was fine-tuned on over 500k task examples, totalling 2B tokens. As a result, Lawma 8B outperforms GPT-4 on 95\% of these legal classification tasks, on average by over 17 accuracy points. See our [arXiv preprint](https://arxiv.org/abs/2407.16615) and [GitHub repository](https://github.com/socialfoundations/lawma) for more details.

## Evaluations

We report mean classification accuracy across the 260 legal classification tasks that we consider. We use the standard MMLU multiple-choice prompt, and evaluate models zero-shot. You can find our evaluation code [here](https://github.com/socialfoundations/lawma/tree/main/evaluation).

| Model   | All tasks | Supreme Court tasks | Court of Appeals tasks |
|---------|:---------:|:-------------:|:----------------:|
| Lawma 70B | **81.9** | **84.1** | **81.5** |
| Lawma 8B | 80.3 | 82.4 | 79.9 |
| GPT4 | 62.9 | 59.8 | 63.4 |
| Llama 3 70B Inst | 58.4 | 47.1 | 60.3 |
| Mixtral 8x7B Inst | 43.2 | 24.4 | 46.4 |
| Llama 3 8B Inst | 42.6 | 32.8 | 44.2 |
| Majority classifier | 41.7 | 31.5 | 43.5 |
| Mistral 7B Inst | 39.9 | 19.5 | 43.4 |
| Saul 7B Inst | 34.4 | 20.2 | 36.8 |
| LegalBert | 24.6 | 13.6 | 26.4 |

## FAQ

**What are the Lawma models useful for?** We recommend using the Lawma models only for the legal classification tasks that they models were fine-tuned on.  The model has been fine-tuned on multiple-choice questions, not on general instructions. Therefore, the model only outputs multiple choice letters (e.g., A, B, C, etc) or numbers. The main take-away of our paper is that specializing models leads to large improvements in performance. Therefore, we strongly recommend practitioners to further fine-tune Lawma on the actual tasks that the models will be used for. Relatively few examples --i.e, dozens or hundreds-- may already lead to large gains in performance.

**What legal classification tasks is Lawma fine-tuned on?** We consider almost all of the variables of the [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Our reasons to study these legal classification tasks are both technical and substantive. From a technical machine learning perspective, these tasks provide highly non-trivial classification problems where
even the best models leave much room for improvement. From a substantive legal perspective, efficient
solutions to such classification problems have rich and important applications in legal research.

## Example use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = "ricdomolm/lawma-8b"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir)

def generate_response(input_text):
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(inputs.input_ids, max_length=2048, do_sample=False)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

input_text = """This case may seem at first blush too inconsequential to find its way into our bdoks, but the issue it presents is of no small constitutional significance.
Appellant Paul Robert Cohen was convicted in the Los Angeles Municipal Court of violating that part of California Penal Code § 415 which prohibits “maliciously and willfully disturb [ing] the peace or quiet of any neighborhood or person. ■.. by... offensive conduct... He was given 30 days’imprisonment. The facts upon which his conviction rests are detailed in the opinion of the Court of Appeal of California, Second Appellate District, as follows:
“On April 26, 1968, the defendant was observed in the Los Angeles County Courthouse in the corridor outside of division 20 of the municipal court wearing a jacket bearing the words ‘F the Draft’ which were plainly visible. There were women and children present in the corridor. The defendant was arrested. The defendant testified that he wore the jacket knowing that the words were on the jacket as a means of informing the public of the depth of his feelings against the Vietnam War and the draft.
“The defendant did not engage in, nor threaten to engage in, nor did anyone as the result of his conduct in fact commit or threaten to commit any act. of violence. The defendant did not make any loud or unusual noise, nor was there any evidence that he uttered any sound prior to his arrest.” 1 Cal. App. 3d 94, 97-98, 81 Cal. Rptr. 503, 505 (1969).
In affirming the conviction the Court of Appeal held that' “offensive conduct” means “behavior which has a tendency to provoke others to acts of violence or to in turn disturb the peace,” and that the State had,proved this element because, on the facts of this case, “[i]t was certainly reasonably foreseeable that.such conduct might cause others to rise up to commit a violent act against the person of the defendant or attempt to forceably remove his jacket.” 1 Cal. App. 3d, at 99-100, 81 Cal.

Question: What is the issue area of the decision?
A. Criminal Procedure
B. Civil Rights
C. First Amendment
D. Due Process
E. Privacy
F. Attorneys
G. Unions
H. Economic Activity
I. Judicial Power
J. Federalism
K. Interstate Relations
L. Federal Taxation
M. Miscellaneous
N. Private Action
Answer:"""

output = generate_response(input_text)
print(output)
```

## Citation

This model was trained for the project

*Lawma: The Power of Specizalization for Legal Tasks. Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore. 2024*

Please cite as:

```
@misc{dominguezolmedo2024lawmapowerspecializationlegal,
      title={Lawma: The Power of Specialization for Legal Tasks}, 
      author={Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore},
      year={2024},
      eprint={2407.16615},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.16615}, 
}
```