LegalAI LLM: A Domain-Specific Legal Model

Welcome to LegalAI LLM, a lightweight and efficient legal-specific large language model (LLM) designed to transform the legal industry with advanced natural language processing capabilities. Built with 497M parameters, this model offers unparalleled accuracy, transparency, and reliability for legal professionals, educators, and the general public.

How to use

Transformers

pip install transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "Muhammad2003/Llama3-LegalLM"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Features

Legal Document Analysis: Analyze legal documents for accuracy, completeness, and compliance with regulations.
Legal Document Generation: Create contracts, agreements, and notices based on user inputs.
Case Law Retrieval: Search and retrieve case laws with relevant summaries and insights.
Evidence Chain Analysis: Map relationships between facts and generate evidence chains from case documents.
Legal Query Handling: Provide accurate, context-aware answers to legal questions.
Bias Mitigation: Designed to minimize racial, gender, and other biases for fair and equitable results.
Hallucination Reduction: Enhanced training processes to minimize the generation of fabricated or inaccurate legal content.

Datasets

LegalAI LLM is trained on publicly available, licensed legal datasets:

HFforLegal/case-law: Comprehensive corpus of legal documents under CC BY 4.0.
Case Law Access Project (CAP): U.S. state and federal case law from 1658–2020.
Court Listener: Federal and state court opinions from the Free Law Project.
Open Australian Legal Corpus: Australian legislative and judicial documents.
German Court Decisions (Gesp): German court decisions collected for legal research.
The Pakistan Codes: Official laws and constitution of Pakistan.

All datasets were collected with full adherence to ownership, intellectual property, and licensing requirements, ensuring complete transparency and no risk of legal repercussions for users.

Technical Specifications

Model Size: 497M parameters
Training Framework: PyTorch with LangChain integration
Supported Hardware: Consumer-grade GPUs (e.g., NVIDIA T4, MacBook M1) or cloud platforms
Input Format: Text queries or legal document inputs
Output Format: Structured responses, summaries, or generated documents

Use Cases

Legal Professionals: Streamline workflows by generating and analyzing legal documents.
Educators: Assist legal students with case law studies and research tools.
Public Users: Enable non-technical users to generate basic legal documents and understand judicial processes.
Law Firms: Integrate with case management systems for enhanced productivity.

Ethical AI Commitment

LegalAI LLM adheres to strict ethical guidelines:

Transparency: Training data sources are fully disclosed.
Compliance: No copyrighted or unlawfully sourced data is used.
Bias Mitigation: Designed to reduce discriminatory outputs in legal contexts.

Limitations

The model may require further fine-tuning for jurisdiction-specific tasks.
Certain nuanced legal interpretations may require human oversight.

Contributors

Muhammad Bin Usman
Zain Ul Abideen
Syed Hasan Abbas

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For support or inquiries, please reach out at [[email protected]].

Explore the future of legal AI with LegalAI LLM!