Model description
This is a tuned random forest classifiert, trained on a processed dataset of 2800 German court cases (see legalis dataset). It predicts the winner (defended/"Verklagtr" or plaintiff/"Klägerin") of a court case based on facts provided (in German).
Intended uses & limitations
- This model was created as part of a university project and should be considered highly experimental.
get started with the model
Try out the hosted Interference UI or the Huggingface Space
import pickle
with open(dtc_pkl_filename, 'rb') as file:
clf = pickle.load(file)
The modelHyperparameters
- The Classifier was tuned with scikit's cv search method, the pipeline used a CountVectorizer with common German stopwords.
Click to expand
Hyperparameter | Value |
---|---|
memory | |
steps | [('count', CountVectorizer(ngram_range=(1, 3), stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles', 'als', 'also', 'am', 'an', 'ander', 'andere', 'anderem', 'anderen', 'anderer', 'anderes', 'anderm', 'andern', 'anderr', 'anders', 'auch', 'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da', 'damit', 'dann', ...])), ('clf', RandomForestClassifier(min_samples_split=5, random_state=0))] |
verbose | False |
count | CountVectorizer(ngram_range=(1, 3), stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles', 'als', 'also', 'am', 'an', 'ander', 'andere', 'anderem', 'anderen', 'anderer', 'anderes', 'anderm', 'andern', 'anderr', 'anders', 'auch', 'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da', 'damit', 'dann', ...]) |
clf | RandomForestClassifier(min_samples_split=5, random_state=0) |
count__analyzer | word |
count__binary | False |
count__decode_error | strict |
count__dtype | <class 'numpy.int64'> |
count__encoding | utf-8 |
count__input | content |
count__lowercase | True |
count__max_df | 1.0 |
count__max_features | |
count__min_df | 1 |
count__ngram_range | (1, 3) |
count__preprocessor | |
count__stop_words | ['aber', 'alle', 'allem', 'allen', 'aller', 'alles', 'als', 'also', 'am', 'an', 'ander', 'andere', 'anderem', 'anderen', 'anderer', 'anderes', 'anderm', 'andern', 'anderr', 'anders', 'auch', 'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da', 'damit', 'dann', 'der', 'den', 'des', 'dem', 'die', 'das', 'dass', 'daß', 'derselbe', 'derselben', 'denselben', 'desselben', 'demselben', 'dieselbe', 'dieselben', 'dasselbe', 'dazu', 'dein', 'deine', 'deinem', 'deinen', 'deiner', 'deines', 'denn', 'derer', 'dessen', 'dich', 'dir', 'du', 'dies', 'diese', 'diesem', 'diesen', 'dieser', 'dieses', 'doch', 'dort', 'durch', 'ein', 'eine', 'einem', 'einen', 'einer', 'eines', 'einig', 'einige', 'einigem', 'einigen', 'einiger', 'einiges', 'einmal', 'er', 'ihn', 'ihm', 'es', 'etwas', 'euer', 'eure', 'eurem', 'euren', 'eurer', 'eures', 'für', 'gegen', 'gewesen', 'hab', 'habe', 'haben', 'hat', 'hatte', 'hatten', 'hier', 'hin', 'hinter', 'ich', 'mich', 'mir', 'ihr', 'ihre', 'ihrem', 'ihren', 'ihrer', 'ihres', 'euch', 'im', 'in', 'indem', 'ins', 'ist', 'jede', 'jedem', 'jeden', 'jeder', 'jedes', 'jene', 'jenem', 'jenen', 'jener', 'jenes', 'jetzt', 'kann', 'kein', 'keine', 'keinem', 'keinen', 'keiner', 'keines', 'können', 'könnte', 'machen', 'man', 'manche', 'manchem', 'manchen', 'mancher', 'manches', 'mein', 'meine', 'meinem', 'meinen', 'meiner', 'meines', 'mit', 'muss', 'musste', 'nach', 'nicht', 'nichts', 'noch', 'nun', 'nur', 'ob', 'oder', 'ohne', 'sehr', 'sein', 'seine', 'seinem', 'seinen', 'seiner', 'seines', 'selbst', 'sich', 'sie', 'ihnen', 'sind', 'so', 'solche', 'solchem', 'solchen', 'solcher', 'solches', 'soll', 'sollte', 'sondern', 'sonst', 'über', 'um', 'und', 'uns', 'unsere', 'unserem', 'unseren', 'unser', 'unseres', 'unter', 'viel', 'vom', 'von', 'vor', 'während', 'war', 'waren', 'warst', 'was', 'weg', 'weil', 'weiter', 'welche', 'welchem', 'welchen', 'welcher', 'welches', 'wenn', 'werde', 'werden', 'wie', 'wieder', 'will', 'wir', 'wird', 'wirst', 'wo', 'wollen', 'wollte', 'würde', 'würden', 'zu', 'zum', 'zur', 'zwar', 'zwischen'] |
count__strip_accents | |
count__token_pattern | (?u)\b\w\w+\b |
count__tokenizer | |
count__vocabulary | |
clf__bootstrap | True |
clf__ccp_alpha | 0.0 |
clf__class_weight | |
clf__criterion | gini |
clf__max_depth | |
clf__max_features | sqrt |
clf__max_leaf_nodes | |
clf__max_samples | |
clf__min_impurity_decrease | 0.0 |
clf__min_samples_leaf | 1 |
clf__min_samples_split | 5 |
clf__min_weight_fraction_leaf | 0.0 |
clf__n_estimators | 100 |
clf__n_jobs | |
clf__oob_score | False |
clf__random_state | 0 |
clf__verbose | 0 |
clf__warm_start | False |
Model Plot
Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])
CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles','als', 'also', 'am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer', 'anderes','anderm', 'andern', 'anderr', 'anders', 'auch','auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da','damit', 'dann', ...])
RandomForestClassifier(min_samples_split=5, random_state=0)
Evaluation Results
Metric | Value |
---|---|
accuracy | 0.664286 |
f1 score | 0.664286 |
Model Card Authors
This model card and the model itself are written by following authors:
@LennardZuendorf -HGF @LennardZuendorf - Github
Citation
See Dataset for Sources and refer to Github for collection of all files.
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.