dominguesm's picture
Update readme
5612cee
|
raw
history blame
4.72 kB
metadata
language:
  - pt
license: cc-by-4.0
datasets:
  - wiki_lingua
thumbnail: null
tags:
  - named-entity-recognition
  - Transformer
  - pytorch
  - bert
metrics:
  - f1
  - precision
  - recall
model-index:
  - name: rpunct-ptbr
    results:
      - task:
          type: named-entity-recognition
        dataset:
          type: wiki_lingua
          name: wiki_lingua
        metrics:
          - type: f1
            value: 55.7
            name: F1 Score
          - type: precision
            value: 57.72
            name: Precision
          - type: recall
            value: 53.83
            name: Recall
widget:
  - text: >-
      henrique foi no lago pescar com o pedro mais tarde foram para a casa do
      pedro fritar os peixes
  - text: >-
      cinco trabalhadores da construção civil em capacetes e coletes amarelos
      estão ocupados no trabalho
  - text: >-
      na quinta feira em visita a belo horizonte pedro sobrevoa a cidade
      atingida pelas chuvas
  - text: >-
      coube ao representante de classe contar que na avaliação de língua
      portuguesa alguns alunos se mantiveram concentrados e outros dispersos

🤗 bert-restore-punctuation-ptbr

Coming soon python package for simpler use.

This is a bert-base-portuguese-cased model finetuned for punctuation restoration on WikiLingua.

This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.

Model restores the following punctuations -- [! ? . , - : ; ' ]

The model also restores the upper-casing of words.


🎯 Accuracy

label precision recall f1-score support
Upper - OU 0.89 0.91 0.90 69376
None - OO 0.99 0.98 0.98 857659
Full stop/period - .O 0.86 0.93 0.89 60410
Comma - ,O 0.85 0.83 0.84 48608
Upper + Comma - ,U 0.73 0.76 0.75 3521
Question - ?O 0.68 0.78 0.73 1168
Upper + period - .U 0.66 0.72 0.69 1884
Upper + colon - :U 0.59 0.63 0.61 352
Colon - :O 0.70 0.53 0.60 2420
Question Mark - ?U 0.50 0.56 0.53 36
Upper + Exclam. - !U 0.38 0.32 0.34 38
Exclamation Mark - !O 0.30 0.05 0.08 783
Semicolon - ;O 0.35 0.04 0.08 1557
Apostrophe - 'O 0.00 0.00 0.00 3
Hyphen - -O 0.00 0.00 0.00 3
accuracy 0.96 1047818
macro avg 0.57 0.54 0.54 1047818
weighted avg 0.96 0.96 0.96 1047818

🤷 Output

Example:

[
  {
    "entity_group": "OU",
    "score": 0.8026431202888489,
    "word": "henrique",
    "start": 0,
    "end": 8
  },
  {
    "entity_group": "OO",
    "score": 0.9925149083137512,
    "word": "foi no lago pescar com o",
    "start": 9,
    "end": 33
  },
  {
    "entity_group": ".U",
    "score": 0.8426014184951782,
    "word": "pedro",
    "start": 34,
    "end": 39
  },
  {
    "entity_group": "OU",
    "score": 0.9519776105880737,
    "word": "mais",
    "start": 40,
    "end": 44
  },
  {
    "entity_group": ",O",
    "score": 0.8551820516586304,
    "word": "tarde",
    "start": 45,
    "end": 50
  },
  {
    "entity_group": "OO",
    "score": 0.9902807474136353,
    "word": "foram para a casa do",
    "start": 51,
    "end": 71
  },
  {
    "entity_group": "OU",
    "score": 0.9227372407913208,
    "word": "pedro",
    "start": 72,
    "end": 77
  },
  {
    "entity_group": "OO",
    "score": 0.9997054934501648,
    "word": "fritar os",
    "start": 78,
    "end": 87
  },
  {
    "entity_group": ".O",
    "score": 0.9813661575317383,
    "word": "peixes",
    "start": 88,
    "end": 94
  }
]

This output refers to:

Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.

🤙 Contact

Maicon Domingues for questions, feedback and/or requests for similar models.