Model Details

Model Description

  • Developed by: Alexander Nikitin
  • Model type: XLM-RoBERTa-base Fine-Tuned on my labelled dataset
  • Language(s) (NLP): Russian, English
  • License: MIT
  • Finetuned from model: FacebookAI/xlm-roberta-base

Dataset

This transformer model was fine-tuned on parsed comments from "Tinkoff Pulse".

First step: Comments were preprocessed, for each stock ticker subcomment for ticker was extracted. Example: "{$GAZP} {$TCSG} {$RTKM} По газрому все хорошо. По Ростелекому не очень. Тинек идет вниз!" -> "{$GAZP} По газрому все хорошо."

Next step: Labelling dataset of 10K preprocessed comments, evenly distributed from 10 russian stocks. Used Mistral-7b LLM to label comments on 3 categories: "buy" - if author wants or encourages to buy (long), "sell" - if author wants or encourages to sell or short, "neutral" - if this is news or we cannot say for sure. Plans for further research: label 100k comments and train on them.

Bias, Risks, and Limitations

  1. Model is trained on Russian/English comments;
  2. Model is not good at extracting sentiment from comments with bright keywords in different directions, like "I wanna sell. But probably I should buy back later.";
  3. Model performs good on short-medium texts like comments, which are usually skewed to one side (strong buy or strong sell).

Recommendations

How to Get Started with the Model

Download the model with huggingface pipeline and use it!

Labels:

  • LABEL_0 = SELL
  • LABEL_1 = NEUTRAL
  • LABEL_2 = BUY

Evaluation

  • Accuracy on validation dataset: 0.786
  • Notice: this is accuracy on ~1.5k comments.

Model Card Authors

https://t.me/pivo_txt

Downloads last month
243
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.