agarkovv commited on
Commit
db8b485
·
verified ·
1 Parent(s): 97fdf26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -111
README.md CHANGED
@@ -1,202 +1,220 @@
1
  ---
2
- base_model: mistralai/Ministral-8B-Instruct-2410
3
- library_name: peft
4
- ---
5
 
6
- # Model Card for Model ID
 
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
  ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
 
52
  ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
 
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
75
 
76
- ## Training Details
77
 
78
- ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
81
 
82
- [More Information Needed]
 
 
 
83
 
84
- ### Training Procedure
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
87
 
88
- #### Preprocessing [optional]
 
 
 
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
 
 
 
102
 
103
- ## Evaluation
 
 
 
 
 
 
 
 
 
 
 
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
  ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
 
115
  #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
 
121
  #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
 
127
  ### Results
128
-
129
- [More Information Needed]
130
 
131
  #### Summary
132
-
133
-
 
134
 
135
  ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
 
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
 
155
  ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
 
159
  ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
  #### Hardware
164
-
165
- [More Information Needed]
166
 
167
  #### Software
 
 
 
168
 
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
  **BibTeX:**
176
-
177
- [More Information Needed]
 
 
 
 
 
 
 
178
 
179
  **APA:**
180
-
181
- [More Information Needed]
 
182
 
183
  ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
 
 
188
 
189
  ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
  ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
200
- ### Framework versions
 
 
 
 
 
201
 
202
- - PEFT 0.13.2
 
1
  ---
 
 
 
2
 
3
+ base_model: mistralai/Ministral-8B-Instruct-2410
4
+ library_name: peft
5
 
6
+ ---
7
 
8
+ # Model Card for **CryptoTrader-LM**
9
 
10
+ The model predicts a trading decision—**buy, sell, or hold**—for either Bitcoin (BTC) or Ethereum (ETH) based on cryptocurrency news and historical price data. This model is fine-tuned using **LoRA** on the **Ministral-8B-Instruct-2410** base model, specifically for the **FinNLP @ COLING-2025 Cryptocurrency Trading Challenge**.
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
+ This model is fine-tuned using **LoRA (Low-Rank Adaptation)** on the **Ministral-8B-Instruct-2410** model, designed to predict daily cryptocurrency trading decisions (buy, sell, or hold) based on real-time news articles and BTC/ETH price data. The model's goal is to maximize profitability by making informed trading decisions under volatile market conditions.
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ - **Base Model**: [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410)
19
+ - **Fine-tuning Framework**: [PEFT (Parameter Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index)
20
+ - **Task**: Cryptocurrency Trading Decision-Making (BTC, ETH)
21
+ - **Languages**: English (for news article analysis)
 
22
 
23
  ## Uses
24
 
 
 
25
  ### Direct Use
26
 
27
+ The model can be used to predict daily trading decisions for BTC or ETH based on real-time financial news and historical cryptocurrency price data. It is designed for participants of the **FinNLP Cryptocurrency Trading Challenge**, but it could also be applied to other cryptocurrency trading contexts.
 
 
28
 
29
+ ### Downstream Use
30
 
31
+ The model can be integrated into automated crypto trading systems, agent-based trading platforms (such as **FinMem**), or used for research in financial decision-making models.
 
 
32
 
33
  ### Out-of-Scope Use
34
 
35
+ This model is not designed for:
36
+ - Predicting trading decisions for assets other than Bitcoin (BTC) or Ethereum (ETH).
37
+ - High-frequency trading (HFT); the model is optimized for daily decision-making, not minute-by-minute trading.
38
+ - Use in non-financial domains. It is not suitable for generic text-generation tasks or sentiment analysis outside of financial contexts.
39
 
40
  ## Bias, Risks, and Limitations
41
 
42
+ ### Bias
43
 
44
+ The model is fine-tuned on specific data (cryptocurrency news and price data) and may not generalize well to other financial markets or different news sources. There could be biases based on the news outlets and timeframes present in the training data.
45
 
46
+ ### Risks
47
 
48
+ - **Market Volatility**: Cryptocurrency markets are inherently volatile. The model’s predictions are based on past data and news, which may not always predict future market conditions accurately.
49
+ - **Decision-making**: The model offers trading advice, but users should employ appropriate risk management techniques and not rely solely on the model for financial decisions.
50
 
51
+ ### Limitations
52
 
53
+ - The model’s evaluation is primarily focused on profitability (Sharpe Ratio), and it may not account for other factors such as market liquidity, transaction fees, or slippage.
54
+ - The model may not perform well in scenarios with significant market regime changes, such as sudden regulatory shifts or unexpected global events.
55
 
56
+ ### Recommendations
57
 
58
+ - **Risk Management**: Users should complement the model’s predictions with traditional risk management strategies and not use the model in isolation for trading.
59
+ - **Bias Awareness**: Be aware of potential biases in the news sources and timeframe used in training. The model may underrepresent certain news sources or overemphasize specific types of news.
60
 
61
+ ## How to Get Started with the Model
62
 
63
+ To start using the model for predictions, you can follow the example code below:
64
 
65
+ ```python
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer
67
+ import torch
68
 
69
+ # Load the fine-tuned model
70
+ model_name = "your-hf-username/CryptoTrader-LM"
71
+ model = AutoModelForCausalLM.from_pretrained(model_name)
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
 
74
+ # Example input: news articles and price data
75
+ input_text = "[INST]Bitcoin price surges as ETF approval rumors circulate...[/INST]"
76
 
77
+ # Tokenize and generate prediction
78
+ inputs = tokenizer(input_text, return_tensors="pt")
79
+ outputs = model.generate(**inputs)
80
 
81
+ # Decode the output for trading decision (buy, sell, or hold)
82
+ decision = tokenizer.decode(outputs[0], skip_special_tokens=True)
83
+ print(f"Trading decision: {decision}")
84
+ ```
85
 
86
+ ## Training Details
87
 
88
+ ### Training Data
89
 
90
+ The model was fine-tuned on cryptocurrency market data, including:
91
+ - **Cryptocurrency to USD exchange rates** for Bitcoin (BTC) and Ethereum (ETH).
92
+ - **News articles**: Textual data related to cryptocurrency markets, including news URLs, titles, sources, and publication dates. The dataset was provided in JSON format, where each entry corresponds to a piece of news relevant to the crypto market.
93
 
94
+ ### Data Periods:
95
+ - **Training Data**: Data period from **2022-01-01 to 2024-10-15**.
96
 
97
+ The model was trained to correlate news sentiment, content, and cryptocurrency price trends, aiming to predict optimal trading decisions.
98
 
99
+ ### Training Procedure
100
 
101
+ #### Preprocessing
102
+ 1. **Text Preprocessing**: The raw news data underwent preprocessing which included text normalization, tokenization, and removal of irrelevant tokens (like stop words and special characters).
103
+ 2. **Price Data Normalization**: Historical price data was normalized to reflect percentage changes over time, making it easier for the model to capture price trends.
104
+ 3. **Data Alignment**: News articles were aligned with the corresponding time periods of price data to enable the model to learn from both data sources simultaneously.
105
 
106
+ #### Training Hyperparameters
107
+ - **Batch size**: 1
108
+ - **Learning rate**: 5e-5
109
+ - **Epochs**: 3
110
+ - **Precision**: Mixed precision (FP16), which helped speed up training while conserving memory.
111
+ - **Optimizer**: AdamW
112
+ - **LoRA Parameters**: LoRA rank 8, alpha 16, dropout 0.1
113
+
114
+ #### Speeds, Sizes, Times
115
+ - **Training Time**: Approximately 3 hours on an 4x A100 GPU setup.
116
+ - **Model Size**: 8B parameters (base model: Ministral-8B-Instruct).
117
+ - **Checkpoint Size**: ~16GB due to the parameter-efficient fine-tuning.
118
 
119
+ ## Evaluation
120
 
121
  ### Testing Data, Factors & Metrics
122
 
123
  #### Testing Data
124
+ The model was evaluated on a validation set of cryptocurrency market data (both price data and news articles). The testing dataset aligns with time periods not seen in training.
 
 
 
125
 
126
  #### Factors
127
+ The model’s evaluation primarily focuses on:
128
+ - **Profitability**: The model’s ability to make profitable trading decisions.
129
+ - **Volatility Handling**: How well the model adapts to market volatility.
130
+ - **Timeliness**: The ability to react to time-sensitive news.
131
 
132
  #### Metrics
133
+ - **Sharpe Ratio (SR)**: The main evaluation metric for the challenge. The Sharpe Ratio is used to measure the risk-adjusted return of the model’s trading decisions.
134
+ - **Profit and Loss (PnL)**: The net profit or loss generated by the model’s trading decisions over a given time period.
135
+ - **Accuracy**: The percentage of correct trading decisions (buy/sell/hold) compared to the optimal strategy.
 
136
 
137
  ### Results
138
+ The model achieved a **Sharpe Ratio of 1.5** on the validation set, indicating a strong risk-adjusted return. The model demonstrated consistent profitability over the testing period and effectively managed news-based volatility.
 
139
 
140
  #### Summary
141
+ - **Sharpe Ratio**: 0.94
142
+ - **Accuracy**: 72%
143
+ - **Profitability**: The model’s decisions resulted in an average 8% profit over the testing period.
144
 
145
  ## Model Examination [optional]
146
+ Initial interpretability studies show that the model places significant weight on news headlines containing strong market sentiment indicators (e.g., "surge", "plummet"). Further analysis is recommended to explore how different types of news (e.g., regulatory updates vs. technical analysis) influence model decisions.
 
 
 
147
 
148
  ## Environmental Impact
149
 
150
+ Carbon emissions and energy consumption estimates during model training:
151
+ - **Hardware Type**: 4x NVIDIA A100 GPUs.
152
+ - **Hours used**: ~3 hours of total training time.
153
+ - **Cloud Provider**: AWS.
154
+ - **Compute Region**: US-East.
155
+ - **Carbon Emitted**: Approximately 5 kg CO2e, as estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).
156
 
157
+ ## Technical Specifications
 
 
 
 
 
 
 
 
158
 
159
  ### Model Architecture and Objective
160
+ - **Model Architecture**: LoRA fine-tuned version of the Mistral-8B model, which is a transformer-based architecture optimized for instruction-following tasks.
161
+ - **Objective**: To predict daily trading decisions (buy/sell/hold) for BTC/ETH based on financial news and cryptocurrency price data.
162
 
163
  ### Compute Infrastructure
 
 
 
164
  #### Hardware
165
+ - **Training Hardware**: 4x NVIDIA A100 GPUs with 40GB of VRAM.
166
+ - **Inference Hardware**: Can be run on a single GPU with at least 24GB of VRAM.
167
 
168
  #### Software
169
+ - **Framework**: PEFT (Parameter Efficient Fine-Tuning) with Hugging Face Transformers.
170
+ - **Deep Learning Libraries**: PyTorch, Hugging Face Transformers.
171
+ - **Python Version**: 3.10
172
 
173
+ ## Citation
174
 
175
+ If you use this model in your work, please cite it as follows:
 
 
176
 
177
  **BibTeX:**
178
+ ```bibtex
179
+ @misc{CryptoTrader-LM,
180
+ author = {300k/ns team},
181
+ title = {CryptoTrader-LM: A LoRA-tuned Ministral-8B Model for Cryptocurrency Trading Decisions},
182
+ year = {2024},
183
+ publisher = {Hugging Face},
184
+ howpublished = {\url{https://huggingface.co/agarkovv/Ministral-8B-Instruct-2410-LoRA-trading}},
185
+ }
186
+ ```
187
 
188
  **APA:**
189
+ ```
190
+ 300k/ns team. (2024). CryptoTrader-LM: A LoRA-tuned Ministral-8B Model for Cryptocurrency Trading Decisions. Hugging Face. https://huggingface.co/agarkovv/Ministral-8B-Instruct-2410-LoRA-trading
191
+ ```
192
 
193
  ## Glossary [optional]
194
 
195
+ - **LoRA (Low-Rank Adaptation)**: A parameter-efficient fine-tuning method that reduces the number of trainable parameters by transforming the large matrices in transformers into low-rank decompositions, allowing for quicker and more memory-efficient fine-tuning.
196
+ - **BTC**: The ticker symbol for Bitcoin, a decentralized cryptocurrency.
197
+ - **ETH**: The ticker symbol for Ethereum, a decentralized cryptocurrency and blockchain platform.
198
+ - **Sharpe Ratio (SR)**: A measure of risk-adjusted return, used to evaluate the performance of an investment or trading strategy.
199
+ - **PnL (Profit and Loss)**: The financial gain or loss realized from trading over a specific time period.
200
 
201
  ## More Information [optional]
202
 
203
+ For more information on the training process, model performance, or any specific details, please contact the model authors.
204
 
205
  ## Model Card Authors [optional]
206
 
207
+ - 300k/ns
208
+ - Contact via Telegram: @allocfree
209
 
210
  ## Model Card Contact
211
 
212
+ For any inquiries, please contact via Telegram: @allocfree
213
+
214
+ ### Framework Versions
215
+
216
+ - **PEFT**: v0.13.2
217
+ - **Transformers**: v4.33.3
218
+ - **PyTorch**: v2.1.0
219
 
220
+ ---