Atefeh Sohrabizadeh commited on
Commit
4b8390f
·
1 Parent(s): a949e5a

Uploading model files

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 4096,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: []
3
+ library_name: sentence-transformers
4
+ pipeline_tag: sentence-similarity
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ widget: []
10
+ ---
11
+
12
+ # SentenceTransformer
13
+
14
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 4096-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+ - **Model Type:** Sentence Transformer
20
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
21
+ - **Maximum Sequence Length:** 4096 tokens
22
+ - **Output Dimensionality:** 4096 tokens
23
+ - **Similarity Function:** Cosine Similarity
24
+ <!-- - **Training Dataset:** Unknown -->
25
+ <!-- - **Language:** Unknown -->
26
+ <!-- - **License:** Unknown -->
27
+
28
+ ### Model Sources
29
+
30
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
31
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
32
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
33
+
34
+ ### Full Model Architecture
35
+
36
+ ```
37
+ SentenceTransformer(
38
+ (0): Transformer({'max_seq_length': 4096, 'do_lower_case': False}) with Transformer model: MistralBiDirectionalModel
39
+ (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
40
+ )
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ### Direct Usage (Sentence Transformers)
46
+
47
+ First install the Sentence Transformers library:
48
+
49
+ ```bash
50
+ pip install -U sentence-transformers
51
+ ```
52
+
53
+ Then you can load this model and run inference.
54
+ ```python
55
+ from sentence_transformers import SentenceTransformer
56
+
57
+ # Download from the 🤗 Hub
58
+ model = SentenceTransformer("sentence_transformers_model_id")
59
+ # Run inference
60
+ sentences = [
61
+ 'The weather is lovely today.',
62
+ "It's so sunny outside!",
63
+ 'He drove to the stadium.',
64
+ ]
65
+ embeddings = model.encode(sentences)
66
+ print(embeddings.shape)
67
+ # [3, 4096]
68
+
69
+ # Get the similarity scores for the embeddings
70
+ similarities = model.similarity(embeddings, embeddings)
71
+ print(similarities.shape)
72
+ # [3, 3]
73
+ ```
74
+
75
+ <!--
76
+ ### Direct Usage (Transformers)
77
+
78
+ <details><summary>Click to see the direct usage in Transformers</summary>
79
+
80
+ </details>
81
+ -->
82
+
83
+ <!--
84
+ ### Downstream Usage (Sentence Transformers)
85
+
86
+ You can finetune this model on your own dataset.
87
+
88
+ <details><summary>Click to expand</summary>
89
+
90
+ </details>
91
+ -->
92
+
93
+ <!--
94
+ ### Out-of-Scope Use
95
+
96
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
97
+ -->
98
+
99
+ <!--
100
+ ## Bias, Risks and Limitations
101
+
102
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
103
+ -->
104
+
105
+ <!--
106
+ ### Recommendations
107
+
108
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
109
+ -->
110
+
111
+ ## Training Details
112
+
113
+ ### Framework Versions
114
+ - Python: 3.11.10
115
+ - Sentence Transformers: 3.0.0
116
+ - Transformers: 4.37.2
117
+ - PyTorch: 2.2.0+cu121
118
+ - Accelerate: 0.34.2
119
+ - Datasets: 3.0.1
120
+ - Tokenizers: 0.15.2
121
+
122
+ ## Citation
123
+
124
+ ### BibTeX
125
+
126
+ <!--
127
+ ## Glossary
128
+
129
+ *Clearly define terms in order to be accessible across audiences.*
130
+ -->
131
+
132
+ <!--
133
+ ## Model Card Authors
134
+
135
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
136
+ -->
137
+
138
+ <!--
139
+ ## Model Card Contact
140
+
141
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
142
+ -->
bidirectional_models.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoConfig, AutoModel
2
+ from transformers.models.mistral.configuration_mistral import MistralConfig
3
+ from transformers.models.mistral.modeling_mistral import MistralModel
4
+
5
+ from typing import List, Optional, Tuple, Union
6
+ import torch
7
+ import torch.nn.functional as F
8
+ from torch import nn
9
+ from torch.nn import CrossEntropyLoss
10
+
11
+ #from transformers.models.mistral.modeling_mistral import MistralConfig, MistralModel,
12
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask_for_sdpa, _prepare_4d_attention_mask
13
+ from transformers.cache_utils import Cache, DynamicCache
14
+ from transformers.modeling_outputs import BaseModelOutputWithPast
15
+ from transformers.utils import (
16
+ logging
17
+ )
18
+
19
+ logger = logging.get_logger(__name__)
20
+
21
+ class MistralBiDirectionalConfig(MistralConfig):
22
+ model_type = 'mistralbidirectional'
23
+
24
+ class MistralBiDirectionalModel(MistralModel):
25
+ config_class = MistralBiDirectionalConfig
26
+
27
+ def __init__(self, config: MistralConfig):
28
+ super().__init__(config)
29
+ for layer in self.layers:
30
+ layer.self_attn.is_causal = False
31
+ self._attn_implementation = "eager"
32
+
33
+ def forward(
34
+ self,
35
+ input_ids: torch.LongTensor = None,
36
+ attention_mask: Optional[torch.Tensor] = None,
37
+ position_ids: Optional[torch.LongTensor] = None,
38
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
39
+ inputs_embeds: Optional[torch.FloatTensor] = None,
40
+ use_cache: Optional[bool] = None,
41
+ output_attentions: Optional[bool] = None,
42
+ output_hidden_states: Optional[bool] = None,
43
+ return_dict: Optional[bool] = None,
44
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
45
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
46
+ output_hidden_states = (
47
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
48
+ )
49
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
50
+
51
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
52
+
53
+ # retrieve input_ids and inputs_embeds
54
+ if input_ids is not None and inputs_embeds is not None:
55
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
56
+ elif input_ids is not None:
57
+ batch_size, seq_length = input_ids.shape
58
+ elif inputs_embeds is not None:
59
+ batch_size, seq_length, _ = inputs_embeds.shape
60
+ else:
61
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
62
+
63
+ if self.gradient_checkpointing and self.training:
64
+ if use_cache:
65
+ logger.warning_once(
66
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
67
+ )
68
+ use_cache = False
69
+
70
+ past_key_values_length = 0
71
+
72
+ if use_cache:
73
+ use_legacy_cache = not isinstance(past_key_values, Cache)
74
+ if use_legacy_cache:
75
+ past_key_values = DynamicCache.from_legacy_cache(past_key_values)
76
+ past_key_values_length = past_key_values.get_usable_length(seq_length)
77
+
78
+ if position_ids is None:
79
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
80
+ position_ids = torch.arange(
81
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
82
+ )
83
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
84
+ else:
85
+ position_ids = position_ids.view(-1, seq_length).long()
86
+
87
+ if inputs_embeds is None:
88
+ inputs_embeds = self.embed_tokens(input_ids)
89
+
90
+ # TODO: make sure to pass correct attention_mask if you use cache and flash_attention_2
91
+ if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
92
+ is_padding_right = attention_mask[:, -1].sum().item() != batch_size
93
+ if is_padding_right:
94
+ raise ValueError(
95
+ "You are attempting to perform batched generation with padding_side='right'"
96
+ " this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to "
97
+ " call `tokenizer.padding_side = 'left'` before tokenizing the input. "
98
+ )
99
+ original_attention_mask = attention_mask
100
+ if self._attn_implementation == "flash_attention_2":
101
+ # 2d mask is passed through the layers
102
+ attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
103
+ raise Exception("bi-directional maks is not implemented for flash attention 2")
104
+ elif self._attn_implementation == "sdpa" and not output_attentions:
105
+ bidirectional_attention_mask = _prepare_4d_attention_mask_for_sdpa(
106
+ original_attention_mask,
107
+ inputs_embeds.dtype
108
+ )
109
+ else:
110
+ bidirectional_attention_mask = _prepare_4d_attention_mask(
111
+ original_attention_mask,
112
+ inputs_embeds.dtype
113
+ )
114
+
115
+ hidden_states = inputs_embeds
116
+
117
+ # decoder layers
118
+ all_hidden_states = () if output_hidden_states else None
119
+ all_self_attns = () if output_attentions else None
120
+ next_decoder_cache = None
121
+
122
+ for decoder_layer in self.layers:
123
+ if output_hidden_states:
124
+ all_hidden_states += (hidden_states,)
125
+
126
+ if self.gradient_checkpointing and self.training:
127
+ layer_outputs = self._gradient_checkpointing_func(
128
+ decoder_layer.__call__,
129
+ hidden_states,
130
+ bidirectional_attention_mask,
131
+ position_ids,
132
+ past_key_values,
133
+ output_attentions,
134
+ use_cache,
135
+ )
136
+ else:
137
+ layer_outputs = decoder_layer(
138
+ hidden_states,
139
+ attention_mask=bidirectional_attention_mask,
140
+ position_ids=position_ids,
141
+ past_key_value=past_key_values,
142
+ output_attentions=output_attentions,
143
+ use_cache=use_cache,
144
+ )
145
+
146
+ hidden_states = layer_outputs[0]
147
+
148
+ if use_cache:
149
+ next_decoder_cache = layer_outputs[2 if output_attentions else 1]
150
+
151
+ if output_attentions:
152
+ all_self_attns += (layer_outputs[1],)
153
+
154
+ hidden_states = self.norm(hidden_states)
155
+
156
+ # add hidden states from the last decoder layer
157
+ if output_hidden_states:
158
+ all_hidden_states += (hidden_states,)
159
+
160
+ next_cache = None
161
+ if use_cache:
162
+ next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
163
+
164
+ if not return_dict:
165
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
166
+ return BaseModelOutputWithPast(
167
+ last_hidden_state=hidden_states,
168
+ past_key_values=next_cache,
169
+ hidden_states=all_hidden_states,
170
+ attentions=all_self_attns,
171
+ )
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nvidia/NV-EmbedCode-7b-v1",
3
+ "architectures": [
4
+ "MistralBiDirectionalModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "bidirectional_models.MistralBiDirectionalConfig",
9
+ "AutoModel": "bidirectional_models.MistralBiDirectionalModel"
10
+ },
11
+ "bos_token_id": 1,
12
+ "eos_token_id": 2,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 4096,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 14336,
17
+ "max_position_embeddings": 32768,
18
+ "model_type": "mistralbidirectional",
19
+ "num_attention_heads": 32,
20
+ "num_hidden_layers": 32,
21
+ "num_key_value_heads": 8,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_theta": 10000.0,
24
+ "sliding_window": 4096,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.37.2",
28
+ "use_cache": true,
29
+ "vocab_size": 32000
30
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.0",
4
+ "transformers": "4.37.2",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01fa665441f5d5146229e6ba30c9900060e652d6c5fc3549778ac7b017066a69
3
+ size 4987196648
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0725c1fd085ef6834d57850266b4a5bad3fe20c23b7c6e96d20fdc674976d1d9
3
+ size 4899116152
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06a46f14ef742551f9186862260eb5fe7be1c5f4d263ea3dfbd061730d6c9249
3
+ size 4999812808
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d113a3619813f0b2f198c5794c2d4a51a3fb59d22664d8e5558ccddb6f3bb8cb
3
+ size 4999812808
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a44b7d205a94f9de16fa59acfe5b6b6470912304fd88dcf40143486fd7fffdba
3
+ size 4832007216
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:624ea48c4a385ddafd3420d558bb2071b1cb54e6951c6e6d36414bb9056adf52
3
+ size 3724726552
model.safetensors.index.json ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 28442640384
4
+ },
5
+ "weight_map": {
6
+ "embed_tokens.weight": "model-00001-of-00006.safetensors",
7
+ "layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
8
+ "layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
9
+ "layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
10
+ "layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
11
+ "layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
12
+ "layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
13
+ "layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
14
+ "layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
15
+ "layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
16
+ "layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
17
+ "layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
18
+ "layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
19
+ "layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
20
+ "layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
21
+ "layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
22
+ "layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
23
+ "layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
24
+ "layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
25
+ "layers.10.input_layernorm.weight": "model-00003-of-00006.safetensors",
26
+ "layers.10.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
27
+ "layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
28
+ "layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
29
+ "layers.10.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
30
+ "layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
31
+ "layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
32
+ "layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
33
+ "layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
34
+ "layers.11.input_layernorm.weight": "model-00003-of-00006.safetensors",
35
+ "layers.11.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
36
+ "layers.11.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
37
+ "layers.11.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
38
+ "layers.11.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
39
+ "layers.11.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
40
+ "layers.11.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
41
+ "layers.11.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
42
+ "layers.11.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
43
+ "layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
44
+ "layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
45
+ "layers.12.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
46
+ "layers.12.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
47
+ "layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
48
+ "layers.12.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
49
+ "layers.12.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
50
+ "layers.12.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
51
+ "layers.12.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
52
+ "layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
53
+ "layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
54
+ "layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
55
+ "layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
56
+ "layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
57
+ "layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
58
+ "layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
59
+ "layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
60
+ "layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
61
+ "layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
62
+ "layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
63
+ "layers.14.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
64
+ "layers.14.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
65
+ "layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
66
+ "layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
67
+ "layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
68
+ "layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
69
+ "layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
70
+ "layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
71
+ "layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
72
+ "layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
73
+ "layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
74
+ "layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
75
+ "layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
76
+ "layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
77
+ "layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
78
+ "layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
79
+ "layers.16.input_layernorm.weight": "model-00004-of-00006.safetensors",
80
+ "layers.16.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
81
+ "layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
82
+ "layers.16.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
83
+ "layers.16.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
84
+ "layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
85
+ "layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
86
+ "layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
87
+ "layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
88
+ "layers.17.input_layernorm.weight": "model-00004-of-00006.safetensors",
89
+ "layers.17.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
90
+ "layers.17.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
91
+ "layers.17.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
92
+ "layers.17.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
93
+ "layers.17.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
94
+ "layers.17.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
95
+ "layers.17.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
96
+ "layers.17.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
97
+ "layers.18.input_layernorm.weight": "model-00004-of-00006.safetensors",
98
+ "layers.18.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
99
+ "layers.18.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
100
+ "layers.18.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
101
+ "layers.18.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
102
+ "layers.18.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
103
+ "layers.18.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
104
+ "layers.18.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
105
+ "layers.18.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
106
+ "layers.19.input_layernorm.weight": "model-00004-of-00006.safetensors",
107
+ "layers.19.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
108
+ "layers.19.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
109
+ "layers.19.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
110
+ "layers.19.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
111
+ "layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
112
+ "layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
113
+ "layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
114
+ "layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
115
+ "layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
116
+ "layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
117
+ "layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
118
+ "layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
119
+ "layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
120
+ "layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
121
+ "layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
122
+ "layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
123
+ "layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
124
+ "layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
125
+ "layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
126
+ "layers.20.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
127
+ "layers.20.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
128
+ "layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
129
+ "layers.20.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
130
+ "layers.20.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
131
+ "layers.20.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
132
+ "layers.20.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
133
+ "layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
134
+ "layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
135
+ "layers.21.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
136
+ "layers.21.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
137
+ "layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
138
+ "layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
139
+ "layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
140
+ "layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
141
+ "layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
142
+ "layers.22.input_layernorm.weight": "model-00005-of-00006.safetensors",
143
+ "layers.22.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
144
+ "layers.22.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
145
+ "layers.22.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
146
+ "layers.22.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
147
+ "layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
148
+ "layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
149
+ "layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
150
+ "layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
151
+ "layers.23.input_layernorm.weight": "model-00005-of-00006.safetensors",
152
+ "layers.23.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
153
+ "layers.23.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
154
+ "layers.23.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
155
+ "layers.23.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
156
+ "layers.23.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
157
+ "layers.23.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
158
+ "layers.23.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
159
+ "layers.23.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
160
+ "layers.24.input_layernorm.weight": "model-00005-of-00006.safetensors",
161
+ "layers.24.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
162
+ "layers.24.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
163
+ "layers.24.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
164
+ "layers.24.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
165
+ "layers.24.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
166
+ "layers.24.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
167
+ "layers.24.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
168
+ "layers.24.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
169
+ "layers.25.input_layernorm.weight": "model-00005-of-00006.safetensors",
170
+ "layers.25.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
171
+ "layers.25.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
172
+ "layers.25.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
173
+ "layers.25.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
174
+ "layers.25.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
175
+ "layers.25.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
176
+ "layers.25.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
177
+ "layers.25.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
178
+ "layers.26.input_layernorm.weight": "model-00005-of-00006.safetensors",
179
+ "layers.26.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
180
+ "layers.26.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
181
+ "layers.26.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
182
+ "layers.26.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
183
+ "layers.26.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
184
+ "layers.26.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
185
+ "layers.26.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
186
+ "layers.26.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
187
+ "layers.27.input_layernorm.weight": "model-00006-of-00006.safetensors",
188
+ "layers.27.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
189
+ "layers.27.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
190
+ "layers.27.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
191
+ "layers.27.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
192
+ "layers.27.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
193
+ "layers.27.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
194
+ "layers.27.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
195
+ "layers.27.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
196
+ "layers.28.input_layernorm.weight": "model-00006-of-00006.safetensors",
197
+ "layers.28.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
198
+ "layers.28.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
199
+ "layers.28.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
200
+ "layers.28.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
201
+ "layers.28.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
202
+ "layers.28.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
203
+ "layers.28.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
204
+ "layers.28.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
205
+ "layers.29.input_layernorm.weight": "model-00006-of-00006.safetensors",
206
+ "layers.29.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
207
+ "layers.29.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
208
+ "layers.29.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
209
+ "layers.29.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
210
+ "layers.29.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
211
+ "layers.29.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
212
+ "layers.29.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
213
+ "layers.29.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
214
+ "layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
215
+ "layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
216
+ "layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
217
+ "layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
218
+ "layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
219
+ "layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
220
+ "layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
221
+ "layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
222
+ "layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
223
+ "layers.30.input_layernorm.weight": "model-00006-of-00006.safetensors",
224
+ "layers.30.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
225
+ "layers.30.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
226
+ "layers.30.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
227
+ "layers.30.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
228
+ "layers.30.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
229
+ "layers.30.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
230
+ "layers.30.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
231
+ "layers.30.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
232
+ "layers.31.input_layernorm.weight": "model-00006-of-00006.safetensors",
233
+ "layers.31.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
234
+ "layers.31.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
235
+ "layers.31.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
236
+ "layers.31.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
237
+ "layers.31.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
238
+ "layers.31.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
239
+ "layers.31.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
240
+ "layers.31.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
241
+ "layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
242
+ "layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
243
+ "layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
244
+ "layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
245
+ "layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
246
+ "layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
247
+ "layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
248
+ "layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
249
+ "layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
250
+ "layers.5.input_layernorm.weight": "model-00002-of-00006.safetensors",
251
+ "layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
252
+ "layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
253
+ "layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
254
+ "layers.5.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
255
+ "layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
256
+ "layers.5.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
257
+ "layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
258
+ "layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
259
+ "layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
260
+ "layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
261
+ "layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
262
+ "layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
263
+ "layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
264
+ "layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
265
+ "layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
266
+ "layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
267
+ "layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
268
+ "layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
269
+ "layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
270
+ "layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
271
+ "layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
272
+ "layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
273
+ "layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
274
+ "layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
275
+ "layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
276
+ "layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
277
+ "layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
278
+ "layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
279
+ "layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
280
+ "layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
281
+ "layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
282
+ "layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
283
+ "layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
284
+ "layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
285
+ "layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
286
+ "layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
287
+ "layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
288
+ "layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
289
+ "layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
290
+ "layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
291
+ "layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
292
+ "layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
293
+ "layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
294
+ "layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
295
+ "norm.weight": "model-00006-of-00006.safetensors"
296
+ }
297
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 4096,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": true,
35
+ "model_max_length": 4096,
36
+ "pad_token": "</s>",
37
+ "padding_side": "left",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }