Feature Extraction
Transformers
Safetensors
diva
custom_code
Helw150 commited on
Commit
bd8b270
1 Parent(s): 3999209
Files changed (2) hide show
  1. README.md +17 -17
  2. modeling_diva.py +2 -1
README.md CHANGED
@@ -12,6 +12,22 @@ This is an end-to-end Voice Assistant Model which can handle speech and text as
12
 
13
  See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ### Inference Example
16
  ```python
17
  from transformers import AutoModel
@@ -44,22 +60,6 @@ print(
44
  )
45
  ```
46
 
47
- ## Citation
48
- **BibTeX:**
49
-
50
- ```
51
- @misc{DiVA,
52
- title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
53
- author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
54
- year={2024},
55
- eprint={2410.02678},
56
- archivePrefix={arXiv},
57
- primaryClass={cs.CL},
58
- url={https://arxiv.org/abs/2410.02678},
59
- }
60
-
61
- ```
62
-
63
  ## Table of Contents
64
 
65
  - [Model Card for DiVA Llama 3](#model-card-for-DiVA-Llama-3)
@@ -114,4 +114,4 @@ Will Held
114
 
115
  ## Model Card Contact
116
 
117
 
12
 
13
  See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
14
 
15
+ ## Citation
16
+ **BibTeX:**
17
+
18
+ ```
19
+ @misc{DiVA,
20
+ title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
21
+ author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
22
+ year={2024},
23
+ eprint={2410.02678},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2410.02678},
27
+ }
28
+
29
+ ```
30
+
31
  ### Inference Example
32
  ```python
33
  from transformers import AutoModel
 
60
  )
61
  ```
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## Table of Contents
64
 
65
  - [Model Card for DiVA Llama 3](#model-card-for-DiVA-Llama-3)
 
114
 
115
  ## Model Card Contact
116
 
117
modeling_diva.py CHANGED
@@ -263,7 +263,8 @@ class DiVAModel(PreTrainedModel):
263
  else:
264
  greedy = next_token_logits.argmax(dim=-1)
265
  for token_index, out in enumerate(greedy.flatten().tolist()):
266
- outs[token_index].append(out)
 
267
  if out == 128009:
268
  complete[token_index] = True
269
 
 
263
  else:
264
  greedy = next_token_logits.argmax(dim=-1)
265
  for token_index, out in enumerate(greedy.flatten().tolist()):
266
+ if not complete[token_index]:
267
+ outs[token_index].append(out)
268
  if out == 128009:
269
  complete[token_index] = True
270