VNTL Settings Guide for ST
Alright, I think I've mess around with this model enough to create a guide, so here we go...
SillyTavern
Prerequisite
- lmg-anon/vntl-llama3-8b-v2-hf (Obviously)
- Imatrix GGUFs by yours truly (Uses a multilingual fork of Bartowski's imatrix dataset. GGUF requests are welcomed.)
- Snowflake/snowflake-arctic-embed-l-v2.0 (Optional. If you're using RAG and your pc can handle a bigger embedding model, use this instead of default)
- GGUFs by yours truly (GGUF requests are welcomed.)
- SillyTavern
- Extensions
- Required:
- LALib
https://github.com/LenAnderson/SillyTavern-LALib
- Message Actions
https://github.com/LenAnderson/SillyTavern-MessageActions
- Send Button
https://github.com/LenAnderson/SillyTavern-SendButton
- LALib
- Recommended:
- Input History
https://github.com/LenAnderson/SillyTavern-InputHistory
- Keyboard
https://github.com/LenAnderson/SillyTavern-Keyboard
- Notebook
https://github.com/SillyTavern/Extension-Notebook
- Chat Top Bar
https://github.com/SillyTavern/Extension-TopInfoBar
- Backup Browser
https://github.com/LenAnderson/SillyTavern-BackupsBrowser
- Message Limit
https://github.com/SillyTavern/Extension-MessageLimit
- Auto Focus
https://github.com/LenAnderson/SillyTavern-AutoFocus
- Input History
- Required:
- My ST Master Preset
- AutoHotkey
- Luna Translator
- an LLM Backend(KoboldCpp recommended. If you want to host a Quanted embedding model, use llama.cpp with the
--embedding
flag) - PC specs that allow you to run VNTL at atleast
10 t/s
(5 t/s
if you're desperate enough)
Backend Settings
Due to overfitting with 1k token only dataset, translation quality diminishes past the 1k point, but with the help of RoPE, this isn't too much of an issue, so if you use SillyTavern instead of Luna Translator as your LLM frontend, then use these backend configs to fix the quality drop.
Context: 4096
Custom RoPE:
- RoPE Base: 6315084.4
TextGen
While the model card says to use neutral samplers with 0 temp, I've found these settings to work rather well, increasing writing quality, and even translation accuracy. Keep in mind the golden garbage in, garbage out
rule and fix any mistakes you find when using these settings.
QR's
Note: Import my QR set instead. It's more up-to-date.
Eventually you'll end up with some translation weirdness that's hard to figure out and fix, in cases like that you can start the context anew by hiding all messages from the prompt. Here's the QR to do so.
/hide 0-{{lastMessageId}}
There will also be cases where you notice lag. SillyTavern isn't made to handle large amounts of messages loaded at once, so make sure to enable a message load limit when opening a chat in User Settings
> Chat/Message Handling
> # Msg. to Load
Once done that, create a QR to reload the chat whenever things start to get a bit laggy:
/chat-reload
QoL
Document Mode: setting chat style tp document mode allows to quickly enter message edit mode with any chat message by simply double clicking.
Auto-save message edits: Enabling auto-saving for message edits allows to quickly leave message edit mode with Esc
and pairs well with document mode.
Setting Up RAG/Vector Storage for Better Translation Quality/Consistency
1. Vector Storage
Use the following images to set up Vector Storage:
1. Main STscript
Note: You can also download and import my Message Actions QR Set and skip to the final paragraph of this part.
Create a new QRset called MTL MesAct
and create a QR called Chat-Pair RAG
. Paste the following script:
/message-get {{mes::id}} |
/= pipe.is_user |
/let isuser {{pipe}} |
/if left={{var::isuser}} else={:
/abort quiet=false QR must be used on a user role message. Aborting. |
:}
{:
/if left={{mes::id}} right={{lastmessageid}} rule=eq {:
/abort quiet=false No message pair. Aborting. |
:}|
:}|
/add {{mes::id}} 1 |
/let engid {{pipe}} |
/message-get {{var::engid}} |
/= pipe.is_user |
/let isassist {{pipe}} |
/if left=isassist {:
/abort quiet=false Improper pair. Aborting. |
:}|
/message-get {{mes::id}} |
/= pipe.name |
/let name_user {{pipe}} |
/message-get {{var::engid}} |
/= pipe.name |
/let name_assist {{pipe}} |
/messages names=on {{mes::id}}-{{var::engid}} |
/let mesGrab {{pipe}} |
/re-replace find="/{{var::name_user}}: /" replace="<\|start_header_id\|>Japanese<\|end_header_id\|>{{newline}}{{newline}}" {{var::mesGrab}} |
/re-replace find="/\n\n{{var::name_assist}}: /" replace="<\|eot_id\|><\|start_header_id\|>English<\|end_header_id\|>{{newline}}{{newline}}" {{pipe}} |
/let mesInst "{{pipe}}<\|eot_id\|>" |
/let entName "" |
/input rows=1 Write data-back entry name(Recommended to use the name of the character speaking) |
/if left="{{pipe}}" right="" rule=eq else={:
/var key=entName as=string "{{pipe}}" |
:}
{:
/var key=entName as=string "Translation-Snip" |
:}|
/databank-add name="{{var::entName}}_{{mes::id}}-{{var::engid}}" "{{var::mesInst}}" |
Set to the following Icon:
Once done, send the cmd /messageactions "MTL MesAct" |
to add it as a button to messages in the expandable menu.
3. Usage
The best way to use RAG to increase quality is to turn every chat pair where you've fixed the translation, and only your fixed translations.
Make sure to only add fixes you are absolutely positive is correct, else RAG will have the opposite effect on translation quality
Luna
ToDo
AutoHotkey Setup
Finally, The AutoHotkey script to tie it togetther. When the Japanese text is sent to the clipboard, AutoHotkey will send the text in SillyTavern to be translated before switching back to the set target window.
Hotkeys
Ctrl + Alt + T
: Target window for switching back to after sending the Japanese text.Ctrl + Alt + E
: Enable/Disable Auto-pasting to SillyTavern
The Script
#SingleInstance Force
#Requires AutoHotKey v2.0+
previousClipboard := A_Clipboard
targetWindow := ""
grabToggle := true
CheckClipboard() {
global previousClipboard
currentClipboard := A_Clipboard
if (currentClipboard != previousClipboard) {
previousClipboard := currentClipboard
HandleClipboardChange(currentClipboard)
}
}
HandleClipboardChange(currentClipboard) {
win := WinExist("SillyTavern")
if (win AND grabToggle) {
WinActivate(win)
WinWaitActive(win)
Send(currentClipboard)
Sleep(500)
Send("{Enter}")
Sleep(500)
WindowSwitchBack()
}
}
WindowSwitchBack() {
global targetWindow
win := WinExist(targetWindow)
if (win) {
WinActivate(win)
WinWaitActive(win)
}
}
SetTimer(CheckClipboard, 500)
^!T:: {
global targetWindow
targetWindow := WinGetTitle("A")
MsgBox("Target window set to: " targetWindow,,"T2")
}
^!E:: {
global grabToggle
if (grabToggle) {
grabToggle := false
MsgBox("Auto-paste: Disabled",,"T2")
} else {
grabToggle := true
MsgBox("Auto-paste: Enabled",,"T2")
}
}
Update 1:
- Added
QoL
section and updated sampler settings inTextGen
Update 2:
- Added RAG setup guide
- Added
Prerequisite
section - Updated AutoHotkey script
Update 3:
- Update Vector Storage settings
Update 4:
- Update
Prerequisite
section- Added Auto Focus extension
- Added embedding model recommendation and GGUFs
- Added VNTL and GGUFs
- Update recommended backends
Update 5:
- Update QR STscripts
- Added links to QR set exports
- Edit and reformat AutoHotkey setup