Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

the answer is not terminated correctly

#58
by captainst - opened

I am using the 8-bit quantized version:

model = transformers.AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=True
)

And inference:

fmt_ex = """

Instruction: Please write a poem with less than 200 words.

Response:

"""

with torch.autocast('cuda', dtype=torch.float16):
print(
pipe(fmt_ex,
max_new_tokens=256,
do_sample=True,
use_cache=True))

I can see that the last word is ended by "#", but then some random characters appear:

He will know I am here and I am meant to be by his side\n#d-link dl6000 # dlink dl6000 \n# dl6000\n- #n\n#d-link #dlinkdl6000 # dl60\n#dlink # dlinkdl6000 # dl6000 #dl\n- #dlk\n#d-link dl6\n#dlk\n##dlink dl6\n\n#d-link #dlink #dlinkdl6000 #dl\n#d

It seems to be a problem with the special tokens.

it turns out that I needed to set "stopping_criteria" when building the pipeline. I did not realize this point since many huggingface models already implemented this in their custom code.

captainst changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment