Spaces:

khoatran94
/

cv_ocr_gradio

Sleeping

khoatran94 commited on Nov 21, 2024

Commit

c0cbe6c

1 Parent(s): d461b3a

test cv extraction

Files changed (1) hide show

app.py CHANGED Viewed

@@ -95,13 +95,15 @@ def LLM_Inference(cv_text):
     6. Language
     - List the languages mentioned in the CV along with proficiency levels (if specified).
-    Do not explain, comment or make up any more information that is not relative to the list of Information extraction. Respond in Vietnamese. Let's work this out in a step by step way to ensure the correct answer. [END].
     '''
-    text = 'who is Lê Duẩn'
     inputs = tokenizer(text, return_tensors='pt', max_length=2048,truncation=True).to(device)
     with torch.no_grad():
         outputs = model.generate(
-            **inputs, max_new_tokens=1024, pad_token_id = tokenizer.eos_token_id
         )
     return tokenizer.decode(outputs[0], skip_special_tokens=True)

     6. Language
     - List the languages mentioned in the CV along with proficiency levels (if specified).
+    Do not explain, comment or make up any more information that is not relative to the list of Information extraction. Respond in the CV language. Let's work this out in a step by step way to ensure the correct answer. Do not repeat the step
     '''
     inputs = tokenizer(text, return_tensors='pt', max_length=2048,truncation=True).to(device)
     with torch.no_grad():
         outputs = model.generate(
+            **inputs, max_new_tokens=1024, pad_token_id = tokenizer.eos_token_id,
+            top_p=0.99,                     # Nucleus sampling - only consider top 90% probability mass
+            top_k=1,                      # Top-k sampling - choose from top 50 tokens
+            temperature=0.0
         )
     return tokenizer.decode(outputs[0], skip_special_tokens=True)