File size: 5,142 Bytes
aec7327
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67d476d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
library_name: peft
base_model: meta-llama/Llama-2-7b-chat-hf
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

### Infrence Function

#### for keyword brand


        def generate_brand(keyword):
            # Define the roles and markers
            B_INST, E_INST = "[INST]", "[/INST]"
            B_KW, E_KW = "[KW]", "[/KW]"
            # Format your prompt template
            prompt = f"""{B_INST} Extract the brand from keyword related to brand loyalty intent.{E_INST}\n
            {B_KW} {keyword} {E_KW}
            """
            # print("Prompt:")
            # print(prompt)
            encoding = tokenizer(prompt, return_tensors="pt").to("cuda:0")
            output =model.generate(input_ids=encoding.input_ids,
                                    attention_mask=encoding.attention_mask,
                                    max_new_tokens=20,
                                    do_sample=True,
                                    temperature=0.01,
                                    eos_token_id=tokenizer.eos_token_id,
                                    top_k=0)
            #print()
            # Subtract the length of input_ids from output to get only the model's response
            output_text = tokenizer.decode(output[0, len(encoding.input_ids[0]):], skip_special_tokens=False)
            output_text = re.sub('\n+', '\n', output_text)  # remove excessive newline characters
            #print("Generated Assistant Response:")
            return output_text

#### for keyword category

        def generate_cat(list_cat,keyword):
          # Define the roles and markers
          B_INST, E_INST = "[INST]", "[/INST]"
          B_KW, E_KW = "[KW]", "[/KW]"
          # Format your prompt template
          prompt = f"""{B_INST} Analyze the following keyword searched on amazon with intent of shopping. Identify the product category from the list {list_cat} {E_INST}\n
          {B_KW} {keyword} {E_KW}
          """
          # print("Prompt:")
          # print(prompt)
          encoding = tokenizer(prompt, return_tensors="pt").to("cuda:0")
          output =model.generate(input_ids=encoding.input_ids,
                                  attention_mask=encoding.attention_mask,
                                  max_new_tokens=20,
                                  do_sample=True,
                                  temperature=0.01,
                                  eos_token_id=tokenizer.eos_token_id,
                                  top_k=0)
          #print()
          # Subtract the length of input_ids from output to get only the model's response
          output_text = tokenizer.decode(output[0, len(encoding.input_ids[0]):], skip_special_tokens=False)
          output_text = re.sub('\n+', '\n', output_text)  # remove excessive newline characters
          #print("Generated Assistant Response:")
          return output_text

#### for keyword category and brand

        def generate_cat(list_cat,keyword):
          # Define the roles and markers
          B_INST, E_INST = "[INST]", "[/INST]"
          B_KW, E_KW = "[KW]", "[/KW]"
          # Format your prompt template
          prompt = f"""{B_INST} Analyze the following keyword searched on amazon with intent of shopping. Identify the product category from the list {list_cat}.
          Extract the brand from keyword related to brand loyalty intent. Output in JSON with keyword, product category, brand as keys.{E_INST}\n
          {B_KW} {keyword} {E_KW}
          """
          # print("Prompt:")
          # print(prompt)
          encoding = tokenizer(prompt, return_tensors="pt").to("cuda:0")
          output =model.generate(input_ids=encoding.input_ids,
                                  attention_mask=encoding.attention_mask,
                                  max_new_tokens=20,
                                  do_sample=True,
                                  temperature=0.01,
                                  eos_token_id=tokenizer.eos_token_id,
                                  top_k=0)
          #print()
          # Subtract the length of input_ids from output to get only the model's response
          output_text = tokenizer.decode(output[0, len(encoding.input_ids[0]):], skip_special_tokens=False)
          output_text = re.sub('\n+', '\n', output_text)  # remove excessive newline characters
          #print("Generated Assistant Response:")
          return output_text