gtamkaleidoscope commited on
Commit
5e4ac86
1 Parent(s): 2ab23fa

Upload folder using huggingface_hub

Browse files
hf-space-upload.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding: utf-8
3
+
4
+ # In[6]:
5
+
6
+
7
+ from huggingface_hub import HfApi
8
+ api = HfApi()
9
+
10
+ api.upload_folder(
11
+ folder_path="",
12
+ repo_id="kaleidoscope-data/data-cleaning-llm",
13
+ repo_type="space",
14
+ )
15
+
prompts/gpt4-system-message.txt CHANGED
@@ -1,81 +1,12 @@
1
- I am going to provide a data set of marijuana products and their metadata. Using the information I provide, I want you to provide me with the following information about the products.
2
 
3
- sku
4
- product_name
5
- Brand (brand)
6
- product category (product_category)
7
- sub product category (sub_product_category)
8
- strain name (strain_name)
9
- the product’s weight in grams (product_weight_grams)
10
-
11
- The only acceptable values for product category are below. Only respond with a product category in the list below.
12
-
13
- Grow Products
14
- Concentrate
15
- Preroll
16
- Vape
17
- Edible
18
- Accessory
19
- Wellness
20
- Flower
21
-
22
- The only acceptable values for sub product category are below. Only respond with a sub product category in the list below.
23
-
24
- Cookies Dough
25
- Packwoods Blunt
26
- Promo/ Sample
27
- Natural Terp Series
28
- Capsule
29
- Mushroom Caps
30
- Beverage
31
- Cookies
32
- Live Flower Series
33
- Cured Resin
34
- Mint
35
- Napalm
36
- CBD Tincture/Caps/etc
37
- Liquid Flower
38
- Cookie Dough
39
- Badder
40
- 510 cart
41
- Gpen 0.5
42
- Blunt
43
- Shatter
44
- Solventless Rosin
45
- Diamonds
46
- Raw Garden
47
- Diamonds and Sauce
48
- Sugar
49
- Dry Flower Series
50
- Cubano
51
- Chocolate
52
- Flan
53
- Infused Blunt
54
- Terp Sauce
55
- Bud
56
- Disposable
57
- Gummies
58
- Infused Joint
59
- Dart Pod 0.5
60
- Rosin
61
- Joint
62
 
63
  Additional requirements:
64
 
65
- Do not automatically assume that the information in the data set I provide is accurate.
66
- Break out the response into multiple messages if necessary, do not give me an incomplete response.
67
- Format the response in a csv codeblock
68
- Take note to convert units into grams when necessary.
69
- Product weights and strain names are only applicable for the following product categories: concentrate, preroll, vape, flower
70
- Only provide product weights and strain nam
71
- Break out the response into multiple messages if necessary, do not give an incomplete response.
72
- Give preference to the “Bud” sub product category instead of “Dry flower series” unless you are confident.
73
- Take note there are some products with multiple units, make sure to multiply the amount by the weight to calculate product weight.
74
- Look for clues in the product name to determine what brand/ product category/ sub product category/ and strain name the product should fall under. For Vape products, consider the words before 'Cartridge' or 'Cart' in the product name as potential strain names.
75
-
76
-
77
-
78
- Return clean dataset in csv format with the following columns
79
-
80
- product_name, brand, product_category, strain_name, product_weight_grams
81
 
 
 
1
+ I am going to provide marijuana product information. Using the information I provide, I want you to provide me with the following information about the product.
2
 
3
+ - Brand (brand)
4
+ - product category (product_category)
5
+ - sub product category (sub_product_category)
6
+ - strain name (strain_name)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  Additional requirements:
9
 
10
+ - DO NOT EXPLAIN YOUR SELF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ Product data below