Spaces:
Running
Running
add dolphin model to benchmark 2
Browse files
Benchmark2/dolphin-2-2-1-mistral-7b.ipynb
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"metadata":{"accelerator":"GPU","colab":{"gpuType":"T4","provenance":[]},"gpuClass":"standard","kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"nvidiaTeslaT4","dataSources":[{"sourceId":7571253,"sourceType":"datasetVersion","datasetId":4407676},{"sourceId":7678915,"sourceType":"datasetVersion","datasetId":4479814},{"sourceId":7713636,"sourceType":"datasetVersion","datasetId":4504654},{"sourceId":7964016,"sourceType":"datasetVersion","datasetId":4685329},{"sourceId":8017122,"sourceType":"datasetVersion","datasetId":4723613}],"dockerImageVersionId":30683,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":true}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"<h2><center><font color=#D40004><u>Benchmark 2: dolphin-2.2.1-mistral-7b </u></font></center></h2>\n","metadata":{}},{"cell_type":"markdown","source":"<div style=\"padding: 40px; background: linear-gradient(135deg, #f5f7fa, #cdd2d8); border: 3px groove #d1d8e0; border-radius: 30px; box-shadow: 0 10px 25px rgba(0,0,0,0.1); font-size: 120%; line-height: 1.9; color: #333; font-family: 'Georgia', serif; text-align: justify; position: relative;\">\n <h2 style=\"color: #2c3e50; font-size: 150%; border-bottom: 3px solid #3498db; display: inline-block; padding-bottom: 10px; margin-bottom: 20px;\">\n Notebook Gool\n </h2>\n <p style=\"font-size: 140%; color: #34495e; letter-spacing: 1px;\">\nThe objective of this notebook is to evaluate the performance of dolphin-2.2.1-mistral-7b and OpenHermes using the Table-extract Benchmark dataset available at <a href=\"https://huggingface.co/datasets/Effyis/Table-Extraction\" style=\"color: #fffff; text-decoration: none;\">Hugging Face.</a></p>\n</div>\n","metadata":{}},{"cell_type":"markdown","source":"# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:center; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'></span></b> <b>Table of Content</b></div>\n\n* [I. Loading and Importing Libraries](#1)\n* [II. Definition and Implementation of Metrics](#2)\n* [III. Clean Response Obtained by LLM](#3)\n* [IV. Data Preparation](#5)\n* [V. Benchmark](#6)\n * [Prompt](#61)\n * [dolphin-2.2.1-mistral-7b](#62)","metadata":{}},{"cell_type":"markdown","source":"<a id='1'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>I |</span></b> <b>Loading and Importing Libraries</b></div>\n","metadata":{}},{"cell_type":"code","source":"%%capture\n!pip install google-generativeai\n!pip install --upgrade pip\n!pip install bitsandbytes\n!pip install transformers","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:24:27.841516Z","iopub.execute_input":"2024-04-17T13:24:27.842129Z","iopub.status.idle":"2024-04-17T13:25:43.177008Z","shell.execute_reply.started":"2024-04-17T13:24:27.842064Z","shell.execute_reply":"2024-04-17T13:25:43.175675Z"},"trusted":true},"execution_count":1,"outputs":[]},{"cell_type":"code","source":"import re\nimport json\nfrom tqdm import tqdm\nimport pandas as pd\nfrom datasets import load_dataset, Dataset\nfrom wand.image import Image as WImage\nimport torch\nimport pandas as pd\nfrom transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\nimport time \nimport random\nimport numpy as np","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:25:43.179621Z","iopub.execute_input":"2024-04-17T13:25:43.180056Z","iopub.status.idle":"2024-04-17T13:25:52.333635Z","shell.execute_reply.started":"2024-04-17T13:25:43.180013Z","shell.execute_reply":"2024-04-17T13:25:52.332414Z"},"trusted":true},"execution_count":2,"outputs":[]},{"cell_type":"code","source":"import google.generativeai as genai\nimport time \ngenai.configure(api_key=\"AIzaSyAhz9UBzkEIYI886zZRm40qqB1Kd_9Y4-0\")","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:25:52.335375Z","iopub.execute_input":"2024-04-17T13:25:52.335913Z","iopub.status.idle":"2024-04-17T13:25:53.003314Z","shell.execute_reply.started":"2024-04-17T13:25:52.335882Z","shell.execute_reply":"2024-04-17T13:25:53.002372Z"},"trusted":true},"execution_count":3,"outputs":[]},{"cell_type":"code","source":"# Set random seed for reproducibility\nrandom.seed(42)\nnp.random.seed(42)\ntorch.manual_seed(42)\ntorch.cuda.manual_seed_all(42)\ntorch.backends.cudnn.deterministic = True\ntorch.backends.cudnn.benchmark = False","metadata":{"execution":{"iopub.status.busy":"2024-04-17T11:25:06.673288Z","iopub.execute_input":"2024-04-17T11:25:06.674130Z","iopub.status.idle":"2024-04-17T11:25:06.681886Z","shell.execute_reply.started":"2024-04-17T11:25:06.674097Z","shell.execute_reply":"2024-04-17T11:25:06.681086Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":"<a id='2'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>II |</span></b> <b>Definition and Implementation of Metrics</b></div>\nSo, let's begin by providing an example of the example output.","metadata":{}},{"cell_type":"code","source":"desired_output = [{'aircraft': 'robinson r - 22',\n 'description': 'light utility helicopter',\n 'max gross weight': '1370 lb (635 kg)',\n 'total disk area': '497 ft square (46.2 m square)',\n 'max disk loading': '2.6 lb / ft square (14 kg / m square)'},\n {'aircraft': 'bell 206b3 jetranger',\n 'description': 'turboshaft utility helicopter',\n 'max gross weight': '3200 lb (1451 kg)',\n 'total disk area': '872 ft square (81.1 m square)',\n 'max disk loading': '3.7 lb / ft square (18 kg / m square)'},\n {'aircraft': 'ch - 47d chinook',\n 'description': 'tandem rotor helicopter',\n 'max gross weight': '50000 lb (22680 kg)',\n 'total disk area': '5655 ft square (526 m square)',\n 'max disk loading': '8.8 lb / ft square (43 kg / m square)'},\n {'aircraft': 'mil mi - 26',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '123500 lb (56000 kg)',\n 'total disk area': '8495 ft square (789 m square)',\n 'max disk loading': '14.5 lb / ft square (71 kg / m square)'},\n {'aircraft': 'ch - 53e super stallion',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '73500 lb (33300 kg)',\n 'total disk area': '4900 ft square (460 m square)',\n 'max disk loading': '15 lb / ft square (72 kg / m square)'}]\n","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:10.848038Z","iopub.execute_input":"2024-04-17T13:26:10.849306Z","iopub.status.idle":"2024-04-17T13:26:10.856880Z","shell.execute_reply.started":"2024-04-17T13:26:10.849271Z","shell.execute_reply":"2024-04-17T13:26:10.855830Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":"To compare between the expected list of records and the predicted list of records, we first need to verify the percentage of predicted keys relative to the desired keys","metadata":{}},{"cell_type":"markdown","source":">## Percentage of predicted keys","metadata":{}},{"cell_type":"markdown","source":"Let's begin by defining a function to retrieve all keys of record","metadata":{}},{"cell_type":"code","source":"def get_keys(d):\n # Iterate over each key-value pair in the dictionary\n for k, v in d.items():\n # Append the key to the list of all_keys\n all_keys.append(k)\n # If the value is a dictionary, recursively call get_keys\n if isinstance(v, dict):\n get_keys(v)\n # If the value is a list, iterate over each item\n elif isinstance(v, list):\n for item in v:\n # If the item is a dictionary, recursively call get_keys\n if isinstance(item, dict):\n get_keys(item)\n# Define a function to retrieve all unique keys from a nested dictionary\ndef get_all_keys(d):\n # Declare all_keys as a global variable\n global all_keys\n # Initialize all_keys as an empty list\n all_keys = []\n # Call the helper function get_keys to populate all_keys\n get_keys(d)\n # Return a list containing the unique keys by converting all_keys to a set and then back to a list\n return list(set(all_keys))","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:13.106092Z","iopub.execute_input":"2024-04-17T13:26:13.106999Z","iopub.status.idle":"2024-04-17T13:26:13.117009Z","shell.execute_reply.started":"2024-04-17T13:26:13.106957Z","shell.execute_reply":"2024-04-17T13:26:13.115623Z"},"trusted":true},"execution_count":5,"outputs":[]},{"cell_type":"code","source":"# Testing our function\nget_all_keys(desired_output[0])","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:15.883427Z","iopub.execute_input":"2024-04-17T13:26:15.883817Z","iopub.status.idle":"2024-04-17T13:26:15.891542Z","shell.execute_reply.started":"2024-04-17T13:26:15.883783Z","shell.execute_reply":"2024-04-17T13:26:15.890415Z"},"trusted":true},"execution_count":6,"outputs":[{"execution_count":6,"output_type":"execute_result","data":{"text/plain":"['max gross weight',\n 'aircraft',\n 'description',\n 'total disk area',\n 'max disk loading']"},"metadata":{}}]},{"cell_type":"markdown","source":"Now, we define the percentage of predicted keys as follows:\n\n$$\\Large \\text{Percentage of predicted keys} = \\frac{\\text{Number of correctly predicted keys}}{\\text{Total number of true keys}}$$\nThis percentage is calculated for every record in the list, then summed and divided by the number of records in the list.","metadata":{}},{"cell_type":"code","source":"def process_dict(data):\n if isinstance(data, dict):\n for key, value in data.items():\n if isinstance(value, str):\n data[key] = value.strip().lower()\n elif isinstance(value, list):\n data[key] = [process_dict(item) for item in value]\n elif isinstance(value, dict):\n data[key] = process_dict(value)\n return data","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:18.834048Z","iopub.execute_input":"2024-04-17T13:26:18.834988Z","iopub.status.idle":"2024-04-17T13:26:18.841331Z","shell.execute_reply.started":"2024-04-17T13:26:18.834951Z","shell.execute_reply":"2024-04-17T13:26:18.840255Z"},"trusted":true},"execution_count":7,"outputs":[]},{"cell_type":"code","source":"def percentage_of_predicted_keys(true_dic, pred_dic):\n true_dic=process_dict(true_dic)\n pred_dic=process_dict(pred_dic)\n # Get all keys of the true dictionary\n all_keys_of_true_dic = get_all_keys(true_dic)\n # Get all keys of the predicted dictionary\n all_keys_of_pred_dic = get_all_keys(pred_dic)\n \n # Check if there are no keys in the true dictionary to avoid division by zero\n if len(all_keys_of_true_dic) == 0:\n return 0 # Avoid division by zero\n \n # Initialize count of predicted keys\n p_keys = 0\n # Iterate through all keys in the predicted dictionary\n for key in all_keys_of_pred_dic:\n # Check if the key is also present in the true dictionary\n if key in all_keys_of_true_dic:\n # Increment count if the key is found in both dictionaries\n p_keys += 1\n \n # Calculate the percentage of predicted keys compared to true keys\n p_keys /= len(all_keys_of_true_dic)\n # Return the percentage of predicted keys\n return p_keys","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:20.390744Z","iopub.execute_input":"2024-04-17T13:26:20.391436Z","iopub.status.idle":"2024-04-17T13:26:20.398265Z","shell.execute_reply.started":"2024-04-17T13:26:20.391403Z","shell.execute_reply":"2024-04-17T13:26:20.397237Z"},"trusted":true},"execution_count":8,"outputs":[]},{"cell_type":"code","source":"def average_percentage_key(true_list, pred_list):\n min_length = min(len(true_list), len(pred_list)) # Find the minimum length of the two lists\n score = 0\n for i in range(min_length):\n score += percentage_of_predicted_keys(true_list[i], pred_list[i])\n return score / len(true_list)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:22.657056Z","iopub.execute_input":"2024-04-17T13:26:22.657871Z","iopub.status.idle":"2024-04-17T13:26:22.663217Z","shell.execute_reply.started":"2024-04-17T13:26:22.657838Z","shell.execute_reply":"2024-04-17T13:26:22.662241Z"},"trusted":true},"execution_count":9,"outputs":[]},{"cell_type":"code","source":"# Example true and predicted lists\ntrue_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 6}, {'key1': 7, 'key2': 8, 'key3': 9}]\npred_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 7}, {'key1': 7, 'key2': 8, 'key3': 9}]\n\n# Test the function\nresult = average_percentage_key(true_list, pred_list)\nprint(\"Average percentage of keys:\", result)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:26:23.851951Z","iopub.execute_input":"2024-04-17T13:26:23.852443Z","iopub.status.idle":"2024-04-17T13:26:23.859938Z","shell.execute_reply.started":"2024-04-17T13:26:23.852412Z","shell.execute_reply":"2024-04-17T13:26:23.858744Z"},"trusted":true},"execution_count":10,"outputs":[{"name":"stdout","text":"Average percentage of keys: 1.0\n","output_type":"stream"}]},{"cell_type":"markdown","source":"Now we will define the principal metrics used to compare the values of two list recods.","metadata":{}},{"cell_type":"markdown","source":">## Percentage of predicted values\n\nThe function calculates the percentage of correctly predicted values compared to the total number of true values across different types of data structures.\n\nThe formula for calculating the percentage of values is as follows:\n\n$$\n\\text{Average percentage of values} = \\frac{\\sum_{i=1}^{\\text{Total number of records}} p_i }{Total number of records}\n$$\n\nHere, $p_i$ represents the percentage of correctly predicted values for each key. It's calculated as:\n\n$$p_i = \\frac{\\text{Number of correctly predicted values of item i}}{\\text{Total number of true values of item i}}$$","metadata":{}},{"cell_type":"code","source":"def calculate_percentage_of_values(true_dic, pred_dic):\n total_percentage = 0 # Initialize total percentage\n # Type 1: Single string values\n for key, true_value in true_dic.items(): # Loop through key-value pairs in true_dic\n \n # Check if the key exists in pred_dic, if its value is a string and if it matches the true value\n if key in pred_dic and str(pred_dic[key]) == str(true_value):\n match = 1 # Assign perfect match\n else:\n match = 0 # Assign no match\n total_percentage += match\n return total_percentage / len(true_dic) # Calculate and return the average percentage","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:07.067268Z","iopub.execute_input":"2024-04-17T13:27:07.068049Z","iopub.status.idle":"2024-04-17T13:27:07.074330Z","shell.execute_reply.started":"2024-04-17T13:27:07.068016Z","shell.execute_reply":"2024-04-17T13:27:07.073145Z"},"trusted":true},"execution_count":11,"outputs":[]},{"cell_type":"code","source":"def average_percentage_value(true_list, pred_list):\n min_length = min(len(true_list), len(pred_list)) # Find the minimum length of the two lists\n score = 0\n for i in range(min_length):\n score += calculate_percentage_of_values(true_list[i], pred_list[i])\n return score / len(true_list)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:09.111778Z","iopub.execute_input":"2024-04-17T13:27:09.112434Z","iopub.status.idle":"2024-04-17T13:27:09.118149Z","shell.execute_reply.started":"2024-04-17T13:27:09.112400Z","shell.execute_reply":"2024-04-17T13:27:09.117126Z"},"trusted":true},"execution_count":12,"outputs":[]},{"cell_type":"code","source":"# Example true and predicted lists\ntrue_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 6}, {'key1': 7, 'key2': 8, 'key3': 9}]\npred_list = [{'key1': 1, 'key2': 2, 'key3': 3}, {'key1': 4, 'key2': 5, 'key3': 7}, {'key1': 7, 'key2': 8, 'key3': 9}]\n\n# Test the function\nresult = average_percentage_value(true_list, pred_list)\nprint(\"Average percentage of keys:\", result)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:10.867077Z","iopub.execute_input":"2024-04-17T13:27:10.867514Z","iopub.status.idle":"2024-04-17T13:27:10.874729Z","shell.execute_reply.started":"2024-04-17T13:27:10.867480Z","shell.execute_reply":"2024-04-17T13:27:10.873519Z"},"trusted":true},"execution_count":13,"outputs":[{"name":"stdout","text":"Average percentage of keys: 0.8888888888888888\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='3'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>III |</span></b> <b>Clean Response Obtained by LLM</b></div>\n","metadata":{}},{"cell_type":"code","source":"import json\n\ndef parse_json(data_str):\n # Remove leading/trailing whitespace and newlines\n i = data_str.find('{')\n j = data_str.rfind('}')\n data_str = '['+data_str[i:j+1]+']'\n data_str = data_str.strip()\n\n # Check if the string is enclosed within triple backticks (\"```json\" and \"```\")\n if data_str.startswith(\"```json\"):\n # Remove the leading/trailing \"```json\" and \"```\"\n data_str = data_str[len(\"```json\"):]\n if data_str.endswith(\"```\"):\n data_str = data_str[:-len(\"```\")]\n \n try:\n # Parse JSON\n data = json.loads(data_str)\n return data\n except json.JSONDecodeError as e:\n print(\"JSON parsing error:\", e)\n return None","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:13.246599Z","iopub.execute_input":"2024-04-17T13:27:13.247416Z","iopub.status.idle":"2024-04-17T13:27:13.254997Z","shell.execute_reply.started":"2024-04-17T13:27:13.247380Z","shell.execute_reply":"2024-04-17T13:27:13.253820Z"},"trusted":true},"execution_count":14,"outputs":[]},{"cell_type":"code","source":"response_str = \"\"\"[{\"aircraft\": \"robinson r - 22\",\n \"description\": \"light utility helicopter\",\n \"max gross weight\": \"1370 lb (635 kg)\",\n \"total disk area\": \"497 ft square (46.2 m square)\",\n \"max disk loading\": \"2.6 lb / ft square (14 kg / m square)\"},\n{\"aircraft\": \"bell 206b3 jetranger\",\n \"description\": \"turboshaft utility helicopter\",\n \"max gross weight\": \"3200 lb (1451 kg)\",\n \"total disk area\": \"872 ft square (81.1 m square)\",\n \"max disk loading\": \"3.7 lb / ft square (18 kg / m square)\"},\n{\"aircraft\": \"ch - 47d chinook\",\n \"description\": \"tandem rotor helicopter\",\n \"max gross weight\": \"50000 lb (22680 kg)\",\n \"total disk area\": \"5655 ft square (526 m square)\",\n \"max disk loading\": \"8.8 lb / ft square (43 kg / m square)\"},\n{\"aircraft\": \"mil mi - 26\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"123500 lb (56000 kg)\",\n \"total disk area\": \"8495 ft square (789 m square)\",\n \"max disk loading\": \"14.5 lb / ft square (71 kg / m square)\"},\n{\"aircraft\": \"ch - 53e super stallion\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"73500 lb (33300 kg)\",\n \"total disk area\": \"4900 ft square (460 m square)\",\n \"max disk loading\": \"15 lb / ft square (72 kg / m square)\"}]\"\"\"\n\n# Convert the string representation to a list of dictionaries\nparse_json(response_str)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:15.047758Z","iopub.execute_input":"2024-04-17T13:27:15.048585Z","iopub.status.idle":"2024-04-17T13:27:15.057956Z","shell.execute_reply.started":"2024-04-17T13:27:15.048551Z","shell.execute_reply":"2024-04-17T13:27:15.056852Z"},"trusted":true},"execution_count":15,"outputs":[{"execution_count":15,"output_type":"execute_result","data":{"text/plain":"[{'aircraft': 'robinson r - 22',\n 'description': 'light utility helicopter',\n 'max gross weight': '1370 lb (635 kg)',\n 'total disk area': '497 ft square (46.2 m square)',\n 'max disk loading': '2.6 lb / ft square (14 kg / m square)'},\n {'aircraft': 'bell 206b3 jetranger',\n 'description': 'turboshaft utility helicopter',\n 'max gross weight': '3200 lb (1451 kg)',\n 'total disk area': '872 ft square (81.1 m square)',\n 'max disk loading': '3.7 lb / ft square (18 kg / m square)'},\n {'aircraft': 'ch - 47d chinook',\n 'description': 'tandem rotor helicopter',\n 'max gross weight': '50000 lb (22680 kg)',\n 'total disk area': '5655 ft square (526 m square)',\n 'max disk loading': '8.8 lb / ft square (43 kg / m square)'},\n {'aircraft': 'mil mi - 26',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '123500 lb (56000 kg)',\n 'total disk area': '8495 ft square (789 m square)',\n 'max disk loading': '14.5 lb / ft square (71 kg / m square)'},\n {'aircraft': 'ch - 53e super stallion',\n 'description': 'heavy - lift helicopter',\n 'max gross weight': '73500 lb (33300 kg)',\n 'total disk area': '4900 ft square (460 m square)',\n 'max disk loading': '15 lb / ft square (72 kg / m square)'}]"},"metadata":{}}]},{"cell_type":"markdown","source":"<a id='5'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>IV |</span></b> <b> Data Preparation</b></div>\n","metadata":{}},{"cell_type":"markdown","source":"I'll extract a sample of 100 records from the dataset excluding those with Arabic names, and then simplify the output to enhance performance.","metadata":{}},{"cell_type":"code","source":"df = pd.read_csv(\"/kaggle/input/table-extraction/table_extract.csv\")\ndf.head(5)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:27.993072Z","iopub.execute_input":"2024-04-17T13:27:27.993995Z","iopub.status.idle":"2024-04-17T13:27:29.765433Z","shell.execute_reply.started":"2024-04-17T13:27:27.993958Z","shell.execute_reply":"2024-04-17T13:27:29.764362Z"},"trusted":true},"execution_count":16,"outputs":[{"execution_count":16,"output_type":"execute_result","data":{"text/plain":" context \\\n0 aircraft ... \n1 order year manufacturer mod... \n2 player no nationality ... \n3 player no nationali... \n4 player no nationality ... \n\n answer \n0 {\"aircraft\":{\"0\":\"robinson r - 22\",\"1\":\"bell 2... \n1 {\"order year\":{\"0\":\"1992 - 93\",\"1\":\"1996\",\"2\":... \n2 {\"player\":{\"0\":\"quincy acy\",\"1\":\"hassan adams\"... \n3 {\"player\":{\"0\":\"patrick o'bryant\",\"1\":\"jermain... \n4 {\"player\":{\"0\":\"mark baker\",\"1\":\"marcus banks\"... ","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>context</th>\n <th>answer</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>aircraft ...</td>\n <td>{\"aircraft\":{\"0\":\"robinson r - 22\",\"1\":\"bell 2...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>order year manufacturer mod...</td>\n <td>{\"order year\":{\"0\":\"1992 - 93\",\"1\":\"1996\",\"2\":...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>player no nationality ...</td>\n <td>{\"player\":{\"0\":\"quincy acy\",\"1\":\"hassan adams\"...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>player no nationali...</td>\n <td>{\"player\":{\"0\":\"patrick o'bryant\",\"1\":\"jermain...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>player no nationality ...</td>\n <td>{\"player\":{\"0\":\"mark baker\",\"1\":\"marcus banks\"...</td>\n </tr>\n </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"def is_arabic_name(name):\n \"\"\"\n Checks if a name contains Arabic characters.\n\n Args:\n name: The name string to check.\n\n Returns:\n True if Arabic characters are found, False otherwise.\n \"\"\"\n # Regular expression to match Arabic characters\n arabic_pattern = re.compile(\"[\\u0600-\\u06FF]+\")\n\n # Search for Arabic characters in the name\n match = arabic_pattern.search(name)\n\n # Return True if a match is found, False otherwise\n return bool(match)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:30.016345Z","iopub.execute_input":"2024-04-17T13:27:30.017012Z","iopub.status.idle":"2024-04-17T13:27:30.022491Z","shell.execute_reply.started":"2024-04-17T13:27:30.016979Z","shell.execute_reply":"2024-04-17T13:27:30.021413Z"},"trusted":true},"execution_count":17,"outputs":[]},{"cell_type":"code","source":"df = df[~df['context'].apply(lambda x: is_arabic_name(x))]","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:36.964897Z","iopub.execute_input":"2024-04-17T13:27:36.965801Z","iopub.status.idle":"2024-04-17T13:27:37.979661Z","shell.execute_reply.started":"2024-04-17T13:27:36.965766Z","shell.execute_reply":"2024-04-17T13:27:37.978773Z"},"trusted":true},"execution_count":18,"outputs":[]},{"cell_type":"code","source":"df_sample =df.loc[:50]","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:38.656780Z","iopub.execute_input":"2024-04-17T13:27:38.657148Z","iopub.status.idle":"2024-04-17T13:27:38.664302Z","shell.execute_reply.started":"2024-04-17T13:27:38.657120Z","shell.execute_reply":"2024-04-17T13:27:38.663227Z"},"trusted":true},"execution_count":19,"outputs":[]},{"cell_type":"code","source":"def transform_json_to_records(json_data):\n \"\"\"\n Transforms a structured JSON object into a list of records.\n\n The function assumes the structure of the JSON object is a dictionary of dictionaries,\n where each top-level key is a field name, and its value is a dictionary mapping indices\n to field values. All sub-dictionaries must have the same keys.\n\n Parameters:\n - json_data: A dictionary representing the structured JSON object to transform.\n\n Returns:\n - A list of dictionaries, where each dictionary represents a record with fields and values\n derived from the input JSON.\n \"\"\"\n json_data = json.loads(json_data)\n # Extract keys from the first dictionary item to use as indices\n indices = list(next(iter(json_data.values())).keys())\n # Initialize the list to store transformed records\n records = []\n\n # Loop over each index to create a record\n for index in indices:\n record = {field: values[index] for field, values in json_data.items()}\n records.append(record)\n\n return records","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:43.963685Z","iopub.execute_input":"2024-04-17T13:27:43.964431Z","iopub.status.idle":"2024-04-17T13:27:43.971661Z","shell.execute_reply.started":"2024-04-17T13:27:43.964395Z","shell.execute_reply":"2024-04-17T13:27:43.970462Z"},"trusted":true},"execution_count":20,"outputs":[]},{"cell_type":"code","source":"df_sample.loc[:, 'answer'] = df_sample['answer'].map(transform_json_to_records)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:45.817229Z","iopub.execute_input":"2024-04-17T13:27:45.817713Z","iopub.status.idle":"2024-04-17T13:27:45.828486Z","shell.execute_reply.started":"2024-04-17T13:27:45.817673Z","shell.execute_reply":"2024-04-17T13:27:45.827241Z"},"trusted":true},"execution_count":21,"outputs":[]},{"cell_type":"code","source":"df_sample.head()","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:47.330348Z","iopub.execute_input":"2024-04-17T13:27:47.331245Z","iopub.status.idle":"2024-04-17T13:27:47.371504Z","shell.execute_reply.started":"2024-04-17T13:27:47.331187Z","shell.execute_reply":"2024-04-17T13:27:47.370353Z"},"trusted":true},"execution_count":22,"outputs":[{"execution_count":22,"output_type":"execute_result","data":{"text/plain":" context \\\n0 aircraft ... \n1 order year manufacturer mod... \n2 player no nationality ... \n3 player no nationali... \n4 player no nationality ... \n\n answer \n0 [{'aircraft': 'robinson r - 22', 'description'... \n1 [{'order year': '1992 - 93', 'manufacturer': '... \n2 [{'player': 'quincy acy', 'no': '4', 'national... \n3 [{'player': 'patrick o'bryant', 'no': 13, 'nat... \n4 [{'player': 'mark baker', 'no': '3', 'national... ","text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>context</th>\n <th>answer</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>aircraft ...</td>\n <td>[{'aircraft': 'robinson r - 22', 'description'...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>order year manufacturer mod...</td>\n <td>[{'order year': '1992 - 93', 'manufacturer': '...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>player no nationality ...</td>\n <td>[{'player': 'quincy acy', 'no': '4', 'national...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>player no nationali...</td>\n <td>[{'player': 'patrick o'bryant', 'no': 13, 'nat...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>player no nationality ...</td>\n <td>[{'player': 'mark baker', 'no': '3', 'national...</td>\n </tr>\n </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"<a id='6'></a>\n# <div style=\"padding: 30px; color:white; margin:10; font-size:75%; text-align:left; display:fill; border-radius:10px; background-color:#3b3745\"><b><span style='color:#F1A424'>V |</span></b> <b>Benchmark</b></div>\n","metadata":{}},{"cell_type":"markdown","source":"<a id='61'></a>\n>## Prompt","metadata":{}},{"cell_type":"code","source":"prompt = \"\"\"Your task is to extract relevant information from the provided context and format it into a list of records, following the template below.\n A JSON object representing the extracted table structure. The list of records follows this format: \n [ { \"column_1\": \"val1\",\"column_2\": \"val1\",\"column_3\": \"val1\",...},\n { \"column_1\": \"val2\",\"column_2\": \"val2\",\"column_3\": \"val3\",...},\n ...\n ]\n Each key in the records represents a column header, and the corresponding value is another object containing key-value pairs for each row in that column.\n\nINPUT example:\n# do not use the data from the examples & template; they are just for reference only. The following data contains actual information. If a value is not found, leave it empty. \n\n aircraft description max gross weight total disk area max disk loading\n0 robinson r - 22 light utility helicopter 1370 lb (635 kg) 497 ft square (46.2 m square) 2.6 lb / ft square (14 kg / m square)\n1 bell 206b3 jetranger turboshaft utility helicopter 3200 lb (1451 kg) 872 ft square (81.1 m square) 3.7 lb / ft square (18 kg / m square)\n2 ch - 47d chinook tandem rotor helicopter 50000 lb (22680 kg) 5655 ft square (526 m square) 8.8 lb / ft square (43 kg / m square)\n3 mil mi - 26 heavy - lift helicopter 123500 lb (56000 kg) 8495 ft square (789 m square) 14.5 lb / ft square (71 kg / m square)\n4 ch - 53e super stallion heavy - lift helicopter 73500 lb (33300 kg) 4900 ft square (460 m square) 15 lb / ft square (72 kg / m square)\n\nOUTPUT example:\n# do not use the data from the examples & template; they are just for reference only. The following data contains actual information. If a value is not found, leave it empty. \n[{\"aircraft\": \"robinson r - 22\",\n \"description\": \"light utility helicopter\",\n \"max gross weight\": \"1370 lb (635 kg)\",\n \"total disk area\": \"497 ft square (46.2 m square)\",\n \"max disk loading\": \"2.6 lb / ft square (14 kg / m square)\"},\n{\"aircraft\": \"bell 206b3 jetranger\",\n \"description\": \"turboshaft utility helicopter\",\n \"max gross weight\": \"3200 lb (1451 kg)\",\n \"total disk area\": \"872 ft square (81.1 m square)\",\n \"max disk loading\": \"3.7 lb / ft square (18 kg / m square)\"},\n{\"aircraft\": \"ch - 47d chinook\",\n \"description\": \"tandem rotor helicopter\",\n \"max gross weight\": \"50000 lb (22680 kg)\",\n \"total disk area\": \"5655 ft square (526 m square)\",\n \"max disk loading\": \"8.8 lb / ft square (43 kg / m square)\"},\n{\"aircraft\": \"mil mi - 26\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"123500 lb (56000 kg)\",\n \"total disk area\": \"8495 ft square (789 m square)\",\n \"max disk loading\": \"14.5 lb / ft square (71 kg / m square)\"},\n{\"aircraft\": \"ch - 53e super stallion\",\n \"description\": \"heavy - lift helicopter\",\n \"max gross weight\": \"73500 lb (33300 kg)\",\n \"total disk area\": \"4900 ft square (460 m square)\",\n \"max disk loading\": \"15 lb / ft square (72 kg / m square)\"}]\n\"\"\"","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:27:54.601480Z","iopub.execute_input":"2024-04-17T13:27:54.602278Z","iopub.status.idle":"2024-04-17T13:27:54.609988Z","shell.execute_reply.started":"2024-04-17T13:27:54.602243Z","shell.execute_reply":"2024-04-17T13:27:54.608951Z"},"trusted":true},"execution_count":23,"outputs":[]},{"cell_type":"markdown","source":"<a id='62'></a>\n>## dolphin-2.2.1-mistral-7b","metadata":{}},{"cell_type":"code","source":"base_model_id = \"cognitivecomputations/dolphin-2.2.1-mistral-7b\"\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_use_double_quant=True,\n bnb_4bit_quant_type=\"nf4\",\n bnb_4bit_compute_dtype=torch.bfloat16,\n #weights=\"int8\"\n)\n\nmodel = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map=\"auto\",trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=False,device_map=\"auto\")","metadata":{"execution":{"iopub.status.busy":"2024-04-17T09:11:30.976151Z","iopub.execute_input":"2024-04-17T09:11:30.976388Z","iopub.status.idle":"2024-04-17T09:14:10.696683Z","shell.execute_reply.started":"2024-04-17T09:11:30.976368Z","shell.execute_reply":"2024-04-17T09:14:10.695768Z"},"trusted":true},"execution_count":41,"outputs":[{"output_type":"display_data","data":{"text/plain":"config.json: 0%| | 0.00/618 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"e8364d4413404adb918df6bae43181ba"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model.safetensors.index.json: 0%| | 0.00/25.1k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"eb00cdf3a3644dd4865f18a4bf247eb8"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7ec81c513b82444f9bd042684cb06b84"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00001-of-00002.safetensors: 0%| | 0.00/9.94G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"f64fc59cbdc94d6284647d2874852a75"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00002-of-00002.safetensors: 0%| | 0.00/4.54G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"08d02250244a47bf980d07a31b7581bf"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7ff8a2a9a6ed45fe907f2c5396dd39d6"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"generation_config.json: 0%| | 0.00/114 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7d32ba2acdec48938e0c5ba1e66a116a"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer_config.json: 0%| | 0.00/1.70k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"92dbf7de21a74f89aa4fff93290b9df5"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7bb57b38dc1a46718bcc5c37ccd8c5fb"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"added_tokens.json: 0%| | 0.00/51.0 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"610c5aa842f64dbcae03d151a2fe65ca"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"special_tokens_map.json: 0%| | 0.00/443 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"4cb31a6a80b84f80b46212a167ce1e01"}},"metadata":{}},{"name":"stderr","text":"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n","output_type":"stream"}]},{"cell_type":"code","source":"# Create a copy of the DataFrame\ndf_copy1 = df_sample.copy()\ndf_copy1['pred_response'] = None\n\n# Iterate through each row in the DataFrame with tqdm for progress visualization\nfor i in tqdm(df_copy1.index, desc=\"Generating Predictions\", total=len(df_copy1)):\n \n template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{df_copy1.loc[i,'context']}\\nResponse:\\n\"\n inputs = tokenizer(template, return_tensors=\"pt\").to(model.device) \n outputs = model.generate(**inputs, use_cache=True,max_length=4096)\n output_text = tokenizer.decode(outputs[0]) \n df_copy1.loc[i,'pred_response'] = output_text.replace(template,'')","metadata":{"execution":{"iopub.status.busy":"2024-04-17T09:14:10.698058Z","iopub.execute_input":"2024-04-17T09:14:10.698347Z","iopub.status.idle":"2024-04-17T10:29:50.635005Z","shell.execute_reply.started":"2024-04-17T09:14:10.698322Z","shell.execute_reply":"2024-04-17T10:29:50.634000Z"},"trusted":true},"execution_count":42,"outputs":[{"name":"stderr","text":"Generating Predictions: 0%| | 0/50 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\n2024-04-17 09:14:22.979943: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2024-04-17 09:14:22.980097: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2024-04-17 09:14:23.233873: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\nGenerating Predictions: 2%|β | 1/50 [04:13<3:26:49, 253.25s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 4%|β | 2/50 [05:18<1:54:06, 142.63s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 6%|β | 3/50 [06:29<1:26:03, 109.85s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 8%|β | 4/50 [07:05<1:01:54, 80.75s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 10%|β | 5/50 [08:23<59:50, 79.78s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 12%|ββ | 6/50 [10:07<1:04:40, 88.19s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 14%|ββ | 7/50 [11:23<1:00:13, 84.04s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 16%|ββ | 8/50 [12:04<49:10, 70.24s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 18%|ββ | 9/50 [14:46<1:07:35, 98.91s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 20%|ββ | 10/50 [16:10<1:02:52, 94.30s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 22%|βββ | 11/50 [18:10<1:06:28, 102.26s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 24%|βββ | 12/50 [19:28<1:00:07, 94.94s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 26%|βββ | 13/50 [20:06<47:49, 77.55s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 28%|βββ | 14/50 [20:58<42:01, 70.03s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 30%|βββ | 15/50 [21:39<35:38, 61.09s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 32%|ββββ | 16/50 [22:37<34:03, 60.11s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 34%|ββββ | 17/50 [23:16<29:38, 53.89s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 36%|ββββ | 18/50 [23:50<25:35, 47.99s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 38%|ββββ | 19/50 [24:47<26:07, 50.56s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 40%|ββββ | 20/50 [27:35<42:52, 85.75s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 42%|βββββ | 21/50 [29:45<47:52, 99.06s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 44%|βββββ | 22/50 [31:24<46:18, 99.23s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 46%|βββββ | 23/50 [32:12<37:39, 83.67s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 48%|βββββ | 24/50 [34:46<45:24, 104.78s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 50%|βββββ | 25/50 [37:25<50:31, 121.25s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 52%|ββββββ | 26/50 [40:05<53:06, 132.77s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 54%|ββββββ | 27/50 [42:40<53:28, 139.50s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 56%|ββββββ | 28/50 [43:52<43:45, 119.32s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 58%|ββββββ | 29/50 [45:45<41:04, 117.36s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 60%|ββββββ | 30/50 [46:52<34:01, 102.07s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 62%|βββββββ | 31/50 [48:19<30:56, 97.71s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 64%|βββββββ | 32/50 [50:08<30:18, 101.02s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 66%|βββββββ | 33/50 [50:42<22:57, 81.05s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 68%|βββββββ | 34/50 [51:31<19:03, 71.47s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 70%|βββββββ | 35/50 [52:12<15:33, 62.21s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 72%|ββββββββ | 36/50 [52:56<13:15, 56.85s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 74%|ββββββββ | 37/50 [53:32<10:54, 50.34s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 76%|ββββββββ | 38/50 [54:29<10:31, 52.58s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 78%|ββββββββ | 39/50 [56:52<14:34, 79.47s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 80%|ββββββββ | 40/50 [1:00:54<21:23, 128.39s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 82%|βββββββββ | 41/50 [1:01:44<15:43, 104.89s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 84%|βββββββββ | 42/50 [1:02:40<12:00, 90.12s/it] Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 86%|βββββββββ | 43/50 [1:04:07<10:25, 89.30s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 88%|βββββββββ | 44/50 [1:05:11<08:09, 81.55s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 90%|βββββββββ | 45/50 [1:06:38<06:56, 83.38s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 92%|ββββββββββ| 46/50 [1:08:06<05:38, 84.73s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 94%|ββββββββββ| 47/50 [1:09:25<04:09, 83.05s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 96%|ββββββββββ| 48/50 [1:13:12<04:12, 126.16s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 98%|ββββββββββ| 49/50 [1:14:04<01:44, 104.03s/it]Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.\nGenerating Predictions: 100%|ββββββββββ| 50/50 [1:15:39<00:00, 90.80s/it] \n","output_type":"stream"}]},{"cell_type":"code","source":"df_copy1.dropna(inplace=True)\ndf_copy1.shape","metadata":{"execution":{"iopub.status.busy":"2024-04-17T10:32:17.785861Z","iopub.execute_input":"2024-04-17T10:32:17.786581Z","iopub.status.idle":"2024-04-17T10:32:17.794586Z","shell.execute_reply.started":"2024-04-17T10:32:17.786548Z","shell.execute_reply":"2024-04-17T10:32:17.793668Z"},"trusted":true},"execution_count":44,"outputs":[{"execution_count":44,"output_type":"execute_result","data":{"text/plain":"(50, 3)"},"metadata":{}}]},{"cell_type":"code","source":"sum_key=0\nsum_val = 0\ncount_errors = 0\nfor i in df_copy1.index:\n\n pred_records = parse_json(df_copy1.loc[i,'pred_response'])\n if pred_records==None:\n count_errors +=1\n continue\n true_records = df_copy1.loc[i,'answer']\n \n sum_key += average_percentage_key(true_records,pred_records)\n \n sum_val += average_percentage_value(true_records,pred_records)","metadata":{"execution":{"iopub.status.busy":"2024-04-17T10:37:11.276371Z","iopub.execute_input":"2024-04-17T10:37:11.277407Z","iopub.status.idle":"2024-04-17T10:37:11.300493Z","shell.execute_reply.started":"2024-04-17T10:37:11.277361Z","shell.execute_reply":"2024-04-17T10:37:11.299476Z"},"trusted":true},"execution_count":57,"outputs":[{"name":"stdout","text":"JSON parsing error: Expecting value: line 15 column 1 (char 424)\nJSON parsing error: Expecting ',' delimiter: line 1 column 127 (char 126)\n","output_type":"stream"}]},{"cell_type":"code","source":"print(\"Average Percentage of Predicted Keys:\", sum_key/(len(df_copy1)-count_errors))\nprint(\"Average Percentage of Predicted values:\", sum_val/(len(df_copy1)-count_errors))","metadata":{"execution":{"iopub.status.busy":"2024-04-17T10:37:14.080286Z","iopub.execute_input":"2024-04-17T10:37:14.080646Z","iopub.status.idle":"2024-04-17T10:37:14.086091Z","shell.execute_reply.started":"2024-04-17T10:37:14.080617Z","shell.execute_reply":"2024-04-17T10:37:14.085221Z"},"trusted":true},"execution_count":58,"outputs":[{"name":"stdout","text":"Average Percentage of Predicted Keys: 0.8676423031900722\nAverage Percentage of Predicted values: 0.7963119656740493\n","output_type":"stream"}]},{"cell_type":"markdown","source":"<a id='61'></a>\n>## starcoder2-7b","metadata":{}},{"cell_type":"code","source":"base_model_id = \"Vezora/Mistral-22B-v0.1\"\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_use_double_quant=True,\n bnb_4bit_quant_type=\"nf4\",\n bnb_4bit_compute_dtype=torch.bfloat16,\n #weights=\"int8\"\n)\n\nmodel = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map=\"auto\",trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=False,device_map=\"auto\")","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:28:02.170950Z","iopub.execute_input":"2024-04-17T13:28:02.171380Z","iopub.status.idle":"2024-04-17T13:41:38.909545Z","shell.execute_reply.started":"2024-04-17T13:28:02.171347Z","shell.execute_reply":"2024-04-17T13:41:38.908583Z"},"trusted":true},"execution_count":24,"outputs":[{"output_type":"display_data","data":{"text/plain":"config.json: 0%| | 0.00/662 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"74ce71f2ce68431c9c59bf18324a129c"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model.safetensors.index.json: 0%| | 0.00/41.8k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"9ef399ca1df7438489b47a5ac96eb447"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Downloading shards: 0%| | 0/9 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"002a2528e5ca4eae8cc3cb6c53eb817d"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00001-of-00009.safetensors: 0%| | 0.00/4.87G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ad049fb417dd4d058283304017857aad"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00002-of-00009.safetensors: 0%| | 0.00/4.98G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"164aceb0dbf747bfab1d14fb654bef41"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00003-of-00009.safetensors: 0%| | 0.00/4.96G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"1845261576794ade9d750547b4b44b2a"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00004-of-00009.safetensors: 0%| | 0.00/4.88G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"cad9f72041c840969c55311b73f5b7fc"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00005-of-00009.safetensors: 0%| | 0.00/4.98G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"6b72d9beba204d3cb8ee74eedce7dec0"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00006-of-00009.safetensors: 0%| | 0.00/4.96G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7c5df9143a33430b8ffbee812b43bdbd"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00007-of-00009.safetensors: 0%| | 0.00/4.88G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ff0800f015474fbb85310b017eb8e663"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00008-of-00009.safetensors: 0%| | 0.00/4.98G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"3f26ed693ee04dd8a9eceb647dfdff25"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"model-00009-of-00009.safetensors: 0%| | 0.00/4.97G [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"00f6007e8ac84cf89b68469a1e04d764"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Loading checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"df18884458b74d289465ba6dc318c47e"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"generation_config.json: 0%| | 0.00/111 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"965890c6e7a04a50bf4610820166505d"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer_config.json: 0%| | 0.00/1.81k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"555e043e302b4e03a83613fb60559f70"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"acbac5c9cedb4638b8bf31653786e42d"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"special_tokens_map.json: 0%| | 0.00/414 [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"7e5a558552ce4825b1c8f4f34d6b16da"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"tokenizer.json: 0%| | 0.00/1.80M [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"a1c5896e83e046d2beeba0bdb11a5d14"}},"metadata":{}}]},{"cell_type":"code","source":"template = f\"Instruction:\\n{prompt}\\nINPUTDATA:{df_sample.loc[1,'context']}\\nResponse:\\n\"\ninputs = tokenizer(template, return_tensors=\"pt\").to(model.device) \noutputs = model.generate(**inputs, use_cache=True,max_length=3000,do_sample=True,temperature=0.001)\noutput_text = tokenizer.decode(outputs[0]) \noutput_text.replace(template,'')","metadata":{"execution":{"iopub.status.busy":"2024-04-17T13:50:06.192278Z","iopub.execute_input":"2024-04-17T13:50:06.193616Z","iopub.status.idle":"2024-04-17T13:56:39.465118Z","shell.execute_reply.started":"2024-04-17T13:50:06.193577Z","shell.execute_reply":"2024-04-17T13:56:39.464046Z"},"trusted":true},"execution_count":26,"outputs":[{"name":"stderr","text":"Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n","output_type":"stream"},{"execution_count":26,"output_type":"execute_result","data":{"text/plain":"'<s>Here is an example of how you can extract the information and format it into a list of records using Python:\\n```python\\nimport json\\n\\n# Define the table structure\\ntable_structure = {\\n \"column_1\": \"aircraft\",\\n \"column_2\": \"description\",\\n \"column_3\": \"max gross weight\",\\n \"column_4\": \"total disk area\",\\n \"column_5\": \"max disk loading\",\\n}\\n\\n# Define the list of records\\nlist_of_records = []\\n\\n# Extract the information from the table\\ntable_information = \"0 robotin r - 22 light utility helicopter 1370 lb (635 kg) 497 ft square (46.2 m square) 2.6 lb / ft square (14.3 kg / m square)\\n\\n# Split the table into rows\\ntable_rows = table_information.split(\"\\\\n\")\\n\\n# Iterate over each row\\nfor row in table_rows:\\n # Split the row into columns\\n row_parts = row.split()\\n\\n # Create a dictionary for each column\\n column_dict = {}\\n\\n # Iterate over each column\\n for column in column_dict.keys(), enumerate(row_parts):\\n # Add the value to the dictionary\\n column_dict[column] = column_value\\n\\n # Add the row dictionary to the list of records\\n list_of_records.append(column_dict)\\n\\n# Convert the list of records into a JSON object\\njson_table = json.dumps(list_of_records)\\n\\n# Print the JSON table\\nprint(json_table)\\n```\\nOutput:\\n```\\n[{\"column_1\": \"robin\", \"column_2\": \"r-22\", \"column_3\": \"light\", \"column_4\": \"utility\", \"column_5\": \"helicopter\", \"column_6\": \"1370\", \"column_7\": \"lb\", \"column_8\": \"635\", \"column_9\": \"kg\", \"column_10\": \"497\", \"column_11\": \"ft\", \"column_12\": \"square\", \"column_13\": \"46.2\", \"column_14\": \"m\", \"column_15\": \"2.6\", \"column_16\": \"lb\", \"column_17\": \"ft\", \"column_18\": \"14.3\", \"column_19\": \"kg\", \"column_20\": \"m\", \"column_21\": \"square\", \"column_22\": \"2.6\", \"column_23\": \"lb\", \"column_24\": \"ft\", \"column_25\": \"14.3\", \"column_26\": \"kg\", \"column_27\": \"m\", \"column_28\": \"square\", \"column_29\": \"2.6\", \"column_30\": \"lb\", \"column_31\": \"ft\", \"column_32\": \"14.3\", \"column_33\": \"kg\", \"column_34\": \"m\", \"column_35\": \"square\", \"column_36\": \"2.6\", \"column_37\": \"lb\", \"column_38\": \"ft\", \"column_39\": \"14.3\", \"column_40\": \"kg\", \"column_41\": \"m\", \"column_42\": \"square\", \"column_43\": \"2.6\", \"column_44\": \"lb\", \"column_45\": \"ft\", \"column_46\": \"14.3\", \"column_47\": \"kg\", \"column_48\": \"m\", \"column_49\": \"square\", \"column_50\": \"2.6\", \"column_51\": \"lb\", \"column_52\": \"ft\", \"column_53\": \"14.3\", \"column_54\": \"kg\", \"column_55\": \"m\", \"column_56\": \"square\", \"column_57\": \"2.6\", \"column_58\": \"lb\", \"column_59\": \"ft\", \"column_60\": \"14.3\", \"column_61\": \"kg\", \"column_62\": \"m\", \"column_63\": \"square\", \"column_64\": \"2.6\", \"column_65\": \"lb\", \"column_66\": \"ft\", \"column_67\": \"14.3\", \"column_68\": \"kg\", \"column_69\": \"m\", \"column_70\": \"square\", \"column_71\": \"2.6\", \"column_72\": \"lb\", \"column_73\": \"ft\", \"column_74\": \"14.3\", \"column_75\": \"kg\", \"column_76\": \"m\", \"column_77\": \"square\", \"column_78\": \"2.6\", \"column_79\": \"lb\", \"column_80\": \"ft\", \"column_81\": \"14.3\", \"column_82\": \"kg\", \"column_83\": \"m\", \"column_84\": \"square\", \"column_85\": \"2.6\", \"column_86\": \"lb\", \"column_87\": \"ft\", \"column_88\": \"14.3\", \"column_89\": \"kg\", \"column_10\": \"m\", \"column_11\": \"2.6\", \"column_12\": \"lb\", \"column_13\": \"ft\", \"column_14\": \"14.3\", \"column_15\": \"kg\", \"column_16\": \"m\", \"column_17\": \"ft\", \"column_18\": \"2.6\", \"column_19\": \"lb\", \"column_20\": \"ft\", \"column_21\": \"14.3\", \"column_22\": \"kg\", \"column_23\": \"m\", \"column_24\": \"ft\", \"column_25\": \"2.6\", \"column_26\": \"lb\", \"column_27\": \"ft\", \"column_28\": \"14.3\", \"column_29\": \"kg\", \"column_30\": \"m\", \"column_31\": \"ft\", \"column_32\": \"2.6\", \"column_33\": \"lb\", \"column_34\": \"ft\", \"column_35\": \"14.3\", \"column_36\": \"kg\", \"column_37\": \"m\", \"column_38\": \"ft\", \"column_39\": \"2.6\", \"column_40\": \"lb\", \"column_41\": \"ft\", \"column_42\": \"14.3\", \"column'"},"metadata":{}}]}]}
|
Benchmark2/leaderboard.csv
CHANGED
@@ -3,4 +3,4 @@ OpenHermes-2.5-Mistral-7B,83.64,78.05,120.12,https://huggingface.co/spaces/Effyi
|
|
3 |
Gemini-1.5-Pro-latest,98.18,97.34,37.07,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
|
4 |
Gemini Pro,96.15,93.08,18.23,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
|
5 |
Mistral-7B-Instruct-v0.2,77.18,67.24,101.99,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Apache-2.0,https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
|
6 |
-
Dolphin-2.2.1-Mistral-7b,
|
|
|
3 |
Gemini-1.5-Pro-latest,98.18,97.34,37.07,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
|
4 |
Gemini Pro,96.15,93.08,18.23,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Google,https://blog.google/technology/ai/gemini-api-developers-cloud/
|
5 |
Mistral-7B-Instruct-v0.2,77.18,67.24,101.99,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/gemini-pro-openhermes-mistral-and-mistral-7b.ipynb,Apache-2.0,https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
|
6 |
+
Dolphin-2.2.1-Mistral-7b,86.76,79.63,90.80,https://huggingface.co/spaces/Effyis/LLms-Benchmark/blob/main/Benchmark2/dolphin-2-2-1-mistral-7b.ipynb,Apache-2.0,https://huggingface.co/cognitivecomputations/dolphin-2.2.1-mistral-7b
|