Spaces:
Runtime error
Runtime error
Update examples
Browse files- app.py +2 -2
- updated_context.txt +0 -22
app.py
CHANGED
@@ -37,7 +37,7 @@ def get_model_answer(question, state=[[],[]]):
|
|
37 |
end = time.perf_counter()
|
38 |
print(f"Build input: {end-start}")
|
39 |
start = time.perf_counter()
|
40 |
-
encoded_inputs = tokenizer(model_input, max_length=
|
41 |
input_ids, attention_mask = (
|
42 |
encoded_inputs.input_ids,
|
43 |
encoded_inputs.attention_mask
|
@@ -59,7 +59,7 @@ with gr.Blocks() as demo:
|
|
59 |
chatbot = gr.Chatbot()
|
60 |
text = gr.Textbox(label="Ask a question (press enter to submit)", default_value="How are you?")
|
61 |
gr.Examples(
|
62 |
-
["What
|
63 |
text
|
64 |
)
|
65 |
|
|
|
37 |
end = time.perf_counter()
|
38 |
print(f"Build input: {end-start}")
|
39 |
start = time.perf_counter()
|
40 |
+
encoded_inputs = tokenizer(model_input, max_length=7000, truncation=True, return_tensors="pt")
|
41 |
input_ids, attention_mask = (
|
42 |
encoded_inputs.input_ids,
|
43 |
encoded_inputs.attention_mask
|
|
|
59 |
chatbot = gr.Chatbot()
|
60 |
text = gr.Textbox(label="Ask a question (press enter to submit)", default_value="How are you?")
|
61 |
gr.Examples(
|
62 |
+
["What's the name of the dataset that was built?", "what task does it focus on?", "Describe that task"],
|
63 |
text
|
64 |
)
|
65 |
|
updated_context.txt
CHANGED
@@ -4,16 +4,6 @@ Large Language Models (LLMs) are powerful tools in the field of artificial intel
|
|
4 |
|
5 |
While these large models are great for general tasks, they may need to be fine-tuned to achieve optimal performance on specific downstream tasks or when dealing with new, unseen data. Fine-tuning is when a model's parameters are adjusted to fit a particular dataset better. This can help companies use the model for tasks specific to their business.
|
6 |
|
7 |
-
As a treat to our amazing readers, we fine-tuned an LLM to build a chatbot that can answer natural questions about the rich content published on our blog. This article will walk you through the fine-tuning process, from building our own dataset to optimizing inference time for the fine-tuned model!
|
8 |
-
|
9 |
-
Say Hi!
|
10 |
-
Before diving into the details of our fine-tuning pipeline, we invite you to try out our chatbot for yourself! We’ve set up a 🤗 Hugging Face Space where you can ask the chatbot any questions you have about this blog post. Simply follow this link to access the space and start chatting! In addition, we’ve uploaded the fine-tuned model and its ONNX export to 🤗 Hugging Face so that you can explore them at will.
|
11 |
-
|
12 |
-
The model powering the chatbot was trained on the task of Conversational Question Answering, which means it can answer questions regarding a specific context (in this case, the content of this blog) or previous questions or answers from the current conversation thread. Try chatting with the bot in a conversational style, just as you would with one of Tryolabs’ tech experts!
|
13 |
-
|
14 |
-
Caveats
|
15 |
-
Remember that this is just a small demo, so the answers may not be as accurate as expected; in the Improvements section, we’ll discuss how we can enhance the model’s responses! Also, the content of this blog was not included in the training set, which means that the chatbot will give you answers about new, unseen data!
|
16 |
-
Response time may take around 10 seconds due to the use of 🤗 Hugging Face’s free-tier space, which only has access to two CPU cores. This response time is slow for a chatbot being used in production, but it's fast for a CPU deployment and for such large inputs (since the model needs to process the entire blog to generate an answer). We'll discuss optimizing inference time in the Optimizing with ONNX section.
|
17 |
The Boom of Foundation Models
|
18 |
Have you ever wondered how computers can understand and respond to human language? The answer lies in a new concept called ‘Foundation Models’. Popularized under this name by the Stanford University, these are machine learning models trained on immense amounts of data (over tens of terabytes and growing) that can be adapted to a wide range of downstream tasks in fields such as image, text, and audio, among others.
|
19 |
|
@@ -29,8 +19,6 @@ GPT-J, the open-source version of GPT-3 developed by OpenAI
|
|
29 |
OPT, developed by Meta
|
30 |
T5, developed by Google
|
31 |
|
32 |
-
A comprehensive list of open-source language models can be found on 🤗 Hugging Face’s list of supported models. Thanks to 🤗 Hugging Face and the ML community for providing this huge open-source repository!
|
33 |
-
|
34 |
One recent example of the impressive capabilities of Foundation Models is ChatGPT, arguably one of the most incredible advances in AI history. This model, developed by OpenAI, is a fine-tuned version of GPT-3.5 (one of the latest versions of the GPT-3 model family). ChatGPT can be used through a simple chat interface to perform various tasks, including summarization, text generation, code generation, and question-answering on virtually any topic. What sets ChatGPT apart is its ability to produce highly detailed, complete, and human-like responses, admit mistakes and adjust responses based on the user's needs.
|
35 |
|
36 |
As the name suggests, Foundation Models can serve as the foundation for many applications, but they require fine-tuning and adjustments to get the most out of them. So keep reading to learn how to use them to their full potential!
|
@@ -72,8 +60,6 @@ How well the model fits your specific task. You can dig deeper into the datasets
|
|
72 |
The size of your inputs and outputs. Some models may not scale well for large inputs and outputs due to high memory consumption or a substantial number of operations.
|
73 |
The size of your dataset. If you have a small dataset, choose a more powerful model capable of zero-shot or few-shot learning. Nevertheless, the more data you have, the better results you will achieve.
|
74 |
How much computing power is available for training and inference. This can be a significant factor in determining which models you can use, as larger models may not fit in your memory, or training may become painfully slow. You can use libraries like 🤗 Accelerate for distributed training to make the most out of your hardware.
|
75 |
-
All this can seem overwhelming, but don’t worry; here at Tryolabs, we can help you choose the right model for your needs! Contact us and get started on building your own fine-tuning pipeline.
|
76 |
-
|
77 |
|
78 |
|
79 |
3. Fine-tuning strategy
|
@@ -131,7 +117,6 @@ Since we had two different training steps, we also had two additional evaluation
|
|
131 |
More than analyzing the quantitative metrics is required to evaluate these results and conversational models in general. It is essential to consider the qualitative aspect of the model's answers, like their grammatical correctness and coherence within the conversation context. Sometimes it is preferable to have better answers (qualitatively speaking) than a better F1. So we looked at some answers from the validation set to ensure that the model was correctly generating what we were looking for. Our analysis revealed that higher F1 scores were generally associated with greater-quality answers. As a result, we selected the checkpoint with the highest F1 score to use in constructing our demonstration chatbot.
|
132 |
|
133 |
|
134 |
-
If you want to play around with our fine-tuned model, you can find it on 🤗 Hugging Face with the ID tryolabs/long-t5-tglobal-base-blogpost-cqa!
|
135 |
Faster inference with 🤗 Optimum and ONNX
|
136 |
After fine-tuning our model, we wanted to make it available to our awesome readers, so we deployed it on 🤗 Hugging Face Spaces, which offers a free tier with two CPU cores for running inference on the model. However, this setup can lead to slow inference times, and processing significant inputs like ours doesn’t make it any better. And a chatbot that takes a few minutes to answer a question doesn't strike anyone as being particularly chatty, does it? So, to improve the speed of our chatbot, we turned to 🤗 Optimum and the ONNX Runtime!
|
137 |
|
@@ -139,15 +124,10 @@ After fine-tuning our model, we wanted to make it available to our awesome reade
|
|
139 |
In our previous blog post, A guide to optimizing Transformer-based models for faster inference, we used 🤗 Optimum and ONNX to achieve an x8 speed-up on inference for a Transformer model. Be sure to check it out!
|
140 |
Using 🤗 Optimum’s recently released exporters feature, we were able to convert our PyTorch model to the ONNX format. This feature is handy for encoder-decoder models like the LongT5 model we trained, as it exports the three main components separately: the encoder, the decoder with the Language Modeling head, and the same decoder with pre-computed hidden states as additional inputs. According to 🤗 Optimum’s documentation, combining these three components can speed up sequential decoding, which results in faster text generation.
|
141 |
|
142 |
-
|
143 |
-
Our fine-tuned model, exported to ONNX into these three components, is also available on 🤗 Hugging Face with the ID tryolabs/long-t5-tglobal-base-blogpost-cqa-onnx!
|
144 |
Once our model was exported to ONNX, we used 🤗 Optimum’s integration with the ONNX Runtime to optimize our model and run inference on it by using the ORTModelForSeq2SeqLM class. This class can optimize and downcast the model using ONNX’s tools and then use ONNX Runtime to run inference with this new, faster model! You can even take it one step further and quantize the model for even shorter inference time on CPU and lower memory consumption.
|
145 |
|
146 |
With these improvements, we could achieve an x2 speed-up on inference time! Although the model still takes around 10 seconds to answer, this is a reasonable speed for a CPU-only deployment and processing such large inputs.
|
147 |
|
148 |
-
|
149 |
-
It’s worth noting that members of the ONNX community are actively working on tools for exporting and optimizing the latest ML architectures. The features mentioned in this section are fresh from the oven, thanks to the invaluable support of the ONNX team. Their work is crucial for unifying deep learning frameworks and optimizing models for faster and more cost-effective training and inference.
|
150 |
-
|
151 |
Improvements
|
152 |
We’re just scratching the surface of what’s possible. There are numerous potential improvements to keep working on to enhance the user's overall experience and the chatbot's performance. Here are some ideas to keep working on:
|
153 |
|
@@ -158,5 +138,3 @@ Takeaways
|
|
158 |
With the ever-increasing popularity of LLMs, it can seem almost impossible to train these models without having access to millions of dollars in resources and tons of data. However, with the right skills and knowledge about Foundation Models, Deep Learning, and the Transformer architecture, we showed you that fine-tuning these huge models is possible, even with few resources and a small dataset!
|
159 |
|
160 |
Fine-tuning is the key to unlocking the full potential of Foundation Models for your business. It allows you to take a pre-trained model and adapt it to your specific needs without breaking the bank.
|
161 |
-
|
162 |
-
And the best part? We're here to help you every step of the way. If you're ready to see how fine-tuning can benefit your business, don't hesitate to reach out, and let's get started on your fine-tuning journey today!
|
|
|
4 |
|
5 |
While these large models are great for general tasks, they may need to be fine-tuned to achieve optimal performance on specific downstream tasks or when dealing with new, unseen data. Fine-tuning is when a model's parameters are adjusted to fit a particular dataset better. This can help companies use the model for tasks specific to their business.
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
The Boom of Foundation Models
|
8 |
Have you ever wondered how computers can understand and respond to human language? The answer lies in a new concept called ‘Foundation Models’. Popularized under this name by the Stanford University, these are machine learning models trained on immense amounts of data (over tens of terabytes and growing) that can be adapted to a wide range of downstream tasks in fields such as image, text, and audio, among others.
|
9 |
|
|
|
19 |
OPT, developed by Meta
|
20 |
T5, developed by Google
|
21 |
|
|
|
|
|
22 |
One recent example of the impressive capabilities of Foundation Models is ChatGPT, arguably one of the most incredible advances in AI history. This model, developed by OpenAI, is a fine-tuned version of GPT-3.5 (one of the latest versions of the GPT-3 model family). ChatGPT can be used through a simple chat interface to perform various tasks, including summarization, text generation, code generation, and question-answering on virtually any topic. What sets ChatGPT apart is its ability to produce highly detailed, complete, and human-like responses, admit mistakes and adjust responses based on the user's needs.
|
23 |
|
24 |
As the name suggests, Foundation Models can serve as the foundation for many applications, but they require fine-tuning and adjustments to get the most out of them. So keep reading to learn how to use them to their full potential!
|
|
|
60 |
The size of your inputs and outputs. Some models may not scale well for large inputs and outputs due to high memory consumption or a substantial number of operations.
|
61 |
The size of your dataset. If you have a small dataset, choose a more powerful model capable of zero-shot or few-shot learning. Nevertheless, the more data you have, the better results you will achieve.
|
62 |
How much computing power is available for training and inference. This can be a significant factor in determining which models you can use, as larger models may not fit in your memory, or training may become painfully slow. You can use libraries like 🤗 Accelerate for distributed training to make the most out of your hardware.
|
|
|
|
|
63 |
|
64 |
|
65 |
3. Fine-tuning strategy
|
|
|
117 |
More than analyzing the quantitative metrics is required to evaluate these results and conversational models in general. It is essential to consider the qualitative aspect of the model's answers, like their grammatical correctness and coherence within the conversation context. Sometimes it is preferable to have better answers (qualitatively speaking) than a better F1. So we looked at some answers from the validation set to ensure that the model was correctly generating what we were looking for. Our analysis revealed that higher F1 scores were generally associated with greater-quality answers. As a result, we selected the checkpoint with the highest F1 score to use in constructing our demonstration chatbot.
|
118 |
|
119 |
|
|
|
120 |
Faster inference with 🤗 Optimum and ONNX
|
121 |
After fine-tuning our model, we wanted to make it available to our awesome readers, so we deployed it on 🤗 Hugging Face Spaces, which offers a free tier with two CPU cores for running inference on the model. However, this setup can lead to slow inference times, and processing significant inputs like ours doesn’t make it any better. And a chatbot that takes a few minutes to answer a question doesn't strike anyone as being particularly chatty, does it? So, to improve the speed of our chatbot, we turned to 🤗 Optimum and the ONNX Runtime!
|
122 |
|
|
|
124 |
In our previous blog post, A guide to optimizing Transformer-based models for faster inference, we used 🤗 Optimum and ONNX to achieve an x8 speed-up on inference for a Transformer model. Be sure to check it out!
|
125 |
Using 🤗 Optimum’s recently released exporters feature, we were able to convert our PyTorch model to the ONNX format. This feature is handy for encoder-decoder models like the LongT5 model we trained, as it exports the three main components separately: the encoder, the decoder with the Language Modeling head, and the same decoder with pre-computed hidden states as additional inputs. According to 🤗 Optimum’s documentation, combining these three components can speed up sequential decoding, which results in faster text generation.
|
126 |
|
|
|
|
|
127 |
Once our model was exported to ONNX, we used 🤗 Optimum’s integration with the ONNX Runtime to optimize our model and run inference on it by using the ORTModelForSeq2SeqLM class. This class can optimize and downcast the model using ONNX’s tools and then use ONNX Runtime to run inference with this new, faster model! You can even take it one step further and quantize the model for even shorter inference time on CPU and lower memory consumption.
|
128 |
|
129 |
With these improvements, we could achieve an x2 speed-up on inference time! Although the model still takes around 10 seconds to answer, this is a reasonable speed for a CPU-only deployment and processing such large inputs.
|
130 |
|
|
|
|
|
|
|
131 |
Improvements
|
132 |
We’re just scratching the surface of what’s possible. There are numerous potential improvements to keep working on to enhance the user's overall experience and the chatbot's performance. Here are some ideas to keep working on:
|
133 |
|
|
|
138 |
With the ever-increasing popularity of LLMs, it can seem almost impossible to train these models without having access to millions of dollars in resources and tons of data. However, with the right skills and knowledge about Foundation Models, Deep Learning, and the Transformer architecture, we showed you that fine-tuning these huge models is possible, even with few resources and a small dataset!
|
139 |
|
140 |
Fine-tuning is the key to unlocking the full potential of Foundation Models for your business. It allows you to take a pre-trained model and adapt it to your specific needs without breaking the bank.
|
|
|
|