Spaces:
Running
[FEATURE] Tools
Tools on HuggingChat
Learn more about available tools in this youtube video: https://www.youtube.com/watch?v=jRcheebdU5U
Today, we are excited to announce the beta release of Tools on HuggingChat! Tools open up a wide range of new possibilities, allowing the model to determine when a tool is needed, which tool to use, and what arguments to pass (via function calling).
- For now, tools are only available on the default HuggingChat model: Cohere Command R+ because it's optimized for using tools and has performed well in our tests.
- Tools use ZeroGPU spaces as endpoints, making it super convenient to add and test new tools!
Available tools
Tool name | Description | Host |
---|---|---|
Web Search | Query the web and do some RAG on retrieved content against the user query | HuggingChat internal tool |
URL Fetcher | Fetch text content from a given URL | HuggingChat internal tool |
Document Parser | Parse content from PDF, text, csv, json and more | ZeroGPU Space |
Image Generation | Generate images based on a given text prompt | ZeroGPU Space |
Image Editing | Edit images based on a given text prompt | ZeroGPU Space |
Calculator | A simple calculator for evaluating mathematical expressions | HuggingChat internal tool |
How we choose tools
- A tool must be a ZeroGPU Space that comes by default with exposed API endpoints.
- Tools need to be fast (~25 seconds max) to ensure a good user experience.
- In general, we prefer simple and fun tools (like a new model) over complex workflows that are harder to test and more likely to fail.
Do you have an idea for a tool to add or to update one directly on HuggingChat? Share your thoughts in this ๐ฅ community discussion.
Next Steps
- Use previously generated files with tools (probably)
- Add tools to Community Assistants: Making it possible for users to add their own ZeroGPU Spaces as tools in their Assistants.
- Add more official tools on a regular basis.
- Improve existing tools.
- Support more models (maybe starting with Llama-3)
- Add multi-step Tool Use (aka Agents)
- Add ability to reference previous files from the conversation.
- Add extra tools at runtime via OpenAPI specification.
chat ui pauses
chat ui pauses
https://huggingface.co/chat/ access it from here
@Stefan171 Thanks for the report! Both issues should be fixed now, thanks to your screenshots!
@nsarrazin Pleasure. It's working now. Thanks for developing these tools.
I think there's an issue with PDFs which are too big, will try to fix it but try to keep it under 1MB for now otherwise it might fail.
Error 413 even with PDFs under 500kb!
I think there's an issue with PDFs which are too big, will try to fix it but try to keep it under 1MB for now otherwise it might fail.
My smallest PDF on my phone is 4.5MB ๐
Is the Calculator tool able to do Randomness? Like say, help to generate random numbers on a range, likes 1-6?
great tools and a lot of potential ,where or how we can get this as an API ?
Suggestion: Add "memories" to tools. "Memories" is just a function call which will be called by the model if it decides to save some memory which can be preserved across chats similar to chatgpt's memory feature.
Thank you for these tools, I personally appreciate them a lot. I wanted to say that I struggled with the document parser till I unchecked the other tools from the list of six. I don't know if it's a coincidence or if it's actually a fix though.
Weird we'll look at it with @Saghen
Kudos to @victor and the whole team! Great way to start adding Tools and Function Calling to HuggingChat.
One question: I love Download prompt and parameters
feature in HuggingChat. It gives transparency as what was the actual prompt in text going to the LLM. However, I see the available tools (all the 6 functions, schema, description, etc.) are missing from the prompt. It seems this is not a complete prompt which doesn't have System Preamble
or Available Tools
parts.
If we can see the entire results from tokenizer.apply_tool_use_template
would be amazing!
Also add private prompt settings to make it safer
Hi HuggingChat. Are there any plans to create detailed documentation covering the full range of functions (documentation including descriptions of various functions, parameters, limitations and use cases)?
Congrats! This is a great step, specially find the document parser feature really useful .
As you suggested I would love to see llama3 support for these tools since the licensing of command-r+ can be a bit restrictive.
One suggestion I would like to make, is to give users the flexibility to plug in their own document parser, similar to how you can configure your own LLM endpoints.
I'm having an issue where I can't upload a document from my iPhone for the parser. Images will still upload, but no .txt .pdf or .docx
It still works fine from my laptop, but I use my phone far more frequently.
Also, none of the tools appear when using the Huggingchat app from the Apple App Store.
Update: .rtf also not selectable.
We recently upgraded our image generation tool to use Stable Diffusion 3! Feel free to try it out and let us know how it works for you.
I check in here from time to time, and it's good to see that the service is developing. However, its usefulness is still almost negligible. Searching often doesn't work, errors appear, and the chat stops frequently, etc. At the moment, when it comes to free services it's better to use the free Copilot than HuggingChat (or Command R+ directly from the Cohere website). However, I'm keeping my fingers crossed.
I have found that the web search and image generation tools work really well and the document parser (when functional) is also really good. I would like to see an image viewing tool, which could emulate multi modal models by returning an image description from a different space like the current Florence 2.
Yes @Smorty100 I agree. the problem I see is that it won't be super useful without multi-step tool calling? for this reason maybe it's better to wait for a true multimodal open model.
I'm noticing the SD 3 tool struggles with prompt alignment. Not sure if it's an SD 3 limitation, or the tool parsing the prompt differently. Is there a workaround improve prompt alignment? Below is an example:
Kirby is not visible in the image, and is not white from Level 37. There are no "negative" aspects in the prompt.
Oh also, command r+ states that it cannot generate images, despite clearly able to do so.
It often pauses and the only solution is to start a new chat. It would also be good to set the translation of promt for image generation to english by default. Otherwise, when you ask for an image in another language you get something what you didn't want. You can bypass this by setting an appropriate system prompt but automatic translation would be better I guess.
Error calling tool calculator
Please elaborate
I believe it can only do basic arithmetic (+,-,*,/, sqrt, that stuff) and not comparisons, which is likely just a limitation of the calculator tool. It should be replaced with something better like Sagemath or similar.
Best is to use python interpreter in this case.
Document Parser is not working for me, always getting an error
Try using tools with the 70B model for now, or the 405B without tools.
We're working through issues with our API which is a bit overloaded on the 405B and when you use tools you need to call the API twice so you have twice as many chances of getting an error. The 70B should work fine for now though!
If you still have issues with document parsing with the 70B let me know and I can take a look.
@nsarrazin may I ask if the image generation tool has been updated to Flux.1? If so, Schnell or Dev? Also where can I stay updated about the changes on HuggingChat and its tools? Thanks
@sneedingface It's still stable diffusion but we were thinking of upgrading to Schnell! We'll keep you posted in this thread
We just updated to Flux.1 Schnell! Let us know if it works well
@nsarrazin why not use flux dev?
@flexagontnt it's a bigger model and with the load on HuggingChat, response time weren't great. We thought a smaller but faster model would be a better fit for a chat workflow :)
I already want daily updates. I would like to be able to add sharing of tools. Everyone can build and use each other's tools.
And I want to be able to use AI to create tools too.
@flexagontnt it's a bigger model and with the load on HuggingChat, response time weren't great. We thought a smaller but faster model would be a better fit for a chat workflow :)
Ye itโs as expected. Just curious!
How can I use the system prompt to tell the model to call (or not call) the tools specifically? And how can I make the model to specify the input to the functions? Eg. How can I use the system prompt to tell the AI to never search for the user's question but search "haha" instead?
@nsarrazin can we please have Flux Dev? Schnell can do so far but that's it, it doesn't give a solid result so it's kinda pointless imho
@sneedingface it was a bit too slow in our testing and was a bit frustrating to use in a chat format so we chose schnell as a default but you'll be able to create your own tools with some upcoming features :)
@nsarrazin can you share how exactly does the websearch work? does the llm generate the search term and "decide" to call the search tool to search it? or does the web search tool use a separate model (or a separate instance of the model) to automatically search the web and feed the result to the llm to be used?
Document parser (or the model) doesn't work as well as it should. e.g. If I upload an image or pdf of a table, it is not able to accurate convert it into text. While gpt40-mini or gemini flash 1.5 easily convert the image into table format. Can that be improved?
@toximod120 The current tools available in HuggingChat do not make the model able to interpret images. This would require either multimodal models, or parsing the image to a multimodal model first, just to then parse an image description to the main model. That second idea I has already proposed to victor, and he said that they'd rather gave actual multimodal functionality, than fake it with this combination approach.
Uploading images currently only allows for image editing.
Can you update Command R + to the lastest version? (https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
I love community tools. I created a very simple tool.
Will assistants support tools? It would be good to be able to call tools while using custom model parameters
So this models use up the quota of Huggingface GPU from the Logged in account? Only the premium members can use this new community tools after a few tries
What's the image
I am developing a Telegram bot that includes the Huggingface API to provide global responses for an interactive game. I need to know if the API has access to the "Tools Beta" feature, as this is critical to the functionality of our game. Or please tell me what code is available in the open source so that this can be implemented directly on the computer?
How do the tools work internally with prompting? I'd like to create something similar with an LLM assistant.
@handfuloftitty wth did you just sent ๐ณ
How do the tools work internally with prompting? I'd like to create something similar with an LLM assistant.
Everything is open source: https://github.com/huggingface/chat-ui
Everything is open source: https://github.com/huggingface/chat-ui
I tried looking around but it's hard to find. Do you know where in the codebase the prompts are located?
I really enjoy using QwQ for some complex JSON content generation. However, I really do not like how it was handled in HuggingChat here, which is why I still use Qwens hf space. There they treat QwQ just like any other LLM, by simply showing the output, without any fancy content handling.
In HuggingChat, this cool lil "reasoning" element shows up and it tries to summarize what the model is currently writing, similar to how we see it with... closed source models. We do have the ability to click and see what it's writing, but at the end of the generation the content is seemingly being summarized by some other AI. This is a really bad way to handel QwQ. It would have been different with Marco-o1 which separates thought from final output. QwQ does not do that.
I would MUCH prefer if it was handled like with deepseeks R1 interface, where they show the thought output as smaller text. OR! Just generate it as usual! Just let us see what the LLM writes like with any other model, without any fancy UI to cover up its output!
As it stands, when trying to interact with QwQ like with o1, it simply does not work as expected. Saying "hi" generates a short response by QwQ, but the summarisation LLM doesn't know what to summarize and gets confused:
It starts generating some tips on how to improve writing.
This is not a good way to handle a reasoning model with the style of QwQ.
If you REALLY want to keep using this o1-like interface, sure, go ahead, but please consider applying some fancy prompting to make it respond with a "final message". This way, you can parse the output, and the moment it types # Final Answer
you can start displaying the rest as the final message, as a replacement for the current summarisation.
Here a custom system prompt I like to use on QwQ to make it respond that way:
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.
You must provide your final answer under the "# Final Answer" header. So your response MUST look like this:
'''
<Your thought process here>
# Final Answer
<Your final anwer here>
'''
This is just the one I use, I'm sure y'all are better at prompting than I am.
The best way to handle QwQ I think is to just handle it like any other LLM on the platform. No fancy handling, no "realtime thought process interpretation" for the fun little titles in the UI, just plain text output.
Also, I really like the new way tool calls are activated and deactivated now. Way easier to use than before, thank you for that one!