HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Abstract
Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence (AGI). While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a system that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., HuggingFace) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in HuggingFace, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards AGI.
Community
Very interesting paper. I like how Hugging Face Hub is being used as an agent/plugin, thereby giving a LLM (in this case ChatGPT) specific abilities/features that it doesn't have.
After quickly skimming the paper, Table 5 pretty much answers all the questions I had. (Authors are using prompting & CoT to make ChatGPT use huggingface api inference)
Would be interested to see if the community can fine-tune EleutherAI/gpt-j-6B in a toolformer manner to teach how to use hugging face hub as well
This is a very cool paper! I'm curious to what extent it's also possible to use this approach to assist people in identifying appropriate existing models for a particular task. i.e. ask, "I want to do blah" -- you should try "x task; these models are a good starting point". This would probably do less than what's done in the paper but might also be a little more predictable because of this constraint.
I think one of the big appeals of chat-based models is that there is an easy entry point to a task that doesn't require you to know how to map what you want to achieve onto an existing ML task. I think this is particularly helpful/appealing for people without an ML background.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper