provide context to instruct_pipeline

#5
by xy-covey - opened

How can I provide context to the instruct_pipeline func?
e.g. I want to provide a paragraph as context and ask questions like "how many years has Jon lived in Miami?"

Databricks org

You just put it in the input however you like, same with any text-generation model. You can send a string like "Jon likes to fish and has red hair. He has lived in Miami since 1998. He moved from Tallahassee. How many years has Jon lived in Miami?"

langchain can help you build this on top of an LLM, helping you look up context relevant to the question and stuff it in a prompt for you automatically, before passing to the LLM.

Databricks org

You can also try using an instruction of the form below. Some of the data the model was trained on had this form. It can either be in the instruction itself as @srowen mentioned or below as input. Longer context probably work better as an input as below.

how many years has Jon lived in Miami?

Input:
Context about John.

@srowen Thanks. What if the paragraph is semi structured data like resume and I want to ask questions like "how many years has this candidate worked at company ABC?"

This comment has been hidden
Databricks org

The input to these types of LLMs needs to be text. You would have to extract the text from a document first in order to feed it into a prompt. Langchain has some related tools here for extracting text chunks from PDFs, etc that might come in handy, alongside all the other things it does. But you could extract the text however you can.

There are many reasons it could be slow, but should be more like seconds on a large GPU. It's hard to say without knowing how you are using it on what hardware. Like, if you are running on CPU only, yes it will take forever. See the github repo for more information https://github.com/databrickslabs/dolly

Hi @srowen , I looked at the link https://github.com/databrickslabs/dolly for any code snippet which can be used to provide context for question answering, but I couldn't find anything. The nearest example I was able to find is on the link https://huggingface.co./databricks/dolly-v1-6b, which is being used for generation task. Could you please point out it any code snippet which can be used for providing context and performing closed QA. Thanks
@xy-covey It would be amazing if you can provide your code snippet for your resume question answering please. Thanks

Databricks org

The code isn't any different. The input is. You put context in the string you supply, as I mentioned above. This is how all similar LLMs take context as input. Have you looked at langchain? May be more what you're looking for to put on top of this model.

Databricks org

Please see the updated model card for examples on how to provide context. It should now be pretty easy to do this with LangChain given the updated pipeline code.

matthayes changed discussion status to closed

Sign up or log in to comment