Advice on inference over a large-ish dataset in Databricks?

#102

by archonlith - opened Aug 15, 2023

Aug 15, 2023

I would like to do some research and apply a single prompt to each entry in a text column in a dataset and then collect the results into a new column.

I played with the 7B model in Databricks and entered a list of 80 prompts as my first test. It took a few hours to finish. I'm thinking that running the 40B model on a larger cluster is only going to get slower and slower.

What is the recommended optimal procedure here? What kind of time/results can I expect in the best case?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment