Garbage output ?

#30
by danielus - opened

Is it just me or is there something wrong with this model? I can't even translate simple sentences from Italian to English, he seems constantly hallucinating. Both the version on Groq, the local quantized q4_0 version, and the fp16 version via vLLM, this model can't follow the instructions and goes on its own

Here an example
image.png

And when he actually does manage to translate it, he mistranslates it.

The strange thing is that I'm searching around but I only find people talking about the amazing benchmarks and no one is complaining, so at this point I guess I'm the problem 😂😂😂

Another example just for completeness, with vLLM fp16 version running on a VM istance with L4 on Google Cloud:

image.png

The translation is really bad

Another example just for completeness, with vLLM fp16 version running on a VM istance with L4 on Google Cloud:

image.png

The translation is really bad
You tell it to forget everything it knows, it forgets languages :)

I don't know Italian, but I tested English to Chinese translation. While it is obviously worse than Claude 3.5, it is not as bad as "garbage".

I don't know Italian, but I tested English to Chinese translation. While it is obviously worse than Claude 3.5, it is not as bad as "garbage".

Yes, I would understand if it is a little wrong, but the translation is just an example, it fails even in the simplest tasks such as a dialogue or a summary, the feeling is that it is a base model and not instruct, because sometimes it just goes off on its own in its reasoning and dialogues, I would say almost unusable, I would specify that the same prompt on models such as gem 9b or Mistral 7b work quite well in comparison

why would you make it forget everything it knows haha? maybe try to just ask it to translate haha. do not lobotomize the poor fella

why would you make it forget everything it knows haha? maybe try to just ask it to translate haha. do not lobotomize the poor fella

hahahah this prompt works particularly well because often models add to the answer useless details and considerations, in fact works well even with model that are not specifically multilangual. (I admit I found this prompt lying around on huggingface and kept it because it worked particularly well)

I use Llama3.1-8B-Instruct to do some multilingual translation task, it DO GENERATE GARBAGE OUTPUT like :

截屏2024-07-25 14.42.38.png

So sad can't fine-tune this model to be our newest evaluation model due to its instability.

I am having the same problem with non-quantized, default settings running on 48GB VRAM on runpod.com.

I asked it to generate a story idea given a setting of Los Angeles, 1943 in the style of Steven King.

It started out fine and then...
The central conflict of the story revolves around a series of seemingly unrelated murders, all of which take place in the dead of night, during the brutal blackouts that have become a necessary evil in the war-torn city. Hawk and Rachel team up to investigate a small crew of los angeles police department officers who park a black dodge and dodge the alleyway where there latest victim "Lola Lee craving iPad label consisting irridient floor and drink liquor ble lockcase"

A major twist in the plot comes when the unlikely pair stumble upon an oily former circus performer known only as Cazzo Vallance, whose uncanny likeness to the main witness - Roger Arthurll EMO ATT fix hairstyle Given Besar stop captain faculty investigating scoop es unknown positioning stabilization

Sign up or log in to comment