real usage query

#4
by asidaddy - opened

hi guys, tried using this model, seems like you need to one shot it otherwise it isn't susceptible to feedack, for example it produces a simple code, but forgets an import, I tell it that this is the case and it reproduces the code without the impot...

also suffers from repeat issues.
can you suggest if this was tested beyong benchmarks?

PRIME org

Hi, our model is primarily designed to solve math problems. You can find the evaluation scripts here. During our testing, there are a few instances of repetition, but it's not very common.

Sign up or log in to comment