Scandeval benchmarking error

by sarthu - opened Oct 3

Oct 3

I tried benchmarking the model on the dutch language benchmarks using scandeval library and I get the following error:
Benchmarking library: https://github.com/ScandEval/ScandEval

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

•

edited about 1 month ago

Hello,

I'm trying to work out exactly what's the problem so I need to reproduce the error. What command are you running exactly? I tried running

 scandeval -m path/to/salamandra-2b-instruct/

But I get the following error:

Traceback (most recent call last):
  File "/venv/bin/scandeval", line 8, in <module>
    sys.exit(benchmark())

  File "python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)

  File "python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)

  File "python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)

  File "python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)

  File "venv/lib/python3.11/site-packages/scandeval/cli.py", line 342, in benchmark
    benchmarker(model=models)
  File "venv/lib/python3.11/site-packages/scandeval/benchmarker.py", line 781, in __call__
    return self.benchmark(*args, **kwargs)

  File "venv/lib/python3.11/site-packages/scandeval/benchmarker.py", line 597, in benchmark
    benchmark_output = self._benchmark_single(

  File "venv/lib/python3.11/site-packages/scandeval/benchmarker.py", line 730, in _benchmark_single
    dataset = dataset_factory.build_dataset(dataset_config)

  File "venv/lib/python3.11/site-packages/scandeval/dataset_factory.py", line 57, in build_dataset
    raise ValueError(
ValueError: Could not find a benchmark class for any of the following potential names: swerec, sentiment-classification, sequence-classification.

sarthu

about 1 month ago

•

edited about 1 month ago

The command I used was

scandeval --model <model-id>  --language nl

Also the error happens in the outlines library but the odd part is all others models are fine, only happens with this model

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

Hello again,

I'm trying this but now I get a similar error in the same place:

ValueError: Could not find a benchmark class for any of the following potential names: dutch-social, sentiment-classification, sequence-classification.

Do you know what could be the problem?

Meanwhile, I found this discussion on a similar issue:
https://github.com/dottxt-ai/outlines/issues/820

Can you try doing what the solution there suggests on your end and see what happens?

sarthu

about 1 month ago

What is the version of scandeval you are using?
Mine is 13.0.0

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

I'm using 13.0.0 as well

sarthu

about 1 month ago

Can you try this:
https://github.com/ScandEval/ScandEval/issues/408

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

Hi, I seem to have fixed the import error manually, I'm running the tests now.

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

Hello, sorry for the delay.

I have experienced the same error on the conll-nl benchmark.

Please kindly try running

pip install outlines==0.0.36

before executing the evaluation. I have done so and the evaluation seems to be running fine.

sarthu

about 1 month ago

Thanks a lot, will be amazing to put the results on the benchmark table. Nice work !!

ferran-espuna

Language Technologies Unit @ Barcelona Supercomputing Center org about 1 month ago

You're welcome. I'm closing this discussion for now. Good luck!

ferran-espuna changed discussion status to closed about 1 month ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment