Getting DatasetGenerationError: An error occurred while generating the dataset

#2
by anilpatelia - opened

Please help me to solve the following issue while downloading the dataset:

/root/.cache/huggingface/modules/datasets_modules/datasets/common_voice/220833898d6a60c50f621126e51fb22eb2dfe5244392c70dccd8e6e2f055f4bf/common_voice.py:634: FutureWarning:
This version of the Common Voice dataset is deprecated.
You can download the latest one with
>>> load_dataset("mozilla-foundation/common_voice_11_0", "en")

warnings.warn(

Generating train split:   0%
 0/2009 [00:00<?, ? examples/s]


ReadError Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)
1749 _time = time.time()
-> 1750 for key, record in generator:
1751 if max_shard_size is not None and writer._num_bytes > max_shard_size:

13 frames

ReadError: truncated header

The above exception was the direct cause of the following exception:

DatasetGenerationError Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/datasets/builder.py in _prepare_split_single(self, gen_kwargs, fpath, file_format, max_shard_size, split_info, check_duplicate_keys, job_id)
1784 if isinstance(e, SchemaInferenceError) and e.context is not None:
1785 e = e.context
-> 1786 raise DatasetGenerationError("An error occurred while generating the dataset") from e
1787
1788 yield job_id, True, (total_num_examples, total_num_bytes, writer._features, num_shards, shard_lengths)

DatasetGenerationError: An error occurred while generating the dataset

Sign up or log in to comment