Spaces:
Running
on
Zero
Training failure
Hello good afternoon, I have tried training several times today, but I keep getting this error, I restart it and finally it is canceled, the system keeps charging me for it.
Exception: Training failed.
INFO | 2024-09-08 23:40:52 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO | 2024-09-08 23:40:52 | autotrain.trainers.common:pause_space:77 - Pausing space...
AttributeError: np.string_
was removed in the NumPy 2.0 release. Use np.bytes_
instead.. Did you mean: 'strings'?
Traceback (most recent call last):
File "/app/omrozvn/script.py", line 129, in
main()
File "/app/omrozvn/script.py", line 123, in main
do_train(script_args)
File "/app/omrozvn/script.py", line 26, in do_train
raise Exception("Training failed.")
Exception: Training failed.
INFO | 2024-09-08 23:46:03 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO | 2024-09-08 23:46:03 | autotrain.trainers.common:pause_space:77 - Pausing space...
+1. I just got this same error right after latent caching. Logs:
Caching latents: 97%|ββββββββββ| 36/37 [00:22<00:00, 1.59it/s]
Caching latents: 100%|ββββββββββ| 37/37 [00:23<00:00, 1.50it/s]
Caching latents: 100%|ββββββββββ| 37/37 [00:23<00:00, 1.59it/s]
Traceback (most recent call last):
File "/app/aerial-photography/trainer.py", line 2136, in <module>
main(args)
File "/app/aerial-photography/trainer.py", line 1673, in main
accelerator.init_trackers("dreambooth-lora-sd-xl", config=vars(args))
File "/app/env/lib/python3.10/site-packages/accelerate/accelerator.py", line 619, in _inner
return PartialState().on_main_process(function)(*args, **kwargs)
File "/app/env/lib/python3.10/site-packages/accelerate/accelerator.py", line 2337, in init_trackers
tracker.store_init_configuration(config)
File "/app/env/lib/python3.10/site-packages/accelerate/tracking.py", line 79, in execute_on_main_process
return PartialState().on_main_process(function)(self, *args, **kwargs)
File "/app/env/lib/python3.10/site-packages/accelerate/tracking.py", line 211, in store_init_configuration
self.writer.add_hparams(values, metric_dict={})
File "/app/env/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 330, in add_hparams
exp, ssi, sei = hparams(hparam_dict, metric_dict, hparam_domain_discrete)
File "/app/env/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 194, in hparams
from tensorboard.plugins.hparams.metadata import (
File "/app/env/lib/python3.10/site-packages/tensorboard/plugins/hparams/metadata.py", line 32, in <module>
NULL_TENSOR = tensor_util.make_tensor_proto(
File "/app/env/lib/python3.10/site-packages/tensorboard/util/tensor_util.py", line 405, in make_tensor_proto
numpy_dtype = dtypes.as_dtype(nparray.dtype)
File "/app/env/lib/python3.10/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py", line 677, in as_dtype
if type_value.type == np.string_ or type_value.type == np.unicode_:
File "/app/env/lib/python3.10/site-packages/numpy/__init__.py", line 397, in __getattr__
raise AttributeError(
AttributeError: `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.. Did you mean: 'strings'?
Traceback (most recent call last):
File "/app/aerial-photography/script.py", line 129, in <module>
main()
File "/app/aerial-photography/script.py", line 123, in main
do_train(script_args)
File "/app/aerial-photography/script.py", line 26, in do_train
raise Exception("Training failed.")
Exception: Training failed.
INFO | 2024-09-10 14:14:31 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO | 2024-09-10 14:14:31 | autotrain.trainers.common:pause_space:77 - Pausing space...```
This should be fixed now