|
# Training |
|
|
|
From the previous tutorials, you may now have a custom model and a data loader. |
|
To run training, users typically have a preference in one of the following two styles: |
|
|
|
### Custom Training Loop |
|
|
|
With a model and a data loader ready, everything else needed to write a training loop can |
|
be found in PyTorch, and you are free to write the training loop yourself. |
|
This style allows researchers to manage the entire training logic more clearly and have full control. |
|
One such example is provided in [tools/plain_train_net.py](../../tools/plain_train_net.py). |
|
|
|
Any customization on the training logic is then easily controlled by the user. |
|
|
|
### Trainer Abstraction |
|
|
|
We also provide a standardized "trainer" abstraction with a |
|
hook system that helps simplify the standard training behavior. |
|
It includes the following two instantiations: |
|
|
|
* [SimpleTrainer](../modules/engine.html#detectron2.engine.SimpleTrainer) |
|
provides a minimal training loop for single-cost single-optimizer single-data-source training, with nothing else. |
|
Other tasks (checkpointing, logging, etc) can be implemented using |
|
[the hook system](../modules/engine.html#detectron2.engine.HookBase). |
|
* [DefaultTrainer](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) is a `SimpleTrainer` initialized from a |
|
yacs config, used by |
|
[tools/train_net.py](../../tools/train_net.py) and many scripts. |
|
It includes more standard default behaviors that one might want to opt in, |
|
including default configurations for optimizer, learning rate schedule, |
|
logging, evaluation, checkpointing etc. |
|
|
|
To customize a `DefaultTrainer`: |
|
|
|
1. For simple customizations (e.g. change optimizer, evaluator, LR scheduler, data loader, etc.), overwrite [its methods](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) in a subclass, just like [tools/train_net.py](../../tools/train_net.py). |
|
2. For extra tasks during training, check the |
|
[hook system](../modules/engine.html#detectron2.engine.HookBase) to see if it's supported. |
|
|
|
As an example, to print hello during training: |
|
```python |
|
class HelloHook(HookBase): |
|
def after_step(self): |
|
if self.trainer.iter % 100 == 0: |
|
print(f"Hello at iteration {self.trainer.iter}!") |
|
``` |
|
3. Using a trainer+hook system means there will always be some non-standard behaviors that cannot be supported, especially in research. |
|
For this reason, we intentionally keep the trainer & hook system minimal, rather than powerful. |
|
If anything cannot be achieved by such a system, it's easier to start from [tools/plain_train_net.py](../../tools/plain_train_net.py) to implement custom training logic manually. |
|
|
|
### Logging of Metrics |
|
|
|
During training, detectron2 models and trainer put metrics to a centralized [EventStorage](../modules/utils.html#detectron2.utils.events.EventStorage). |
|
You can use the following code to access it and log metrics to it: |
|
```python |
|
from detectron2.utils.events import get_event_storage |
|
|
|
# inside the model: |
|
if self.training: |
|
value = # compute the value from inputs |
|
storage = get_event_storage() |
|
storage.put_scalar("some_accuracy", value) |
|
``` |
|
|
|
Refer to its documentation for more details. |
|
|
|
Metrics are then written to various destinations with [EventWriter](../modules/utils.html#module-detectron2.utils.events). |
|
DefaultTrainer enables a few `EventWriter` with default configurations. |
|
See above for how to customize them. |
|
|