Image Processor
An image processor is in charge of preparing input features for vision models and post processing their outputs. This includes transformations such as resizing, normalization, and conversion to PyTorch, TensorFlow, Flax and Numpy tensors. It may also include model specific post-processing such as converting logits to segmentation masks.
ImageProcessingMixin
This is an image processor mixin used to provide saving/loading functionality for sequential and image feature extractors.
from_pretrained
< source >( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained image_processor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like
bert-base-uncased
, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased
. - a path to a directory containing a image processor file saved using the
save_pretrained() method, e.g.,
./my_model_directory/
. - a path or url to a saved image processor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
- a string, the model id of a pretrained image_processor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model image processor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the image processor files and override the cached versions if they exist. - resume_download (
bool
, optional, defaults toFalse
) — Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.
Instantiate a type of ImageProcessingMixin from an image processor.
Examples:
# We can't instantiate directly the base class *ImageProcessingMixin* so let's show the examples on a
# derived class: *CLIPImageProcessor*
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32"
) # Download image_processing_config from huggingface.co and cache.
image_processor = CLIPImageProcessor.from_pretrained(
"./test/saved_model/"
) # E.g. image processor (or model) was saved using *save_pretrained('./test/saved_model/')*
image_processor = CLIPImageProcessor.from_pretrained("./test/saved_model/preprocessor_config.json")
image_processor = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False
)
assert image_processor.do_normalize is False
image_processor, unused_kwargs = CLIPImageProcessor.from_pretrained(
"openai/clip-vit-base-patch32", do_normalize=False, foo=False, return_unused_kwargs=True
)
assert image_processor.do_normalize is False
assert unused_kwargs == {"foo": False}
save_pretrained
< source >( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )
Parameters
- save_directory (
str
oros.PathLike
) — Directory where the image processor JSON file will be saved (will be created if it does not exist). - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save an image processor object to the directory save_directory
, so that it can be re-loaded using the
from_pretrained() class method.
BatchFeature
class transformers.BatchFeature
< source >( data: typing.Union[typing.Dict[str, typing.Any], NoneType] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )
Parameters
Holds the output of the pad() and feature extractor specific __call__
methods.
This class is derived from a python dictionary and can be used as a dictionary.
convert_to_tensors
< source >( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
- tensor_type (
str
or TensorType, optional) — The type of tensors to use. Ifstr
, should be one of the values of the enum TensorType. IfNone
, no modification is done.
Convert the inner content to tensors.
to
< source >( *args **kwargs ) → BatchFeature
Send all values to device by calling v.to(*args, **kwargs)
(PyTorch only). This should support casting in
different dtypes
and sending the BatchFeature
to a different device
.
BaseImageProcessor
center_crop
< source >( image: ndarray size: typing.Dict[str, int] data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs )
Parameters
- image (
np.ndarray
) — Image to center crop. - size (
Dict[str, int]
) — Size of the output image. - data_format (
str
orChannelDimension
, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
- input_data_format (
ChannelDimension
orstr
, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
Center crop an image to (size["height"], size["width"])
. If the input size is smaller than crop_size
along
any edge, the image is padded with 0’s and then center cropped.
normalize
< source >( image: ndarray mean: typing.Union[float, typing.Iterable[float]] std: typing.Union[float, typing.Iterable[float]] data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs ) → np.ndarray
Parameters
- image (
np.ndarray
) — Image to normalize. - mean (
float
orIterable[float]
) — Image mean to use for normalization. - std (
float
orIterable[float]
) — Image standard deviation to use for normalization. - data_format (
str
orChannelDimension
, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
- input_data_format (
ChannelDimension
orstr
, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
Returns
np.ndarray
The normalized image.
Normalize an image. image = (image - image_mean) / image_std.
rescale
< source >( image: ndarray scale: float data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None input_data_format: typing.Union[transformers.image_utils.ChannelDimension, str, NoneType] = None **kwargs ) → np.ndarray
Parameters
- image (
np.ndarray
) — Image to rescale. - scale (
float
) — The scaling factor to rescale pixel values by. - data_format (
str
orChannelDimension
, optional) — The channel dimension format for the output image. If unset, the channel dimension format of the input image is used. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
- input_data_format (
ChannelDimension
orstr
, optional) — The channel dimension format for the input image. If unset, the channel dimension format is inferred from the input image. Can be one of:"channels_first"
orChannelDimension.FIRST
: image in (num_channels, height, width) format."channels_last"
orChannelDimension.LAST
: image in (height, width, num_channels) format.
Returns
np.ndarray
The rescaled image.
Rescale an image by a scale factor. image = image * scale.