--- language: en library_name: transformers tags: - vision - image-segmentation - nvidia/mit-b5 - transformers.js - onnx datasets: - celebamaskhq --- # Face Parsing ![example image and output](demo.png) [Semantic segmentation](https://huggingface.co./docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co./nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co./docs/transformers/model_doc/segformer). > ONNX model for web inference contributed by [Xenova](https://huggingface.co./Xenova). ## Usage in Python ```python import torch from torch import nn from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation from PIL import Image import matplotlib.pyplot as plt import requests # convenience expression for automatically determining device device = ( "cuda" # Device for NVIDIA or AMD GPUs if torch.cuda.is_available() else "mps" # Device for Apple Silicon (Metal Performance Shaders) if torch.backends.mps.is_available() else "cpu" ) # load models image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing") model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing") model.to(device) # expects a PIL.Image or torch.Tensor url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6" image = Image.open(requests.get(url, stream=True).raw) # run inference on image inputs = image_processor(images=image, return_tensors="pt").to(device) outputs = model(**inputs) logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4) # resize output to match input image dimensions upsampled_logits = nn.functional.interpolate(logits, size=image.size[::-1], # H x W mode='bilinear', align_corners=False) # get label masks labels = upsampled_logits.argmax(dim=1)[0] # move to CPU to visualize in matplotlib labels_viz = labels.cpu().numpy() plt.imshow(labels_viz) plt.show() ``` ## Usage in the browser (Transformers.js) ```js import { pipeline, env, } from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0"; // important to prevent errors since the model files are likely remote on HF hub env.allowLocalModels = false; // instantiate image segmentation pipeline with pretrained face parsing model model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); // async inference since it could take a few seconds const output = await model(url); // each label is a separate mask object // [ // { score: null, label: 'background', mask: transformers.js RawImage { ... }} // { score: null, label: 'hair', mask: transformers.js RawImage { ... }} // ... // ] for (const m of output) { print(`Found ${m.label}`); m.mask.save(`${m.label}.png`); } ``` ### p5.js Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions. ```js // ... // asynchronously load transformers.js and instantiate model async function preload() { // load transformers.js library with a dynamic import const { pipeline, env } = await import( "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.14.0" ); // important to prevent errors since the model files are remote on HF hub env.allowLocalModels = false; // instantiate image segmentation pipeline with pretrained face parsing model model = await pipeline("image-segmentation", "jonathandinu/face-parsing"); print("face-parsing model loaded"); } // ... ``` [full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh) ### Model Description - **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu) - **Model type:** Transformer-based semantic segmentation image model - **License:** non-commercial research and educational purposes - **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co./docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203). ## Limitations and Bias ### Bias While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.