File size: 4,579 Bytes
2b4c28f 92cb9d5 2b4c28f 92cb9d5 2b4c28f 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac 92cb9d5 65972ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
language: en
library_name: transformers
tags:
- vision
- image-segmentation
- nvidia/mit-b5
- transformers.js
- onnx
datasets:
- celebamaskhq
---
# Face Parsing
![example image and output](demo.png)
[Semantic segmentation](https://huggingface.co./docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co./nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co./docs/transformers/model_doc/segformer).
> ONNX model for web inference contributed by [Xenova](https://huggingface.co./Xenova).
## Usage in Python
```python
import torch
from torch import nn
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import matplotlib.pyplot as plt
import requests
# convenience expression for automatically determining device
device = (
"cuda"
# Device for NVIDIA or AMD GPUs
if torch.cuda.is_available()
else "mps"
# Device for Apple Silicon (Metal Performance Shaders)
if torch.backends.mps.is_available()
else "cpu"
)
# load models
image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
model.to(device)
# expects a PIL.Image or torch.Tensor
url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
image = Image.open(requests.get(url, stream=True).raw)
# run inference on image
inputs = image_processor(images=image, return_tensors="pt").to(device)
outputs = model(**inputs)
logits = outputs.logits # shape (batch_size, num_labels, ~height/4, ~width/4)
# resize output to match input image dimensions
upsampled_logits = nn.functional.interpolate(logits,
size=image.size[::-1], # H x W
mode='bilinear',
align_corners=False)
# get label masks
labels = upsampled_logits.argmax(dim=1)[0]
# move to CPU to visualize in matplotlib
labels_viz = labels.cpu().numpy()
plt.imshow(labels_viz)
plt.show()
```
## Usage in the browser (Transformers.js)
```js
import {
pipeline,
env,
} from "https://cdn.jsdelivr.net/npm/@xenova/[email protected]";
// important to prevent errors since the model files are likely remote on HF hub
env.allowLocalModels = false;
// instantiate image segmentation pipeline with pretrained face parsing model
model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
// async inference since it could take a few seconds
const output = await model(url);
// each label is a separate mask object
// [
// { score: null, label: 'background', mask: transformers.js RawImage { ... }}
// { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
// ...
// ]
for (const m of output) {
print(`Found ${m.label}`);
m.mask.save(`${m.label}.png`);
}
```
### p5.js
Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.
```js
// ...
// asynchronously load transformers.js and instantiate model
async function preload() {
// load transformers.js library with a dynamic import
const { pipeline, env } = await import(
"https://cdn.jsdelivr.net/npm/@xenova/[email protected]"
);
// important to prevent errors since the model files are remote on HF hub
env.allowLocalModels = false;
// instantiate image segmentation pipeline with pretrained face parsing model
model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
print("face-parsing model loaded");
}
// ...
```
[full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)
### Model Description
- **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu)
- **Model type:** Transformer-based semantic segmentation image model
- **License:** non-commercial research and educational purposes
- **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co./docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).
## Limitations and Bias
### Bias
While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.
|