jonathandinu
/

face-parsing

@@ -1,15 +1,133 @@
 ---
 language: en
-license: cc0-1.0
 library_name: transformers
 tags:
-- vision
-- image-segmentation
-- nvidia/mit-b5
-- transformers.js
-- onnx
 datasets:
-- celebamaskhq
 ---
-## Face Parsing

 ---
 language: en
 library_name: transformers
 tags:
+  - vision
+  - image-segmentation
+  - nvidia/mit-b5
+  - transformers.js
+  - onnx
 datasets:
+  - celebamaskhq
 ---
+# Face Parsing
+[Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).
+> ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).
+## Usage in Python
+```python
+import torch
+from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
+from PIL import Image
+import requests
+# convenience expression for automatically determining device
+device = (
+    "cuda"
+    # Device for NVIDIA or AMD GPUs
+    if torch.cuda.is_available()
+    else "mps"
+    # Device for Apple Silicon (Metal Performance Shaders)
+    if torch.backends.mps.is_available()
+    else "cpu"
+)
+# load models
+image_processor = SegformerImageProcessor.from_pretrained("jonathandinu/face-parsing")
+model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-parsing")
+model.to(device)
+# expects a PIL.Image or torch.Tensor
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+pixel_values = F.resize(image, (512, 512)).unsqueeze(0)
+# run inference on image
+inputs = image_processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)
+# resize output to match input image dimensions
+upsampled_logits = nn.functional.interpolate(logits,
+                size=image.shape[1:], # H x W
+                mode='bilinear',
+                align_corners=False)
+# get label masks
+masks = upsampled_logits.argmax(dim=1)[0]
+```
+## Usage in the browser (Transformers.js)
+```js
+import {
+  pipeline,
+  env,
+} from "https://cdn.jsdelivr.net/npm/@xenova/[email protected]";
+// important to prevent errors since the model files are likely remote on HF hub
+env.allowLocalModels = false;
+// instantiate image segmentation pipeline with pretrained face parsing model
+model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
+// async inference since it could take a few seconds
+const output = await model(url);
+// each label is a separate mask object
+// [
+//   { score: null, label: 'background', mask: transformers.js RawImage { ... }}
+//   { score: null, label: 'hair', mask: transformers.js RawImage { ... }}
+//    ...
+// ]
+for (const m of output) {
+  print(`Found ${m.label}`);
+  m.mask.save(`${m.label}.png`);
+}
+```
+### p5.js
+Since [p5.js](https://p5js.org/) uses an animation loop abstraction, we need to take care loading the model and making predictions.
+```js
+// ...
+// asynchronously load transformers.js and instantiate model
+async function preload() {
+  // load transformers.js library with a dynamic import
+  const { pipeline, env } = await import(
+    "https://cdn.jsdelivr.net/npm/@xenova/[email protected]"
+  );
+  // important to prevent errors since the model files are remote on HF hub
+  env.allowLocalModels = false;
+  // instantiate image segmentation pipeline with pretrained face parsing model
+  model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
+  print("face-parsing model loaded");
+  loading = false;
+}
+// ...
+```
+[full p5.js example](https://editor.p5js.org/jonathan.ai/sketches/wZn15Dvgh)
+### Model Description
+- **Developed by:** [Jonathan Dinu](https://twitter.com/jonathandinu)
+- **Model type:** Transformer-based semantic segmentation image model
+- **License:** non-commercial research and educational purposes
+- **Resources for more information:** Transformers docs on [Segformer](https://huggingface.co/docs/transformers/model_doc/segformer) and/or the [original research paper](https://arxiv.org/abs/2105.15203).
+## Limitations and Bias
+### Bias
+While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse.