jonathandinu
/

face-parsing

@@ -13,6 +13,8 @@ datasets:
 # Face Parsing
 [Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).
 > ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).
@@ -21,8 +23,11 @@ datasets:
 ```python
 import torch
 from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
 from PIL import Image
 import requests
 # convenience expression for automatically determining device
@@ -42,23 +47,27 @@ model = SegformerForSemanticSegmentation.from_pretrained("jonathandinu/face-pars
 model.to(device)
 # expects a PIL.Image or torch.Tensor
-url = "http://images.cocodataset.org/val2017/000000039769.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
-pixel_values = F.resize(image, (512, 512)).unsqueeze(0)
 # run inference on image
-inputs = image_processor(images=image, return_tensors="pt")
 outputs = model(**inputs)
-logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)
 # resize output to match input image dimensions
 upsampled_logits = nn.functional.interpolate(logits,
-                size=image.shape[1:], # H x W
                 mode='bilinear',
                 align_corners=False)
 # get label masks
-masks = upsampled_logits.argmax(dim=1)[0]
 ```
 ## Usage in the browser (Transformers.js)
@@ -111,7 +120,6 @@ async function preload() {
   model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
   print("face-parsing model loaded");
-  loading = false;
 }
 // ...
@@ -130,4 +138,4 @@ async function preload() {
 ### Bias
-While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse.

 # Face Parsing
+![example image and output](demo.png)
 [Semantic segmentation](https://huggingface.co/docs/transformers/tasks/semantic_segmentation) model fine-tuned from [nvidia/mit-b5](https://huggingface.co/nvidia/mit-b5) with [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) for face parsing. For additional options, see the Transformers [Segformer docs](https://huggingface.co/docs/transformers/model_doc/segformer).
 > ONNX model for web inference contributed by [Xenova](https://huggingface.co/Xenova).
 ```python
 import torch
+from torch import nn
 from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
 from PIL import Image
+import matplotlib.pyplot as plt
 import requests
 # convenience expression for automatically determining device
 model.to(device)
 # expects a PIL.Image or torch.Tensor
+url = "https://images.unsplash.com/photo-1539571696357-5a69c17a67c6"
 image = Image.open(requests.get(url, stream=True).raw)
 # run inference on image
+inputs = image_processor(images=image, return_tensors="pt").to(device)
 outputs = model(**inputs)
+logits = outputs.logits  # shape (batch_size, num_labels, ~height/4, ~width/4)
 # resize output to match input image dimensions
 upsampled_logits = nn.functional.interpolate(logits,
+                size=image.size[::-1], # H x W
                 mode='bilinear',
                 align_corners=False)
 # get label masks
+labels = upsampled_logits.argmax(dim=1)[0]
+# move to CPU to visualize in matplotlib
+labels_viz = labels.cpu().numpy()
+plt.imshow(labels_viz)
+plt.show()
 ```
 ## Usage in the browser (Transformers.js)
   model = await pipeline("image-segmentation", "jonathandinu/face-parsing");
   print("face-parsing model loaded");
 }
 // ...
 ### Bias
+While the capabilities of computer vision models are impressive, they can also reinforce or exacerbate social biases. The [CelebAMask-HQ](https://github.com/switchablenorms/CelebAMask-HQ) dataset used for fine-tuning is large but not necessarily perfectly diverse or representative. Also, they are images of.... just celebrities.

demo.png ADDED Viewed