Spaces:

Vertaix
/

vendiscore

Build error

App Files Files Community

danf0 commited on Aug 30, 2022

Commit

e094451

•

1 Parent(s): b2dcfc6

Update README

Browse files

Files changed (1) hide show

README.md +52 -60

README.md CHANGED Viewed

@@ -80,75 +80,67 @@ Given n samples, the value of the Vendi Score ranges between 1 and n, with highe
 ### Examples
-```python
-import numpy as np
-vendiscore = evaluate.load("danf0/vendiscore")
-samples = [0, 0, 10, 10, 20, 20]
-k = lambda a, b: np.exp(-np.abs(a - b))
-vendiscore.compute(samples, k)
-# 2.9999
 ```
 If you already have precomputed a similarity matrix:
-```python
-K = np.array([[1.0, 0.9, 0.0],
-              [0.9, 1.0, 0.0],
-              [0.0, 0.0, 1.0]])
-vendiscore.compute(K, score_K=True)
-# 2.1573
 ```
-If your similarity function is a dot product between normalized
-embeddings $X\in\mathbb{R}^{n\times d}$, and $d < n$, it is faster
-to compute the Vendi Score using the covariance matrix,
-$\frac{1}{n} \sum_i x_i x_i^{\top}$:
-```python
-vendiscore.compute(X, score_dual=True)
 ```
-If the rows of $X$ are not normalized, set `normalize = True`.
-Images:
-```python
-from torchvision import datasets
-mnist = datasets.MNIST("data/mnist", train=False, download=True)
-digits = [[x for x, y in mnist if y == c] for c in range(10)]
-pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
-# The default embeddings are from the pool-2048 layer of the torchvision
-# Inception v3 model.
-inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
-for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
-# Output:
-# 0       7.68    3.45
-# 1       5.31    3.50
-# 2       12.18   3.62
-# 3       9.97    2.97
-# 4       11.10   3.75
-# 5       13.51   3.16
-# 6       9.06    3.63
-# 7       9.58    4.07
-# 8       9.69    3.74
-# 9       8.56    3.43
 ```
-Text:
-```python
-sents = ["Look, Jane.",
-         "See Spot.",
-         "See Spot run.",
-         "Run, Spot, run.",
-	 "Jane sees Spot run."]
-ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
-bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
-simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
-print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
-# N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
 ```
 ## Limitations and Bias

 ### Examples
+```
+>>> import numpy as np
+>>> vendiscore = evaluate.load("danf0/vendiscore")
+>>> samples = [0, 0, 10, 10, 20, 20]
+>>> k = lambda a, b: np.exp(-np.abs(a - b))
+>>> vendiscore.compute(samples, k)
+2.9999
 ```
 If you already have precomputed a similarity matrix:
+```
+>>> K = np.array([[1.0, 0.9, 0.0],
+                  [0.9, 1.0, 0.0],
+                  [0.0, 0.0, 1.0]])
+>>> vendiscore.compute(K, score_K=True)
+2.1573
+```
+If your similarity function is a dot product between `n` normalized
+`d`-dimensional embeddings `X`, and `d` < `n`, it is faster
+to compute the Vendi Score using the covariance matrix, `X @ X.T`.
+(If the rows of `X` are not normalized, set `normalize = True`.)
+```
+>>> X = np.array([[100, 0], [99, 1], [1, 99], [0, 100])
+>>> vendiscore.compute(X, score_dual=True, normalize=True)
+1.9989...
 ```
+Image similarity can be calculated using inner products between pixel vectors or between embeddings from a neural network.
+The default embeddings are from the pool-2048 layer of the torchvision version of the Inception v3 model; other embedding functions can be passed to the `model` argument.
 ```
+>>> from torchvision import datasets
+>>> mnist = datasets.MNIST("data/mnist", train=False, download=True)
+>>> digits = [[x for x, y in mnist if y == c] for c in range(10)]
+>>> pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
+>>> inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
+>>> for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
+0       7.68    3.45
+1       5.31    3.50
+2       12.18   3.62
+3       9.97    2.97
+4       11.10   3.75
+5       13.51   3.16
+6       9.06    3.63
+7       9.58    4.07
+8       9.69    3.74
+9       8.56    3.43
 ```
+Text similarity can be calculated using n-gram overlap or using inner products between embeddings from a neural network.
+```
+>>> sents = ["Look, Jane.",
+             "See Spot.",
+             "See Spot run.",
+             "Run, Spot, run.",
+	     "Jane sees Spot run."]
+>>> ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
+>>> bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
+>>> simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
+>>> print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
+N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
 ```
 ## Limitations and Bias