Spaces:

GIZ
/

embedding_visualisation

Running

peter2000 commited on Nov 7, 2022

Commit

90dfdac

1 Parent(s): 65131af

Update apps/intro.py

Files changed (1) hide show

apps/intro.py CHANGED Viewed

@@ -29,7 +29,7 @@ def app():
         st.write(
             """
             Information cartography - Get your word/phrase/sentence/paragraph embedded and visualized.
-            The (English) sentence-transformers model "all-MiniLM-L6-v2" maps sentences & paragraphs to a 384 dimensional dense vector space This is normally used for tasks like clustering or semantic search, but in this case, we use it to place your text to a 3D map. Before plotting, the dimension needs to be reduced to three so we can actually plot it, but preserve as much information as possible. For this, we use a technology called umap.
             Simply put in your text and press EMBED, your examples will add up. You can use the category for different coloring.
             """)
@@ -59,7 +59,7 @@ def app():
                 cat_list .append(cat)
                 st.session_state['cat_list '] = cat_list
-                phrase_to_embed = ["The book is about "+ wte for wte in word_to_embed_list]
                 examples_embeddings = model.encode(phrase_to_embed)
                 examples_umap = umap_model.transform(examples_embeddings)

         st.write(
             """
             Information cartography - Get your word/phrase/sentence/paragraph embedded and visualized.
+            The (English) sentence-transformers model "all-MiniLM-L6-v2" maps sentences & paragraphs to a 384 dimensional dense vector space This is normally used for tasks like clustering or semantic search, but in this case, we use it to place your text to a 3D map. Before plotting, the dimension needs to be reduced to three so we can actually plot it, but preserve as much information as possible. For this, we use a technology called umap. The sentence transformer is context sensitive and works best with whole sentences, to account for that we extend your text with "The book is about <text>" if its less than 15 characters.
             Simply put in your text and press EMBED, your examples will add up. You can use the category for different coloring.
             """)
                 cat_list .append(cat)
                 st.session_state['cat_list '] = cat_list
+                phrase_to_embed = ["The book is about "+ wte for wte in word_to_embed_list if len(wte) <15]
                 examples_embeddings = model.encode(phrase_to_embed)
                 examples_umap = umap_model.transform(examples_embeddings)