julep-ai
/

dfe-base-en

@@ -7,11 +7,58 @@ tags:
 ---
-# {MODEL_NAME}
-This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1536 dimensional dense vector space and can be used for tasks like clustering or semantic search.
-<!--- Describe your model here -->
 ## Usage (Sentence-Transformers)
@@ -33,24 +80,9 @@ print(embeddings)
 ```
-## Evaluation Results
-<!--- Describe how your model was evaluated -->
-For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
 ## Training
 The model was trained with the parameters:
-**DataLoader**:
-`torch.utils.data.dataloader.DataLoader` of length 1321 with parameters:
-```
-{'batch_size': 1280, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
-```
 **Loss**:
 `sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
@@ -77,6 +109,10 @@ Parameters of the fit()-Method:
 }
 ```
 ## Full Model Architecture
 ```
@@ -86,10 +122,18 @@ SentenceTransformer(
   (2): Asym(
     (dialog-0): Dense({'in_features': 768, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (dialog-1): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
-    (dialog-2): Dense({'in_features': 1536, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (fact-0): Dense({'in_features': 768, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (fact-1): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
-    (fact-2): Dense({'in_features': 1536, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
   )
 )
 ```

 ---
+# DFE (Dialog Fact Encoder)
+This is a [sentence-transformers](https://www.SBERT.net) model: It maps "dialog" & "facts" to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+The Dialog Fact Encoder (DFE) is an embedding model trained to capture semantic relevance between conversational dialog turns and factual statements. It builds upon the BGE embedding model by adding a merge layer that transforms the embeddings based on whether the input text is a "dialog" or "fact".
+Specifically, dialog inputs pass through additional dense layers to project the embeddings into a space optimized for comparing dialog turns. Similarly, factual inputs pass through separate dense layers to transform the embeddings for relevance matching against dialogs.
+This allows DFE to embed dialogs and facts into a joint relevance space without needing to generate explicit search queries. DFE enables low-latency approximate matching of relevant facts to dialog turns, while avoiding the high computational cost of query generation models.
+The model was trained using a triplet loss to pull dialog embeddings closer to relevant fact embeddings, while pushing non-relevant pairs further apart. This helps the model learn the nuanced semantics needed to assess relevance between dialogs and facts.
+DFE provides an efficient way to embed variable conversational dialog into a relevance space with factual statements. This enables real-time semantic search over knowledge without expensive query generation.
+## Background
+So what's _Dialog-Fact Encoder_?
+It is a new model trained by the Julep AI team to match (possibly) relevant facts to a dialog. It is an embedding model which means that it takes text as an input and outputs an embedding vector (a list of float numbers). In DFE's case, it takes an extra parameter "type" which can be either "dialog" or "fact".
+A regular embedding model like openai's text-embedding-ada-2 does not distinguish between different types and gives vectors that can then be used for calculating similarity. These models are great for building search because comparing vectors is relatively cheap so for a database (of say product descriptions), you can compute vectors for every row beforehand and then when a query comes in (like: "Winter coats for women"), calculate the query's embeddings and find items using vector similarity.
+Unfortunately, this does not work for dialog because conversational statements and turns within a dialog are typically not in the format of a "query". Take this case for example:
+**Database**:
+1. Diwank likes Sushi.
+2. Ishita does not like unnecessarily-pricey places restaurants
+3. Diwank likes cooking.
+4. Ishita is terrible at cooking.
+5. Diwank loves to eat Ishita's head.
+**Dialog**:
+> Diwank: Hey, what are we eating for dinner today?
+> Ishita: Already? I thought we just ate lol
+> Diwank: Yeah, some of us work hard and get hungy
+> Ishita: Okay, what do you want to eat then?
+> Diwank: I want to eat out but I am thinking of something light.
+Now, a text/vector/hybrid search would probably match all 5 facts to this conversation but, as you can see, only facts 1 and 2 are relevant. The only way to get the correct fact, right now, is to ask an LLM like gpt-3.5 to "generate a query" for querying the database and then using that for similarity. Unfortunately, there are three big problems with that:
+- It adds latency and cost.
+- You have to figure out "when" to run this process and retrieve (which is hard).
+- The prompt for generating the query will have to be customized for every use case because the prompt has to "know" what is "query-able". So for example, in this case above, we would have to specify that you can write a query to search preferences of Diwank and Ishita from the database.
+Here's where DFE comes in. The insight here is that embeddings for a dialog have meaningful information to distinguish whether a fact is relevant or not (that is exactly how we can tell that Fact 1 and 2 are relevant and others are not in the example above because we can "see" the meaning of the dialog). Normal embedding models are only interested in "overall similarity" and they'd miss this nuance, especially for details that were NOT present in the dialog directly (for example, fact 1 mentions Sushi whereas no food items are specifically mentioned in the dialog).
+So, if this information is already there in theory, how can we learn to connect embeddings of "facts" and "dialogues" based on relevance? DFE is trained to do exactly that. DFE is about learning this "relevance" transformation of a dialog so it matches similar facts.
+DFE is a built upon BGE (currently the best state-of-the-art model for embeddings). It has the base embeddings from the original BGE model and then an added "merge" model layer. The base BGE model is actually left completely unchanged (because it already knows how to turn a passage into an embedding very well. We add the new layers to learn how to "transform" the input depending on what kind of passage it is (a dialog or a fact) in a way that "relevant" stuff is closer together in the embedding space.
+This solves all of the three problems from the "query generation" method from earlier. Instead of generating a query using an LLM, you can store facts with their DFE embeddings in the database beforehand and then just embed the dialog using DFE and match. Since this operation is so much faster, you can basically do this on every turn without much hassle.
+The "query generation" method is still far superior in quality but is too prohibitive (costly + slow) in normal circumstances and DFE solves that. :)
 ## Usage (Sentence-Transformers)
 ```
 ## Training
 The model was trained with the parameters:
 **Loss**:
 `sentence_transformers.losses.TripletLoss.TripletLoss` with parameters:
 }
 ```
+## Evaluation Results
+<!--- Describe how your model was evaluated -->
 ## Full Model Architecture
 ```
   (2): Asym(
     (dialog-0): Dense({'in_features': 768, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (dialog-1): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
+    (dialog-2): Dropout(
+        (dropout_layer): Dropout(p=0.1, inplace=False)
+    )
+    (dialog-3): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
+    (dialog-4): Dense({'in_features': 1536, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (fact-0): Dense({'in_features': 768, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
     (fact-1): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
+    (fact-2): Dropout(
+        (dropout_layer): Dropout(p=0.1, inplace=False)
+    )
+    (fact-3): Dense({'in_features': 1536, 'out_features': 1536, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
+    (fact-4): Dense({'in_features': 1536, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
   )
 )
 ```