julep-ai
/

dfe-base-en

@@ -21,6 +21,41 @@ The model was trained using a triplet loss to pull dialog embeddings closer to r
 DFE provides an efficient way to embed variable conversational dialog into a relevance space with factual statements. This enables real-time semantic search over knowledge without expensive query generation.
 ## Background
 So what's _Dialog-Fact Encoder_?
@@ -54,7 +89,7 @@ Here's where DFE comes in. The insight here is that embeddings for a dialog have
 So, if this information is already there in theory, how can we learn to connect embeddings of "facts" and "dialogues" based on relevance? DFE is trained to do exactly that. DFE is about learning this "relevance" transformation of a dialog so it matches similar facts.
-DFE is a built upon BGE (currently the best state-of-the-art model for embeddings). It has the base embeddings from the original BGE model and then an added "merge" model layer. The base BGE model is actually left completely unchanged (because it already knows how to turn a passage into an embedding very well. We add the new layers to learn how to "transform" the input depending on what kind of passage it is (a dialog or a fact) in a way that "relevant" stuff is closer together in the embedding space.
 This solves all of the three problems from the "query generation" method from earlier. Instead of generating a query using an LLM, you can store facts with their DFE embeddings in the database beforehand and then just embed the dialog using DFE and match. Since this operation is so much faster, you can basically do this on every turn without much hassle.
@@ -66,50 +101,14 @@ It inherits the base BERT model and pooling layer from BGE to generate 768-dimen
 DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
-Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
-Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
-This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
 DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
-The model was trained for 12 epochs using the Lion optimizer with 100 warmup steps and a learning rate of 0.0001. No evaluation steps were used during training.
 This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
-## Usage (Sentence-Transformers)
-Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-dialog = """
-Diwank: Hey, what are we eating for dinner today?
-Ishita: Already? I thought we just ate lol
-Diwank: Yeah, some of us work hard and get hungy
-Ishita: Okay, what do you want to eat then?
-Diwank: I want to eat out but I am thinking of something light.
-""".strip()
-facts = [
-  "Diwank likes Sushi.",
-  "Ishita does not like unnecessarily-pricey places restaurants",
-  "Diwank likes cooking.",
-  "Ishita is terrible at cooking.",
-  "Diwank loves to eat Ishita's head.",
-]
-model = SentenceTransformer("julep-ai/dfe-base-en")
-dialog_embeddings = model.encode({"dialog": dialog})
-fact_embeddings = model.encode([{"fact": fact} for fact in facts])
-```
 ## Dataset
 The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by:

 DFE provides an efficient way to embed variable conversational dialog into a relevance space with factual statements. This enables real-time semantic search over knowledge without expensive query generation.
+## Usage (Sentence-Transformers)
+Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
+```
+pip install -U sentence-transformers
+```
+Then you can use the model like this:
+```python
+from sentence_transformers import SentenceTransformer
+dialog = """
+Diwank: Hey, what are we eating for dinner today?
+Ishita: Already? I thought we just ate lol
+Diwank: Yeah, some of us work hard and get hungy
+Ishita: Okay, what do you want to eat then?
+Diwank: I want to eat out but I am thinking of something light.
+""".strip()
+facts = [
+  "Diwank likes Sushi.",
+  "Ishita does not like unnecessarily-pricey places restaurants",
+  "Diwank likes cooking.",
+  "Ishita is terrible at cooking.",
+  "Diwank loves to eat Ishita's head.",
+]
+model = SentenceTransformer("julep-ai/dfe-base-en")
+dialog_embeddings = model.encode({"dialog": dialog})
+fact_embeddings = model.encode([{"fact": fact} for fact in facts])
+```
 ## Background
 So what's _Dialog-Fact Encoder_?
 So, if this information is already there in theory, how can we learn to connect embeddings of "facts" and "dialogues" based on relevance? DFE is trained to do exactly that. DFE is about learning this "relevance" transformation of a dialog so it matches similar facts.
+DFE is a built upon BGE (currently the best state-of-the-art model for embeddings). It has the base embeddings from the original BGE model and added dense layers. The base BGE model is actually frozen and left completely unchanged because it already knows how to turn a passage into an embedding very well. We add the new layers to learn how to "transform" the input depending on what kind of passage it is (a dialog or a fact) in a way that "relevant" stuff is closer together in the embedding space.
 This solves all of the three problems from the "query generation" method from earlier. Instead of generating a query using an LLM, you can store facts with their DFE embeddings in the database beforehand and then just embed the dialog using DFE and match. Since this operation is so much faster, you can basically do this on every turn without much hassle.
 DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
+1. Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
+2. Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
+3. This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
 DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
 This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
 ## Dataset
 The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by: