diwank commited on
Commit
a880117
·
1 Parent(s): dfe1e58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -40
README.md CHANGED
@@ -21,6 +21,41 @@ The model was trained using a triplet loss to pull dialog embeddings closer to r
21
 
22
  DFE provides an efficient way to embed variable conversational dialog into a relevance space with factual statements. This enables real-time semantic search over knowledge without expensive query generation.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Background
25
 
26
  So what's _Dialog-Fact Encoder_?
@@ -54,7 +89,7 @@ Here's where DFE comes in. The insight here is that embeddings for a dialog have
54
 
55
  So, if this information is already there in theory, how can we learn to connect embeddings of "facts" and "dialogues" based on relevance? DFE is trained to do exactly that. DFE is about learning this "relevance" transformation of a dialog so it matches similar facts.
56
 
57
- DFE is a built upon BGE (currently the best state-of-the-art model for embeddings). It has the base embeddings from the original BGE model and then an added "merge" model layer. The base BGE model is actually left completely unchanged (because it already knows how to turn a passage into an embedding very well. We add the new layers to learn how to "transform" the input depending on what kind of passage it is (a dialog or a fact) in a way that "relevant" stuff is closer together in the embedding space.
58
 
59
  This solves all of the three problems from the "query generation" method from earlier. Instead of generating a query using an LLM, you can store facts with their DFE embeddings in the database beforehand and then just embed the dialog using DFE and match. Since this operation is so much faster, you can basically do this on every turn without much hassle.
60
 
@@ -66,50 +101,14 @@ It inherits the base BERT model and pooling layer from BGE to generate 768-dimen
66
 
67
  DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
68
 
69
- Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
70
- Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
71
- This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
72
 
73
  DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
74
 
75
- The model was trained for 12 epochs using the Lion optimizer with 100 warmup steps and a learning rate of 0.0001. No evaluation steps were used during training.
76
-
77
  This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
78
 
79
- ## Usage (Sentence-Transformers)
80
-
81
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
82
-
83
- ```
84
- pip install -U sentence-transformers
85
- ```
86
-
87
- Then you can use the model like this:
88
-
89
- ```python
90
- from sentence_transformers import SentenceTransformer
91
-
92
- dialog = """
93
- Diwank: Hey, what are we eating for dinner today?
94
- Ishita: Already? I thought we just ate lol
95
- Diwank: Yeah, some of us work hard and get hungy
96
- Ishita: Okay, what do you want to eat then?
97
- Diwank: I want to eat out but I am thinking of something light.
98
- """.strip()
99
-
100
- facts = [
101
- "Diwank likes Sushi.",
102
- "Ishita does not like unnecessarily-pricey places restaurants",
103
- "Diwank likes cooking.",
104
- "Ishita is terrible at cooking.",
105
- "Diwank loves to eat Ishita's head.",
106
- ]
107
-
108
- model = SentenceTransformer("julep-ai/dfe-base-en")
109
- dialog_embeddings = model.encode({"dialog": dialog})
110
- fact_embeddings = model.encode([{"fact": fact} for fact in facts])
111
- ```
112
-
113
  ## Dataset
114
 
115
  The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by:
 
21
 
22
  DFE provides an efficient way to embed variable conversational dialog into a relevance space with factual statements. This enables real-time semantic search over knowledge without expensive query generation.
23
 
24
+
25
+ ## Usage (Sentence-Transformers)
26
+
27
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
28
+
29
+ ```
30
+ pip install -U sentence-transformers
31
+ ```
32
+
33
+ Then you can use the model like this:
34
+
35
+ ```python
36
+ from sentence_transformers import SentenceTransformer
37
+
38
+ dialog = """
39
+ Diwank: Hey, what are we eating for dinner today?
40
+ Ishita: Already? I thought we just ate lol
41
+ Diwank: Yeah, some of us work hard and get hungy
42
+ Ishita: Okay, what do you want to eat then?
43
+ Diwank: I want to eat out but I am thinking of something light.
44
+ """.strip()
45
+
46
+ facts = [
47
+ "Diwank likes Sushi.",
48
+ "Ishita does not like unnecessarily-pricey places restaurants",
49
+ "Diwank likes cooking.",
50
+ "Ishita is terrible at cooking.",
51
+ "Diwank loves to eat Ishita's head.",
52
+ ]
53
+
54
+ model = SentenceTransformer("julep-ai/dfe-base-en")
55
+ dialog_embeddings = model.encode({"dialog": dialog})
56
+ fact_embeddings = model.encode([{"fact": fact} for fact in facts])
57
+ ```
58
+
59
  ## Background
60
 
61
  So what's _Dialog-Fact Encoder_?
 
89
 
90
  So, if this information is already there in theory, how can we learn to connect embeddings of "facts" and "dialogues" based on relevance? DFE is trained to do exactly that. DFE is about learning this "relevance" transformation of a dialog so it matches similar facts.
91
 
92
+ DFE is a built upon BGE (currently the best state-of-the-art model for embeddings). It has the base embeddings from the original BGE model and added dense layers. The base BGE model is actually frozen and left completely unchanged because it already knows how to turn a passage into an embedding very well. We add the new layers to learn how to "transform" the input depending on what kind of passage it is (a dialog or a fact) in a way that "relevant" stuff is closer together in the embedding space.
93
 
94
  This solves all of the three problems from the "query generation" method from earlier. Instead of generating a query using an LLM, you can store facts with their DFE embeddings in the database beforehand and then just embed the dialog using DFE and match. Since this operation is so much faster, you can basically do this on every turn without much hassle.
95
 
 
101
 
102
  DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
103
 
104
+ 1. Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
105
+ 2. Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
106
+ 3. This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
107
 
108
  DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
109
 
 
 
110
  This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
111
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ## Dataset
113
 
114
  The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by: