shahrukhx01
commited on
Commit
·
80868e7
1
Parent(s):
7f0856f
Update README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Fine Tune Bert for QUESTION CLASSIFICATION
|
2 |
+
|
3 |
+
| Train Loss | Validation Acc.| Test Acc.|
|
4 |
+
| ------------- |:-------------: | -----: |
|
5 |
+
| 0.000806 | 0.99 | 0.992 |
|
6 |
+
|
7 |
+
# USAGE
|
8 |
+
```python
|
9 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
10 |
+
|
11 |
+
tokenizer = AutoTokenizer.from_pretrained("shahrukhx01/bert-mini-finetune-question-detection")
|
12 |
+
|
13 |
+
model = AutoModelForSequenceClassification.from_pretrained("shahrukhx01/bert-mini-finetune-question-detection")
|
14 |
+
```
|
15 |
+
Trained to add feature of Question vs Statement classification in (Haystack)[https://github.com/deepset-ai/haystack/issues/611]
|
16 |
+
|
17 |
+
Problem Statement:
|
18 |
+
One common challenge that we saw in deployments: We need to distinguish between real questions and keyword queries that come in. We only want to route questions to the Reader branch in order to maximize the accuracy of results and minimize computation efforts/costs.
|
19 |
+
|
20 |
+
Describe the solution you'd like
|
21 |
+
|
22 |
+
New class QueryClassifier that takes a query as input and determines if it is a question or a keyword query.
|
23 |
+
We could start with a very basic version (maybe even rule-based) here and later extend it to use a classification model.
|
24 |
+
The run method would need to return query, "output_1" for a question and query, "output_2" for a keyword query in order to allow branching in the DAG.
|
25 |
+
|
26 |
+
Describe alternatives you've considered
|
27 |
+
Later it might also make sense to distinguish into more types (e.g. full sentence but not a question)
|
28 |
+
|
29 |
+
Additional context
|
30 |
+
We could use it like this in a pipeline
|
31 |
+
|
32 |
+
Baseline:
|
33 |
+
https://www.kaggle.com/shahrukhkhan/question-v-statement-detection
|
34 |
+
|
35 |
+
Dataset:
|
36 |
+
https://www.kaggle.com/stefanondisponibile/quora-question-keyword-pairs
|
37 |
+
|
38 |
+
Kaggle Notebook:
|
39 |
+
https://www.kaggle.com/shahrukhkhan/question-vs-statement-classification-mini-bert/
|