Commit
·
cf75e58
1
Parent(s):
d2fd2f1
add evaluation
Browse files
README.md
CHANGED
@@ -54,14 +54,15 @@ The classification task for v1 is split into two stages:
|
|
54 |
1. URL features model
|
55 |
- **96.5%+ accurate** on training and validation data
|
56 |
- 2,436,727 rows of labelled URLs
|
57 |
-
- evaluation from v2: slightly overfitted
|
58 |
2. Website features model
|
59 |
- **98.4% accurate** on training data, and **98.9% accurate** on validation data
|
60 |
- 911,180 rows of 42 features
|
|
|
61 |
|
62 |
## Training Features
|
63 |
I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
|
64 |
-
Here's the dict passed to `GridSearchCV
|
65 |
```python
|
66 |
params = {
|
67 |
'objective': 'binary',
|
|
|
54 |
1. URL features model
|
55 |
- **96.5%+ accurate** on training and validation data
|
56 |
- 2,436,727 rows of labelled URLs
|
57 |
+
- evaluation from v2: slightly overfitted, by perhaps around 0.8%
|
58 |
2. Website features model
|
59 |
- **98.4% accurate** on training data, and **98.9% accurate** on validation data
|
60 |
- 911,180 rows of 42 features
|
61 |
+
- evaluation from v2: biased towards the URL feature (bert_confidence) column
|
62 |
|
63 |
## Training Features
|
64 |
I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
|
65 |
+
Here's the dict passed to `sklearn`'s '`GridSearchCV` function:
|
66 |
```python
|
67 |
params = {
|
68 |
'objective': 'binary',
|