lixsh6
/

wsdm23_pretrain

Model card Files Files and versions Community

wsdm23_pretrain / README.md

lixsh6's picture

Update README.md

b92fcc1 over 1 year ago

|

history blame contribute delete

2.17 kB

WSDM Cup 2023 BERT Checkpoints:

This repo contains the checkpoints of our competition in WSDM Cup 2023: Pre-training for Web Search and Unbiased Learning for Web Search.

Paper released

Please refer to our paper for details in this competition:

Task1 Unbiased Learning to rank: Multi-Feature Integration for Perception-Dependent Examination-Bias Estimation
Task2 Pretraining for web search: Pretraining De-Biased Language Model with Large-scale Click Logs for Document Ranking

Method Overview

Pre-training BERT with MLM and CTR prediction loss (or multi-task CTR prediction loss).
Finetuning BERT with pairwise ranking loss.
Obtain prediction scores from different BERTs.
Ensemble learning to combine BERT features and sparse features.

Details will be updated in the submission paper.

BERT features:

1) Model details: Checkpoints Download Here

Index	Model Flag	Method	Pretrain step	Finetune step	DCG on leaderboard
1	large_group2_wwm_from_unw4625K	M1	1700K	5130	11.96214
2	large_group2_wwm_from_unw4625K	M1	1700K	5130	NAN
3	base_group2_wwm	M2	2150K	5130	~11.32363
4	large_group2_wwm_from_unw4625K	M1	590K	5130	11.94845
5	large_group2_wwm_from_unw4625K	M1	1700K	4180	NAN
6	large_group2_mt_pretrain	M3	1940K	5130	NAN

2) Method details

Method	Model Layers	Details
M1	24	WWM & CTR prediction as pretraining tasks
M2	12	WWM & CTR prediction as pretraining tasks
M3	24	WWM & Multi-task CTR prediction as pretraining tasks

Contacts

Xiangsheng Li: [email protected].
Xiaoshu Chen: [email protected]