arxiv:2407.02837

Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction

Published on Jul 3, 2024

Authors:

Xinying Qiu

Abstract

Protecting Personal Identifiable Information (PII) in text data is crucial for privacy, but current PII generalization methods face challenges such as uneven data distributions and limited context awareness. To address these issues, we propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that considers the broader context and semantic relationships between the original text and generalized candidates. The context-aware approach employs Multilingual-BERT for text representation, functional transformations, and mean squared error scoring to evaluate candidates. Experiments on the WikiReplace dataset demonstrate the effectiveness of both methods, with the context-aware approach outperforming the feature-based one across different scales. This work contributes to advancing PII generalization techniques by highlighting the importance of feature selection, ensemble learning, and incorporating contextual information for better privacy protection in text anonymization.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.02837 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.02837 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.02837 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.