Structured Harm Reporting in AI: New Research Paper at AIES and DEFCON event!

Community Article Published July 18, 2024

📢 New paper accepted to AIES 2024: Coordinated Flaws Disclosure for AI

Why Coordinated Disclosure for AI?

The Problem

Key Contributions

Why It Matters

From Research to Practice: GRT2 at DEFCON 2024

Looking Ahead

📢 New paper accepted to AIES 2024: Coordinated Flaws Disclosure for AI

We're thrilled to announce our paper "Coordinated Disclosure for AI: Beyond Security Vulnerabilities" has been accepted to the AAAI/ACM Conference on AI, Ethics, and Society (AIES) 2024!

Authors: Sven Cattell, Avijit Ghosh, PhD, Lucie-Aimée Kaffee

Paper: https://arxiv.org/abs/2402.07039

Why Coordinated Disclosure for AI?

As AI systems become increasingly prevalent in our daily lives, we urgently need robust mechanisms to identify and address potential harms. While the cybersecurity community has well-established practices for vulnerability disclosure, the AI field lacks a structured process for reporting algorithmic flaws. Our paper aims to bridge this critical gap by proposing a framework specifically tailored to the unique challenges posed by AI systems.

The Problem

Currently, the identification of AI flaws often relies on calls for first or third-party periodic audits or the work of investigative journalists exposing issues in the media. However, these approaches have significant limitations in addressing the full spectrum of issues that can arise post-deployment. The lack of a standardized, accessible reporting process means that many potential flaws may go unnoticed or unreported, potentially leading to harm.

We propose a framework called Coordinated Flaw Disclosure (CFD) to address these challenges. By providing a unified structure for reporting AI flaws, CFD aims to boost the visibility, accessibility, and transparency of the reporting process. This framework creates a structured channel that enables a broader range of individuals – from AI researchers and ethicists to everyday users – to report issues they encounter. The framework provides a clear, systematic approach to handling reports when they arise, potentially leading to faster and more effective responses to identified flaws.

Key Contributions

Adapting CVD for AI: We propose adapting the Coordinated Vulnerability Disclosure (CVD) process from cybersecurity to create a Coordinated Flaw Disclosure (CFD) framework specifically designed for AI systems.
Defining "Flaws" in AI: We introduce the concept of a "flaw" in AI systems, defined as any unexpected model behavior outside the defined intent and scope of the model design. This definition helps clarify what issues should be reported and addressed.
Comprehensive Model Cards: We advocate for extending Model Cards to provide detailed documentation of a system's intent and scope, crucial for effective flaw reporting. This enhancement improves transparency and aids in identifying genuine flaws.
Independent Adjudication: Our framework includes a trusted independent adjudication panel to mediate disputes between submitters and vendors in the flaw recognition process, ensuring fair and unbiased assessments.
Automated Verification: We propose mechanisms for automatic verification of reported issues, streamlining the process and ensuring reproducibility. This feature helps in quickly validating and addressing reported flaws.

Why It Matters

Enhanced Flaw Identification: A structured disclosure process encourages more comprehensive and systematic identification of AI-related issues, improving overall system safety.
Balancing Interests: The CFD framework aims to strike a balance between the needs of organizations developing AI systems and the broader community's right to understand and address potential harms.
Improved Response Mechanisms: By establishing clear channels for reporting and addressing flaws, we can potentially speed up the process of addressing identified issues, reducing potential negative impacts.
Fostering Trust: A transparent, standardized process for flaw disclosure can help build public trust in AI systems and the organizations developing them, crucial for wider AI adoption.

From Research to Practice: GRT2 at DEFCON 2024

To put our research into practice and test the proposed framework in a real-world setting, AI Village and partners are organizing the Generative Red Team 2 (GRT2) event at DEFCON 2024. This event will provide a unique opportunity to apply and refine the Coordinated Flaw Disclosure (CFD) framework.

Event Announcement: https://grt.aivillage.org/announcement

The GRT2 event brings together a diverse group of stakeholders to simulate and stress-test the CFD process. DEF CON attendees will act as red team members, identifying and reporting flaws in a target AI system. An independent adjudication team of experts will mediate disputes between participants and the vendor, mirroring the "root" role in traditional CVE processes.

This hands-on application of our research will help refine the framework, provide valuable insights into its practical implementation, and contribute to the development of better AI transparency platforms. More details about the exact nature of the challenge and the different stakeholders coming soon! Watch this space 👀

Looking Ahead

Our proposed CFD framework, combined with the real-world testing at DEFCON's GRT2 event, represents a significant step toward more rigorous, standardized practices for identifying and addressing issues in AI systems. We hope this work will stimulate discussion and further research into effective AI governance and accountability mechanisms.

We invite the AI ethics and security communities to engage with our ideas, provide feedback, and help refine these concepts as we work towards safer, more transparent AI systems. Say hi if you're going to AIES or DEFCON this year and let's discuss!

#AIES2024 #AIEthics #AIAccountability #StructuredHarmReporting #CoordinatedDisclosure #DEFCON #GenerativeRedTeam

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote