{ "cells": [ { "cell_type": "markdown", "id": "604ab692-a51f-4093-bf94-45b503c68d33", "metadata": {}, "source": [ "# AgentReview\n", "\n", "\n", "\n", "In this tutorial, you will explore customizing the AgentReview experiment.\n", "\n", "📑 Venue: EMNLP 2024 (Oral)\n", "\n", "🔗 arXiv: [https://arxiv.org/abs/2406.12708](https://arxiv.org/abs/2406.12708)\n", "\n", "🌐 Website: [https://agentreview.github.io/](https://agentreview.github.io/)\n", "\n", "```bibtex\n", "@inproceedings{jin2024agentreview,\n", " title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},\n", " author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},\n", " booktitle={EMNLP},\n", " year={2024}\n", "}\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bdb3190e-09cf-44e7-b539-f531dfc68446", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\"" ] }, { "cell_type": "markdown", "id": "09de5377-25f3-4363-923a-c597ec6e52d0", "metadata": {}, "source": [ "## Specify OpenAI Keys\n", "\n", "### OpenAI\n", "\n", "If you use OpenAI client, specify your OpenAI key here" ] }, { "cell_type": "code", "execution_count": null, "id": "62906f8a-6aef-4d48-8a3e-ba0b9c3d5b4b", "metadata": {}, "outputs": [], "source": [ "# If you use either OpenAI or AzureOpenAI, specify the API key here\n", "os.environ['OPENAI_API_KEY'] = ... # Your OpenAI key here" ] }, { "cell_type": "markdown", "id": "7c2c9418-c67f-4824-b40f-40b5a7eea781", "metadata": {}, "source": [ "### AzureOpenAI\n", "\n", "If you use AzureOpenAI, specify these environment variables" ] }, { "cell_type": "code", "execution_count": null, "id": "5f85ee6f-49f0-419b-89a0-5f02e0f96200", "metadata": {}, "outputs": [], "source": [ "os.environ['AZURE_ENDPOINT'] = ... # Format: f\"https://YOUR_ENDPOINT.openai.azure.com\"\n", "os.environ['AZURE_DEPLOYMENT'] = ... # Your Azure OpenAI deployment here\n", "os.environ['OPENAI_API_VERSION'] = ...\n", "os.environ[\"AZURE_OPENAI_KEY\"] = ... # Your Azure OpenAI key here" ] }, { "cell_type": "markdown", "id": "2043b4f5-81a8-4886-ab9b-2ff6a794813a", "metadata": {}, "source": [ "## Overview\n", "\n", "AgentReview features a range of customizable variables, such as characteristics of reviewers, authors, area chairs (ACs), as well as the reviewing mechanisms " ] }, { "cell_type": "code", "execution_count": null, "id": "fed41214-73da-4c45-8760-55cb36f5ab9f", "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "Image(filename=\"../static/img/Overview.png\")" ] }, { "cell_type": "markdown", "id": "64a27407-67d6-4506-9f84-c0d1f6c752eb", "metadata": {}, "source": [ "## Review Pipeline\n", "\n", "The simulation adopts a structured, 5-phase pipeline (Section 2 in the [paper](https://arxiv.org/abs/2406.12708)):\n", "\n", "* **I. Reviewer Assessment.** Each manuscript is evaluated by three reviewers independently.\n", "* **II. Author-Reviewer Discussion.** Authors submit rebuttals to address reviewers' concerns;\n", "* **III. Reviewer-AC Discussion.** The AC facilitates discussions among reviewers, prompting updates to their initial assessments.\n", "* **IV. Meta-Review Compilation.** The AC synthesizes the discussions into a meta-review.\n", "* **V. Paper Decision.** The AC makes the final decision on whether to accept or reject the paper, based on all gathered inputs." ] }, { "cell_type": "code", "execution_count": null, "id": "f579fe52-2ced-408b-88a1-1b0c5da880f5", "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "Image(filename=\"../static/img/ReviewPipeline.png\")" ] }, { "cell_type": "code", "execution_count": 1, "id": "274cc233-051a-444a-8170-a8b3acd30c80", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Changing the current working directory to AgentReview\n" ] } ], "source": [ "import os\n", "\n", "if os.path.basename(os.getcwd()) == \"notebooks\":\n", " os.chdir(\"..\")\n", "# Change the working directory to AgentReview\n", "print(f\"Changing the current working directory to {os.path.basename(os.getcwd())}\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "664d2ade-94cb-44cc-9460-ba4092b8f311", "metadata": {}, "outputs": [], "source": [ "from argparse import Namespace\n", "\n", "args = Namespace(openai_key=None, \n", " deployment=None, \n", " openai_client_type='azure_openai', \n", " endpoint=None, \n", " api_version='2023-03-15-preview', \n", " ac_scoring_method='ranking', \n", " conference='ICLR2024', \n", " num_reviewers_per_paper=3, \n", " ignore_missing_metareviews=False, \n", " overwrite=False, \n", " num_papers_per_area_chair=10, \n", " model_name='gpt-4o', \n", " output_dir='outputs', \n", " max_num_words=16384, \n", " visual_dir='outputs/visual', \n", " device='cuda', \n", " data_dir='./data', # Directory to all paper PDF\n", " acceptance_rate=0.32, \n", " skip_logging=True, # If set, we do not log the messages in the console.\n", " task='paper_review')" ] }, { "cell_type": "code", "execution_count": 3, "id": "114d4525-3f47-4e2e-b91e-f7513ec4fa0e", "metadata": {}, "outputs": [], "source": [ "malicious_Rx1_setting = {\n", " \"AC\": [\n", " \"BASELINE\"\n", " ],\n", "\n", " \"reviewer\": [\n", " \"malicious\",\n", " \"BASELINE\",\n", " \"BASELINE\"\n", " ],\n", "\n", " \"author\": [\n", " \"BASELINE\"\n", " ],\n", " \"global_settings\":{\n", " \"provides_numeric_rating\": ['reviewer', 'ac'],\n", " \"persons_aware_of_authors_identities\": []\n", " }\n", "}\n", "\n", "all_settings = {\"malicious_Rx1\": malicious_Rx1_setting}\n", "args.experiment_name = \"malicious_Rx1\"" ] }, { "cell_type": "markdown", "id": "9e706786-4e0c-48f8-8d71-e1bbefeb1d8f", "metadata": {}, "source": [ "\n", "`malicious_Rx1` means 1 reviewer is a malicious reviewer, and the other reviewers are default (i.e. `BASELINE`) reviewers.\n", "\n" ] }, { "cell_type": "markdown", "id": "15ffecd4-4718-492e-b897-a5cceb6f3b6e", "metadata": {}, "source": [ "## Reviews\n", "\n", "Define the review pipeline" ] }, { "cell_type": "code", "execution_count": 4, "id": "4e22ff91-d72a-412f-8c8d-52b9251ff566", "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "import numpy as np\n", "\n", "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), \"agentreview\")))\n", "\n", "from agentreview.environments import PaperReview\n", "from agentreview.paper_review_arena import PaperReviewArena\n", "from agentreview.paper_review_settings import get_experiment_settings\n", "from agentreview.utility.experiment_utils import initialize_players\n", "from agentreview.utility.utils import project_setup, get_paper_decision_mapping\n", " \n", "from agentreview import const" ] }, { "cell_type": "code", "execution_count": 5, "id": "e0b7658b-742f-46d7-858a-684f3d8ce8ad", "metadata": {}, "outputs": [], "source": [ "def review_one_paper(paper_id, setting):\n", " args.task = \"paper_review\"\n", " paper_decision = paper_id2decision[paper_id]\n", "\n", " experiment_setting = get_experiment_settings(paper_id=paper_id,\n", " paper_decision=paper_decision,\n", " setting=setting)\n", " print(f\"Paper ID: {paper_id} (Decision in {args.conference}: {paper_decision})\")\n", "\n", " players = initialize_players(experiment_setting=experiment_setting, args=args)\n", "\n", " player_names = [player.name for player in players]\n", "\n", " env = PaperReview(player_names=player_names, paper_decision=paper_decision, paper_id=paper_id,\n", " args=args, experiment_setting=experiment_setting)\n", "\n", " arena = PaperReviewArena(players=players, environment=env, args=args)\n", " arena.launch_cli(interactive=False)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ed7cd3a8-bd7a-4c21-bfc7-fc7982a13a0c", "metadata": { "scrolled": true }, "outputs": [], "source": [ "sampled_paper_ids = [39]\n", "sampled_paper_ids = [39, 247, 289, 400]\n", "\n", "paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)\n", "\n", "for paper_id in sampled_paper_ids:\n", " review_one_paper(paper_id, malicious_Rx1_setting)" ] }, { "cell_type": "markdown", "id": "de642af7-af85-46a8-9570-3dd599223d00", "metadata": {}, "source": [ "Note: Sometimes metareview fails to load due to content filtering. We thus use `experimental_paper_ids` to track the paper IDs that were actually used in the experiment." ] }, { "cell_type": "code", "execution_count": 6, "id": "f1814583-3221-4a2a-a141-f20f5aae5906", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shuffling paper IDs\n", "[1 3 0 2]\n", "[247 400 39 289]\n", "247\n", "400\n", "39\n", "289\n", "TODO\n", "[247, 400, 39, 289] 2\n" ] }, { "data": { "text/html": [ "
\n", " _ _____ _ \n", " /\\ | | | __ \\ (_) \n", " / \\ __ _ ___ _ __ | |_| |__) |_____ ___ _____ __\n", " / /\\ \\ / _` |/ _ \\ '_ \\| __| _ // _ \\ \\ / / |/ _ \\ \\ /\\ / /\n", " / ____ \\ (_| | __/ | | | |_| | \\ \\ __/\\ V /| | __/\\ V V / \n", " /_/ \\_\\__, |\\___|_| |_|\\__|_| \\_\\___| \\_/ |_|\\___| \\_/\\_/ \n", " __/ | \n", " |___/ \n", "\n", "\n" ], "text/plain": [ "\n", "\u001b[1;38;5;166m _ _____ _ \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ | | | __ \\ \u001b[0m\u001b[1;38;5;166m(\u001b[0m\u001b[1;38;5;166m_\u001b[0m\u001b[1;38;5;166m)\u001b[0m\u001b[1;38;5;166m \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \\ __ _ ___ _ __ | |_| |__\u001b[0m\u001b[1;38;5;166m)\u001b[0m\u001b[1;38;5;166m |_____ ___ _____ __\u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _` |\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ '_ \\| __| _ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m |\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m ____ \\ \u001b[0m\u001b[1;38;5;166m(\u001b[0m\u001b[1;38;5;166m_| | __/ | | | |_| | \\ \\ __/\\ V \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m| | __/\\ V V \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/_/\u001b[0m\u001b[1;38;5;166m \\_\\__, |\\___|_| |_|\\__|_| \\_\\___| \\_/ |_|\\___| \\_/\\_/ \u001b[0m\n", "\u001b[1;38;5;166m __/ | \u001b[0m\n", "\u001b[1;38;5;166m |___/ \u001b[0m\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
🎓AgentReview Initialized!\n",
"
\n"
],
"text/plain": [
"\u001b[1;32m🎓AgentReview Initialized!\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"name_to_color: {'AC': 'blue'}\n"
]
},
{
"data": {
"text/html": [
"Environment (paper_decision) description:\n",
"This is a realistic simulation of academic peer review.\n",
"
\n"
],
"text/plain": [
"\u001b[1;4;32mEnvironment \u001b[0m\u001b[1;4;32m(\u001b[0m\u001b[1;4;32mpaper_decision\u001b[0m\u001b[1;4;32m)\u001b[0m\u001b[1;4;32m description:\u001b[0m\n",
"This is a realistic simulation of academic peer review.\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\u001b[32m\n", "========= Arena Start! ==========\n", "\u001b[0m\n", "\n" ], "text/plain": [ "\u001b\u001b[1m[\u001b[0m32m\n", "========= Arena Start! ==========\n", "\u001b\u001b[1m[\u001b[0m0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\u001b[34m[AC->all]: Paper ID: 400\n", "Willingness to accept: 1\n", "Paper ID: 247\n", "Willingness to accept: 2\u001b[0m\n", "\n" ], "text/plain": [ "\u001b\u001b[1m[\u001b[0m34m\u001b[1m[\u001b[0mAC->all\u001b[1m]\u001b[0m: Paper ID: \u001b[1;36m400\u001b[0m\n", "Willingness to accept: \u001b[1;36m1\u001b[0m\n", "Paper ID: \u001b[1;36m247\u001b[0m\n", "Willingness to accept: \u001b[1;36m2\u001b[0m\u001b\u001b[1m[\u001b[0m0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
"========= Arena Ended! ==========\n",
"\n",
"
\n"
],
"text/plain": [
"\n",
"\u001b[1;31m========= Arena Ended! ==========\u001b[0m\n",
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded 4 batches of existing AC decisions from outputs/decisions/ICLR2024/gpt-4o/decisions_thru_ranking/decision_malicious_Rx1.json\n"
]
},
{
"data": {
"text/html": [
"\n", " _ _____ _ \n", " /\\ | | | __ \\ (_) \n", " / \\ __ _ ___ _ __ | |_| |__) |_____ ___ _____ __\n", " / /\\ \\ / _` |/ _ \\ '_ \\| __| _ // _ \\ \\ / / |/ _ \\ \\ /\\ / /\n", " / ____ \\ (_| | __/ | | | |_| | \\ \\ __/\\ V /| | __/\\ V V / \n", " /_/ \\_\\__, |\\___|_| |_|\\__|_| \\_\\___| \\_/ |_|\\___| \\_/\\_/ \n", " __/ | \n", " |___/ \n", "\n", "\n" ], "text/plain": [ "\n", "\u001b[1;38;5;166m _ _____ _ \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ | | | __ \\ \u001b[0m\u001b[1;38;5;166m(\u001b[0m\u001b[1;38;5;166m_\u001b[0m\u001b[1;38;5;166m)\u001b[0m\u001b[1;38;5;166m \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \\ __ _ ___ _ __ | |_| |__\u001b[0m\u001b[1;38;5;166m)\u001b[0m\u001b[1;38;5;166m |_____ ___ _____ __\u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _` |\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ '_ \\| __| _ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m |\u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m _ \\ \\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m\\ \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m ____ \\ \u001b[0m\u001b[1;38;5;166m(\u001b[0m\u001b[1;38;5;166m_| | __/ | | | |_| | \\ \\ __/\\ V \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m| | __/\\ V V \u001b[0m\u001b[1;35m/\u001b[0m\u001b[1;38;5;166m \u001b[0m\n", "\u001b[1;38;5;166m \u001b[0m\u001b[1;35m/_/\u001b[0m\u001b[1;38;5;166m \\_\\__, |\\___|_| |_|\\__|_| \\_\\___| \\_/ |_|\\___| \\_/\\_/ \u001b[0m\n", "\u001b[1;38;5;166m __/ | \u001b[0m\n", "\u001b[1;38;5;166m |___/ \u001b[0m\n", "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
🎓AgentReview Initialized!\n",
"
\n"
],
"text/plain": [
"\u001b[1;32m🎓AgentReview Initialized!\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"name_to_color: {'AC': 'blue'}\n"
]
},
{
"data": {
"text/html": [
"Environment (paper_decision) description:\n",
"This is a realistic simulation of academic peer review.\n",
"
\n"
],
"text/plain": [
"\u001b[1;4;32mEnvironment \u001b[0m\u001b[1;4;32m(\u001b[0m\u001b[1;4;32mpaper_decision\u001b[0m\u001b[1;4;32m)\u001b[0m\u001b[1;4;32m description:\u001b[0m\n",
"This is a realistic simulation of academic peer review.\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\u001b[32m\n", "========= Arena Start! ==========\n", "\u001b[0m\n", "\n" ], "text/plain": [ "\u001b\u001b[1m[\u001b[0m32m\n", "========= Arena Start! ==========\n", "\u001b\u001b[1m[\u001b[0m0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\u001b[34m[AC->all]: Paper ID: 289\n", "Willingness to accept: 1\n", "Paper ID: 39\n", "Willingness to accept: 2\u001b[0m\n", "\n" ], "text/plain": [ "\u001b\u001b[1m[\u001b[0m34m\u001b[1m[\u001b[0mAC->all\u001b[1m]\u001b[0m: Paper ID: \u001b[1;36m289\u001b[0m\n", "Willingness to accept: \u001b[1;36m1\u001b[0m\n", "Paper ID: \u001b[1;36m39\u001b[0m\n", "Willingness to accept: \u001b[1;36m2\u001b[0m\u001b\u001b[1m[\u001b[0m0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
"========= Arena Ended! ==========\n",
"\n",
"
\n"
],
"text/plain": [
"\n",
"\u001b[1;31m========= Arena Ended! ==========\u001b[0m\n",
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded 5 batches of existing AC decisions from outputs/decisions/ICLR2024/gpt-4o/decisions_thru_ranking/decision_malicious_Rx1.json\n"
]
}
],
"source": [
"from agentreview.environments import PaperDecision\n",
"from agentreview.utility.utils import project_setup, get_paper_decision_mapping, \\\n",
" load_metareview, load_llm_ac_decisions_as_array\n",
"\n",
"args.task = \"paper_decision\"\n",
"\n",
"sampled_paper_ids = [39, 247, 289, 400]\n",
"\n",
"# Make sure the same set of papers always go through the same AC no matter which setting we choose\n",
"NUM_PAPERS = len(sampled_paper_ids)\n",
"order = np.random.choice(range(NUM_PAPERS), size=NUM_PAPERS, replace=False)\n",
"\n",
"\n",
"# Paper IDs we actually used in experiments\n",
"experimental_paper_ids = []\n",
"\n",
"# For papers that have not been decided yet, load their metareviews\n",
"metareviews = []\n",
"print(\"Shuffling paper IDs\")\n",
"print(order)\n",
"sampled_paper_ids = np.array(sampled_paper_ids)[order]\n",
"\n",
"print(sampled_paper_ids)\n",
"for paper_id in sampled_paper_ids:\n",
" print(paper_id)\n",
" # Since we are feeding a batch of paper, the paper_id and paper_decision fields \n",
" # are not specific to one paper, thus left None\n",
" experiment_setting = get_experiment_settings(paper_id=None,\n",
" paper_decision=None,\n",
" setting=all_settings[args.experiment_name])\n",
"\n",
" # Load meta-reviews\n",
" metareview = load_metareview(output_dir=args.output_dir, paper_id=paper_id,\n",
" experiment_name=args.experiment_name,\n",
" model_name=args.model_name, conference=args.conference)\n",
"\n",
" if metareview is None:\n",
"\n",
" print(f\"Metareview for {paper_id} does not exist. This may happen because the conversation is \"\n",
" f\"completely filtered out due to content policy. \"\n",
" f\"Loading the BASELINE metareview...\")\n",
"\n",
" metareview = load_metareview(paper_id=paper_id, experiment_name=\"BASELINE\",\n",
" model_name=args.model_name, conference=args.conference)\n",
" print(metareview)\n",
"\n",
" if metareview is not None:\n",
" metareviews += [metareview]\n",
" experimental_paper_ids += [paper_id]\n",
"\n",
"print(\"TODO\")\n",
"args.num_papers_per_area_chair = 2\n",
"num_batches = len(experimental_paper_ids) // args.num_papers_per_area_chair\n",
"print(experimental_paper_ids, num_batches)\n",
"\n",
"for batch_index in range(num_batches):\n",
"\n",
" players = initialize_players(experiment_setting=experiment_setting, args=args)\n",
" player_names = [player.name for player in players]\n",
"\n",
" if batch_index >= num_batches - 1: # Last batch. Include all remaining papers\n",
" batch_paper_ids = experimental_paper_ids[batch_index * args.num_papers_per_area_chair:]\n",
"\n",
" else:\n",
" batch_paper_ids = experimental_paper_ids[batch_index * args.num_papers_per_area_chair: (batch_index + 1) *\n",
" args.num_papers_per_area_chair]\n",
"\n",
" env = PaperDecision(player_names=player_names, paper_ids=batch_paper_ids,\n",
" metareviews=metareviews,\n",
" experiment_setting=experiment_setting, ac_scoring_method=args.ac_scoring_method)\n",
"\n",
" arena = PaperReviewArena(players=players, environment=env, args=args, global_prompt=const.GLOBAL_PROMPT)\n",
" arena.launch_cli(interactive=False)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0a3eb359-2814-49ac-bf2b-b13219ddb3e3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"==============================\n",
"Experiment Name: malicious_Rx1\n",
"Loaded 2 batches of existing AC decisions from outputs/decisions/ICLR2024/gpt-4o/decisions_thru_ranking/decision_malicious_Rx1.json\n"
]
}
],
"source": [
"decisions, paper_ids = load_llm_ac_decisions_as_array(output_dir=args.output_dir, conference=args.conference, \n",
" model_name=args.model_name,\n",
" ac_scoring_method=args.ac_scoring_method,\n",
" experiment_name=args.experiment_name,\n",
" acceptance_rate=args.acceptance_rate,\n",
" num_papers_per_area_chair=args.num_papers_per_area_chair)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "72fe60ab-8324-4632-b84b-ac2a7b560a5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"39\tReject\n",
"247\tReject\n",
"289\tReject\n",
"400\tAccept\n"
]
}
],
"source": [
"for paper_id, decision in zip(paper_ids, decisions):\n",
" print(f\"{paper_id}\\t{'Accept' if decision else 'Reject'}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2805ab5-efcb-4770-8c54-f6a7d787fa55",
"metadata": {},
"outputs": [],
"source": [
"num_batches = len(experimental_paper_ids) // args.num_papers_per_area_chair"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3da97ae2-33fa-4399-a1bc-8356ac65f243",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}