{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Foong_Coding Challenge for Fatima Fellowship",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "eBpjBBZc6IvA"
},
"source": [
"# Fatima Fellowship Quick Coding Challenge (Pick 1)\n",
"\n",
"Thank you for applying to the Fatima Fellowship. To help us select the Fellows and assess your ability to do machine learning research, we are asking that you complete a short coding challenge. Please pick **1 of these 5** coding challenges, whichever is most aligned with your interests. \n",
"\n",
"**Due date: 1 week**\n",
"\n",
"**How to submit**: Please make a copy of this colab notebook, add your code and results, and submit your colab notebook to the submission link below. If you have never used a colab notebook, [check out this video](https://www.youtube.com/watch?v=i-HnvsehuSw).\n",
"\n",
"**Submission link**: https://airtable.com/shrXy3QKSsO2yALd3"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sFU9LTOyMiMj"
},
"source": [
"# 2. Deep Learning for NLP\n",
"\n",
"**Fake news classifier**: Train a text classification model to detect fake news articles!\n",
"\n",
"* Download the dataset here: https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset\n",
"* Develop an NLP model for classification that uses a pretrained language model\n",
"* Finetune your model on the dataset, and generate an AUC curve of your model on the test set of your choice. \n",
"* [Upload the the model to the Hugging Face Hub](https://huggingface.co./docs/hub/adding-a-model), and add a link to your model below.\n",
"* *Answer the following question*: Look at some of the news articles that were classified incorrectly. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)"
]
},
{
"cell_type": "code",
"source": [
"### WRITE YOUR CODE TO TRAIN THE MODEL HERE\n",
"import numpy as np\n",
"import pandas as pd\n",
"import csv\n",
"from sklearn.metrics import accuracy_score, precision_recall_fscore_support\n",
"\n"
],
"metadata": {
"id": "E90i018KyJH3"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Data Loading"
],
"metadata": {
"id": "HUDOBz2tRivY"
}
},
{
"cell_type": "code",
"source": [
"real_news = pd.read_csv(\"True.csv\", sep=',', engine='python', encoding='utf8',on_bad_lines='skip')\n",
"fake_news = pd.read_csv(\"Fake.csv\", sep=',', engine='python', encoding='utf8',on_bad_lines='skip')\n",
"\n",
"print(\"real_news: \" + str(real_news.shape))\n",
"print(\"fake_news: \" + str(fake_news.shape))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "d60sCvRjOSWa",
"outputId": "99813f74-971d-41e2-8597-4913ca131fe1"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"real_news: (21417, 4)\n",
"fake_news: (14568, 4)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"fake_news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "ywYW2xTuOVGy",
"outputId": "2e442a61-4634-4965-a6f7-822896f45dbb"
},
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Donald Trump Sends Out Embarrassing New Year’... \n",
"1 Drunk Bragging Trump Staffer Started Russian ... \n",
"2 Sheriff David Clarke Becomes An Internet Joke... \n",
"3 Trump Is So Obsessed He Even Has Obama’s Name... \n",
"4 Pope Francis Just Called Out Donald Trump Dur... \n",
"\n",
" text subject \\\n",
"0 Donald Trump just couldn t wish all Americans ... News \n",
"1 House Intelligence Committee Chairman Devin Nu... News \n",
"2 On Friday, it was revealed that former Milwauk... News \n",
"3 On Christmas day, Donald Trump announced that ... News \n",
"4 Pope Francis used his annual Christmas Day mes... News \n",
"\n",
" date \n",
"0 December 31, 2017 \n",
"1 December 31, 2017 \n",
"2 December 30, 2017 \n",
"3 December 29, 2017 \n",
"4 December 25, 2017 "
],
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
text
\n",
"
subject
\n",
"
date
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Donald Trump Sends Out Embarrassing New Year’...
\n",
"
Donald Trump just couldn t wish all Americans ...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
\n",
"
\n",
"
1
\n",
"
Drunk Bragging Trump Staffer Started Russian ...
\n",
"
House Intelligence Committee Chairman Devin Nu...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
\n",
"
\n",
"
2
\n",
"
Sheriff David Clarke Becomes An Internet Joke...
\n",
"
On Friday, it was revealed that former Milwauk...
\n",
"
News
\n",
"
December 30, 2017
\n",
"
\n",
"
\n",
"
3
\n",
"
Trump Is So Obsessed He Even Has Obama’s Name...
\n",
"
On Christmas day, Donald Trump announced that ...
\n",
"
News
\n",
"
December 29, 2017
\n",
"
\n",
"
\n",
"
4
\n",
"
Pope Francis Just Called Out Donald Trump Dur...
\n",
"
Pope Francis used his annual Christmas Day mes...
\n",
"
News
\n",
"
December 25, 2017
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"source": [
"## Add labeling"
],
"metadata": {
"id": "ZghmfpC2SIVC"
}
},
{
"cell_type": "code",
"source": [
"fake_news['label'] = 0 \n",
"real_news['label'] = 1"
],
"metadata": {
"id": "rZ8pF-RtSJ6_"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"source": [
"fake_news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "CR_yBlbRR6R4",
"outputId": "f2eff41d-8cfc-44cf-d68c-313cb692fb45"
},
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Donald Trump Sends Out Embarrassing New Year’... \n",
"1 Drunk Bragging Trump Staffer Started Russian ... \n",
"2 Sheriff David Clarke Becomes An Internet Joke... \n",
"3 Trump Is So Obsessed He Even Has Obama’s Name... \n",
"4 Pope Francis Just Called Out Donald Trump Dur... \n",
"\n",
" text subject \\\n",
"0 Donald Trump just couldn t wish all Americans ... News \n",
"1 House Intelligence Committee Chairman Devin Nu... News \n",
"2 On Friday, it was revealed that former Milwauk... News \n",
"3 On Christmas day, Donald Trump announced that ... News \n",
"4 Pope Francis used his annual Christmas Day mes... News \n",
"\n",
" date label \n",
"0 December 31, 2017 0 \n",
"1 December 31, 2017 0 \n",
"2 December 30, 2017 0 \n",
"3 December 29, 2017 0 \n",
"4 December 25, 2017 0 "
],
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
text
\n",
"
subject
\n",
"
date
\n",
"
label
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Donald Trump Sends Out Embarrassing New Year’...
\n",
"
Donald Trump just couldn t wish all Americans ...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
Drunk Bragging Trump Staffer Started Russian ...
\n",
"
House Intelligence Committee Chairman Devin Nu...
\n",
"
News
\n",
"
December 31, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
Sheriff David Clarke Becomes An Internet Joke...
\n",
"
On Friday, it was revealed that former Milwauk...
\n",
"
News
\n",
"
December 30, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
Trump Is So Obsessed He Even Has Obama’s Name...
\n",
"
On Christmas day, Donald Trump announced that ...
\n",
"
News
\n",
"
December 29, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
Pope Francis Just Called Out Donald Trump Dur...
\n",
"
Pope Francis used his annual Christmas Day mes...
\n",
"
News
\n",
"
December 25, 2017
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"## Combine Real & Fake News into one dataframe"
],
"metadata": {
"id": "ZB2C1ImfSUUg"
}
},
{
"cell_type": "code",
"source": [
"news = pd.concat([real_news,fake_news],axis=0,ignore_index=True)\n",
"news = news.sample(frac = 1).reset_index(drop = True)\n",
"news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "RTifEXcHSQJ0",
"outputId": "d2e996c9-9068-4cfb-dbe0-1fb84c4b0b2f"
},
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Trump’s Involvement In Houston Chemical Plant... \n",
"1 OOPS! Media Forgot Ted Kennedy Asked Russia To... \n",
"2 OBAMA GIVES FINAL THOUGHTS On Trump Presidency... \n",
"3 CNN ANCHOR DON LEMON: A Republican Winning in ... \n",
"4 Trump Confirms He Thinks GOP Healthcare Bill ... \n",
"\n",
" text subject \\\n",
"0 In the aftermath of the historic flooding that... News \n",
"1 In 1991 a reporter for the London Times found ... politics \n",
"2 The Obama family ended their eight-year reside... politics \n",
"3 CNN anchor Don Lemon got snarky during reporti... politics \n",
"4 Trump got into a bizarre pissing match with fo... News \n",
"\n",
" date label \n",
"0 September 1, 2017 0 \n",
"1 Feb 16, 2017 0 \n",
"2 Jan 20, 2017 0 \n",
"3 Jun 21, 2017 0 \n",
"4 June 25, 2017 0 "
],
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
title
\n",
"
text
\n",
"
subject
\n",
"
date
\n",
"
label
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Trump’s Involvement In Houston Chemical Plant...
\n",
"
In the aftermath of the historic flooding that...
\n",
"
News
\n",
"
September 1, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
1
\n",
"
OOPS! Media Forgot Ted Kennedy Asked Russia To...
\n",
"
In 1991 a reporter for the London Times found ...
\n",
"
politics
\n",
"
Feb 16, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
OBAMA GIVES FINAL THOUGHTS On Trump Presidency...
\n",
"
The Obama family ended their eight-year reside...
\n",
"
politics
\n",
"
Jan 20, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
3
\n",
"
CNN ANCHOR DON LEMON: A Republican Winning in ...
\n",
"
CNN anchor Don Lemon got snarky during reporti...
\n",
"
politics
\n",
"
Jun 21, 2017
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
Trump Confirms He Thinks GOP Healthcare Bill ...
\n",
"
Trump got into a bizarre pissing match with fo...
\n",
"
News
\n",
"
June 25, 2017
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
]
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "code",
"source": [
"news['combine'] = news['title'] + ' ' + news['text']\n",
"news.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 337
},
"id": "N7QZ7Zk5VvDk",
"outputId": "1abb083b-33d5-4e82-a14f-7bd943231d9e"
},
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Trump’s Involvement In Houston Chemical Plant... \n",
"1 OOPS! Media Forgot Ted Kennedy Asked Russia To... \n",
"2 OBAMA GIVES FINAL THOUGHTS On Trump Presidency... \n",
"3 CNN ANCHOR DON LEMON: A Republican Winning in ... \n",
"4 Trump Confirms He Thinks GOP Healthcare Bill ... \n",
"\n",
" text subject \\\n",
"0 In the aftermath of the historic flooding that... News \n",
"1 In 1991 a reporter for the London Times found ... politics \n",
"2 The Obama family ended their eight-year reside... politics \n",
"3 CNN anchor Don Lemon got snarky during reporti... politics \n",
"4 Trump got into a bizarre pissing match with fo... News \n",
"\n",
" date label combine \n",
"0 September 1, 2017 0 Trump’s Involvement In Houston Chemical Plant... \n",
"1 Feb 16, 2017 0 OOPS! Media Forgot Ted Kennedy Asked Russia To... \n",
"2 Jan 20, 2017 0 OBAMA GIVES FINAL THOUGHTS On Trump Presidency... \n",
"3 Jun 21, 2017 0 CNN ANCHOR DON LEMON: A Republican Winning in ... \n",
"4 June 25, 2017 0 Trump Confirms He Thinks GOP Healthcare Bill ... "
],
"text/html": [
"\n",
"