File size: 15,354 Bytes
0b43703 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Summarize Youtube videos into text"
],
"metadata": {
"id": "Jhu3KrV9Ru1t"
}
},
{
"cell_type": "markdown",
"source": [
"### Installing necessary libraries\n"
],
"metadata": {
"id": "PD3P82YJRIzH"
}
},
{
"cell_type": "code",
"source": [
"%pip install git+https://github.com/openai/whisper.git"
],
"metadata": {
"id": "JfPAYngLYiJP"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%pip install pytube\n",
"%pip install openai\n",
"%pip install numpy"
],
"metadata": {
"id": "FX8cTWYpAWCQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Downloading audio clip from the YouTube video"
],
"metadata": {
"id": "kPp1Gx_GGVmv"
}
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "jbeQ3nGRABqH"
},
"outputs": [],
"source": [
"from pytube import YouTube\n",
"import whisper\n",
"import os\n",
"import subprocess"
]
},
{
"cell_type": "code",
"source": [
"def download_youtube_audio(url, destination):\n",
" # Create a YouTube object\n",
" yt = YouTube(url)\n",
"\n",
" # Select the audio stream\n",
" audio_stream = yt.streams.filter(only_audio=True).first()\n",
"\n",
" # Download the audio stream\n",
" out_file = audio_stream.download(output_path=destination)\n",
"\n",
" # Set up new filename\n",
" base, ext = os.path.splitext(out_file)\n",
" new_file = base + '.mp3'\n",
"\n",
" # Convert file to mp3\n",
" subprocess.run(['ffmpeg', '-i', out_file, new_file])\n",
"\n",
" # Remove the original file\n",
" os.remove(out_file)\n",
"\n",
" print(f\"Downloaded and converted to MP3: {new_file}\")\n",
" return new_file"
],
"metadata": {
"id": "TIpMJJshAR7m"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Input Youtube link"
],
"metadata": {
"id": "F-uidWN5RP0R"
}
},
{
"cell_type": "code",
"source": [
"url = input(\"Enter the YouTube URL: \")\n",
"\n",
"#url = 'https://www.youtube.com/watch?v=reUZRyXxUs4' # as test"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9nCkN_X_As1l",
"outputId": "db1e5e14-0ef9-4ed1-fa85-7cced9b42eab"
},
"execution_count": 72,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter the YouTube URL: https://youtu.be/4o5hSxvN_-s?si=6ZcSvt69baVOjYBn\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# Set the destination path for the download\n",
"destination = \"audiofiles/\"\n",
"\n",
"file_path = download_youtube_audio(url, destination)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "iNdsqUCXA8nf",
"outputId": "a401b201-3710-4465-d4ab-cc6f8df28265"
},
"execution_count": 73,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloaded and converted to MP3: /content/audiofiles/This is what happens when you reply to spam email l TED.mp3\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### Converting the audio into text using **Whisper 1**"
],
"metadata": {
"id": "90eokVCvGgGX"
}
},
{
"cell_type": "code",
"source": [
"def write_text_to_file(text, filename=\"transcribed_text.txt\"):\n",
" # Write the text to the file\n",
" with open(filename, \"w\") as file:\n",
" file.write(text)"
],
"metadata": {
"id": "6fzySYbSMLgT"
},
"execution_count": 74,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import whisper\n",
"\n",
"model = whisper.load_model(\"base\")\n",
"result = model.transcribe(file_path)\n",
"print(result[\"text\"])\n",
"\n",
"transcription = result[\"text\"]"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oJlcn4NJa3uR",
"outputId": "a34c89eb-3d86-409b-d6a6-b16c40195a77"
},
"execution_count": 75,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" I Three years ago I got one of those spam emails and It managed to get through my spam filter not quite sure how but it's telling me my inbox and it was from a guy called Solomon Odonka I know And one like this said hello James Veech. I have an interesting business proposal. I want to share with you Solomon Now my hand was kind of hovering on the delete button. I see why I was looking at my phone. I thought I could just delete this Or I could do what I think we've all Always wanted to do And I said Solomon your email intrigues me And the game was a foot He said dear James Veech we shall be shipping gold to you You will earn 10% of any gold you distribute So I knew I was dealing with a professional I said how much is it worth He said we will start with a smaller quantity. I was like oh, and then he said of 25 kilograms The worst should be about 2.5 million I just saw him and if you're gonna do it, let's go big I can handle it How much gold do you have He said it's not a matter of how much gold how much is your cable if handling? What we can start with 50 kilograms as a trial shipment. I said 50 kilograms There's no point doing this at all. No such as you're being at least a metric ton So what do you do for a living? I Said I'm a hedge fund executive manager This isn't the first time I've shit booty in my friend no no no Then I started the panic I was down. I look where are you based? I don't know about you, but I think we're going by the postal service. It ought to be sign for Right, that's a lot of gold He said oh not be easily convinced my company to do a larger quantity shipment My said Solomon I completely with you on this one. I'm putting together a visual For you's taken for the board meeting whole time This is what I sent Solomon I don't know if we have any statisticians in in the house, but there's definitely something going on I Said a song on attach a email you'll find a helpful chart I've had one of my assistants run the numbers We need to be as much gold as possible There's always a moment where they try to take your heartstrings and this was it for Solomon He said I will be so much happy if the deal goes well because I'm going to get a very good commission as well And I said that's amazing. What are you gonna spend your cut on? And he said on real estate. What about you? I Thought about it for a long time I Said one word It's going place I Was in Saint-Mizard, they know like 30 different varieties also you can call up carrots and you can dip them Have you ever done that? You said I've to go to bed now Till morrow Have sweet dream I didn't know what to say I Said boss war my golden nugget boss war Guys you have to understand this will be going for like weeks or be a hitherto the greatest weeks of my life But I had to knock it on the head it was getting about a hand friends were saying to me James You want to come out for a drink I was like I can't me. I'm expecting email about some gold So I figured I had to knock it on the head. I take it to a ridiculous conclusion. So I thought I can cut to the plan. I said look Solomon Solomon I'm concerned about security when we email each other we need to use a code And you agree Nice it's Solomon I spend all night coming up with this code we need to use an all-father correspondence lawyer Gummy bear Bank cream egg Legal physicala bottle came peanut amends documents jelly beans western union guys a giant gummy lizard I knew these were all words they used right I said Please gonna kick out an all-father corresponded I Didn't hit back after I've gone too far I've gone too far so I had to I had to back pedal a little bit I said look Solomon is the deal still on Kit-cat Because you have to be consistent Then I did get any more back from him. He said the business all I'm traveling a bird of that I said dude you have to use the code What Follow this the greatest email I've ever received I'm not joking. This is what turn up in my inbox. This is a good day The business is on And I'm trying to raise the balance for the gummy bear So So he could submit all the needed physicala bottle jelly beans to the cream egg For the peanut amends process to start 7500 pounds via a giant Gummy living And that was so much fun right that it got me thinking like what would happen if I just felt as much as I could Replying to as many scam emails as I could And that's what I've been doing For three years on your behalf Let me tell you crazy stuff happens when you start replying to scam emails It's really difficult and I highly recommend we do I don't think what I'm doing is me right? You know there are a lot of people out there who do mean things to scams I don't think what I'm doing all I think it's all I'm doing is wasting their time And I think anytime they're spending with me is time they're not spending scamming vulnerable adults out of their savings right And if you're gonna do this and I highly recommend you do Get yourself a pseudonym of email address. Don't use your own email address because that's exactly what I was doing at the start It was a nightmare because you know, I'd wake up in the morning and have like thousand emails about penis enlargement You know only one of which was a legitimate Response to a medical question I'd have I tell you what though guys I tell you what any days of good day Any days of good day if you receive an email the begins like this I am Winnie Mandela The second wife of Nelson Mandela the former South African president. I was like oh that Winnie Mandela I Know so many I need to transfer 45 million dollars out of the country because of my husband Nelson's health condition Let that sink in She sent me this which is hysterical And this and this is fairly legitimate. This is a letter authorization But to be honest if there's nothing written on it. It's just a shape I say Winnie I'm really sorry to hear of this given that Nelson died three months ago. I describe his health conditioners That's a worse health condition you can have not being alive She said kindly comply with my bankers instructions one love I Said of course no woman no cry She said my bank would eat transfer of three thousand dollars one love I said no problem I should share it Thank you\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# The response object will contain the transcription\n",
"write_text_to_file(transcription)"
],
"metadata": {
"id": "R3ttiOCdI94t"
},
"execution_count": 76,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Converting transcript into summarized text"
],
"metadata": {
"id": "_vQ7VqW1RgR7"
}
},
{
"cell_type": "code",
"source": [
"preprompt = 'You are a model that receives a transcription of a YouTube video. Your task is to correct any words that may be incorrect based on the context, and transform it into a well-structured summary of the entire video. Your summary should highlight important details and provide additional context when necessary. Aim to be detailed, particularly when addressing non-trivial aspects of the content. The summary should encompass at least 20-30% of the original text length.'\n",
"prompt = preprompt + transcription"
],
"metadata": {
"id": "FtrdIAbhDZSr"
},
"execution_count": 77,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from openai import OpenAI\n",
"from google.colab import userdata\n",
"\n",
"client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))\n",
"\n",
"response = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": preprompt},\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ]\n",
")\n",
"\n",
"# The 'response' will contain the completion from the model\n",
"print(response.choices[0].message.content)\n",
"write_text_to_file(response.choices[0].message.content, filename='summary.txt')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pYXkw7qdMqZI",
"outputId": "2d6b3edb-47ef-4930-d544-5b26e8d1b555"
},
"execution_count": 78,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Three years ago, the creator received a spam email from someone named Solomon Odonka, offering a business proposal involving shipping gold with a 10% distribution earnings. Initially skeptical, the creator decided to engage with Solomon out of curiosity. Their conversation escalated from discussing smaller quantities of gold to trial shipments of 50 kilograms, with Solomon expressing the intent to conduct larger transactions. The creator, posing as a hedge fund executive, continued the correspondence, even creating elaborate codes for security. The interaction with Solomon led the creator to reflect on the consequences of scamming vulnerable individuals and the value of wasting scammers' time. The creator recommended using a pseudonymous email address to avoid inundation with spam emails. They humorously shared experiences of engaging with various scam emails, including one claiming to be from Winnie Mandela seeking assistance in transferring funds. Despite the absurdity of the interactions, the creator maintained a light-hearted and playful approach throughout the discussion.\n"
]
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "jqVM1469JdNB"
},
"execution_count": null,
"outputs": []
}
]
} |