Spaces:
Sleeping
Sleeping
Upload readme.txt
Browse filesRead the readme.txt or something!
- readme.txt +47 -0
readme.txt
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Synacknetwork.com's web crawler!
|
2 |
+
|
3 |
+
This is a web crawler that scans websites for specified keywords and saves the found URLs in real-time. The crawler simulates both a Googlebot user agent and random user agents for a more human-like browsing experience.
|
4 |
+
(20% of the time is uses google-bot, other 80% it will use random UA)
|
5 |
+
|
6 |
+
## Features
|
7 |
+
- Crawl websites for specified keywords
|
8 |
+
- Use a Googlebot user agent and random user agents to mimic real user behavior
|
9 |
+
- Display found URLs and matched keywords in real-time in the UI
|
10 |
+
- Save crawled data to a local file and database (local only I'm not sure about HF and Gradio, maybe I'll add a feature later to download found.txt from the UI, if you want this feature msg me or visit synacknetwork and let me know)
|
11 |
+
|
12 |
+
## Usage
|
13 |
+
1. Enter the starting URL.
|
14 |
+
2. Enter keywords (comma-separated) that the crawler should look for.
|
15 |
+
3. Choose the crawl depth (the number of levels deep to crawl is 5 max for safety, add more if you want to edit the code on your own space).
|
16 |
+
|
17 |
+
## Installation
|
18 |
+
Make sure the following Python packages are installed:
|
19 |
+
- `requests`
|
20 |
+
- `beautifulsoup4`
|
21 |
+
- `fake-useragent`
|
22 |
+
- `gradio`
|
23 |
+
|
24 |
+
These are listed in `requirements.txt`, which will be installed automatically when you run the Space. (DO NOT install sqllite, python handles this on its own, and it will throw errors)
|
25 |
+
|
26 |
+
## How It Works
|
27 |
+
The crawler simulates both a Googlebot user agent and random user agents for most requests. It checks for the specified keywords in the content of crawled pages and displays the URLs where the keywords were found.
|
28 |
+
|
29 |
+
## Example
|
30 |
+
- **Start URL**: `https://let's-find-some-cool-shit.common`
|
31 |
+
- **Keywords**: `GTA 6 videos, ufo, private videos, my webcam, my files, etc.`
|
32 |
+
- **Depth**: `3'
|
33 |
+
|
34 |
+
The results will be shown in real-time as the crawler scans through the pages.
|
35 |
+
|
36 |
+
## Troubleshooting
|
37 |
+
- If you encounter any issues with missing dependencies, make sure your `requirements.txt` is up-to-date.
|
38 |
+
- For any errors related to the crawling process, check the logs (accessible in the Space environment).
|
39 |
+
|
40 |
+
# Thanks for checking my crawler out!
|
41 |
+
I made this mainly to look for Bigfoot, paranormal, and UFO/UAP videos for my YouTube channel. I also used it to find some ring cam videos people have on there personal servers, and they ARE out there play around with keywords have fun!
|
42 |
+
contact us @ synacknetwork.com
|
43 |
+
have any ideas you would like added
|
44 |
+
please let us know....
|
45 |
+
|
46 |
+
|
47 |
+
|