Synacknetwork commited on
Commit
1b203be
·
verified ·
1 Parent(s): 1c43b20

Upload readme.txt

Browse files

Read the readme.txt or something!

Files changed (1) hide show
  1. readme.txt +47 -0
readme.txt ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Synacknetwork.com's web crawler!
2
+
3
+ This is a web crawler that scans websites for specified keywords and saves the found URLs in real-time. The crawler simulates both a Googlebot user agent and random user agents for a more human-like browsing experience.
4
+ (20% of the time is uses google-bot, other 80% it will use random UA)
5
+
6
+ ## Features
7
+ - Crawl websites for specified keywords
8
+ - Use a Googlebot user agent and random user agents to mimic real user behavior
9
+ - Display found URLs and matched keywords in real-time in the UI
10
+ - Save crawled data to a local file and database (local only I'm not sure about HF and Gradio, maybe I'll add a feature later to download found.txt from the UI, if you want this feature msg me or visit synacknetwork and let me know)
11
+
12
+ ## Usage
13
+ 1. Enter the starting URL.
14
+ 2. Enter keywords (comma-separated) that the crawler should look for.
15
+ 3. Choose the crawl depth (the number of levels deep to crawl is 5 max for safety, add more if you want to edit the code on your own space).
16
+
17
+ ## Installation
18
+ Make sure the following Python packages are installed:
19
+ - `requests`
20
+ - `beautifulsoup4`
21
+ - `fake-useragent`
22
+ - `gradio`
23
+
24
+ These are listed in `requirements.txt`, which will be installed automatically when you run the Space. (DO NOT install sqllite, python handles this on its own, and it will throw errors)
25
+
26
+ ## How It Works
27
+ The crawler simulates both a Googlebot user agent and random user agents for most requests. It checks for the specified keywords in the content of crawled pages and displays the URLs where the keywords were found.
28
+
29
+ ## Example
30
+ - **Start URL**: `https://let's-find-some-cool-shit.common`
31
+ - **Keywords**: `GTA 6 videos, ufo, private videos, my webcam, my files, etc.`
32
+ - **Depth**: `3'
33
+
34
+ The results will be shown in real-time as the crawler scans through the pages.
35
+
36
+ ## Troubleshooting
37
+ - If you encounter any issues with missing dependencies, make sure your `requirements.txt` is up-to-date.
38
+ - For any errors related to the crawling process, check the logs (accessible in the Space environment).
39
+
40
+ # Thanks for checking my crawler out!
41
+ I made this mainly to look for Bigfoot, paranormal, and UFO/UAP videos for my YouTube channel. I also used it to find some ring cam videos people have on there personal servers, and they ARE out there play around with keywords have fun!
42
+ contact us @ synacknetwork.com
43
+ have any ideas you would like added
44
+ please let us know....
45
+
46
+
47
+