Tinkers Projects

Imagine | Develop | Create

Spider

Categories: Software
SpiderMain

Spider

Categories: Software

For years now I have been interested in how different trends can come on the internet then disappear overnight. I have always wanted to make a tool to scan some websites on the internet to see what trending on the web and where that would take me.

I made a tool called Spider to try and do just this. this tool is just the start of a bigger tool to make a web crawler that looks for trends and targeted context/keywords.

How it works

This tool is written in Python with a MySQL database and web framework front end all installed on a Raspberry Pi. The Raspberry Pi is the best to have this tool running on and allows for low power and easy setup for compared to my other computers. The MySQL database is used to store the data gathered from each craw. The web framework allows for quick access to the gathered data and displays it in an easily readable format. The web framework can allow allows the user to change the setting, keywords and starting URLs.

The Python script begins the crawl at a starting URL and downloads the content for that page. The script will then search keywords and links to other pages, documents or websites. Once the script finishes the search, it saves the gathered data and links and continues into those links to gather more data. After the script exceeds a search limit or hits dead ends on all links, the database can be analyzed. This will show the trail of where the crawler started and went from each URL.

Sadly this tool could be used to do a denial-of-service attack on a server and work will be done to try and prevent it without slowing it down. This software has a bit more work to be done but currently is functional.

For years now I have been interested in how different trends can come on the internet then disappear overnight. I have always wanted to make a tool to scan some websites on the internet to see what trending on the web and where that would take me.

I made a tool called Spider to try and do just this. this tool is just the start of a bigger tool to make a web crawler that looks for trends and targeted context/keywords.

How it works

This tool is written in Python with a MySQL database and web framework front end all installed on a Raspberry Pi. The Raspberry Pi is the best to have this tool running on and allows for low power and easy setup for compared to my other computers. The MySQL database is used to store the data gathered from each craw. The web framework allows for quick access to the gathered data and displays it in an easily readable format. The web framework can allow allows the user to change the setting, keywords and starting URLs.

The Python script begins the crawl at a starting URL and downloads the content for that page. The script will then search keywords and links to other pages, documents or websites. Once the script finishes the search, it saves the gathered data and links and continues into those links to gather more data. After the script exceeds a search limit or hits dead ends on all links, the database can be analyzed. This will show the trail of where the crawler started and went from each URL.

Sadly this tool could be used to do a denial-of-service attack on a server and work will be done to try and prevent it without slowing it down. This software has a bit more work to be done but currently is functional.