I develop a lot of websites for myself and clients. I need a simple app that crawls a website and examines the code on every web page and documents the links on the page.
The goal is to document 1) which web pages link to which web pages, 2) whether the link is a follow or no-follow link, 3) how the page opens in a window, 4) the link text used, and 5) whether it's an internal or external link.
I need a desktop application/script with the following features:
--- Crawl WordPress (PHP) sites and HTML sites. If it can spider other platforms, that would be a bonus.
--- Produce a CSV file that I can format to look like the attached file.
--- The app needs to be a desktop app that runs on Windows XP/7/8. If it can run on MACs that would be great.
--- Spider/crawl at least 100 pages. I would prefer if it could handle more pages.
--- What programming language you will use to develop the script?
--- Will the user of this script have to download any software (like .net or Adobe Air, etc.) to use your program. I strongly prefer that they don't have to download any special software if possible.
--- Will you make any changes needed until it works 100%.
--- Have you written web crawlers before?
See attached file for the kind of data that needs to be captured in a CSV file.
I will retain all rights to the script when completed.
This app is not intended for anything illegal or unethical, I just want to quickly document how a site does its linking, because I specialize in SEO and it takes a lot of time to do this manually.