Validation GraphsValidation Graphs are a method for visualizing a website's structure and the HTML validation status of its pages.
Based on my PageGraphGUI, I wrote a new tool, which spiders a website and validates HTML pages. The results are visualized as a graph, which is created with a real physics particle engine.
Again it uses some free Java libraries and some code snippets of the website as graph applet by aharef.info. The HTML content is validated with the public available HTML validator from the W3C.
Pages which deliver content-type text/html are validated by the W3C validator and the validation result is shown as an red or green outer circle around the node, where red means that the page contains validation errors. Yellow circles denote URLs which raised a server error, e.g. 404: Page not found. This is usually a result of broken links. The console output of the program lists all invalid pages and those with server errors.
Nodes can be clicked and dropped around.
How it worksBeginning with the URL given by the user, a HEAD request is done for each new URL. The response of this request contains the content-type and the HTTP status of the URL. If the content-type is text/html and the status code is valid (no 404 etc.), then the URL is fetched with a GET request and the received HTML file is parsed. All found outbound links in the HTML file will be undertaken the same procedure. If the status code is negative (e.g. 404) or the content-type is not text/html, then the page is not fetched with GET. After all links are extracted, the current page is passed to the validator thread. By default the link parser leaves out URLs which server name is not the server name given in the start URL, to avoid spidering the whole web. The default search depth is 3. These settings can be changed in the ValidationGraph.properties file.
It is not a graph, it is a TREEYes it is a tree, but trees are also graphs, just without cycles. ;-)
Actually it as a rooted DAG (Directed Acyclic Graph) which paints the spanning tree of the website, where the tree's root denotes the user given page and the parent of each node is the first node seen, which contains a link to it.
DownloadI just gzipped my complete Eclipse project directory with all sources etc. It already contains the file validationgraph.jar which can directly be run by java -jar validationgraph.jar. If you want it to run anywhere else, do not forget to put the jar files contained in the lib directory in the classpath. You need the Java 1.5 Runtime installed.
All in One JarFor ease of use, I created a jar file, which already contains all required libraries. It can be run anywhere by java -jar validationgraph.jar.
Download jar file.
How To ...
... run it.When running the program, you can override several default properties, affecting the program's behaviour.
These are the defaults:
validationgraph.validationEnabled=true validationgraph.maxDepth=3 validationgraph.maxValidators=3 validationgraph.stayInHost=trueIf you want to modify these, just add a system property to the java call:
java -Dvalidationgraph.maxDepth=5 -jar validationgraph.jar