Web Based Link Checker
I had a client the other day ask me if there was an easy way for him to check a page he had that had a list of over one hundred web sites. Obviously it's very tedious to have to go and click each link one by one, so I offered to try and come up with some code that can do it for him. While I know it's fairly trivial to write a Windows app that would do this, I wanted to make it web-based so it was easy to distribute and re-use within our organization.
Well, as it turns out, there's no "quick and easy" way to parse HTML with server-side ASP.NET code! In a Windows app it's easy to use a Webbrowser control and get an HTMLDocument object, but there's no server-side equivalent. After much searching I ran across something called the HTMLAgilityPack which made it super easy to get a list of A elements in a web page and parse out the text and href information.
Here's some simple code if you want to try something like this yourself ...
linkChecker.aspx
<form id="form1" runat="server">
<div>
URL to check: <asp:TextBox ID="TextBoxURL" runat="server" Width="550px" /> <asp:Button ID="ButtonGo" runat="server" Text="Go" />
</div>
<div>
<asp:literal ID="LinkTable" runat="server" />
</div>
</form>
linkChecker.aspx.vb
Imports SystemImports System.IO
Imports System.Net
Imports System.Text
Imports System.Runtime.InteropServices
Imports HtmlAgilityPack
Partial Class linkChecker
Inherits System.Web.UI.Page
Protected Sub ButtonGo_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonGo.Click
Dim hw As HtmlWeb = New HtmlWeb
Dim request As WebRequest = WebRequest.Create(TextBoxURL.Text)
'use this line if you need to authenticate
request.Credentials = New NetworkCredential("username", "password", "domain")Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
Dim doc As HtmlDocument = New HtmlDocumentdoc.Load(response.GetResponseStream())
'select all the Anchor elements
Dim hrefs As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@href]")Dim links As String = "<table style='width:100%;font-family:verdana;font-size:10pt;'>" & vbCrLf
For Each href As HtmlNode In hrefsDim linkColor As String = "blue"
Dim uri As String = href.Attributes("href").ValueIf InStr(uri, "http", CompareMethod.Text) > 0 Then
links += "<tr><td>"
Try
Dim testReq As WebRequest = WebRequest.Create(uri)Dim myProxy As New WebProxy()
myProxy.Address = New Uri(http://proxy.mydomain.com:8000)
'if you go through a proxy for external site but exclude for internals site, put your domain hereIf (InStr(uri, "mydomain.com/", CompareMethod.Text) > 0) Then
testReq.Proxy = Nothing
Else
testReq.Proxy = myProxy
End If
Dim testRes As HttpWebResponse = CType(testReq.GetResponse(), HttpWebResponse)links += testRes.StatusDescription
Catch ex As Exceptionlinks += "<span style='color:red;'>Err</span>"
linkColor = "red"
End Try
links += "</td><td>" & href.InnerHtml & "<br><a style='color:" & linkColor & ";' href='" & uri & "'>" & uri & "</a></td></tr>" & vbCrLfEnd If
Next
links += "</table>" & vbCrLfLinkTable.Text = links
End SubEnd Class