Dailycode.info

Short solution for short problems

Read html from an url using VB.Net

 

If you are planning on reading html from a given url, you can have the problem that the system hangs when the url is not found. Especially when the url is a local url on the network. It tends to hang. To resolve this, you can use the HttpWebRequest. I created a simple implementation that will check if the url exists and reads the html to a text field:

        Dim urlCheck As Uri = New Uri(txtUrl.Text)

        Dim request As HttpWebRequest = CType(WebRequest.Create(urlCheck), HttpWebRequest)

        'Timeout is set to 2 seconds.

        request.Timeout = 2000

        Dim response As HttpWebResponse

        Try

            'If the url doesn't exist the exeption will be thrown after 2 seconds.

            'This will prevent your system from hanging.

            response = CType(request.GetResponse(), HttpWebResponse)

            If response.StatusCode = HttpStatusCode.Found Or response.StatusCode = HttpStatusCode.OK Then

                Try

                    If Not String.IsNullOrEmpty(txtUrl.Text) Then

                        sUrl = txtUrl.Text

                        Dim webClient As System.Net.WebClient = New System.Net.WebClient()

                        Dim result As String = webClient.DownloadString(sUrl)

                        result = result.Replace(vbCrLf, "")

                        webClient.Dispose()

                        txtResult.text = result

                    Else

                        MessageBox.Show("Url is required!")

                        Return ""

                    End If

                Catch ex As Exception

                    MessageBox.Show("Error: " & ex.Message)

                End Try

            End If

        Catch ex As Exception

            'If the site is not found, then an exception will be trown

        End Try

After some testing and debugging work I came up with an even more stable solution, since the DownloadString method can give trouble. I decided to use the StreamReader solution directly on the response. It also saves an extra WebRequest. I implemented here in an extensions method for a string (c# this time):

public static class Extensions

{

    public static string GetHTMLForURL(this String s)

    {

        HttpWebRequest webRequest = WebRequest.Create(s) as HttpWebRequest;

        webRequest.Timeout = 2000; 

        HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();

        if (response.StatusCode == HttpStatusCode.Found || response.StatusCode == HttpStatusCode.OK)

        {

            using (StreamReader sr = new StreamReader(response.GetResponseStream()))

            {

                //This is an arbitrary size for this example.

                char[] c = null;

                StringBuilder sb = new StringBuilder();

                while (sr.Peek() >= 0)

                {

                    c = new char[5];

                    sr.Read(c, 0, c.Length);

                    //The output will look odd, because

                    //only five characters are read at a time.

                    sb.Append(c);

                }

                return sb.ToString();

            }

        }

        else

        {

            return s;

        }

    }

}


Then implement the method like this:

txtResult.Text = txtUrl.Text.GetHTMLForURL();

More info on this you can find here: http://nevmehta.blogspot.com/2006/11/controlsclear-doesnt-dispose-in-win.html

A simple extension example can be found here:  http://dotnetbyexample.blogspot.com/2007/11/string-extension-methods.html