I was developing an application to read the html from a given URL and get some text out of it. The application was for temperature (using degree Celsius) meters that post their findings on a simple html page. Since the structure is configurable, we had to look for a dynamic solution. So we get the html and look for a string that is between 2 given tags.
All went well until I had some troubles with the WebClient.DownloadString method. It works the first time, but hangs the second time. So I decided to use a streamreader on the webresponse.
This works good, you can close the connections and read as much as you like. But then I figured out that the degree sign: ° was converted into some weird symbol.
The reason was that if you do not use a specific encoding for the stream reader, it uses utf-8 encoding, which replaces the degree sign. The solution is simple, but it took me a while to find out which encoding did the trick. At the end it was the Default encoding that worked for me!
He is the code example:
Private Function GetHTML() As String
Dim urlCheck As Uri = New Uri(txtUrl.Text)
Dim request As HttpWebRequest = CType(WebRequest.Create(urlCheck), HttpWebRequest)
'Timeout is set to 3 seconds.
request.Timeout = 3000
Dim sb As StringBuilder = New StringBuilder
Dim response As HttpWebResponse
'If the url doesn't exist the exeption will be thrown after 2 seconds.
'This will prevent your system from hanging.
response = CType(request.GetResponse(), HttpWebResponse)
If response.StatusCode = HttpStatusCode.Found Or response.StatusCode = HttpStatusCode.OK Then
Dim receiveStream As Stream = response.GetResponseStream()
Dim encode As Encoding = System.Text.Encoding.Default
' Pipes the response stream to a higher level stream reader with the required encoding format.
Dim readStream As New StreamReader(receiveStream, encode)
Dim read(256) As [Char]
' Reads 256 characters at a time.
Dim count As Integer = readStream.Read(read, 0, 256)
While count > 0
' Dumps the 256 characters to a string and displays the string to the console.
Dim str As New [String](read, 0, count)
count = readStream.Read(read, 0, 256)
' Releases the resources of the Stream.
' Releases the resources of the response.
Catch ex As Exception
'If the site is not found, then an exception will be trown
Return "No response"