Capture Browser content from c# window application

a1jit · Mar 9, 2006

Hi Guys,

How do i capture content from a browser based on a specific URL..

Lets say i navigate to google.com, how do i capture the whole content into a variable..

for some reasons im generating some xml file on server side, so i want
my users to actually access the data from the window application..

Appreciate if some guidelines/reference can be provided to get me started with this.

thanks

Cags · Mar 9, 2006

Do you mean something like this? Has nothing todo with a browser, but it should capture the contents. If you wished to grab the URL from a browser that should be possible also.

C#:

			string html;
			System.Net.WebClient wc;  
			System.IO.Stream myStream; 
			System.IO.StreamReader myReader;

			wc = new System.Net.WebClient();
			myStream = wc.OpenRead(@"http://www.google.com");
			html = myReader.ReadToEnd();

			myReader.Close();
			wc.Dispose();

By the way, this post shouldn't really be in the C# syntax section, as it isn't syntax specific. For future reference you should only post in this section if you have a syntax issue, not just because your useing C#

.

a1jit · Mar 9, 2006

Oh ok, sorry for the posting in wrong section,

yea, thats the code i was looking for, thanks..

but i got a small error, that is
"use of unassigned local variable 'myReader'

Not sure what shall i assigned it to..Any idea?

thanks a lott

dynamic_sysop · Mar 10, 2006

you needed to declare myReader as new .
here's a way that uses a few less lines of code & less classes...

Visual Basic:

[size=2]System.Net.WebClient wClient = [/size][size=2][color=#0000ff]new[/color][/size][size=2] System.Net.WebClient();
 
 
[/size][size=2][color=#0000ff]byte[/color][/size][size=2][] buffer = wClient.DownloadData([url="http://www.google.co.uk"]http://www.google.co.uk[/url]);
 
 
[/size][size=2][color=#0000ff]string[/color][/size][size=2] html = System.Text.Encoding.Default.GetString(buffer, 0 , buffer.Length);
 
 
Console.WriteLine( html );

[/size]

a1jit · Mar 10, 2006

i see, thanks a lott for all the help, is there any way to convert your code to read
the content of the webpage rather than the source code itself..thanks..

PlausiblyDamp · Mar 10, 2006

Not sure what you mean by

read the content of the webpage

as that is exactly what dynamic_sysop's code does.

a1jit · Mar 10, 2006

no i mean, the code above reads the source . meaning it includes the html tags and the data that is in the webpage, so what i plan to do was just to read the body content..meaning lets say this page, i just want to read what is see here, in this page, so i dont want to read the codes to build this page, but i want to read the values on this page (data)..hope i did not confuse you..

PlausiblyDamp · Mar 10, 2006

That's what HTML is... A mixture of tags and content, there is no real concept of 'values' when dealing with HTML.

If you want to get to the text content then you will need to parse the tags to get at the content. If the HTML has a lot of content then you are probably going to have to invest some time in learning Regular Expressions as a way to parse out the tags etc.

a1jit · Mar 10, 2006

thanks for the support guys, really appreciate it..

Capture Browser content from c# window application

a1jit

Regular

Cags

Contributor

a1jit

Regular

dynamic_sysop

Senior Contributor

a1jit

Regular

PlausiblyDamp

Administrator

a1jit

Regular

PlausiblyDamp

Administrator

a1jit

Regular