Parse HTML String


Mar 10, 2008
I am using VB.NET 2008 with .NET Framework 2.0.
Right now I have a string taken from an HTML page that I have to parse.
But I am facing critical problems and really need an expert help :(
The string is an IMAGE string, it shows the location and other properties of an IMAGE in an HTML page.
It can be in any of these formats:
<P><IMG alt="" hspace=0 src="E:\untitled.bmp" align=baseline border=0></P>

<P><IMG alt=sometext hspace=0 src="E:\untitled.bmp" align=baseline border=0></P>

<P><IMG alt="some text" hspace=0 src="E:\untitled.bmp" align=baseline border=0></P>

<P><IMG alt="some text" hspace=2 src="E:\untitled.bmp" align=baseline vspace=2 border=1></P>

And go on...
I need to parse the following parameters:

1. Image source : src="%path%" - the path is always bound by double quotations.
2. Image alt : alt - this tag has 3 states:
a. there is no tag, it will be alt=""
b. there is a tag without space, it will be alt=sometext
c. there is a tag with space, it will be alt="some text"
3.Alignment : align=alignment - It will not be bound by double quotations.
4. hspace and vspace - it will be like hspace=2 or hspace=22 or hspace=222 - maximum 3 numbers afterward.
It's so hard to parse it and I really don't know what to do!