SIMIN Posted August 9, 2008 Posted August 9, 2008 Hello, I am using VB.NET 2008 with .NET Framework 2.0. Right now I have a string taken from an HTML page that I have to parse. But I am facing critical problems and really need an expert help :( The string is an IMAGE string, it shows the location and other properties of an IMAGE in an HTML page. It can be in any of these formats: <P><IMG alt="" hspace=0 src="E:\untitled.bmp" align=baseline border=0></P> <P><IMG alt=sometext hspace=0 src="E:\untitled.bmp" align=baseline border=0></P> <P><IMG alt="some text" hspace=0 src="E:\untitled.bmp" align=baseline border=0></P> <P><IMG alt="some text" hspace=2 src="E:\untitled.bmp" align=baseline vspace=2 border=1></P> And go on... I need to parse the following parameters: 1. Image source : src="%path%" - the path is always bound by double quotations. 2. Image alt : alt - this tag has 3 states: a. there is no tag, it will be alt="" b. there is a tag without space, it will be alt=sometext c. there is a tag with space, it will be alt="some text" 3.Alignment : align=alignment - It will not be bound by double quotations. 4. hspace and vspace - it will be like hspace=2 or hspace=22 or hspace=222 - maximum 3 numbers afterward. It's so hard to parse it and I really don't know what to do! Quote
OMID SOFT Posted August 9, 2008 Posted August 9, 2008 Regular Expressions will do the trick for you, but however, this is a time consuming task, if this is OK with you, here is an open source COM component to extract the different parts of HTML documents: http://www.miken.com/htmlzap/index.htm Quote Don't ask what your country can do for you, ask what you can do for your country...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.