NiallWaller Posted April 11, 2003 Posted April 11, 2003 Need a Regular Expression to strip out the text inside a anchor tag (<a> here </a>) Sounds simple but the simple answer wont allow for nested tags inside the <a> tag... something like this <a [^>]*>(.*)</a> where the .* obviously replaces what i'm looking for. but needs to allow for nested tags like: <a href=blah> <b>text1</b> </a> picking up "<b>text1</b>" And find adjascent <a> tags separetely <a href=blah>text1</a>text2<a href=blah>text3</a> picking up "text1" and "text3" seperately All being used in .net - so lookahead assertions are a possible if anyone knows a solution using them... Any help appreciated Thanks, Niall Quote
philprice Posted April 12, 2003 Posted April 12, 2003 I Would do this Firstly make the regex opject, global, multiline and ignore case, then do something like this expression <a.*?>(.+?)</a> That will pick out blocks the ? on the .+ means "non greedy" so it wont try and be clever and match upto the last instance of a </a>, im not sure how the regex object in .NET works, but it should pick up stuff off one line, if not check for options you might want to use. To be honnest its really not hard, you've made it sound worse than it is. Oh and dont use * to frequently, it will match nothing, which is usually not what you want, use + instead. Quote Phil Price� Visual Studio .NET 2003 Enterprise Edition Microsoft Student Partner 2004 Microsoft Redmond, EMEA Intern 2004
*Experts* Bucky Posted April 12, 2003 *Experts* Posted April 12, 2003 I think what he's looking for is a way to strip out HTML tags from a string. There is an article here about how to accomplish the task. Granted the article is for classic ASP, but the Regexp class works similarly in the .NET framework. Quote "Being grown up isn't half as fun as growing up These are the best days of our lives" -The Ataris, In This Diary
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.