aewarnick Posted May 1, 2003 Posted May 1, 2003 How to get a regex match without characters before and after counted I am making a method that will get exact matches from a string. When you double click on a word in VS to highlight it, the word does not include the dot before it or after it. Likewise, this method searches for matches that are separated by many simbols or whitespace. Regex.Matches(text, @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]"+find+@"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]"); When a match is made it includes the special characters in the index and legth properties. My question is, how do I search for those characters for matches but not have them counted as part of the match value? Also, I would really like to use C# instead of code to put my C# code in. How do I do that? Quote C#
*Experts* Nerseus Posted May 1, 2003 *Experts* Posted May 1, 2003 I think you're going to want to use Groups with the regular expression. The problem is that a Regular expression by itself just tells you whether or not there's a match, and optionally returns a list of matches. Since you really want to find a match based on delimiters, you need a way to specify which parts of the match you want. Here's a new expression that includes a group name of MyMatch. It's the same expression as yours, but instead of just inserting your "find" string in the middle, I've put it inside of the named group, MyMatch. The for loop then loops through matches and pulls out the Group by name. MatchCollection mc = Regex.Matches(text, @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]*(?<MyMatch>" + find + @")[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]"); foreach(Match m in mc) Debug.WriteLine(m.Groups["MyMatch"].Value); -Nerseus Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
aewarnick Posted May 1, 2003 Author Posted May 1, 2003 That is good but I don't really understand how it works. Do you feel like explaining why these things are done: 1. Why is the * quantifyer placed before the grouping? 2. The question mark? 3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters? And, does anyone know of a good e-book or web page on advanced regular expressions. My book covers very basic stuff. Also, please, someone tell me how to put code inside C# code and not just regular code for this forum. Quote C#
Leaders John Posted May 1, 2003 Leaders Posted May 1, 2003 ...Also, please, someone tell me how to put code inside C# code and not just regular code for this forum. Is this what you mean: http://www.xtremedotnettalk.com/misc.php?s=&action=bbcode ? Quote "These Patriot playoff wins are like Ray Charles songs, Nantucket sunsets, and hot fudge sundaes. Each one is better than the last." - Dan Shaughnessy
aewarnick Posted May 1, 2003 Author Posted May 1, 2003 Thank you very much. Now I have a good reason to use code blocks like that! Quote C#
*Experts* Nerseus Posted May 2, 2003 *Experts* Posted May 2, 2003 1. Why is the * quantifyer placed before the grouping? The * indicates that the previous character(s) should repeat 0 or more times. If you have the expression: a[c]*d it would match (0 or more of the character 'c'): ad acd accd acccd You could use the expression: a[cd]*e to match 0 or more of either the character 'c' or 'd'. The following would match: ae ace ade acce adde acde accddccdce The + is used to indicate 1 or more. So if the expression were "a[c]+" then "ad" would not match, but "ac", "acc", and "accc" would match. 2. The question mark? The question mark, where I put it, is part of the Grouping syntax. You make a group like this: (?<MatchName>expression to match) 3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters? Look at 2. above. The ending paren is just part of the group name and must come after the regular expression that represents what the group should be. Essentionally you're trying to find a word (just a set of characters in a particular order) and assign that word to a group name. So if you're looking for the word "Item", the group name should look like: (?<MatchName>Item) If you wanted to make it more generic, you could use any regular expression syntax in place of "Item", such as [\w]+, which would find one or more letters or numbers. Here's the new group: (?<MatchName>[\w]+) I find MS's help on regular expression VERY thorough, though you may have to jump through a number of links in the help to find a good sample of what you need. I'd use the help's search option for regular expression examples. The normal F1-linked help from Visual Studio will show mostly the Regex object help and I can never remember which links to click to get to the "good stuff". -Nerseus Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
aewarnick Posted May 2, 2003 Author Posted May 2, 2003 That was very helpful Nersus, Thank you. Quote C#
aewarnick Posted May 2, 2003 Author Posted May 2, 2003 I changed the code so that there is much less writing: MatchCollection mc = Regex.Matches(text, @"[\D & \W]" + "(?<F>" + find + ")" + @"[\D & \W]"); This is what I send to it: a.Regexes.FindExact("gi2joe jgljkasjoejlasdfu*+-/joe....joe(&& kjlaksdgk)()*&%joe--++++halgu", "joe"); The joe in here: jgljkasjoejlasdfu should not be matched but it is!! I think that the & is not working in the expression. Did I do that right? I also tried many other things to get it to work even using && but nothing works. Quote C#
aewarnick Posted May 2, 2003 Author Posted May 2, 2003 I did some testing and found out that a digit is also a word character. The method works now. Did I use the correct syntax to say "If char is not a digit and not a word" with this: \D & \W? Quote C#
*Experts* Nerseus Posted May 3, 2003 *Experts* Posted May 3, 2003 I don't think so. \W by itself says match anything *except* digits and lower/upper case characters (and maybe underscore). By using \W you don't need to also use \D, it's redundant. I have no idea what you're doing with the & character... if you're trying to do "this AND this", you don't need the ampersand, that's more of C# syntax. I think all you want/need is: @"[\W]*(?<F>" + find + ")[\W]*" This says match 0 or more non-letters and non-digits right next to your "find" letters. Then find 0 or more non-letters and non-digits on the right hand side. So: ".joe." will match "#$%(*(joe@$^" will match "joe" will match "joejoe" will not match ".joe" will match "joe." will match "ajoe" will not match "joea" will not match This is all a guess - I'm not testing any of this :) -Ner Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
aewarnick Posted May 3, 2003 Author Posted May 3, 2003 I found that when I used the * before 2joe matched when it should not have. But maybe I am mistaken. My question is, how can I say if char is not whitespace AND not word? Quote C#
*Experts* Nerseus Posted May 5, 2003 *Experts* Posted May 5, 2003 You could use "[\S\W]" for non-whitespace and non-word (alpha+numeric). If you want 0 or more to match, use "[\S\W]*". If you want to match at least one but maybe many more, use "[\S\W]+". If you want to match 1 to 10, use "[\S\W]{1,10}". For the record, "*" is short for "{0,} and "+" is short for "{1,}". If you want any specific number of characters, you can put them in curly braces (or use a range like 2,5). -Ner Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
aewarnick Posted May 5, 2003 Author Posted May 5, 2003 Thanks Nerseus. One thing that does not make sense to me is how you can match 0 matches. Quote C#
*Experts* Nerseus Posted May 5, 2003 *Experts* Posted May 5, 2003 If it's not there, that's 0 matches. For example. Say you might have this: ...Dan.Jones or .Dan.Jones or Dan.Jones To get the first name, you could use: "^[\S\W]*[\w]+[\S\W]+..." So, find 0 or more non-whitespace non-alpha non-digit chars, followed by a word character. You want 0 or more because you don't know if the string starts with a non word or a word character (and it might be more than one non-word character). -ner Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.