How to get a regex match without characters before and after counted (C#)

aewarnick · May 1, 2003

How to get a regex match without characters before and after counted

I am making a method that will get exact matches from a string. When you double click on a word in VS to highlight it, the word does not include the dot before it or after it. Likewise, this method searches for matches that are separated by many simbols or whitespace.

Code:

Regex.Matches(text, @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]"+find+@"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]");

When a match is made it includes the special characters in the index and legth properties. My question is, how do I search for those characters for matches but not have them counted as part of the match value?

Also, I would really like to use C# instead of code to put my C# code in. How do I do that?

Nerseus · May 1, 2003

I think you're going to want to use Groups with the regular expression. The problem is that a Regular expression by itself just tells you whether or not there's a match, and optionally returns a list of matches. Since you really want to find a match based on delimiters, you need a way to specify which parts of the match you want.

Here's a new expression that includes a group name of MyMatch. It's the same expression as yours, but instead of just inserting your "find" string in the middle, I've put it inside of the named group, MyMatch. The for loop then loops through matches and pulls out the Group by name.

Code:

MatchCollection mc = Regex.Matches(text, 
    @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]*(?<MyMatch>" + 
    find + 
    @")[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]");
foreach(Match m in mc)
	Debug.WriteLine(m.Groups["MyMatch"].Value);

-Nerseus

aewarnick · May 1, 2003

That is good but I don't really understand how it works. Do you feel like explaining why these things are done:

1. Why is the * quantifyer placed before the grouping?
2. The question mark?
3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters?

And, does anyone know of a good e-book or web page on advanced regular expressions. My book covers very basic stuff.

Also, please, someone tell me how to put code inside C# code and not just regular code for this forum.

John · May 1, 2003

aewarnick said:
...Also, please, someone tell me how to put code inside C# code and not just regular code for this forum.

Is this what you mean: http://www.xtremedotnettalk.com/misc.php?s=&action=bbcode ?

aewarnick · May 1, 2003

Thank you very much. Now I have a good reason to use code blocks like that!

Nerseus · May 2, 2003

1. Why is the * quantifyer placed before the grouping?
The * indicates that the previous character(s) should repeat 0 or more times. If you have the expression:
a[c]*d
it would match (0 or more of the character 'c'):
ad
acd
accd
acccd

You could use the expression:
a[cd]*e to match 0 or more of either the character 'c' or 'd'. The following would match:
ae
ace
ade
acce
adde
acde
accddccdce

The + is used to indicate 1 or more. So if the expression were "a[c]+" then "ad" would not match, but "ac", "acc", and "accc" would match.

2. The question mark?
The question mark, where I put it, is part of the Grouping syntax. You make a group like this:
(?<MatchName>expression to match)

3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters?
Look at 2. above. The ending paren is just part of the group name and must come after the regular expression that represents what the group should be. Essentionally you're trying to find a word (just a set of characters in a particular order) and assign that word to a group name. So if you're looking for the word "Item", the group name should look like:
(?<MatchName>Item)

If you wanted to make it more generic, you could use any regular expression syntax in place of "Item", such as [\w]+, which would find one or more letters or numbers. Here's the new group:
(?<MatchName>[\w]+)

I find MS's help on regular expression VERY thorough, though you may have to jump through a number of links in the help to find a good sample of what you need. I'd use the help's search option for regular expression examples. The normal F1-linked help from Visual Studio will show mostly the Regex object help and I can never remember which links to click to get to the "good stuff".

-Nerseus

aewarnick · May 2, 2003

That was very helpful Nersus, Thank you.

aewarnick · May 2, 2003

I changed the code so that there is much less writing:

C#:

MatchCollection mc = Regex.Matches(text, 
	@"[\D & \W]" + "(?<F>" + find + ")" + 
	@"[\D & \W]");

This is what I send to it:

C#:

a.Regexes.FindExact("gi2joe jgljkasjoejlasdfu*+-/joe....joe(&& kjlaksdgk)()*&%joe--++++halgu", "joe");

The joe in here: jgljkasjoejlasdfu should not be matched but it is!! I think that the & is not working in the expression. Did I do that right? I also tried many other things to get it to work even using && but nothing works.

aewarnick · May 2, 2003

I did some testing and found out that a digit is also a word character. The method works now. Did I use the correct syntax to say "If char is not a digit and not a word" with this: \D & \W?

Nerseus · May 2, 2003

I don't think so. \W by itself says match anything *except* digits and lower/upper case characters (and maybe underscore). By using \W you don't need to also use \D, it's redundant. I have no idea what you're doing with the & character... if you're trying to do "this AND this", you don't need the ampersand, that's more of C# syntax.

I think all you want/need is:
@"[\W]*(?<F>" + find + ")[\W]*"

This says match 0 or more non-letters and non-digits right next to your "find" letters. Then find 0 or more non-letters and non-digits on the right hand side. So:
".joe." will match
"#$%(*(joe@$^" will match
"joe" will match
"joejoe" will not match
".joe" will match
"joe." will match
"ajoe" will not match
"joea" will not match

This is all a guess - I'm not testing any of this

-Ner

aewarnick · May 2, 2003

I found that when I used the * before 2joe matched when it should not have. But maybe I am mistaken.

My question is, how can I say
if char is not whitespace AND not word?

Nerseus · May 5, 2003

You could use "[\S\W]" for non-whitespace and non-word (alpha+numeric). If you want 0 or more to match, use "[\S\W]*". If you want to match at least one but maybe many more, use "[\S\W]+". If you want to match 1 to 10, use "[\S\W]{1,10}".

For the record, "*" is short for "{0,} and "+" is short for "{1,}". If you want any specific number of characters, you can put them in curly braces (or use a range like 2,5).

-Ner

aewarnick · May 5, 2003

Thanks Nerseus. One thing that does not make sense to me is how you can match 0 matches.

Nerseus · May 5, 2003

If it's not there, that's 0 matches. For example. Say you might have this:
...Dan.Jones
or
.Dan.Jones
or
Dan.Jones

To get the first name, you could use:
"^[\S\W]*[\w]+[\S\W]+..."

So, find 0 or more non-whitespace non-alpha non-digit chars, followed by a word character. You want 0 or more because you don't know if the string starts with a non word or a word character (and it might be more than one non-word character).

-ner

aewarnick · May 5, 2003

Ok, I get it. Thank you.

How to get a regex match without characters before and after counted (C#)

aewarnick

Senior Contributor

Nerseus

Danner

aewarnick

Senior Contributor

John

Junior Contributor

aewarnick

Senior Contributor

Nerseus

Danner

aewarnick

Senior Contributor

aewarnick

Senior Contributor

aewarnick

Senior Contributor

Nerseus

Danner

aewarnick

Senior Contributor

Nerseus

Danner

aewarnick

Senior Contributor

Nerseus

Danner

aewarnick

Senior Contributor