Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

How to get a regex match without characters before and after counted

 

I am making a method that will get exact matches from a string. When you double click on a word in VS to highlight it, the word does not include the dot before it or after it. Likewise, this method searches for matches that are separated by many simbols or whitespace.

 

Regex.Matches(text, @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]"+find+@"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]");

 

When a match is made it includes the special characters in the index and legth properties. My question is, how do I search for those characters for matches but not have them counted as part of the match value?

 

Also, I would really like to use C# instead of code to put my C# code in. How do I do that?

C#
  • *Experts*
Posted

I think you're going to want to use Groups with the regular expression. The problem is that a Regular expression by itself just tells you whether or not there's a match, and optionally returns a list of matches. Since you really want to find a match based on delimiters, you need a way to specify which parts of the match you want.

 

Here's a new expression that includes a group name of MyMatch. It's the same expression as yours, but instead of just inserting your "find" string in the middle, I've put it inside of the named group, MyMatch. The for loop then loops through matches and pulls out the Group by name.

MatchCollection mc = Regex.Matches(text, 
   @"[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]*(?<MyMatch>" + 
   find + 
   @")[\s\.\<\>\?\\\!\*\(\)\/\-\+\=]");
foreach(Match m in mc)
Debug.WriteLine(m.Groups["MyMatch"].Value);

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

That is good but I don't really understand how it works. Do you feel like explaining why these things are done:

 

1. Why is the * quantifyer placed before the grouping?

2. The question mark?

3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters?

 

 

And, does anyone know of a good e-book or web page on advanced regular expressions. My book covers very basic stuff.

 

Also, please, someone tell me how to put code inside C# code and not just regular code for this forum.

C#
  • *Experts*
Posted

1. Why is the * quantifyer placed before the grouping?

The * indicates that the previous character(s) should repeat 0 or more times. If you have the expression:

a[c]*d

it would match (0 or more of the character 'c'):

ad

acd

accd

acccd

 

You could use the expression:

a[cd]*e to match 0 or more of either the character 'c' or 'd'. The following would match:

ae

ace

ade

acce

adde

acde

accddccdce

 

The + is used to indicate 1 or more. So if the expression were "a[c]+" then "ad" would not match, but "ac", "acc", and "accc" would match.

 

2. The question mark?

The question mark, where I put it, is part of the Grouping syntax. You make a group like this:

(?<MatchName>expression to match)

 

3. Why is the grouping end parethese placed inside the start of the 2nd set of special characters?

Look at 2. above. The ending paren is just part of the group name and must come after the regular expression that represents what the group should be. Essentionally you're trying to find a word (just a set of characters in a particular order) and assign that word to a group name. So if you're looking for the word "Item", the group name should look like:

(?<MatchName>Item)

 

If you wanted to make it more generic, you could use any regular expression syntax in place of "Item", such as [\w]+, which would find one or more letters or numbers. Here's the new group:

(?<MatchName>[\w]+)

 

I find MS's help on regular expression VERY thorough, though you may have to jump through a number of links in the help to find a good sample of what you need. I'd use the help's search option for regular expression examples. The normal F1-linked help from Visual Studio will show mostly the Regex object help and I can never remember which links to click to get to the "good stuff".

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

I changed the code so that there is much less writing:

 

MatchCollection mc = Regex.Matches(text, 
@"[\D & \W]" + "(?<F>" + find + ")" + 
@"[\D & \W]");

This is what I send to it:

a.Regexes.FindExact("gi2joe jgljkasjoejlasdfu*+-/joe....joe(&& kjlaksdgk)()*&%joe--++++halgu", "joe");

The joe in here: jgljkasjoejlasdfu should not be matched but it is!! I think that the & is not working in the expression. Did I do that right? I also tried many other things to get it to work even using && but nothing works.

C#
Posted
I did some testing and found out that a digit is also a word character. The method works now. Did I use the correct syntax to say "If char is not a digit and not a word" with this: \D & \W?
C#
  • *Experts*
Posted

I don't think so. \W by itself says match anything *except* digits and lower/upper case characters (and maybe underscore). By using \W you don't need to also use \D, it's redundant. I have no idea what you're doing with the & character... if you're trying to do "this AND this", you don't need the ampersand, that's more of C# syntax.

 

I think all you want/need is:

@"[\W]*(?<F>" + find + ")[\W]*"

 

This says match 0 or more non-letters and non-digits right next to your "find" letters. Then find 0 or more non-letters and non-digits on the right hand side. So:

".joe." will match

"#$%(*(joe@$^" will match

"joe" will match

"joejoe" will not match

".joe" will match

"joe." will match

"ajoe" will not match

"joea" will not match

 

This is all a guess - I'm not testing any of this :)

 

-Ner

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

I found that when I used the * before 2joe matched when it should not have. But maybe I am mistaken.

 

My question is, how can I say

if char is not whitespace AND not word?

C#
  • *Experts*
Posted

You could use "[\S\W]" for non-whitespace and non-word (alpha+numeric). If you want 0 or more to match, use "[\S\W]*". If you want to match at least one but maybe many more, use "[\S\W]+". If you want to match 1 to 10, use "[\S\W]{1,10}".

 

For the record, "*" is short for "{0,} and "+" is short for "{1,}". If you want any specific number of characters, you can put them in curly braces (or use a range like 2,5).

 

-Ner

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
  • *Experts*
Posted

If it's not there, that's 0 matches. For example. Say you might have this:

...Dan.Jones

or

.Dan.Jones

or

Dan.Jones

 

To get the first name, you could use:

"^[\S\W]*[\w]+[\S\W]+..."

 

So, find 0 or more non-whitespace non-alpha non-digit chars, followed by a word character. You want 0 or more because you don't know if the string starts with a non word or a word character (and it might be more than one non-word character).

 

-ner

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...