negative matches?

seve7_wa

Newcomer
Joined
Feb 23, 2005
Messages
14
Hi,

This is my first post. Please forgive user-ignorance if this is the issue.
I'm a low-rent perl programmer, trying out and really enjoying C# express beta.

I've always managed to get a little turned around with complicated regexen. Luckily perl's regex is (unsafely) powerful and you can avoid having to write a "proper" regex with it usually.

I'm trying to create a regex to strip everything but dollar ammounts and white space from a file. I have some values like "$1,200-$3,500", but in no other case is there text I want to strip immediately following a dollar sign. Some lines have mulitple dollar ammounts at different places on the line, however (up to four I think).

In perl, I'd do a replacement match, something like s/(!~(?:\$\S+)|(?:\s+))//g;
-- but I can't do this in C#.

Can anyone help me write the proper replacement regex? Since I have to run this through hundreds of thousands of lines in a catalog we produce, I need it to be somewhat effecient.
 
Example(s)

seve7_wa said:
I'm trying to create a regex to strip everything but dollar ammounts and white space from a file. I have some values like "$1,200-$3,500", but in no other case is there text I want to strip immediately following a dollar sign. Some lines have mulitple dollar ammounts at different places on the line, however (up to four I think).

Can you post one or more example lines for reference? For each one please show the line before change and the line after change to give us an idea of what you are dealing with. Sometimes just doing this will help you realize the solution yourself. :)
 
example lines

Thanks Richard http://www.xtremedotnettalk.com/newreply.php?do=newreply&p=437692#
Smilie

Richard Crist said:
Can you post one or more example lines for reference? For each one please show the line before change and the line after change to give us an idea of what you are dealing with. Sometimes just doing this will help you realize the solution yourself. :)

Here's some examples:
Code:
Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost.
->
$2,600

Cost: Approximately $3,100-$5,500
->
$3,100 $5,500 (or $3,100-$5,500 would be fine too)

is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical
->
$1,950 $50 $25

The problem I have is I've gotten into the habbit of thinking about regexen in terms of perl, so the obvious solution that comes to mind is a negative match. I'm not sure the proper way to effect a negative match without actually using one, though. :)
 
Visual Basic:
static void Main()
{
	Regex regex = new Regex(@"\$\d{1,3}((,\d{3})+)?(\.\d*)?");
	
	string input = "Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost.";
	foreach(Match match in regex.Matches(input))
		Console.Write("{0} ", match.Value);

	Console.WriteLine();
	input = "Cost: Approximately $3,100-$5,500";
	foreach(Match match in regex.Matches(input))
		Console.Write("{0} ", match.Value);

	Console.WriteLine();
	input = "is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical";
	foreach(Match match in regex.Matches(input))
		Console.Write("{0} ", match.Value);
	Console.WriteLine();
}


output said:
$2,600
$3,100 $5,500
$1,950 $50 $25

I originally thought you wanted to replace anything that wasn't currency with whitespace. e.x.:

before
Visual Basic:
"Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost."

after
Visual Basic:
"              $2,600                                                         "
 
That's awesome in that it matches precisely my values. Thank you. :)

I flubbed a little in my descrption, in that I need to write out a \cM for every line, regardless of whether it contains a match or not. Also, I would prefer that I can use the regex with a Replace call. Is that doable too?
 
The replace call is doable but you'll most likely need to use the negation of the regex "$d{1,3}((,d{3})+)?(.d*)?".

Doing this "\cM for every line" depends on which path you take (using the regex above, or the negation of it).
 
HJB417 said:
The replace call is doable but you'll most likely need to use the negation of the regex "$d{1,3}((,d{3})+)?(.d*)?".

Doing this "\cM for every line" depends on which path you take (using the regex above, or the negation of it).

The negation of a regex? You can do negative matches in .NET?? AWESOME :cool:

(*meekly*) how?

PS> Thank you for treating me with kid gloves, the standard Regular Expression Syntax document apparently just doesn't do justice to all the functionality! Is there a document that thoroughly describes .NET regex syntax?
 
so very close...

Visual Basic:
(?(.*\$.*)(?:[^\$]*)(\$\d{1,3}(?:\,\d{1,3})?(?:\.\d{2})?)?|(?:\S*))
with
$1
this replaces almost perfectly, leaving carrage returns at each line.

but now there's no spaces between the dollar ammounts, to me this is just about cosmetic ...

I'm testing this in a test-app I've written, that has an input box for find and replace, and two to input boxes acting as panes (the first shows the loaded file, the second shows the results).

First, I load the file, which stream writes until EOF into the input file pane.

Then I write in the text boxes the find and replace. They don't need / so I just write the match and replace expressions.

The input goes into the regex like:
Replace(inFile.Text, inputFind.Text, inputReplace.Text)

If I change $1 to $1 , end of lines become square blocks. Saving the results to a file, and opening it with WordPad, those square blocks remain. I think it's interesting that if I copy a section of that text and paste it into WordPad, it properly ends the lines, instead of showing me a square bracket.

Might anyone help me understand what is going on?
 
Try this

using the match evaluator, I replace the match 'txt', with whitespace. Easier than creating a regex that's the negation of what finds currency. The downside is, more string manipulation is being done for each match found, but regardless of that, I think this is the easiest solution.

Code:
static void Main(string[] args)
{
	MatchEvaluator me = new MatchEvaluator(ReplaceWithWhiteSpace);
	Regex regex = new Regex(@"(?<txt>.*?)((?:\$\d{1,3}((,\d{3})+)?(.\d*)?)|$)", RegexOptions.ExplicitCapture);

	string input = "Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost.";
	input = regex.Replace(input, me);
	Console.WriteLine("\"{0}\"", input);

	Console.WriteLine();
	input = "Cost: Approximately $3,100-$5,500";
	input = regex.Replace(input, me);
	Console.WriteLine("\"{0}\"", input);

	Console.WriteLine();
	input = "is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical";
	input = regex.Replace(input, me);
	Console.WriteLine("\"{0}\"", input);
}

static string ReplaceWithWhiteSpace(Match match)
{
	string text = match.Result("$1");
	int index = match.Value.IndexOf(text);
	
	StringBuilder sb = new StringBuilder(match.Value);
	for(int i=index; i < text.Length; i++)
		sb[i] = ' ';
	return sb.ToString();
}

output
Visual Basic:
"              $2,600                                                         "

"                    $3,100-$5,500"

"   $1,950,                      $50                        $25         "
 
Back
Top