seve7_wa Posted February 23, 2005 Posted February 23, 2005 Hi, This is my first post. Please forgive user-ignorance if this is the issue. I'm a low-rent perl programmer, trying out and really enjoying C# express beta. I've always managed to get a little turned around with complicated regexen. Luckily perl's regex is (unsafely) powerful and you can avoid having to write a "proper" regex with it usually. I'm trying to create a regex to strip everything but dollar ammounts and white space from a file. I have some values like "$1,200-$3,500", but in no other case is there text I want to strip immediately following a dollar sign. Some lines have mulitple dollar ammounts at different places on the line, however (up to four I think). In perl, I'd do a replacement match, something like s/(!~(?:\$\S+)|(?:\s+))//g; -- but I can't do this in C#. Can anyone help me write the proper replacement regex? Since I have to run this through hundreds of thousands of lines in a catalog we produce, I need it to be somewhat effecient. Quote
Richard Crist Posted February 23, 2005 Posted February 23, 2005 Example(s) I'm trying to create a regex to strip everything but dollar ammounts and white space from a file. I have some values like "$1' date='200-$3,500", but in no other case is there text I want to strip immediately following a dollar sign. Some lines have mulitple dollar ammounts at different places on the line, however (up to four I think).[/quote'] Can you post one or more example lines for reference? For each one please show the line before change and the line after change to give us an idea of what you are dealing with. Sometimes just doing this will help you realize the solution yourself. :) Quote nothing unreal exists .NET Framework Homepage ~ Visual C# Spec ~ C++/CLI Spec ~ Visual Basic .NET Spec
HJB417 Posted February 24, 2005 Posted February 24, 2005 Use Regex.Replace Everything else should be trivial from there Quote
seve7_wa Posted February 24, 2005 Author Posted February 24, 2005 example lines Thanks Richard http://www.xtremedotnettalk.com/newreply.php?do=newreply&p=437692# Smilie Can you post one or more example lines for reference? For each one please show the line before change and the line after change to give us an idea of what you are dealing with. Sometimes just doing this will help you realize the solution yourself. :) Here's some examples: Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost. -> $2,600 Cost: Approximately $3,100-$5,500 -> $3,100 $5,500 (or $3,100-$5,500 would be fine too) is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical -> $1,950 $50 $25 The problem I have is I've gotten into the habbit of thinking about regexen in terms of perl, so the obvious solution that comes to mind is a negative match. I'm not sure the proper way to effect a negative match without actually using one, though. :) Quote
HJB417 Posted February 24, 2005 Posted February 24, 2005 static void Main() { Regex regex = new Regex(@"\$\d{1,3}((,\d{3})+)?(\.\d*)?"); string input = "Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost."; foreach(Match match in regex.Matches(input)) Console.Write("{0} ", match.Value); Console.WriteLine(); input = "Cost: Approximately $3,100-$5,500"; foreach(Match match in regex.Matches(input)) Console.Write("{0} ", match.Value); Console.WriteLine(); input = "is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical"; foreach(Match match in regex.Matches(input)) Console.Write("{0} ", match.Value); Console.WriteLine(); } $2,600 $3,100 $5,500 $1,950 $50 $25 I originally thought you wanted to replace anything that wasn't currency with whitespace. e.x.: before "Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost." after " $2,600 " Quote
seve7_wa Posted February 24, 2005 Author Posted February 24, 2005 That's awesome in that it matches precisely my values. Thank you. :) I flubbed a little in my descrption, in that I need to write out a \cM for every line, regardless of whether it contains a match or not. Also, I would prefer that I can use the regex with a Replace call. Is that doable too? Quote
HJB417 Posted February 24, 2005 Posted February 24, 2005 The replace call is doable but you'll most likely need to use the negation of the regex "$d{1,3}((,d{3})+)?(.d*)?". Doing this "\cM for every line" depends on which path you take (using the regex above, or the negation of it). Quote
seve7_wa Posted February 24, 2005 Author Posted February 24, 2005 The replace call is doable but you'll most likely need to use the negation of the regex "$d{1,3}((,d{3})+)?(.d*)?". Doing this "\cM for every line" depends on which path you take (using the regex above, or the negation of it). The negation of a regex? You can do negative matches in .NET?? AWESOME :cool: (*meekly*) how? PS> Thank you for treating me with kid gloves, the standard Regular Expression Syntax document apparently just doesn't do justice to all the functionality! Is there a document that thoroughly describes .NET regex syntax? Quote
HJB417 Posted February 25, 2005 Posted February 25, 2005 The negation of a regex? You can do negative matches in .NET?? No, you should add that to the .net wishlist though. Sorry if I gassed you. .NET Framework General Reference - Regular Expression Language Elements Quote
seve7_wa Posted February 25, 2005 Author Posted February 25, 2005 so very close... (?(.*\$.*)(?:[^\$]*)(\$\d{1,3}(?:\,\d{1,3})?(?:\.\d{2})?)?|(?:\S*)) with $1 this replaces almost perfectly, leaving carrage returns at each line. but now there's no spaces between the dollar ammounts, to me this is just about cosmetic ... I'm testing this in a test-app I've written, that has an input box for find and replace, and two to input boxes acting as panes (the first shows the loaded file, the second shows the results). First, I load the file, which stream writes until EOF into the input file pane. Then I write in the text boxes the find and replace. They don't need / so I just write the match and replace expressions. The input goes into the regex like: Replace(inFile.Text, inputFind.Text, inputReplace.Text) If I change $1 to $1 , end of lines become square blocks. Saving the results to a file, and opening it with WordPad, those square blocks remain. I think it's interesting that if I copy a section of that text and paste it into WordPad, it properly ends the lines, instead of showing me a square bracket. Might anyone help me understand what is going on? Quote
HJB417 Posted February 25, 2005 Posted February 25, 2005 Try this using the match evaluator, I replace the match 'txt', with whitespace. Easier than creating a regex that's the negation of what finds currency. The downside is, more string manipulation is being done for each match found, but regardless of that, I think this is the easiest solution. static void Main(string[] args) { MatchEvaluator me = new MatchEvaluator(ReplaceWithWhiteSpace); Regex regex = new Regex(@"(?<txt>.*?)((?:\$\d{1,3}((,\d{3})+)?(.\d*)?)|$)", RegexOptions.ExplicitCapture); string input = "Cost: Approx. $2,600 for a 3-credit course, noncredit workshops vary in cost."; input = regex.Replace(input, me); Console.WriteLine("\"{0}\"", input); Console.WriteLine(); input = "Cost: Approximately $3,100-$5,500"; input = regex.Replace(input, me); Console.WriteLine("\"{0}\"", input); Console.WriteLine(); input = "is $1,950, plus a nonrefundable $50 registration fee and a $25 Physical"; input = regex.Replace(input, me); Console.WriteLine("\"{0}\"", input); } static string ReplaceWithWhiteSpace(Match match) { string text = match.Result("$1"); int index = match.Value.IndexOf(text); StringBuilder sb = new StringBuilder(match.Value); for(int i=index; i < text.Length; i++) sb[i] = ' '; return sb.ToString(); } output " $2,600 " " $3,100-$5,500" " $1,950, $50 $25 " Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.