Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

I was wondering if someone might be able to lend a hand with a little issue I have encountered when it comes to parsing strings in C#, not necessarily a hard task just something I have been struggling with (not the best string parser yet).

 

Given a String (line) which needs to be broken down into several sections (columns) with delimiters as indicators (strDelim in array Delims[]) of where the break occurs.

 

Code:

 

int iPos = 0;

int iLast = 0;

int iCol = 0;

 

while (iPos < line.Length)

{

foreach (string strDelim in Delims)

{

if (iPos + strDelim.Length >= line.Length)

break;

if (line.Substring(iPos, strDelim.Length) == strDelim)

{

row[iCol] = line.Substring(iLast, iPos - iLast);

iPos += strDelim.Length;

iLast = iPos + 1;

iCol++;

}

}

iPos++;

if (iPos == line.Length)

row[iCol] = line.Substring(iLast - 1, iPos - iLast + 1);

}

 

This code causes a few problems and I was hoping someone could see where I am going wrong or propose something more efficient:

- If the last column of the string is empty the code above will fail

- If the String has 3 Columns, the first character of the second column is trimmed

- Anything else that jumps out to you as being a mistake

 

I am sure there are other issues I have yet to encounter, hopefully a better method is known and I am just being blind.

Posted (edited)

you are rebuilding the wheel i think. . .

doesnt String.Split do this already?

Though I didnt look at your code too closely

Edited by Joe Mamma

Joe Mamma

Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

  • Leaders
Posted

As Joe said , you may wish to use string.Split . Also you can use Regex.Split ( which i seem to find much better and faster ) ...

// example using string.Split ...
		string[] str = "some stuff with spaces in it".Split(' ');
		foreach(string s in str)
		{
			Console.WriteLine(s);
		}

// example using Regex ...
		string str = "some stuff with spaces in it";
		foreach(string s in System.Text.RegularExpressions.Regex.Split(str , " "))
		{
			Console.WriteLine(s);
		}

Posted

Problem is I don't think I can use either of those methods (string.split or regex.split), the code must be dynamic to handle many different formats of string.

 

For example, some string may have many different delimiters (stored in the delim array).

 

Specifially:

Line = CODE : DESCRIPTION AREA_NAME

where (:) and (AREA_) are the delimiters and CODE / DESCRIPTION / NAME are the fields.

 

or

Line = Name :- CODE - SUBLET

where (:-) and (-) are the delimiters and NAME / CODE / SUBLET are the fields.

 

See what I mean?

Posted
Problem is I don't think I can use either of those methods (string.split or regex.split), the code must be dynamic to handle many different formats of string.

 

For example, some string may have many different delimiters (stored in the delim array).

 

Specifially:

Line = CODE : DESCRIPTION AREA_NAME

where (:) and (AREA_) are the delimiters and CODE / DESCRIPTION / NAME are the fields.

 

or

Line = Name :- CODE - SUBLET

where (:-) and (-) are the delimiters and NAME / CODE / SUBLET are the fields.

 

See what I mean?

Sounds like you will have to do a little data massaging before you split

Joe Mamma

Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

  • *Experts*
Posted

Shaitan00, even with your "dynamic" code you still need to figure out Delim. I assume you "know" this from either a file or some other method. Once you have that list of delimiters, you can use Regex.Split.

 

RegEx allows multiple delimeters, each being any size you want. For example:

// Somewhere at the top:
// using System.Text.RegularExpressions; 

string s = "Name :- CODE - SUBLET";
Regex r = new Regex(":-|-");
string[] s2 = r.Split(s);
foreach(string s3 in s2)
{
Debug.WriteLine(s3);
}

 

The regular express is ":-|-". This says find a match on ":-" or on "-" (the pipe is an OR).

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

I do know my Delimiters, they are stored in array [Delims] in the proper order as they should (and have to) appear in the actual string.

 

Thus the (foreach (string strDelim in Delims))

Is there anyway to implement your idea (using RegEx) with this criteria?

 

I assume the only changes needed would be to "Regex r = new Regex(":-|-");" where I need to input the Array information from Delims into that Regex r.

Any clues?

 

Will this work?

  • *Experts*
Posted

It sounds like what you want is more than a simple split - I think you'd do best with regular expression groups. Here's a code snippet to extract what you want (I guess):

// "Name :- CODE - SUBLET";
string expression = @"^(?<Name>.*):-(?<Code>.*)-(?<Sublet>.*)$";
string data = "Nerseus Bob :- ABC123 - 05f";
Regex regex = new Regex(expression);
Match match = regex.Match(data);
if(match.Success)
{
string name = match.Groups["Name"].Value;
string code = match.Groups["Code"].Value;
string sublet = match.Groups["Sublet"].Value;
}

 

The string "expression" is the regular expression. It defines 3 groups: Name, Code and Sublet. Each has a match of ".*" which means, match anything. In between each group is the other pieces of the match, which you don't care about except as delimiters.

 

If you know that there will always be spaces around each delimiter, you could put spaces in your non-group matches. Something like this:

string expression = @"^(?<Name>.*) :- (?<Code>.*) - (?<Sublet>.*)$";

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

So, to adapt this to meet my requirements I just need a way to create the string �expression� dynamically from my delims[] array and Field Names.

 

So something like:

For (int j = 0, j < delims.length ; j++)

string expression = delims[j] + @�^(?<Field[j]>.*)

 

// Where delims[0] = ��; (no delimiter at the beginning of the string]

// Where field[j] is an array with the name of all the corresponding fields

// For this example Content of delims[0, :-, -] and Contents of Fields[NAME, CODE, SUBLET]

 

Note: Name / Code / Sublet are fields, the values inside those fields is what I am interested in.

 

Would something like that work?

Is the syntax of my string �expression� correct? (I don�t know much about RegEx yet).

 

Thanks for the help.

  • *Experts*
Posted

All you have is an array of delimiters? Is this an array that's given to you or one you build yourself (or read from a config)? Ideally, you'd store the regular expressions instead of the delimiters, if you have the control.

 

If you just need a general purpose "split" library that splits on a set of delimiters in a specific sequence and you don't know how many or how big each delim will be, you may want a custom regular expression like the one your pseudocode was showing.

 

If you know there are only two or three sets of delims to look for, I'd go with the more readable named groups. If/when you have to modify this code a month or more from now it will be a lot simpler.

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

I generate the delimiter array myself.

 

Simply put, first I identify which kind of line I need to read (this is given) and then I start off with a FORMAT string that looks like the following:

string format = "[NAME]:-

-[sUBLET]"    (this string is GIVEN)

I take the string and identify all Delimiters (between ] and [) and save each (in the specific order) in the delims[] array.

 

Then I get the actual line (call it LINE) that can look something like this

string line = "CHIPS:-124566-Grocery" (taken from a text file).

 

I need to run threw LINE and isolate the terms CHIPS, 124556, and Grocery.

 

This must work given any kind for FORMAT string and associated information in LINE.

 

Thus, it must all be Dynamic, I can put anything in FORMAT and as long as the LINE I pass in matches I need it to work.

 

For example, string FORMAT = "[TYPE]/[LOCATION]--.-[AREA]_[code]

or FORMAT = "[sONG].[NAME].[TITLE]avc[sTYLE]

etc......

 

Is RegEx still possible?

And if so, does the syntax used in my Pseudo code work?

  • *Experts*
Posted

I'd say work off the Format, not the parsed Delimiters.

For example:

// Given some generic format...
string format = "[Name] :- [Code] - [sublet]";

// Replace the "[" and "]" characters with regular expression chars
format = format.Replace("[", "(?<");
format = format.Replace("]", ">.*)");
string regExpression = "^" + format + "$";

// The rest of the code is the same
string data = "Nerseus Bob :- ABC123 - 05f";
Regex regex = new Regex(regExpression);
Match match = regex.Match(data);
if(match.Success)
{
string name = match.Groups["Name"].Value;
string code = match.Groups["Code"].Value;
string sublet = match.Groups["Sublet"].Value;
}

 

If you don't want to "know" the names of each group, you can use ordianl positions. So change:

string name = match.Groups["Name"].Value;

to

string name = match.Groups[0].Value;

 

You could, if you needed it, get the group name from the ordinal. So to see that the groups are "Name", "Code", and "Sublet", you simply check the group's property given the position 0, 1, or 2.

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted (edited)

I attempted to implement the parser using RegEx as you described but it is not giving the expected results, I setup the following as a Test (almost exactly what I need) to demonstrate:

 

Code:

// Other method of Parsing using RegEx

 

format = format.Replace("[", "(?<");

format = format.Replace("]", ">.*)");

string regExpression = "^" + format + "$";

 

Regex regex = new Regex(regExpression);

Match match = regex.Match("");

 

while(sR.Peek() != -1)

{

line = sR.ReadLine();

 

if (line.Length != 0)

{

match = regex.Match(line);

 

if(match.Success)

{

string name = match.Groups[0].Value;

}

}

}

// End of Other Method

 

Problem is that string name = string line, the �match� doesn�t seem to be doing anything. Unless I am mistaken the string name should be = "1552"!

Any clues?

 

To help debugging here are the contents of some variables:

Line = "1552 : Markdown Coupon"

Format = "

:[Description]"

regExpression = "^(?<Code>.*):(?<Description>.*)$"

Name = "1552 : Markdown Coupon"

match.Groups.Count = 3 [but there should only be 2 groups, Code and Description]

Edited by Shaitan00
  • *Experts*
Posted

Look in the help, but this is by design - the first group (group 0) is an unnamed group. Look at the help topic under Group Constructs.

 

Here's a snippet:

Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted. For instance, the pattern ((?<One>abc)/d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and name. (The first capture (number 0) always refers to the entire pattern).

 

-Nerseus

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...