New to RegEx...Need some help/advice.

BWolf · April 19, 2005

Howdy all,

Great forum here.

I need some help with regex.

I am (trying to) use regex to extract client names from text that I have extracted from scanned pages.

The zone text varies in it's format as shown in the examples below.

I need to extract the client name(s), both husband and wife if present.

FBT&T CUST FOR THE ROLLOVER IRA OF

JOHN Q PUBLIC

1234 MAIN ST

ANYTOWN, CA 55555-4444

JOHN Q PUBLIC

AND JANE M PUBLIC

JTTEN

1234 MAIN ST

ANYTOWN, CA 55555-4444

JOHN Q PUBLIC

1234 MAIN ST

ANYTOWN, CA 55555-4444

JOHN Q PUBLIC &

JANE M PUBLIC JTWROS

1234 MAIN ST

ANYTOWN, CA 55555-4444

I need to pull out JOHN Q PUBLIC and JANE M PUBLIC (if present) from these examples.

Can i use a single regex to extract this data from these variations?

Thanks,

Brian

HJB417 · April 19, 2005

probably not because of the data is ambiguous

e.x.:

1)if the address is 4 lines, line 1 is the company name and line 2 is the person's name - or vice versa

2)if the address is 3 lines, line 1 can be either the company name or the person's name.

How can one distinguish a company name from one or two ppls name? When you get that done you can move on the parsing extracting the data using regex.

BWolf · April 20, 2005

Thanks HJB417.

That is the problem I'm having, to many variation in the data.

I think I have list all possibilities.

If these are the only variations can I use regex to:

Extract all but the last two lines (these will always be address and not needed)

Then I will be left with only the client names and possiblity some IRA reference, from here can I use regex to remove the IRA reference if it exists AND always ends with "IRA OF"?

Thanks again.

Sign In

New to RegEx...Need some help/advice.

Recommended Posts

BWolf

HJB417

BWolf

Join the conversation

Browse

Activity