New to RegEx...Need some help/advice.

BWolf

Newcomer
Joined
Apr 19, 2005
Messages
2
Howdy all,

Great forum here.

I need some help with regex.

I am (trying to) use regex to extract client names from text that I have extracted from scanned pages.

The zone text varies in it's format as shown in the examples below.

I need to extract the client name(s), both husband and wife if present.



FBT&T CUST FOR THE ROLLOVER IRA OF
JOHN Q PUBLIC
1234 MAIN ST
ANYTOWN, CA 55555-4444


JOHN Q PUBLIC
AND JANE M PUBLIC
JTTEN
1234 MAIN ST
ANYTOWN, CA 55555-4444


JOHN Q PUBLIC
1234 MAIN ST
ANYTOWN, CA 55555-4444


JOHN Q PUBLIC &
JANE M PUBLIC JTWROS
1234 MAIN ST
ANYTOWN, CA 55555-4444


I need to pull out JOHN Q PUBLIC and JANE M PUBLIC (if present) from these examples.

Can i use a single regex to extract this data from these variations?

Thanks,
Brian
 
probably not because of the data is ambiguous
e.x.:
1)if the address is 4 lines, line 1 is the company name and line 2 is the person's name - or vice versa
2)if the address is 3 lines, line 1 can be either the company name or the person's name.

How can one distinguish a company name from one or two ppls name? When you get that done you can move on the parsing extracting the data using regex.
 
Thanks HJB417.

That is the problem I'm having, to many variation in the data.

I think I have list all possibilities.

If these are the only variations can I use regex to:

Extract all but the last two lines (these will always be address and not needed)

Then I will be left with only the client names and possiblity some IRA reference, from here can I use regex to remove the IRA reference if it exists AND always ends with "IRA OF"?

Thanks again.
 
Back
Top