BWolf Posted April 19, 2005 Posted April 19, 2005 Howdy all, Great forum here. I need some help with regex. I am (trying to) use regex to extract client names from text that I have extracted from scanned pages. The zone text varies in it's format as shown in the examples below. I need to extract the client name(s), both husband and wife if present. FBT&T CUST FOR THE ROLLOVER IRA OF JOHN Q PUBLIC 1234 MAIN ST ANYTOWN, CA 55555-4444 JOHN Q PUBLIC AND JANE M PUBLIC JTTEN 1234 MAIN ST ANYTOWN, CA 55555-4444 JOHN Q PUBLIC 1234 MAIN ST ANYTOWN, CA 55555-4444 JOHN Q PUBLIC & JANE M PUBLIC JTWROS 1234 MAIN ST ANYTOWN, CA 55555-4444 I need to pull out JOHN Q PUBLIC and JANE M PUBLIC (if present) from these examples. Can i use a single regex to extract this data from these variations? Thanks, Brian Quote
HJB417 Posted April 19, 2005 Posted April 19, 2005 probably not because of the data is ambiguous e.x.: 1)if the address is 4 lines, line 1 is the company name and line 2 is the person's name - or vice versa 2)if the address is 3 lines, line 1 can be either the company name or the person's name. How can one distinguish a company name from one or two ppls name? When you get that done you can move on the parsing extracting the data using regex. Quote
BWolf Posted April 20, 2005 Author Posted April 20, 2005 Thanks HJB417. That is the problem I'm having, to many variation in the data. I think I have list all possibilities. If these are the only variations can I use regex to: Extract all but the last two lines (these will always be address and not needed) Then I will be left with only the client names and possiblity some IRA reference, from here can I use regex to remove the IRA reference if it exists AND always ends with "IRA OF"? Thanks again. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.