Jump to content
Xtreme .Net Talk

Triming a Textfile have String Manipulation Trim 93,000 lines How?


Recommended Posts

Posted

I making a program that will help me with Trimming characters off the begining of each line of Text.

 

I need this because there are over 93,000 lines if I count them all fortunately there arent 93,000 in just one text file.....its group of files total

 

ANyway the Text in each one looks like this

1:1: Line of text
1:2: Line of text
1:3: Line of text
2:1: Line of text
2:2: Line of text
2:3: Line of text

The text is in put into an array

What I need is to Trim it so that the results are as Follows

1:1 Line of text
  2 Line of text
  3 Line of text
2:1 Line of text
  2 Line of text
  3 Line of text
3:1 Line of text
  2 Line of text
  3 Line of text

NOTE in the Lines of text in some strings there are ":" if that is important I thought it might be

 

Normally if I were to Trim I might use itm.StartsWith("i:i")

then itm.Substring(2) or something like that

cant remember now exactly how to tackle this

In this case its different since the : after the numbers needs taken out too

Anythoughts appreciated

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted

UPDATE:

 

Ok I through some code together but this doesnt get exactly what I need but I think it might show you how or where I am going.

 

Dim myChap() As String

       For i As Integer = 0 To UBound(myArr)
           Dim itm As String
           For Each itm In myArr
               If itm.StartsWith(i & ":1:") Then
                   ' Do nothing to them
                   Dim strC As String
                   strC = strC & itm & vbCrLf
                   myChap = Split(strC, vbCrLf)
               End If
           Next

       Next
       chapText.Text = UBound(myChap)

       For i As Integer = 0 To UBound(myChap) - 1
           Dim strItm As String
           For Each strItm In myArr
               If strItm.StartsWith(i & ":") Then
                   ListBox1.Items.Add(strItm.Substring(2))
               End If
           Next
       Next

 

The result of this code is this for the text file

 

1:1: Line of Text

1:2: Line of Text

1:3: Line of Text

 

To this

1: Line of Text

2: Line of Text

3: Line of Text

 

This wont work I neeed to retain the Chapter start meaning this

1:1 Line of Text ' NOte I removed however the ":" after the 1

2 Line of Text

3 Line of Text

2:1 Line of Text

2 Line of Text

3 Line of Text

 

 

Help How do I get this done

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted
UPDATE:

 

Ok I through some code together but this doesnt get exactly what I need but I think it might show you how or where I am going.

 

Dim myChap() As String

       For i As Integer = 0 To UBound(myArr)
           Dim itm As String
           For Each itm In myArr
               If itm.StartsWith(i & ":1:") Then
                   ' Do nothing to them
                   Dim strC As String
                   strC = strC & itm & vbCrLf
                   myChap = Split(strC, vbCrLf)
               End If
           Next

       Next
       chapText.Text = UBound(myChap)

       For i As Integer = 0 To UBound(myChap) - 1
           Dim strItm As String
           For Each strItm In myArr
               If strItm.StartsWith(i & ":") Then
                   ListBox1.Items.Add(strItm.Substring(2))
               End If
           Next
       Next

 

The result of this code is this for the text file

 

1:1: Line of Text

1:2: Line of Text

1:3: Line of Text

 

To this

1: Line of Text

2: Line of Text

3: Line of Text

 

This wont work I neeed to retain the Chapter start meaning this

1:1 Line of Text ' NOte I removed however the ":" after the 1

2 Line of Text

3 Line of Text

2:1 Line of Text

2 Line of Text

3 Line of Text

 

 

Help How do I get this done

 

vbMarkO

 

You need to increment integer variable "i" by 1 if you expect this line to work:

 

If itm.StartsWith(i & ":1:") Then

 

The first time your code executes you are checking for "0:1:".

 

I recommend removing the first for loop and adding a counter for the starting integers in the file.

 

Dim itm As String

Dim i as integer = 1

For Each itm In myArr

If itm.StartsWith(i & ":1:") Then

' Do nothing to them

Dim strC As String

strC = strC & itm & vbCrLf

myChap = Split(strC, vbCrLf)

End If

i += 1

Next

Posted

hmmm,

 

I will take that into consideration and even test it to see the difference.

 

However that first loop works, it gives me an accurate count of the chapters everytime!

 

Its the second loop I cant figure out how to remove this

 

1:1: Line of Text <--- This line I need like this 1:1 Line of Text <--NOTE ":" Removed from end of the chapter start

1:2: Line of Text <----This Line I need like this 2 Line of Text <---NOTE "1:" & ":" are removed

Result should be this

 

1:1 Line of Text <------ Chapter one verse one

2 Line of Text <----- verse 2 of chapter one ect ect ect

3 Line of Text

2:1 Line of Text

2 Line of Text

3 Line of Text

I have over 93,000 lines like this

ALSO it might be important to NOTE: That ":" <--- these can be found inthe lines of text thus what ever code is used must not effect them only

the chapter starts and verses

I hope someone has an idea how I might do this the idea of having to go through 93,000 lines one by one really doesnt thrill me :(

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted (edited)
hmmm,

 

I will take that into consideration and even test it to see the difference.

 

However that first loop works, it gives me an accurate count of the chapters everytime!

 

Its the second loop I cant figure out how to remove this

 

1:1: Line of Text <--- This line I need like this 1:1 Line of Text <--NOTE ":" Removed from end of the chapter start

1:2: Line of Text <----This Line I need like this 2 Line of Text <---NOTE "1:" & ":" are removed

Result should be this

 

1:1 Line of Text <------ Chapter one verse one

2 Line of Text <----- verse 2 of chapter one ect ect ect

3 Line of Text

2:1 Line of Text

2 Line of Text

3 Line of Text

I have over 93,000 lines like this

ALSO it might be important to NOTE: That ":" <--- these can be found inthe lines of text thus what ever code is used must not effect them only

the chapter starts and verses

I hope someone has an idea how I might do this the idea of having to go through 93,000 lines one by one really doesnt thrill me :(

 

vbMarkO

 

Your first block of code may work fine but it isn't as efficient as it could be.

 

As for your second block of code, detect the first line where the verse is 1 for the chapter. Treat the next lines as verses in that chapter until the chapter changes by removing the chapter number and ":" symbols from the strings. I would do all of this, including the first code block, in one for loop.

 

This assumes you are reading the file with the chapters grouped together. If they are not in order you may want to create a dataset with the 93k rows so you can sort them by chapter ascendingly.

Edited by JTDPublix
Posted

a Couple of thoughts...

 

You mention by First block of code is not as efficient as yours yet they both accomplish the same things so my questions is this keeping in mind im not callenging what you are saying, instead I am simply trying to understand why?

 

So, why is yours more efficient and mine less efficient?

As I certainly want to do it most efficeintly!!!

 

Ok about the rest of what you said..... Uh I did mention I am new at this didnt I LOL

 

I have no idea what you meant by that.... I am not good at theorie, or verbal explanations of code I learn by example ..... it helps me to see the logic.... perhaps one day I will be able to do the other too... I hope so!!!

 

Anyway, couild you post an example of what you mean?

 

Also, I might be misunderstanding you or my original code example misled you.

 

But what I need my second block of code to do is strip away or trim off parts of the begining of each verse.

 

1:1: Line of Text ' /// On Chapter Starts I need them to go from this to 1:1 Line of Text NOTICE the ":" removed from the end

then on all vereses following each chapter start I need to go from this to this as below

1:2: Line of Text to 2 Line of Text

 

NOw why do I need this? I have text files that total together equal over 93,000 lines of text

Instead of recreating then entire Text files in this new format I thought I would take what I already have and build a tool to automatically take out what I needed then loading the newly trimmed 93,000 line text into an RTB of which I will then save as a new file.

 

vbMarKO

Visual Basic 2008 Express Edition!
Posted
a Couple of thoughts...

 

You mention by First block of code is not as efficient as yours yet they both accomplish the same things so my questions is this keeping in mind im not callenging what you are saying, instead I am simply trying to understand why?

 

So, why is yours more efficient and mine less efficient?

As I certainly want to do it most efficeintly!!!

vbMarKO

 

If two blocks of code do the same thing but the first one uses 1 for loop and the second uses 2 for loops, which do you think is more efficient?

 

Anyway, couild you post an example of what you mean?

 

Also, I might be misunderstanding you or my original code example misled you.

 

But what I need my second block of code to do is strip away or trim off parts of the begining of each verse.

 

1:1: Line of Text ' /// On Chapter Starts I need them to go from this to 1:1 Line of Text NOTICE the ":" removed from the end

then on all vereses following each chapter start I need to go from this to this as below

1:2: Line of Text to 2 Line of Text

 

NOw why do I need this? I have text files that total together equal over 93,000 lines of text

Instead of recreating then entire Text files in this new format I thought I would take what I already have and build a tool to automatically take out what I needed then loading the newly trimmed 93,000 line text into an RTB of which I will then save as a new file.

 

vbMarKO

 

I understand what you need to do. Your file looks like this:

 

1:1: Line of Text

1:2: Line of Text

2:1: Line of Text

 

Here is how you can approach this:

 

For Each arrLine as String in arrLines
'Break the line up into 3 pieces
lineParts = Split(arrLine, ":")
	
'If the second piece contains a 1
if lineParts(1) = "1" then 'We found the start of the chapter
	arrLine = lineParts(0) & ":" & lineParts(1) & lineParts(2)
else 'This line must be a verse in the chapter
	arrLine = lineParts(1) & lineParts(2)
end if

newFile &= arrLine & vbNewLine
Next

 

Please note that I did not test this at all. It is just intended to help you get going in the right direction.

Posted

Thank you it does give me an idea or 2.

I havent tested it yet either but I will just as soon as I am done...

 

My first concern when looking at this right off is using this ":" to split it.

Reason is the Lines of Text are scripture and scripture uses this ":" often thus wouldnt that pose a problem?

I will give this a practice run see what comes out of it.

For Each arrLine as String in arrLines
'Break the line up into 3 pieces
lineParts = Split(arrLine, ":")
	
'If the second piece contains a 1
if lineParts(1) = "1" then 'We found the start of the chapter
	arrLine = lineParts(0) & ":" & lineParts(1) & lineParts(2)
else 'This line must be a verse in the chapter
	arrLine = lineParts(1) & lineParts(2)
end if

newFile &= arrLine & vbNewLine
Next
  

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted

Ok I ran your code and I have to say WOW,

 

It worked based on how I tested it below... code first then I will discuss results and my idea

 

       Dim lineParts() As String
       Dim arrLines() As String
       Dim strText As String
       Dim newFile As String

       strText = "1:1: Line of Text" & vbCrLf & "1:2: Line of Text" & vbCrLf & "1:3: Line of Text" & vbCrLf & _
       "2:1: Line of Text" & vbCrLf & "2:2: Line of Text" & vbCrLf & "2:3: Line of Text"
       arrLines = Split(strText, vbCrLf)
       
       For Each arrLine As String In arrLines
           'Break the line up into 3 pieces
           lineParts = Split(arrLine, ":")

           'If the second piece contains a 1
           If lineParts(1) = "1" Then 'We found the start of the chapter
               arrLine = lineParts(0) & ":" & lineParts(1) & lineParts(2)
           Else 'This line must be a verse in the chapter
               arrLine = lineParts(1) & lineParts(2)
           End If

           newFile &= arrLine & vbNewLine
       Next
       TextBox1.Text = newFile

 

Ok based on the string I used above your code works perfectly :)

 

However, I added an ":" in the lines of text and the results were what I suspected they might be... missing lines of text ....

 

You might know a way around this better than I but I do have an idea, one you may feel is not needed as yoou may know a way to simplify this.

 

But still here is my idea

 

Create 2 more arrays

 

Dim Arr1() As string

Dim Arr2() As String

 

' Step One

For Each itm As String in arrLines

Arr1 = ' itm.Substring ' I know this isnt right idea is though to load only the line of text and not the numbers ie; 1:1: or 1:2: ect ect into this Array

Next

 

' STEP TWO

For Each itm As String in arrLines

Arr2 = itm.Remove (substring) ' Again I know this isnt right code wise but the idea again here is to remove the Line of text leaving only the numbers

' as you can see Im not sure how to do either of these

Next

 

' STEP three

 

Run your code only it needs to be modified I think so that it deals with only 2 parts not 3 parts RIGHT? ANyway it will only be dealing with the numbers left by step two

Then when it concatonates the strings back together it would be

 

the newly striped numbers of Step three with the elements of Arr1()

 

Ok, is the idea sound?

If not then the idea is on track I know because we need to keep your code from splitting the lines of text up because of the ":" in them

 

ANy thoughts to this appreciated

 

In fact I appreciate the fact you was able to get me think out of the box a little

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted

Im glad that bit of code worked for you.

 

You seem to be on the right track. This is an untested, modified version of the code I posted previously:

 

For Each arrLine As String In arrLines
'Break the line up into 2 pieces (text and numbers)
           	lineParts = Split(arrLine, " ")

'We don't care about the second piece lineParts(1) because it is all text.
'We need to focus on parsing the first piece that contains the numbers.
           	'Split the first piece on ":"
firstPieceParts = split(lineParts(0), ":")
           	If firstPieceParts(1) = "1" Then 'We found the start of the chapter
             	arrLine = firstPieceParts(0) & ":" & firstPieceParts(1) & " " & lineParts(1)
           	Else 'This line must be a verse in the chapter
               	arrLine = firstPieceParts(0) & " " & lineParts(1)
           	End If
newFile &= arrLine & vbNewLine
Next

 

Try to stay away from creating extra work for yourself. Think it through it your head before coding, and keep efficiency in mind.

 

Let us know how you make out.

Posted

Thanx,

 

I got it working.... sorry didnt use your new code, I hadnt seen it yet...

 

efficient UH LOL well it works, but its big LOL

 

Its just like I described except I had to add a routine to it..

 

What I added was to change out the ":" after the numbers like

1:1:

1:2:

I replaced it with this

1:1>

1:2>

I then used the ">" as a delimiter I chose this because the other code was having trouble with the actual text I was using because the actual lines of text also contain ":" in them

and it was messing up... I am sure you know a work around maybe but I couldnt think of one so

I replaced it

Then I split it into 2 parts numbers and placed the lines of text which is bible scripture into an array for later use.

 

then with the numbers I had them in a 2nd array where I first removed the :>: from the end then ran the remaining results which was

1:1

1:2

1:3

2:1

2:2

2:3

taking this through your code then brought this result

1:1 Line of text 1

2 Line of Text 2

3 Line of Text 3

2:1 Line of Text 4

 

2 Line of Text 5

3 Line of Text 6

I then added one more aspect since I was displaying this in an RTB I formated it

result

1:1 Line of Text 1
 2 Line of Text 2
 3 Line of Text 3
2:1 Line of Text 4
 2 Line of Text 5
 3 Line of Text 6

Ofcourse I have tested it on the actual scripture text files it works great

Now if I coud just figure out one more thing I am good to go

I am trying to figure out how to select all the numbers in the text and highlight them so I can change the selectioncolor of the selected text to blue

 

I can change any line I want but havent figured out to only slect all numbers and change them all at once

 

ANy ideas?

 

Thank you so much for your help on pointing oe in the right direction, it really helped alot

 

vbMarkO

Visual Basic 2008 Express Edition!
Posted
Thanx,

 

I got it working.... sorry didnt use your new code, I hadnt seen it yet...

 

efficient UH LOL well it works, but its big LOL

 

Its just like I described except I had to add a routine to it..

 

What I added was to change out the ":" after the numbers like

1:1:

1:2:

I replaced it with this

1:1>

1:2>

I then used the ">" as a delimiter I chose this because the other code was having trouble with the actual text I was using because the actual lines of text also contain ":" in them

and it was messing up... I am sure you know a work around maybe but I couldnt think of one so

I replaced it

Then I split it into 2 parts numbers and placed the lines of text which is bible scripture into an array for later use.

 

then with the numbers I had them in a 2nd array where I first removed the :>: from the end then ran the remaining results which was

1:1

1:2

1:3

2:1

2:2

2:3

taking this through your code then brought this result

1:1 Line of text 1

2 Line of Text 2

3 Line of Text 3

2:1 Line of Text 4

 

2 Line of Text 5

3 Line of Text 6

I then added one more aspect since I was displaying this in an RTB I formated it

result

1:1 Line of Text 1
 2 Line of Text 2
 3 Line of Text 3
2:1 Line of Text 4
 2 Line of Text 5
 3 Line of Text 6

Ofcourse I have tested it on the actual scripture text files it works great

Now if I coud just figure out one more thing I am good to go

I am trying to figure out how to select all the numbers in the text and highlight them so I can change the selectioncolor of the selected text to blue

 

I can change any line I want but havent figured out to only slect all numbers and change them all at once

 

ANy ideas?

 

Thank you so much for your help on pointing oe in the right direction, it really helped alot

 

vbMarkO

 

Check out the IsNumber Method in .NET:

 

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemcharclassisdigittopic.asp

Posted

UPDATE

 

Really coming along with this.... trashed most of the code I was using as I figured out a more efficient way.

 

post code tomorrow when I am finished

 

Thanx for all the input I am really learning a lot from here

 

vbMarkO

Visual Basic 2008 Express Edition!

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...