Get numbers out of .txt file

Lanc1988

Contributor
Joined
Nov 27, 2003
Messages
508
I have a program where when you click a button it saves the source code of a webpage in a .txt file.

Now in that .txt file there are about 20 different numbers I need it to get and put in labels. Here is part of what is saved in the .txt file:
Visual Basic:
<tr><td></td><td align="left">
<a href="overall.ws?table=0&user=lanc1988">Overall</a>
</td> <td align="right">154,416</td> <td align="right">1107</td> 
<td align="right">5,465,821</td> </tr><tr><td align="right">
<img class="miniimg" src="http://www.runescape.com/img/hiscores/skill_icon_attack.gif"></td>
<td align="left"><a href="overall.ws?table=1&user=lanc1988">Attack</a></td> 
<td align="right">225,387</td> <td align="right">66</td> 
<td align="right">532,780</td> </tr>

the only things different each time in the above code are "lanc1988", and all the numbers. I need to figure out how to have it get just the numbers out of there and put them each in a label. I have been trying to figure out how to do this for so long and I cant ever get it.
 
how would i tell it to get the value between:

<a href="overall.ws?table=0&user=lanc1988">Overall</a></td> <td align="right">

and

</td> <td align="right">
 
i have tried that in the past, but i dont understand how to use them. i need to see an example of them being used to get something that is between two things.. if you know where i can find one or have time to show me a quick example it would be very helpful
 
Without testing it I would say you could do a match with the following:

>([0-9]|,|\.)+<


That will at least get you a bunch of candidate strings. You could use a more detailed regex to filter out other non-matches.

Check out The Regulator: http://regex.osherove.com/

It is a great tool for testing regular expressions
 
It 'might' be a valid candidate for some sort of xml parsing using either the XmlTextReader or the XML DOM. I suspect this would give you slightly more control than Regular Expressions (assuming you know exactly which nodes you need the numbers from).
 
Cags said:
It 'might' be a valid candidate for some sort of xml parsing using either the XmlTextReader or the XML DOM. I suspect this would give you slightly more control than Regular Expressions (assuming you know exactly which nodes you need the numbers from).
This is assuming that the HTML is in the form of well-formed XML. I think that RegEx might be the way to go. You should be able to identify tags or content between tags. What you need to take into consideration is what may vary between instances of such a document. Can the table change size, or is the layout always constant.

If it is always constant, you can find the beginning of the table, use regex to identify content between tags, and count in the appropriate number of tags to find the data you want. On the other hand, if the table can vary in layout your approach would need to be a little more dynamic.

And then, on the other hand, the table itself at least appears to be well-formed XML, and you may be able to extract that and use some XML classes to make life easier.
 
Joe Mamma said:
which framework are you using?
I am going to assume that you have a way of getting the HTML for the table that contains the rows and that you know what row to start on.

for example, you can parse the text and get the the following in a string variable and pass that to a function:
<table>
. . . row definitions
</table>

give me a moment
 
Joe Mamma said:
I am going to assume that you have a way of getting the HTML for the table that contains the rows and that you know what row to start on.

for example, you can parse the text and get the the following in a string variable and pass that to a function:
<table>
. . . row definitions
</table>

give me a moment
oh yeah. . . I will assume that one column is the title column and every column after that will contain a number. . . if the number cannot be parsed, I will store double.MinValue in it.
Assumes .Net 2.0
 
Assumptions:
  1. column 0 is not needed.
  2. column 1 contains an anchor with the inner text being a unique value in the table
ok. . . given you have a method like this that returns the html table markup you want to parse -
[csharp]string getTableText()
{
//
// Enter code here to get your twext for the html table. . .
// of the form <table><tr><td>...</td></tr></table>

}[/csharp]

define this set of classes. . .
[csharp]namespace HtmlTableParse
{
class ColumnValues: Dictionary<int,double?>
{
internal void Parse(int index, mshtml.HTMLTableCell cell)
{
double d;
if (Double.TryParse(cell.innerText, out d))
this.Add(index, d);
else
this.Add(index, null);
}
}

class RowColumns : Dictionary<string, ColumnValues>
{
public static RowColumns Parse(mshtml.HTMLTable table, int titleColumn)
{
RowColumns result = new RowColumns();
foreach (mshtml.HTMLTableRow row in table.rows)
{
mshtml.HTMLAnchorElement anchor = null;
foreach(mshtml.HTMLTableCell cell in row.cells)
if (cell.cellIndex != 0)
if (cell.cellIndex==1)
{
anchor = cell.firstChild as mshtml.HTMLAnchorElement;
result.Add(anchor.innerText, new ColumnValues());
}
else
result[anchor.innerText].Parse(cell.cellIndex-2, cell);
}
return result;
}
}
}[/csharp]

reference mshtml com library
reference system.windows.forms
  1. instance a WebBrowser control.
  2. Navigate to "about:blank"
  3. build a body and insert your table text
  4. extract the table element object
  5. run through the HtmlTableParse.RowColumns.Parse method
[csharp] // instance a WebBrowser control.
using(System.Windows.Forms.WebBrowser webbrowser1 = new System.Windows.Forms.WebBrowser())
{
// Navigate to "about:blank"
webBrowser1.Navigate("about:blank");
mshtml.IHTMLDocument2 currentDoc = (mshtml.IHTMLDocument2) webBrowser1.Document.DomDocument;
// build a body and insert your table text
mshtml.IHTMLElement bodydisp = currentDoc.createElement("body");
mshtml.HTMLDocument doc = (mshtml.HTMLDocument) currentDoc;
doc.appendChild((mshtml.IHTMLDOMNode) bodydisp);
webBrowser1.Document.Body.InnerHtml = getTableText();
mshtml.HTMLBody body = (mshtml.HTMLBody)bodydisp;
mshtml.IHTMLElementCollection elems = (mshtml.IHTMLElementCollection)body.all;
// extract the table element object
mshtml.HTMLTable table = (mshtml.HTMLTable) elems.item(0,0);
// run through the HtmlTableParse.RowColumns.Parse method
RowColumns output = RowColumns.Parse(table, 1);

// Test
foreach (string key in output.Keys)
{
Console.WriteLine("{0}:", key);
foreach (int col in output[key].Keys)
{
string val = output[key][col] == null ? "<NULL>" : output[key][col].ToString();
Console.WriteLine("\tColumn {0}: {1}", col.ToString(), val.ToString());
}
}
}[/csharp]

returns a dictionary keyed by the values in column 1 of the orginal table.
the objects associated in the dictionary for the keys is a column value dictionary.
the column value dictionary is keyed by column number with a nullable double value.
 
Back
Top