Lanc1988 Posted February 17, 2006 Posted February 17, 2006 I have a program where when you click a button it saves the source code of a webpage in a .txt file. Now in that .txt file there are about 20 different numbers I need it to get and put in labels. Here is part of what is saved in the .txt file: <tr><td></td><td align="left"> <a href="overall.ws?table=0&user=lanc1988">Overall</a> </td> <td align="right">154,416</td> <td align="right">1107</td> <td align="right">5,465,821</td> </tr><tr><td align="right"> <img class="miniimg" src="http://www.runescape.com/img/hiscores/skill_icon_attack.gif"></td> <td align="left"><a href="overall.ws?table=1&user=lanc1988">Attack</a></td> <td align="right">225,387</td> <td align="right">66</td> <td align="right">532,780</td> </tr> the only things different each time in the above code are "lanc1988", and all the numbers. I need to figure out how to have it get just the numbers out of there and put them each in a label. I have been trying to figure out how to do this for so long and I cant ever get it. Quote
Lanc1988 Posted February 18, 2006 Author Posted February 18, 2006 how would i tell it to get the value between: <a href="overall.ws?table=0&user=lanc1988">Overall</a></td> <td align="right"> and </td> <td align="right"> Quote
Lanc1988 Posted February 18, 2006 Author Posted February 18, 2006 i have tried that in the past, but i dont understand how to use them. i need to see an example of them being used to get something that is between two things.. if you know where i can find one or have time to show me a quick example it would be very helpful Quote
Mister E Posted February 18, 2006 Posted February 18, 2006 Without testing it I would say you could do a match with the following: >([0-9]|,|\.)+< That will at least get you a bunch of candidate strings. You could use a more detailed regex to filter out other non-matches. Check out The Regulator: http://regex.osherove.com/ It is a great tool for testing regular expressions Quote
Cags Posted February 18, 2006 Posted February 18, 2006 It 'might' be a valid candidate for some sort of xml parsing using either the XmlTextReader or the XML DOM. I suspect this would give you slightly more control than Regular Expressions (assuming you know exactly which nodes you need the numbers from). Quote Anybody looking for a graduate programmer (Midlands, England)?
Leaders snarfblam Posted February 18, 2006 Leaders Posted February 18, 2006 It 'might' be a valid candidate for some sort of xml parsing using either the XmlTextReader or the XML DOM. I suspect this would give you slightly more control than Regular Expressions (assuming you know exactly which nodes you need the numbers from). This is assuming that the HTML is in the form of well-formed XML. I think that RegEx might be the way to go. You should be able to identify tags or content between tags. What you need to take into consideration is what may vary between instances of such a document. Can the table change size, or is the layout always constant. If it is always constant, you can find the beginning of the table, use regex to identify content between tags, and count in the appropriate number of tags to find the data you want. On the other hand, if the table can vary in layout your approach would need to be a little more dynamic. And then, on the other hand, the table itself at least appears to be well-formed XML, and you may be able to extract that and use some XML classes to make life easier. Quote [sIGPIC]e[/sIGPIC]
Joe Mamma Posted February 18, 2006 Posted February 18, 2006 which framework are you using? Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Joe Mamma Posted February 18, 2006 Posted February 18, 2006 which framework are you using? I am going to assume that you have a way of getting the HTML for the table that contains the rows and that you know what row to start on. for example, you can parse the text and get the the following in a string variable and pass that to a function: <table> . . . row definitions </table> give me a moment Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Joe Mamma Posted February 18, 2006 Posted February 18, 2006 I am going to assume that you have a way of getting the HTML for the table that contains the rows and that you know what row to start on. for example, you can parse the text and get the the following in a string variable and pass that to a function: <table> . . . row definitions </table> give me a moment oh yeah. . . I will assume that one column is the title column and every column after that will contain a number. . . if the number cannot be parsed, I will store double.MinValue in it. Assumes .Net 2.0 Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Joe Mamma Posted February 19, 2006 Posted February 19, 2006 Assumptions: column 0 is not needed. column 1 contains an anchor with the inner text being a unique value in the table ok. . . given you have a method like this that returns the html table markup you want to parse - [csharp]string getTableText() { // // Enter code here to get your twext for the html table. . . // of the form <table><tr><td>...</td></tr></table> }[/csharp] define this set of classes. . . [csharp]namespace HtmlTableParse { class ColumnValues: Dictionary<int,double?> { internal void Parse(int index, mshtml.HTMLTableCell cell) { double d; if (Double.TryParse(cell.innerText, out d)) this.Add(index, d); else this.Add(index, null); } } class RowColumns : Dictionary<string, ColumnValues> { public static RowColumns Parse(mshtml.HTMLTable table, int titleColumn) { RowColumns result = new RowColumns(); foreach (mshtml.HTMLTableRow row in table.rows) { mshtml.HTMLAnchorElement anchor = null; foreach(mshtml.HTMLTableCell cell in row.cells) if (cell.cellIndex != 0) if (cell.cellIndex==1) { anchor = cell.firstChild as mshtml.HTMLAnchorElement; result.Add(anchor.innerText, new ColumnValues()); } else result[anchor.innerText].Parse(cell.cellIndex-2, cell); } return result; } } }[/csharp] reference mshtml com library reference system.windows.forms instance a WebBrowser control. Navigate to "about:blank" build a body and insert your table text extract the table element object run through the HtmlTableParse.RowColumns.Parse method [csharp] // instance a WebBrowser control. using(System.Windows.Forms.WebBrowser webbrowser1 = new System.Windows.Forms.WebBrowser()) { // Navigate to "about:blank" webBrowser1.Navigate("about:blank"); mshtml.IHTMLDocument2 currentDoc = (mshtml.IHTMLDocument2) webBrowser1.Document.DomDocument; // build a body and insert your table text mshtml.IHTMLElement bodydisp = currentDoc.createElement("body"); mshtml.HTMLDocument doc = (mshtml.HTMLDocument) currentDoc; doc.appendChild((mshtml.IHTMLDOMNode) bodydisp); webBrowser1.Document.Body.InnerHtml = getTableText(); mshtml.HTMLBody body = (mshtml.HTMLBody)bodydisp; mshtml.IHTMLElementCollection elems = (mshtml.IHTMLElementCollection)body.all; // extract the table element object mshtml.HTMLTable table = (mshtml.HTMLTable) elems.item(0,0); // run through the HtmlTableParse.RowColumns.Parse method RowColumns output = RowColumns.Parse(table, 1); // Test foreach (string key in output.Keys) { Console.WriteLine("{0}:", key); foreach (int col in output[key].Keys) { string val = output[key][col] == null ? "<NULL>" : output[key][col].ToString(); Console.WriteLine("\tColumn {0}: {1}", col.ToString(), val.ToString()); } } }[/csharp] returns a dictionary keyed by the values in column 1 of the orginal table. the objects associated in the dictionary for the keys is a column value dictionary. the column value dictionary is keyed by column number with a nullable double value. Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.