neodammer Posted August 8, 2005 Posted August 8, 2005 Lets say I have a web pages html code right? with 100's of jpg links saved to a variable. I was wondering how would one program (vb.net) to extract the links. If you think about (most) links to jpg's are ...../picture.jpg right? so then i guess if one could maybe loop a search function into pulling out everything from the "/" to the "g" it wouldnt be so hard then you could just add on that /picture.jpg to the entire path and just use a simple download control. Any suggestions as to how to write this? ive tried searching on this forum and MSDN on how to do this but cant find exactly what im looking for. Quote Enzin Research and Development
jmcilhinney Posted August 8, 2005 Posted August 8, 2005 Look at the Regex class, which encapsulates regular expression functionality. You'll need to do some reading on regular expression syntax to make much of a use of it though. Quote
neodammer Posted August 8, 2005 Author Posted August 8, 2005 Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load '-------------------------------------------------------------------------------- ' this will create a large string and save it to var s htmlstr = "http://mysite.com/pictures/index.html" sr = New IO.StreamReader(wc.OpenRead(htmlstr)) s = sr.ReadToEnd sr.DiscardBufferedData() sr.Close() wc.Dispose() TextBox1.Text = s 's now contains all the html and displays it in a textbox '--------------------------------------------------------------------------------- 'this next step will extract all the links to any .jpg' Dim linkstr As String Dim quotee As String = Chr(34) Dim searchingstr As String = "<a href=" Dim slashstr As String = ".jpg" + quotee + ">" Dim firstinstance As Integer = s.IndexOf(searchingstr, 0) 'get the index of where the link should begin Dim endinstance As Integer = s.IndexOf(slashstr, 0) 'get the index of where the link should end Dim differ As Integer = endinstance - firstinstance 'calculate the difference in index's to determine how many spaces between the start of link and end of link linkstr = s.Substring(firstinstance, differ) MsgBox(linkstr) End Sub Hrm..having some troubles but you can see what im trying to do so far. Quote Enzin Research and Development
IngisKahn Posted August 8, 2005 Posted August 8, 2005 RegEx will do this for you. This should catch the entire image path: (?<=<img\ssrc=").+?(?=") If you just want the jpg file names: (?<=/)\S+?\.jpg These would need to be tweeked, but it's a good starting point. Quote "Who is John Galt?"
jmcilhinney Posted August 8, 2005 Posted August 8, 2005 The Regex class has a Matches method which is what you want to use. Quote
neodammer Posted August 8, 2005 Author Posted August 8, 2005 RegEx will do this for you. This should catch the entire image path: (?<=<img\ssrc=").+?(?=") If you just want the jpg file names: (?<=/)\S+?\.jpg These would need to be tweeked, but it's a good starting point. ok..guess i need to research RegEx and what they are and how to use one lol any examples on a syntax code? Quote Enzin Research and Development
IngisKahn Posted August 8, 2005 Posted August 8, 2005 Check out the Regular Expressions forum. Quote "Who is John Galt?"
Jay1b Posted August 8, 2005 Posted August 8, 2005 Sounds like a lot of work just to download porn faster.... :P Quote
neodammer Posted August 9, 2005 Author Posted August 9, 2005 Is that a crime? mwuahahah :p Not only that :rolleyes: but other stuff to..educational you know..that sort of thing..you know..stuff like that..pr0n? nah not I. Quote Enzin Research and Development
Joe Mamma Posted August 9, 2005 Posted August 9, 2005 Is that a crime? mwuahahah :p Not only that :rolleyes: but other stuff to..educational you know..that sort of thing..you know..stuff like that..pr0n? nah not I. You know, I keep getting all this email telling me to "Free Porn!". . . Who is Porn, and why do they think I am holding him??? Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.