Jump to content
Xtreme .Net Talk

Recommended Posts

Posted
Lets say I have a web pages html code right? with 100's of jpg links saved to a variable. I was wondering how would one program (vb.net) to extract the links. If you think about (most) links to jpg's are ...../picture.jpg right? so then i guess if one could maybe loop a search function into pulling out everything from the "/" to the "g" it wouldnt be so hard then you could just add on that /picture.jpg to the entire path and just use a simple download control. Any suggestions as to how to write this? ive tried searching on this forum and MSDN on how to do this but cant find exactly what im looking for.
Posted

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

'--------------------------------------------------------------------------------

' this will create a large string and save it to var s

htmlstr = "http://mysite.com/pictures/index.html"

 

sr = New IO.StreamReader(wc.OpenRead(htmlstr))

s = sr.ReadToEnd

sr.DiscardBufferedData()

sr.Close()

wc.Dispose()

TextBox1.Text = s

 

 

 

's now contains all the html and displays it in a textbox

'---------------------------------------------------------------------------------

'this next step will extract all the links to any .jpg'

Dim linkstr As String

Dim quotee As String = Chr(34)

 

Dim searchingstr As String = "<a href="

Dim slashstr As String = ".jpg" + quotee + ">"

 

Dim firstinstance As Integer = s.IndexOf(searchingstr, 0) 'get the index of where the link should begin

Dim endinstance As Integer = s.IndexOf(slashstr, 0) 'get the index of where the link should end

Dim differ As Integer = endinstance - firstinstance 'calculate the difference in index's to determine how many spaces between the start of link and end of link

 

linkstr = s.Substring(firstinstance, differ)

 

MsgBox(linkstr)

 

 

End Sub

 

 

 

Hrm..having some troubles but you can see what im trying to do so far.

Posted

RegEx will do this for you. This should catch the entire image path: (?<=<img\ssrc=").+?(?=")

 

If you just want the jpg file names: (?<=/)\S+?\.jpg

 

These would need to be tweeked, but it's a good starting point.

"Who is John Galt?"
Posted
RegEx will do this for you. This should catch the entire image path: (?<=<img\ssrc=").+?(?=")

 

If you just want the jpg file names: (?<=/)\S+?\.jpg

 

These would need to be tweeked, but it's a good starting point.

 

 

ok..guess i need to research RegEx and what they are and how to use one lol any examples on a syntax code?

Posted
Is that a crime? mwuahahah :p Not only that :rolleyes: but other stuff to..educational you know..that sort of thing..you know..stuff like that..pr0n? nah not I.

You know, I keep getting all this email telling me to "Free Porn!". . .

 

Who is Porn, and why do they think I am holding him???

Joe Mamma

Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...