Jump to content
Xtreme .Net Talk

Recommended Posts

Posted
Im trying to browse the web which is easy. Now im wondering if its possible and if how to take all the .jpg's inside that webpage and save them. I was thinking of saving the webpage, then because it really is just text making it a huuuge string file then just taking the http...jpg links out of it , just extracting them. Maybe somebody knows an easier way? lol
Posted

ok i read up more on it, now im just needing a loop for this.

 

 

 

Im needing to do these steps:

 

 

1. open web page...checked

2. Use webclient to open and save file...checked

3. Create a loop that runs through the html and tells webclient to download all the .jpg's ..... Not solved

4. Kill html doc...checked

 

so step 3 is what im needing some help on

Posted
ok i read up more on it, now im just needing a loop for this.

 

 

 

Im needing to do these steps:

 

 

1. open web page...checked

2. Use webclient to open and save file...checked

3. Create a loop that runs through the html and tells webclient to download all the .jpg's ..... Not solved

4. Kill html doc...checked

 

so step 3 is what im needing some help on

Arn't they cached somewere on the machine if you're displaying the pictures? Either that or in memory - the memory that your program is using while displaying the pictures.

 

Either way it seems that you should be able to get at the pictures without downloading them again.

Posted
Well its not quite that simple you see because on most picture pages' date=' your only seeing thumbnails. So you'd have to still click through every one.[/quote']

And each thumbnail is displayed with a link to the full sized picture in the HTML. You could just have it run through each link and see if there is a picture attached (.jpg, jpeg, .swf, .gif, etc) and if so, download it.

Posted
Yeah thats my problem. What kind of loop would do that correctly? :D I wouldnt mind now just telling the program ok.. htp://mysite.com/01.jpg is where you start..now just increase that htp://mysite.com/02.jpg etc.. and save them to a specific folder..that kind of loop is probably ideal
Posted
Yeah thats my problem. What kind of loop would do that correctly? :D I wouldnt mind now just telling the program ok.. htp://mysite.com/01.jpg is where you start..now just increase that htp://mysite.com/02.jpg etc.. and save them to a specific folder..that kind of loop is probably ideal

can't do all your work for you, but here are some key notes:

 

The entire HTML file is a string.

 

A link will always be formatted: <a href="/go/homepage/int/sport/h3/-/news/sport1/hi/olympics_2004/3557922.stm">Athens poised for Games</a>

 

so you just need to do a search in the string for "<a href=" and capture the text between = and > starting at the index of intIndexToSearch (which starts as 0). Thats the link. check if it's a picture before you download of course :)

 

Then find the index of the next </a> which is the very end of your link. that becomes the new intIndexToSearch

 

repeat the process searching for the next "<a href=" starting at the index you just got.

 

Once you start a search at an index and you have nothing returned (a -1 index) the the loop is over, you've processed the entire HTML.

Posted

Private htmlstr As String

Private rtrhtml As String

Public sr As IO.StreamReader

Public wc As New Net.WebClient

Public s As String

 

 

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

 

' this will create a large string and save it to var s

sr = New IO.StreamReader(wc.OpenRead("http://www.google.com/index.html"))

s = sr.ReadToEnd

sr.DiscardBufferedData()

sr.Close()

wc.Dispose()

textbox1.text = s 's now contains all the html and displays it in a textbox

 

 

'this next step will extract all the links to any .jpg

 

 

 

End Sub

 

 

This code so far takes the page and saves it to string var s then displays the html code in the textbox. Of course google isnt coded by an amateur therefore you dont have simple links like usual. but lets say it did have something like htp://www.google.com/01.jpg I would use that indexsearch method?

Posted

This code so far takes the page and saves it to string var s then displays the html code in the textbox. Of course google isnt coded by an amateur therefore you dont have simple links like usual. but lets say it did have something like htp://www.google.com/01.jpg I would use that indexsearch method?

I don't know anything about hacking into peoples servers and trying to find files, that's not my gig.

 

I only see how to do it when they list the link (either in a text or picture link) which tells you where that picture is going to be.

 

If someone just started trying to access random filetypes off my server in a systematic way, hammering their way in, I'd construe it as an attack on my server.

Posted
no no no nothing like this. This is only for sites with links to pics in them that are viewable to everybody. Im just having trouble finding one as a demo lol I am making this for sites with like alot of cartoon pics etc.. that i know alot of folks like to make a collection of.

If you want to get what is legitimatly offered you just need to follow the links in the HTML to grab the pictures.

 

You might even want to create a recursive "spidering" script to go "n" links deep looking for pictures.

 

But this is a way to get your IP banned as it pisses off a lot of people who PAY for their bandwidth to have someone come in and download everything and use up their bandwidth.

Posted
true true. Spidering is something ill probably have to do in order to get alot of the bugs out of just searching and extracting. surprised IE hasnt already introduced something that will show all the pics of thumbnailed images.

I doubt that IE would do such a thing.

 

people already have problems with programs that "Make site available offline".

 

Say the average user just goes and checks out your site and uses like 100-500k bandwidth. Your site is like 50mb. Thats just fine.

 

Now some ******* with a "save site" program comes along and downloads the whole site. Well thats 50mb he just downloaded. Will he use all of it? unlikely. Not a horrible problem.

 

Now the average joe idiot surfer just thinks "what if theres an update!? I need that update. So they set it to go off every week or even every day. Thats 50mb - 350mb a week for one person. who'll probobly only look at a few pages here and there and probobly could have just went online - but he wants to download them in advance "just in case" so he doesn't have to wait for them to download.

 

This has caused a few of my favorite sites to go belly up. They'd have their monthly bandwidth used up within the first week.

 

Its caused many to stop carrying movies/pictures as well.

 

If IE made this technology more commonly available and billed it as "increasing your internet speed" as others have, then there would be pretty big problems for smallish site owners with content and probobly a backlash on IE/Microsoft.

 

I can think of four reasons to do this, three are "legitimate".

 

1. Honestly some servers have crappy tools and this might be the best way to backup your own site.

 

2. You truely only have access to a modem once and a while and this is how you view the internet when you don't have it.

 

3. You were going to click all of those links for the porn - err pictures of cars ;) , but it's quicker to do it automatically.

 

4. You dont' care who you hurt so long as you help yourself. You just download whole sites "just to have them" or "just in case" to speed up your browsing experience.

 

An old roommate was #4. we had dialup and he'd download whole sites overnight so he could check them out in less time in the morning before work.

Posted

Indeed a mirror program is nice but not my goal in this case. I do understand what you are saying about bandwidth issues. I figure though if i was going to just click on every .jpg anyway and rightclick-save.. why not just do it in less time? same bandwidth. I used to on dial-up download the local newspaper all night and read it in the morning. Now im on broadband so its not vital anymore :D

 

 

PS: omg pron? I never...ever..well..i guess i should never say never.. :p

Posted

PS: omg pron? I never...ever..well..i guess i should never say never.. :p

Three things I've learned in life that are absolutes:

 

1. Death

2. Taxes

3. No matter what you're looking at, it's porn to someone :D

Posted
I've maked a project that do all of this. And it's really nice. Want the source or you want to work on it ?

"If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown

"Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me

"A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend.

C# TO VB TRANSLATOR

Posted
Well... it took me 2 day to make it work. However... I don't know why... on some rare site... it doesn't work. But well... if someone might use it or improve it... I will release all the source

"If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown

"Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me

"A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend.

C# TO VB TRANSLATOR

Posted
Here is a copy of a similar project that I made.

XXX Image Aspirator.zip

"If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown

"Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me

"A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend.

C# TO VB TRANSLATOR

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...