|
#1
|
|
|
|
|
Hello,
I want to create a php scraper that will get some information from e.g. 5 sites simultaneously. I tried the following script: http://www.phpied.com/simultaneuos-h...php-with-curl/ Everything works fine, but what I want is simultaneuos (something to multithread, when these 5 websites will be loaded not one after another, but by using different sockets) scraper. In addition I would like to display the results as soon as it will be scraped. So when first http-post get answer, it will show the result and wait for the rest of the pages (not display everything when all scraping is done). Any ideas how can I achieve it? Thanks! regards, Mark |
|
|
|
#2
|
|
|
|
|
mark wrote:
> Hello, > > I want to create a php scraper that will get some information from > e.g. 5 sites simultaneously. I tried the following script: > [..] > Everything works fine, but what I want is simultaneuos (something to > multithread, when these 5 websites will be loaded not one after > another, but by using different sockets) scraper. > > In addition I would like to display the results as soon as it will be > scraped. So when first http-post get answer, it will show the result > and wait for the rest of the pages (not display everything when all > scraping is done). > Any ideas how can I achieve it? Thanks! > > regards, Mark > Sorry, PHP doesn't do multithreading very well. Probably the best you can do is start multiple background processes to do the work then communicate via a database, shared memory, etc. As for displaying the contents immediately - again, not guaranteed possible. You can flush() the buffers in PHP - but that doesn't guarantee the data will be sent by the webserver to the client immediately, nor does it guarantee the client will display the data before it's received. Sounds like java might be a better fit. |
|
#3
|
|
|
|
|
mark wrote:
> In addition I would like to display the results as soon as it will be > scraped. So when first http-post get answer, it will show the result > and wait for the rest of the pages (not display everything when all > scraping is done). > Any ideas how can I achieve it? Thanks! Either: - Run in console and use fork(). - Use raw HTTP and some socket_select() magic. - curl_multi_exec(). - Rely on javascript, ajax techniques, and make a web browser launch 5 queries yo your web server, each of one scraping a site. - Use ignore_user_abort() and a mix of raw HTTP with sockets to blindly launch PHP threads. This one's quite tricky to pull out. There may be more ways to do this, but unless you know what a critical section is, please stay away from concurrent (AKA multithread) programming. Besides, you want IPC to get the results as they appear - to make your life easier, you should stick with either curl_multi queries or rely on javascript to individually fetch results as they are ready. |
|
#4
|
|
|
|
|
Hello,
on 10/04/2008 05:09 PM mark said the following: > Hello, > > I want to create a php scraper that will get some information from > e.g. 5 sites simultaneously. I tried the following script: > [..] > Everything works fine, but what I want is simultaneuos (something to > multithread, when these 5 websites will be loaded not one after > another, but by using different sockets) scraper. > > In addition I would like to display the results as soon as it will be > scraped. So when first http-post get answer, it will show the result > and wait for the rest of the pages (not display everything when all > scraping is done). > Any ideas how can I achieve it? Thanks! This class can do exactly what you describe: http://www.phpclasses.org/thread This other class also uses separate HTTP requests to run multiple parallel tasks but these are started from the browser side using AJAX requests: http://www.phpclasses.org/phpthreader |
|
#5
|
|
|
|
|
Manuel Lemos wrote:
> Hello, > > on 10/04/2008 05:09 PM mark said the following: > > This class can do exactly what you describe: > > [..] > > This other class also uses separate HTTP requests to run multiple > parallel tasks but these are started from the browser side using AJAX > requests: > > [..] > Why don't you tell him that's your own site you're spamming again, Manuel? And those are your own classes (which, BTW, aren't worth a damn) you're spamming? |
|
#6
|
|
|
|
|
On 4 Oct, 21:09, mark <mkazmier> wrote:
> Hello, > > I want to create a php scraper that will get some information from > e.g. 5 sites simultaneously. I tried the following script:[..] > Everything works fine, but what I want is simultaneuos (something to > multithread, when these 5 websites will be loaded not one after > another, but by using different sockets) scraper. > That's exactly what curl_multi_* does. > In addition I would like to display the results as soon as it will be > scraped. So when first http-post get answer, it will show the result > and wait for the rest of the pages (not display everything when all > scraping is done). > Any ideas how can I achieve it? Thanks! > This is not a trivial bit of coding. It's not impossible but since you seem to be relying on cut-and-paste coding, do you think you're overstretching your abilities? C. |
|
#7
|
|
|
|
|
On Oct 5, 7:44 am, Jerry Stuckle <jstuck> wrote:
> Manuel Lemos wrote: <snip> >> >> Why don't you tell him that's your own site you're spamming again, Manuel? > > And those are your own classes (which, BTW, aren't worth a damn) you're > spamming? What's your solution? Do you have better approach? |
|
#8
|
|
|
|
|
R. Rajesh Jeba Anbiah wrote:
> On Oct 5, 7:44 am, Jerry Stuckle <jstuck> wrote: > <snip> > > What's your solution? Do you have better approach? > > -- > <?php echo 'Just another PHP saint'; ?> > Email: rrjanbiah-at-Y!com Blog: [..] > Yes, curl_multi_exec(), as Iván indicated. Manuel is just a spammer - virtually every answer he posts refers to something on his site. And he doesn't even indicate it's his own site when he spams it. Now I wouldn't mind if he were giving good technical advice. But I've looked at some of his scripts. I've seen relatively new PHP programmers do better. |
|
#9
|
|
|
|
|
..oO(Jerry Stuckle)
>R. Rajesh Jeba Anbiah wrote: >> On Oct 5, 7:44 am, Jerry Stuckle <jstuck> wrote: >>> >>> And those are your own classes (which, BTW, aren't worth a damn) you're >>> spamming? >> >> What's your solution? Do you have better approach? >> > >Yes, curl_multi_exec(), as Iván indicated. > >Manuel is just a spammer Wrong. >virtually every answer he posts refers to >something on his site. Nothing wrong with that. I would also point to my own classes to solve a given problem if they would be freely available. >And he doesn't even indicate it's his own site >when he spams it. Not necessary. It would be spam if it would be totally OT, but he posts ready-to-use solutions to PHP problems. It doesn't matter if these solutions are his own or not. Even if they would be commercial, it wouldn't be spam in the given context. >Now I wouldn't mind if he were giving good technical advice. But I've >looked at some of his scripts. Some. But surely not all. They might not fit your coding standards, but this doesn't give you the right to discredit them on every chance you get. If you have a problem with them, come to the point and post exactly what you don't like. And _prove_ it by posting code samples. >I've seen relatively new PHP programmers >do better. If you don't like his solutions, post better ones or simply ignore him. It's always good to have a choice between various ways to solve a problem. He's contributing to the community by posting alternatives. You OTOH are just trolling by attacking him personally on each and every post. This sucks. Enough is enough! >:-( Micha |
|
#10
|
|
|
|
|
Jerry Stuckle wrote:
> Manuel Lemos wrote: Jerry Stuckle has a personality problem. He seems to live on comp.lang.php like rat addicted to the cocaine lever in a laboratory cage. He seems to do nothing else. Does his employer know how much time he spends insulting people, complaining, posturing? He seems to be a competent hacker. But also a lonely, friendless, nasty dispositioned jerk. Manuel Lemos is a mature, cosiderate and helpful guy by comparison. |
|
#11
|
|
|
|
|
salmobytes wrote:
> Jerry Stuckle wrote: >> Manuel Lemos wrote: > > Jerry Stuckle has a personality problem. > He seems to live on comp.lang.php like rat addicted to the cocaine > lever in a laboratory cage. He seems to do nothing else. Does his > employer know how much time he spends insulting people, complaining, > posturing? He seems to be a competent hacker. But also a lonely, > friendless, nasty dispositioned jerk. > > Manuel Lemos is a mature, cosiderate and helpful guy by comparison. > ROFLMAO! FYI, I am my own employer - an independent consultant. And I suspect I make a lot more than most of the people in this newsgroup. No, I don't "live" here. But I check in a few times during the day, usually when I need to take a break from coding. As for Manuel - "mature" people don't need to spam their websites at every opportunity. When was the last time you saw him give advice which wasn't on his website? Not very often. OTOH, I never refer to my website for solutions. Many here don't even know what it is (which is fine with me). |
|
#12
|
|
|
|
|
Michael Fesser wrote:
[..] > It's always good to have a choice between various ways to solve a > problem. He's contributing to the community by posting alternatives. > > You OTOH are just trolling by attacking him personally on each and every > post. This sucks. > > Enough is enough! >:-( > > Micha > Sorry, Micha, as much as I respect you, I have to disagree. How many posts has Manuel made which had solutions - other than saying "see this website" - and not telling people it is his? I don't spam my website - because its contents is not germane to this newsgroup. I do sometimes refer people to other websites. But at NO time have I ever referred anyone to a site where I have a pecuniary interest. And if I did, I'd at least tell them it was my site. And no, I haven't looked at every one of his scripts. But I know bad coding when I see it. And there is no reason to inflict such garbage on new PHP programmers who are trying to learn how to do things the write way. It's at least worth warning them that the coding is lousy. |
|
#13
|
|
|
|
|
If you Google "Jerry Stuckle" you get quite an impressive list of link
titles. Here are just a few samples: Hard Evidence that Jerry Stuckle is Lying PUNCHING JERRY STUCKLE IN THE FACE HTML - SCAM Alert - Jerry Stuckle WIPING MY SHITTY ASS WITH JERRY STUCKLE'S FACE Jerry Stuckle - Fat, Old, Talentless, Unproducing and Stupid ....this list goes on for page after page. It's almost endless. What is it about you Jerry? |
|
#14
|
|
|
|
|
..oO(salmobytes)
>If you Google "Jerry Stuckle" you get quite an impressive list of link >titles. Here are just a few samples: >[...] > >...this list goes on for page after page. It's almost endless. >What is it about you Jerry? What does this have to do with PHP? Micha |
|
#15
|
|
|
|
|
salmobytes wrote:
> > If you Google "Jerry Stuckle" you get quite an impressive list of link > titles. Here are just a few samples: > > Hard Evidence that Jerry Stuckle is Lying > PUNCHING JERRY STUCKLE IN THE FACE > HTML - SCAM Alert - Jerry Stuckle > WIPING MY SHITTY ASS WITH JERRY STUCKLE'S FACE > Jerry Stuckle - Fat, Old, Talentless, Unproducing and Stupid > > ...this list goes on for page after page. It's almost endless. > What is it about you Jerry? Yeah and we all now what a creditable place the Web is......(being sarcastic BTW) I am sure most posts like those are retaliatory for being called out, would you not agree? Sure He does not get it right all the time, he called me out wrongly once, but that's life. I don't take it personally. Most of the time that I can tell he pretty much is on point. Open groups like these need that skeptical eye to help keep the trash out, I would not expect you to disagree with that either. Anyway, my 3 cents worth. Scotty |
|
|
|
|
| Similar Threads | |
| Problems with XML HTTP POST requests etc Our MS Access application currently sends XML documents and data requests to a Govt department as email attachments. There is now a requirement to use HTTP Post to the... |
|
| Http POST requests changed to GET by proxy? Hi, we have a software solution which works fine since several years for several different companies: It's an applet which communicates with a servlet using HTTP POST... |
|
| sending http post requests i'm trying to send an http post request and see if the server got it correctly. i'm sending the http post request with this script: <? $address = 'domain.tld'; $port =... |
|
| Re3: HTTP 405 when using POST requests POST requests are finally working. I've disabled write access for all the web site's properties. I've also checked MIME global settings. I found out I have .php and .phtml... |
|
| HTTP 405 when using POST requests Hello, My server is: Windows 2000 Advanced, SP4 IIS 5 PHP 4.3.4 installed as an CGI extension I'm getting HTTP 405 "Resource not allowed" error when using a form with a... |
|
|
All times are GMT. The time now is 04:32 PM. | Privacy Policy
|