keyongtech


  keyongtech > javascript > 09/2007

 #1  
09-26-07, 04:37 PM
VUNETdotUS
My research I did a while ago showed there was no possibility to get
web page content from a third-party website with AJAX only, without
using a server side technology. Now I have to re-investigate this case
and look for a workaround, perhaps, to allow client side to get the
content of the external page, living on another server. In case you
are wondering, there is no content stealing: both parties agree to
exchange data.
Please, advise if you know of any examples, links or suggestion as to
how a client can request external page content.
Thanks.
 #2  
09-26-07, 05:03 PM
David Dorward
On Sep 26, 4:37 pm, VUNETdotUS <vunet> wrote:
> My research I did a while ago showed there was no possibility to get
> web page content from a third-party website with AJAX only, without
> using a server side technology. Now I have to re-investigate this case
> and look for a workaround, perhaps, to allow client side to get the
> content of the external page, living on another server. In case you
> are wondering, there is no content stealing: both parties agree to
> exchange data.


If the third party can be trusted you can have them provide the data
as a piece of JavaScript, and pass data to them in the query string:

<script type="text/javascript">
function myFunction(third_party_code) {
// etc
}
</script>
<script src="http://example.com/foo/?hello=world">
// which would contain something like:
myFunction({ foo: 'bar', baz: [1,2,3,4] });
</script>

The script element that sourced the third party data could be
dynamically generated.
 #3  
09-26-07, 11:12 PM
Peter Michaux
On Sep 26, 8:37 am, VUNETdotUS <vunet> wrote:
> My research I did a while ago showed there was no possibility to get
> web page content from a third-party website with AJAX only, without
> using a server side technology. Now I have to re-investigate this case
> and look for a workaround, perhaps, to allow client side to get the
> content of the external page, living on another server. In case you
> are wondering, there is no content stealing: both parties agree to
> exchange data.
> Please, advise if you know of any examples, links or suggestion as to
> how a client can request external page content.
> Thanks.


If your site is foo.com and the other is bar.net then you can play a
trick...

Set up the domain name servers so that bar.foo.com points to bar.net

Then in your JavaScript write

document.domain = 'foo.com';

Now you can make Ajax requests to both foo.com and bar.foo.com. It's
just like you can make requests to foo.com and bar.net.

This works around the XMLHttpRequest "same origin policy".

I believe that I read this trick on Ajaxian some time this year.

Peter
 #4  
09-26-07, 11:13 PM
David Mark
On Sep 26, 11:37 am, VUNETdotUS <vunet> wrote:
> My research I did a while ago showed there was no possibility to get
> web page content from a third-party website with AJAX only, without
> using a server side technology. Now I have to re-investigate

this case

That's not true.

> and look for a workaround, perhaps, to allow client side to get the
> content of the external page, living on another server. In case you
> are wondering, there is no content stealing: both parties agree to
> exchange data.
> Please, advise if you know of any examples, links or suggestion as to
> how a client can request external page content.


The same way it requests any other content. It only fails if a user's
browser settings restrict cross-domain requests. Since this may rule
it out for your particular application, you can use dynamically
created script elements as described in a previous post.
 #5  
09-26-07, 11:22 PM
Thomas 'PointedEars' Lahn
Peter Michaux wrote:
> If your site is foo.com and the other is bar.net then you can play a
> trick...
>
> Set up the domain name servers so that bar.foo.com points to bar.net
>
> Then in your JavaScript write
>
> document.domain = 'foo.com';
>
> Now you can make Ajax requests to both foo.com and bar.foo.com. It's
> just like you can make requests to foo.com and bar.net.
>
> This works around the XMLHttpRequest "same origin policy".


It doesn't. This works for DOM Level 0 objects only.


PointedEars
 #6  
09-26-07, 11:25 PM
Peter Michaux
On Sep 26, 3:22 pm, Thomas 'PointedEars' Lahn <PointedE>
wrote:
> Peter Michaux wrote:
>>
>>

> It doesn't. This works for DOM Level 0 objects only.


What do you mean?

Peter
 #7  
09-27-07, 12:46 AM
Thomas 'PointedEars' Lahn
Peter Michaux wrote:
> [...] Thomas 'PointedEars' Lahn [...] wrote:
>> Peter Michaux wrote:
>>> If your site is foo.com and the other is bar.net then you can play a
>>> trick...
>>> Set up the domain name servers so that bar.foo.com points to bar.net
>>> Then in your JavaScript write
>>> document.domain = 'foo.com';
>>> Now you can make Ajax requests to both foo.com and bar.foo.com. It's
>>> just like you can make requests to foo.com and bar.net.
>>> This works around the XMLHttpRequest "same origin policy".

>> It doesn't. This works for DOM Level 0 objects only.

>
> What do you mean?


The Same Origin Policy was introduced with DOM Level 0 objects where
properties could be tainted; some properties were tainted and others were
not. The tainting was dropped later but the policy and affected properties
remained. Setting `document.domain' therefore was and is a way to work
around the SOP for those objects if there is the same second-level domain
(as you described).

[url down]

However, that does not work for XHR (as that is not part of DOM Level 0),
and that, at least partly, is good so.

http://web.archive.org/web/200504041...our#se curity
[url down]

This can be tested easily. Execute the following in the context of
<http://www.google.com/>:

try
{
document.domain = "google.com";

var x = new XMLHttpRequest();
x.open("GET", "http://groups.google.com/", false);
x.send(null);
window.alert(x.responseText);
}
catch (e)
{
// "Permission denied to call method XMLHttpRequest.open"
// even though document.domain was set
window.alert(e);
}

Tested with Firebug 1.05 on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7

It might be that some UAs work differently (although IE 6+7 and Opera 9.23
behaved much the same in my tests), however that would be a security issue
that would be fixed soon.


PointedEars
 #8  
09-27-07, 01:52 AM
Peter Michaux
On Sep 26, 4:46 pm, Thomas 'PointedEars' Lahn <PointedE>
wrote:
[..]
> // even though document.domain was set
> window.alert(e);
> }
>
> Tested with Firebug 1.05 on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
> rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7
>
> It might be that some UAs work differently (although IE 6+7 and Opera 9.23
> behaved much the same in my tests), however that would be a security issue
> that would be fixed soon.


Well, it looks like my memory has betrayed me on this one. I only
played with the document.domain property once over a year ago. I
looked around on Ajaxian and couldn't find the article I remember
reading about playing a trick with the domain name servers. There is
some trick of some kind out there somewhere.

Thanks,
Peter
 #9  
09-27-07, 02:02 AM
Peter Michaux
On Sep 26, 5:52 pm, Peter Michaux <petermich> wrote:
[..]
>>
>> Well, it looks like my memory has betrayed me on this one. I only

> played with the document.domain property once over a year ago. I
> looked around on Ajaxian and couldn't find the article I remember
> reading about playing a trick with the domain name servers. There is
> some trick of some kind out there somewhere.


Perhaps something like this...

<URL: http://www.xml.com/pub/a/2005/11/09/fixing-ajax-xmlhttprequest-considered-harmful.html?page=2>

In relation to my previous example, this shows using apache to proxy

[url down] to [url down]

Peter
 #10  
09-27-07, 03:27 PM
VUNETdotUS
Is IFRAME trick possible if I load IFRAME with src='anotherdomain.com'
and access with document.getElementById("myiframe").contentWindow?
It does get a reference to IFRAME object but I cannot find a way to
get its innerText or innerHTML property.
Thanks.
 #11  
09-27-07, 03:57 PM
Erwin Moller
VUNETdotUS wrote:
> My research I did a while ago showed there was no possibility to get
> web page content from a third-party website with AJAX only, without
> using a server side technology. Now I have to re-investigate this case
> and look for a workaround, perhaps, to allow client side to get the
> content of the external page, living on another server. In case you
> are wondering, there is no content stealing: both parties agree to
> exchange data.
> Please, advise if you know of any examples, links or suggestion as to
> how a client can request external page content.
> Thanks.
>


Hi,

As you have seen in the other responses, it will be tricky: changing
policy in browser, or using an iframe.
Which brings me to: WHY do you avoid serverside scripting?
Getting a random page via AJAX is simple WITH serverside scripting.

In PHP it would be simple as:

<?php
$requestedPage = $_POST["requestedpage"];
echo file_get_contents($requestedPage);
?>

What is wrong with a little serverside help?

Regards,
Erwin Moller
 #12  
09-27-07, 04:40 PM
VUNETdotUS
On Sep 27, 10:57 am, Erwin Moller
<Since_humans_read_this_I_am_spammed_too_m> wrote:
[..]
>
> <?php
> $requestedPage = $_POST["requestedpage"];
> echo file_get_contents($requestedPage);
> ?>
>
> What is wrong with a little serverside help?
>
> Regards,
> Erwin Moller


It is a very high traffic problem.
 #13  
09-27-07, 04:56 PM
Peter Michaux
On Sep 27, 7:27 am, VUNETdotUS <vunet> wrote:
> Is IFRAME trick possible if I load IFRAME with src='anotherdomain.com'
> and access with document.getElementById("myiframe").contentWindow?
> It does get a reference to IFRAME object but I cannot find a way to
> get its innerText or innerHTML property.


If the second-level domains do not match then you cannot get around
the same origin policy. A page from foo.com cannot inspect the
contents of an iframe from bar.net.

Peter
 #14  
09-28-07, 05:09 AM
Aaron Saray
On Sep 27, 10:56 am, Peter Michaux <petermich> wrote:
> On Sep 27, 7:27 am, VUNETdotUS <vunet> wrote:
>
> > Is IFRAME trick possible if I load IFRAME with src='anotherdomain.com'
> > and access with document.getElementById("myiframe").contentWindow?
> > It does get a reference to IFRAME object but I cannot find a way to
> > get its innerText or innerHTML property.

>
> If the second-level domains do not match then you cannot get around
> the same origin policy. A page from foo.com cannot inspect the
> contents of an iframe from bar.net.
>
> Peter


You can do some fun DNS/subdomain tricks - but it requires
communication with the mashup service providers.

*shameless self promotion* check this link:
[url down]
I pointed to two of the technical articles, and then filled in the
blanks on the things I thought they were missing.
Similar Threads
Anyone scraping dynamic AJAX sites?

Hello. Is there anyone who has successfully found a way to scrape a dynamically generated AJAX web site? If I view the source, it gives me the variables. If I use Firebug...

possible to getSelection from AJAX sites?

Is it possible to get highlighted text on an AJAX style website using javascript? I'm writing a bookmarklet for Firefox and it works for regular pages and framed-pages but...

Problem in accessing sites having ajax

I have an open IP and on that IP our main application is hosted. it uses ajax. in web.config file i have register ajax handlers. there are also other sites or project on that...

AJAX sites and WSH

I work for a large antivirus company and am very new to the whole concept of AJAX. I have a pretty good understanding of what it is (javascript and XML). However, after doing...

IE7 does not work with AJAX sites

Hello. As a developer, i'd like to make this suggestion. It seems that IE7 Beta 2 has a problem recognizing XHTML sites, and any site that incorporates AJAX. I could not even...


All times are GMT. The time now is 10:37 AM. | Privacy Policy