Archive for the ‘Standards Participants’ Category

Proposing HTTP Request Forwarding

Thursday, March 8th, 2007

I’ve been monitoring the ietf-http-wg mailing list and have noticed there is renewed actively around revising RFC 2616, the HTTP/1.1 specification. This renewed activity got me thinking it was time to discuss the need for HTTP Request Forwarding.

I’ll start by saying it is possible that Request Forwarding is already part of the HTTP spec and that I just overlooked it. Lord knows I’ve read enough W3C and IETF specs lately to raze a small forest if printed, so I could easily have missed that part as I read bleary-eyed through the specs. But I’ve asked the question privately of a few people that should know and they all said that HTTP does not support request forwarding; one of them pointed out that that is why VOIP needed SIP.

If the term “HTTP Request Forwarding” isn’t obvious from context, let me give a use-case to illustrate:

USE-CASE: Retaining control of URL Interfaces when outsourcing image hosting

  1. http://example.com
  2. http://image-servers-r-us.net
  3. http://example.com/index.html
  4. http://example.com/images/splash.png
  5. http://image-servers-r-us.net/example.com/splash.png

Assume client “A”, server “B” mapped to [1] and server “C” mapped to [2] (servers “B” and “C” are different computers with different IP addresses probably at different locations.) For this use case client “A” requests the HTML file at [3] which includes an <img> tag pointing to a graphic at [4].

The request from client “A” for the .html file at [3] goes to server “B” and the response is returne to client “A.” After parsing [3] client “A” realizes it also needs to download the image at [4] and requests the .png file from server “B” at [1].

HTTP Request Forwarding (Note use of HTTP 1.2 to make URL in example explicit)

However, server “B” knows that the .png file [3] is actually located on server “C” at [5] so it forwards the request to server “C” at [2]. Server “C” responds by returning the .png file directly back to client “A” and client “A” is none the wiser; i.e. client “A” still thinks that the image was returned by [4].

In addition, if client “A” is a browser and the user inspects the properties of the image, the user would find the URL to be [4], not [5]. No where in the response given to the client would there be information that the image came from [5] instead of [4] except that the IP address differs from the server that requested it. It is possible that a header could contain the forwarding information, but the client would not need to act on it except potentially for debugging.

There are several other use-cases as well, such as making it possible to fully virtualize a domain authority’s URL interface. However, the use-case of outsourcing images or any large static content illustrates a benefit that I believe should be obvious to most people. And for those that are interested, especially if you are involved in updating the spec, I could detail other use-cases if people are interested.

Now it is possible there are security implications with this that I have not considered. If so, I would hope that we could at least explore potential ways to mitigate those issues as opposed to immediately dismissing HTTP Request Forwarding out of hand.

In summary, adding a Request Forwarding functionality to HTTP would allow servers to maintain control of their URL interfaces while still being able to distribute loads and services to appropriate servers. Having such a feature could improve consistency in URL design and make it easier for website owners to restructure their websites without the level of broken links seen on the web today.

Interestingly, on Tuesday Scott Hanselman blogged about just this problem from a slightly different perspective; I present Scott’s post for your consideration.

URL Quote #2: Think about your website’s “public face.”

Saturday, March 3rd, 2007
“…one should take an hour or so and really think about their website’s ‘public face.’”

-Scott Hanselman on “A Website’s Public Face

URLQuiz #1: To .WWW or not to .WWW?

Monday, February 19th, 2007

As promised, this is the first of what will be many URLQuizes here are the blog for The Well Designed URLs Initiative. This URLQuiz discusses the convention of using a subdomain with the name ‘www‘ to identify a website.

As most everyone knows, many of the first sites on the web started using this convention. Examples include  www.amazon.com, www.yahoo.com, www.google.com, and www.ebay.com. However, there is nothing about the web that requires a subdomain be named ‘www‘ when selecting the address for a website. To the contrary, many websites use other subdomains for prefixes such as:

There is even a passionate contingent of web developers  that believe the ‘www‘ convention is an anachronism and should be deprecated (or ‘eventually abolished‘, in layman’s terms.)

So how should the base domain and subdomain(s) be handled, and what are the pros and cons of each? Here are the options I’ve identified, but feel free to suggest others that come to mind as well:

  1. Establish the ‘www‘ form as the implicit canonical form and issue a 404 - Not Found whenever an inbound request attempts to deference a URL using the root domain (i.e. without ‘www‘ or any other subdomain.)
  2. Establish the non-’www‘ form as the implicit canonical form and issue a 404 - Not Found whenever an inbound request attempts to deference a URL using the ‘www‘ subdomain.
  3. Establish the ‘www‘ form as the implicit canonical form and use a 301 - Moved Permanently (redirect)  whenever an inbound request attempts to deference a URL using the root domain (i.e. without ‘www‘ or any other subdomain.)
  4. Establish the the non-’www‘ form as the implicit canonical form and use a 301 - Moved Permanently (redirect) whenever an inbound request attempts to deference a URL using the ‘www‘ subdomain.
  5. Do not establish a canonical form and return 200 - Ok for both the ‘www‘ form and the non-’www‘ form.
  6. Abandon both the ‘www‘ form and the non-’www‘ form and always use explicitly subdomains based on your site organization like in the examples shown above.
  7. Some combination of 1 through 6 I haven’t already described.
  8. Or, something completely different?

So there you go; give your answer(s) in the comments. Though I definitely have my opinions on the subject I will stay out of it unless I don’t see anyone mentioning several of the points I think are relevant. After enough comments come in, I’ll summarize and write a follow up post, just like Dan Cederholm did with SimpleQuiz.

Hint: You might want to consider not only online usage but offline usage as well.

UPDATE: Just days after writing this post Tim Bromhead wrote: Which is better for your site: www or no www?  Is that weird or what? Tim must have had some kind of a Vulcan Mind Meld or similar going on… Anywho, great article Tim and thanks for being a URLian!

UPDATE#2: Looks like I picked the right time to discuss this issue! A few days ago Scott Hanselman talked about the downside of ignoring the distinction between ‘www’ and the root domain, Jeff Atwood discussed how to solve it, to which Phil Haack then responds with a bit of a rant about the www or lack thereof. Since they both have such strong yet opposite opinions on the subject, maybe we can get both Jeff and Phil to weight in on the subject over here…?

Technorati Tags: URL Design | Subdomains | Canconical Form | www | no-www

Use rel=”spam” to Fight Comment Spam?

Thursday, February 8th, 2007

As I was going through my Akismet spam filter today reviewing the 87 comment spam I got during the prior ~24 hours to ensure I didn’t delete any legitimate comments, it occurred to me that maybe there is a simple solution to comment spam.

What if blog apps could simply mark a hyperlink with?:

rel=”spam”

The simple idea is that rather than delete spams, blogs could start maintaining a special page of links to comment spammer’s websites using rel=”spam” on the “A” element. Basically this would be PageRank in reverse. The search engines would then apply negative weighting to anything marked spam and give the spammers the exact opposite of what they were pursuing when they unethically tried to game the system!\.

 For example, Google could give negative PageRank for a spam link compared to positive PageRank for a non-spam link. Google could also weight the relevency of the link text negatively vs. the positive value it would give a non-spam link. This would have the affect of distributing the watch-dogging of spammers out onto the web without requiring any new infrastructure, and it would create a clear disincentive for comment spammers instead of the lack of disincentive from “nofollow.”

Are there problems with this I’m not foreseeing?  Probably.  I already know that people would try to game the system for negative purposes, and that’s to be expected. Still, I think that for the most part anyone simply using it to field a grudge or in as attempt to harm a competitor would be doing it by definition on such a small scale that it would have no effective. Given that the many comment spammers automate, they can end up with huge numbers of comment spam links. If the search engines merely weighted a spam link as 1/10th the negative value of a positive link, it would certainly still be effective.

Of course the hard-core Linux faithful would immediately spam-link to Microsoft.com just to spite them! But I really don’t (currently) see how that couldn’t be detected and managed via policies and algorithms. For example, if a company has a large number positive links it could be exempt from the effects of spam links. And I’m sure automated methods or methods using collective intelligence could emerge to resolve these problems the vast majority of time. The rest could be handled via policy; get caught spam-linking someone inappropriately and get your domain pulled from the index!

What’s more, it would give bloggers a sense of purpose when they review their spam filters instead of them feeling like the time spent was just a waste. I know that if my efforts to detect comment spammers could get them lower PageRank, I’d feel good about monitoring my comments for spam as I would be doing a service for the public good. And I’m sure most other bloggers would feel the same.

Now I know that Microformats.org has the similiar proposal VoteLinks, but that is about registering opinion as opposed to calling out gamers of the system. VoteLinks is also much broader than what I’m suggesting.  If we keep the focus really narrow — shine a spolight on spam so that the search engines can erradicate it — then I’m pretty sure it would be a success.

What do you think?  Good idea?  Filled with holes I’ve not considered?  I look forward to your feedback.

Proposing URI Templates for WebForms 2.0

Thursday, January 11th, 2007

I recently had an off-list email conversation with Ian Hickson, the editor of the Web Application Hypertext Technology Working Group specifications (i.e. HTML5 and WebForms 2.0). I was proposing to him that the current WebForms 2.0 be draft specification be amended to include a URI Template in the “action” attribute of the FORM element. Because I believe so strongly in the benefit of this proposal and because such things are inline with the Well Designed URLs Initiatitve was envisioned to advocate for, I decided to publish it to our blog and reference it in the WHATWG blog. The following is what I sent to Ian in email:

I really want to see WHATWG incorporate URI Templates for Web Form actions[1]. i.e.:

<form
action="http://foo.com/{make}/{model}/”
method="get">
<input type="text" name="make" />
<input type="text" name="mode" />
<input type="submit" />
</form>

If I type “Honda” and “Civic”, it will do a get to:

http://foo.com/Honda/Civic/

Instead of the only current possibility being something like:

<form
method="get"
action="http://foo.com/cars.php">
<input type="text" name="make" />
<input type="text" name="mode" />
<input type="submit" />
</form>

Which would produce the following for “Honda” and “Civic”:

http://foo.com/cars.php?make=Honda&model=Civic

To which Ian replied in two parts. Here is the first part:

“Why not just write a server-side redirector? That’s a trivial one to write. Four lines of code, maybe 10 if you make the recommended security checks first. You could also do it with a little bit of JavaScript.”

Unfortunately, a server-side redirector is not an appropriate solution in one case for the use-cases this proposal would address and doesn’t work for two others:

  1. Server-side redirection requires two round trips instead of one. I don’t believe any reasonably competent web architect for a high traffic site would allow a redirect for a high-traffic site. Consider any search engine such as the flagship offering for Ian’s employer; is Google likely to implement a redirect on every search request? Not likely. Server-side redirection reduces response time (by half?) and increases (doubles?) the number of concurrent requests that servers must handle. Using server-side redirection would probably also increase bandwidth requirements a measurable amount.
  2. It is not possible via HTTP to redirect the body of a POST. Consequently, the following use-case cannot be duplicated with a server-side redirect:

    <form
    action="http://www.myblog.com/{topic}/”
    method=”post”>
    <select name=”topic”>
    <option value=”first”>My 1st Post</option>
    <option value=”second”>My 2nd Post</option>
    <option value=”third”>My 3d Post</option>
    </select>
    <input type=”text” name=”comment”>
    <input type=”submit”>
    </form>

  3. For those wanting to create a form to direct to a website they do not control, a server-side redirect is absolutely not an option. For example, assume that I wanted to run a page on my website that lets people navigate to the topics on the WHATWG blog using a FORM with a SELECT? It’s simply not possible as WebForms 2.0 is currently specified without Javascript (addressed in a moment), but would be easily possible using a template (note I left off the closing “</option>” tags for formatting reasons):

    <form
    action="http://blog.whatwg.org/{topic}"
    method="post">
    <select name="topic">
    <option value="feed-autodiscovery">
    Feed Autodiscovery
    </option>
    <option
    value="text-content-checking">
    textContent Checking
    </option>
    <option value="checker-bug-fixes">
    Bug Fixes
    </option>
    <option
    value="significant-inline-checking">
    Significant Inline Checking
    </option>
    <option value="charmod-norm-checking">
    Charmod Norm Checking
    </option>
    <option
    value="proposing-features">
    Proposing features
    </option>
    </select>
    <input type="submit">
    </form>

  4. My belief is having this capability would encourage a lot more linking of this type between pages on the web.

So yes, server-side redirection is possible in some cases but by no means all, and for those cases where it’s possible it is not optimal.

Moving on the Ian’s suggestion to use “a little bit of JavaScript” to meet this use-case, I will admit it is possible to use JavaScript but these are the drawbacks in viewing JavaScript as the solution for this use-case:

  1. JavaScript is simply not allowed in a wide variety of web applications such as wikis, forums, and other sites that solicit community content although many of these do allow HTML elements such as FORM.
  2. Some users disable scripting on certain sites, often by decree of their employers.
  3. Javascript’s cross-browser compatibility issues make it less reliable and people are far less likely to depend on it when money is at stake, and forms are frequently used in those contexts.
  4. “User agents with no scripting support” from Section 1.3 Conformance Requirements of the Web Applications 1.0 Working Draft (HTML5) that incorporates WebForms 2.0. Need I say more?
  5. Declarative code is far easier to debug than procedural code.
  6. Far more people know HTML than JavasScript given the much greater skill required to master the latter, and that is unlikely to change.

It’s interesting to note that in the preface to the introduction for Section 3 of the WebForms 2.0 Working Draft of 12 October 2006, the following note is made about how everything that repeating form controls offers can already be done in JavaScript and the DOM. The mere fact that they went to the trouble to include something as complex as repeating controls into HTML5 when it can be done with JavaScript and the DOM implies that well-known patterns in web architecture are better implemented declaratively instead of via JavaScript and the DOM:

Occasionally forms contain repeating sections. For example, an order form could have one row per item, with product, quantity, and subtotal controls. The repeating form controls model defines how such a form can be described without resorting to scripting.

Note: The entire model can be emulated purely using JavaScript and the DOM. With such a library, this model could be used and down-level clients could be supported before user agents implemented it ubiquitously. Creating such a library is left as an exercise for the reader.

So yes it is possible to use JavaScript in many cases, but it no where near optimal. Javascript should not be considered the solution for as well-defined and obvious patterns such as submitting to a clean URL.

To further drive home the value of this proposal, anyone monitoring the REST-discuss list for any length of time will see that most REST experts tend toward using (what I call :) well-designed URLs, i.e. URLs where the resource is identified by path instead of query string. With WebForm 2.0’s pending support of PUT and DELETE, it would be just short of a crime not to include support for posting to clean URLs in WebForms 2.0.

Since having this discussion with Ian via email it was since pointed out to me on rest-discuss by Mark Baker that my proposal as written would break the existing web so was a non-starter. For some reason I wasn’t thinking about that, probably because I was more concerned about getting Ian (who I like to call: Mr. “No :) to agree that URI Templates were needed. Still, the solution would be simple.

What follows are my examples from above recast using an optional template attribute that would override the action attribute for WebForms 2.0 compliant browsers. This would of course require the server to accept both query string parameters and clean URLs (and hopefully do a server redirect from the former to the latter), or the submit could be implemented using Javascript for older browsers when applicable. Note that I didn’t show an example using JavaScript but, as the WebForm 2.0 spec says(that) is left as an exercise for the reader”:

  1. <form
    action="http://foo.com/model"
    template=”http://foo.com/{make}/{model}/”
    method=”get”>
    <input type=”text” name=”make” />
    <input type=”text” name=”mode” />
    <input type=”submit” />
    </form>

  2. <form
    action="http://www.myblog.com/topic"
    template=”http://www.myblog.com/{topic}/”
    method=”post”>
    <select name=”topic”>
    <option value=”first”>My 1st Post</option>
    <option value=”second”>My 2nd Post</option>
    <option value=”third”>My 3d Post</option>
    </select>
    <input type=”text” name=”comment”>
    <input type=”submit”>
    </form>

  3. <form
    action="http://blog.whatwg.org/topic"
    template=”http://blog.whatwg.org/{topic}”
    method="post">
    <select name="topic">
    <option value="feed-autodiscovery">
    Feed Autodiscovery
    </option>
    <option
    value="text-content-checking">
    textContent Checking
    </option>
    <option value="checker-bug-fixes">
    Bug Fixes
    </option>
    <option
    value="significant-inline-checking">
    Significant Inline Checking
    </option>
    <option value="charmod-norm-checking">
    Charmod Norm Checking
    </option>
    <option
    value="proposing-features">
    Proposing features
    </option>
    </select>
    <input type="submit">
    </form>

So in summary I really hope that Ian, who definitely seems to be the gatekeeper for what goes into HTML5 and what doesn’t go into HTML5, can see his way clear to add this feature to WebForms 2.0. If his main issue with it is needing to have it written up for inclusion in the spec, I’m more than happy to help.

Intro, Part 11: Each Post will Identify Audience

Wednesday, December 27th, 2006

Here at the Well Designed URLs Initiative we plan to address a wide audience and cover a plethora of URL-related topics. If it wasn’t obvious from yesterday’s post we plan to publish content for a variety of roles so we will categorize all our posts by the audience we are targeting.

Using our audience categories you can subscribe to our RSS feed then configure your feed reader to filter out all but those topics which are likely to appeal to you so as not to be overwhelmed by the rest. Some of our audience categories encompass other categories such as Everyone and Internet Professionals so we’ll plan to tag only the highest level category that applies to avoid duplication. For example, if you are a web developer you might want to filter out all but the Everyone and Internet Professionals, and Web Developers categories. Of course, if that’s too much trouble just subscribe to our entire feed and just ignore those posts that don’t interest you.

The following is the list of categories we’ve set up by audience role:

NOTE: If you read this post shortly after it is published, most of those links above will just redisplay this post. Let me explain. Because of the way our WordPress blog software works, those links would have displayed a 404 Not Found error if no posts existed for the given category. To avoid that I’ve tagged this post with all audience categories contradicting what I said above; that we would only put a post in its highest level category. Moving forward we shouldn’t need to do this again.