URL Quote #2: Think about your website’s “public face.”
Saturday, March 3rd, 2007“…one should take an hour or so and really think about their website’s ‘public face.’”
“…one should take an hour or so and really think about their website’s ‘public face.’”
I was planning to blog something else today, but Mark Nottingham of Yahoo made a statement about URL Design [1] in his post entitled REST Issues, Real and Imagined and I simply could not let his statement without comment.
But first let me say I always appreciate Mark’s perspective on issues, and enjoy reading his writings whether on his blog or in the mailing lists. His perspective is typically insightful and prescient, and he hovers above the hovers above the muck-racking and unsupported claims that can occur on mailing lists populated by egos. All in all his involvement is very professional, and I highly respect him for that,
That said, here is the comment me made that really bothered me (2nd and subsequent emphasis mine):
Red Herring: URI Design
When somebody first “gets” REST, they often spend an inordinate amount of time agonising over the exact design of the URIs in their application; take a look over on rest-discuss, for example. In the end, though, URI design is a mostly a cosmetic issue; sure, it’s evidence that you’ve thought about good resource modelling, and it makes things more human-intelligable, but it’s seldom worth spending so much time on it.I’d worry a lot more about cacheability, extensibility and well-defined formats before blowing out my schedule on well-designed URIs. For me, the high points are broadly exploiting the hierarchy, allowing relative references, and making sure that tools (e.g., HTML forms) can work well with my URIs; everything else is gravy.
I’ll start by saying I don’t really disagree significantly with his overall points that I believe he was making, such as the fact that there are other aspects that deserve attention in addition to URL design and also the fact some people appear to thrash when designing URLs. But to someone who does not appreciate URL design his statement could be easily misconstrued to mean that URI Design is not at all important, especially when he says it is mostly a cosmetic issue. That is false.
URI design is NOT merely a cosmetic issues! It has many tangible ramifications and those who ignore it do so at their own peril. Just scratching the surface, proper URL planning and design provides a framework for good information architecture, can facilitate spontaneous inbound linking via blogs, voting, tagging, and other social media sites, and can guard against broken URLs and subsequent traffic loss, to name but a few. They are issues that concern, or should concern everyone who publishes a site on the world wide web.
But what’s worries me most are the people who will certain latch onto Mark’s words as not only justification for ignoring patterns and best practices but also for antagonistically preaching against URL Design. This group includes web and content management system developers who would prefer not to be bothered with usability issues, system administrators who believe in security by obscurity (which itself is a fool’s precaution), dogmatists who misinterpret the principle of URI opacity and preach that web publishers should publish completely opaque URLs, opaque even to the web publishers themselves, and a tiny but vocal contingent that for reasons I cannot fathom argue against URI design even within totally unrelated conversations. It is for this reason I think Mark’s statement is potentially very damaging.
And as for his comments about rest-discuss, it is quite possible he was referring to conversations in which I participated. If so, I believe he mistook the crux of the conversation on several levels. The first was that in many cases I was advocating URL design, not agonizing over it. Certainly, URI design is really not that hard, it just takes understanding core principles and best practices, and then applying them. And secondly, some people escalate conversations to raging debates when simple questions are asked about proper URL usage in context of REST and Web Architecture. Mark could easily yet wrongly have misinterpreted those as too much hand-ringing over URL design.
In summary I don’t really fault Mark’s comments as I appreciate them to be. And I think Mark does a great service for the web in his quest to educate people about the value of REST. But as Mark is justifiably well respected I’m worried Mark’s comments may be used to rationalize bad URL Design. As such, I definitely hope he updates his post to guard against his word’s misuse.
P.S. Oh, and I also couldn’t help but wonder if Mark was trying to get in a cheap dig when he wrote “…before blowing out my schedule on well-designed URIs“…? But naaaaah, Mark’s a real professional and wouldn’t go for such an underhanded shot. ;-)
If you’ve read many of our other posts here at The Well Designed URLs Initiative, you know that we are strong advocates for User-Centered URL Design as well as for URL Literacy. It’s our contention that the URL is woefully under-appreciated as the most fundamentally important technology of the web, more important than HTTP, and even more important than HTML. The purpose of this post is to provide background for future posts explaining URL design importance from a perspective most website owners can appreciate; search engine rankings!
For those unfamiliar with Google’s core algorithm for determining its search engine results, you can read this article to learn about PageRank in more painstaking detail. Here, I’ll just try to explaining the aspects of PageRank as it relates to URLs. Note also that my explanations are simply meant to be a conceptual guide and not exacting details. The founders of Google did publish their initial algorithms but have since made tweaks that are as closely held a corporate secret as the formula for Classic Coke!
PageRank ias essentially a popularity rating, and a page’s PageRank is determined by the inbound links from other pages on the web. A PageRank can be as low of almost zero (0) to as high as ten (10). Google’s algorithms determine a page’s PageRank by dividing the PageRank for each of the inbound linking pages by the number of outbound links on each of those pages, factoring in each page’s PageRank, and then summing the results for all inbound links. Clear as mud, right? It’s easier to explain with an example, but let’s cover a bit more ground first.
PageRank considers each link a ‘vote’ for the page linked to. But unlike in a democratic “one citizen, one vote” society, Google’s algorithm more closely models the shareholders of a corporation voting their shares; the votes of those with “more” (PageRank or shares) have a greater influence on the outcome. So a link from a page with a PageRank of 7 is more valuable than a link from a page of PageRank 3; probably many orders of magnitude more valuable, as you’ll see next.
Because of the nature of the web, a small number of pages have a huge number of inbound links, and vice versa. So those with more links get more PageRank, but the value of PageRank is on a logarithmic scale thus it increases exponentially. Assuming[1] that the base were five (5), the value a page would get to vote based on it’s PageRank would look like this:
| PageRank | Value |
|---|---|
| 0 | 0 |
| 1 | 5 |
| 2 | 25 |
| 3 | 125 |
| 4 | 625 |
| 5 | 3,125 |
| 6 | 15,625 |
| 7 | 78,125 |
| 8 | 390,625 |
| 9 | 1,953,125 |
| 10 | 9,765,625 |
Assume a site somehow manages to get a persistent link from MySpace’s home page (www.myspace.com). At the moment contains MySpace’s home page contains about 70 outbound links and has a PageRank of seven (7). Let’s also assume that there are a total of 50 other inbound links, and let’s say the average PageRank for those pages linking in is three (3) and those pages have an average outbound link count of 10. From this, let’s calculate PageRank:
Looking it up in the table, the resultant PageRank for the home page is four (4).
As with the three ‘L’s of real estate, the three ‘P’s of inbound links are: PageRank, PageRank, PageRank! [2] Note how in the prior example the 50 inbound links of PR3 offered less PageRank than the one (1) inbound link from MySpace with PR7! Of course we don’t know the logarithmic base [1] but Phil Craven says 5 or 6 are what many people believe it to be.
Here is what it would look like with base two (2) through ten (10) (download the full calculations here as a zipped Excel 2003 file [4kb]):
| Logarithmic Base |
Value from PR3 * 50 / 10 |
Value From PR7 * 1 / 70 |
Total Value |
Resultant PageRank |
|---|---|---|---|---|
| 2 | 40 | 2 | 42 | 5 |
| 3 | 135 | 31 | 166 | 4 |
| 4 | 320 | 234 | 554 | 4 |
| 5 | 625 | 1,116 | 1741 | 4 |
| 6 | 1,080 | 3,999 | 5,079 | 4 |
| 7 | 1,715 | 11,765 | 13,480 | 4 |
| 8 | 2,560 | 29,959 | 32,519 | 4 |
| 9 | 3,645 | 68,328 | 71,973 | 5 |
| 10 | 5,000 | 142,857 | 147,867 | 5 |
So depending on the logarithmic base, PageRank fluctuates between four (4) and five (5) for this hypothetical example. However, starting with a logarithmic base of five (5) the one MySpace link overpowers the 50 others! And because pages with a PageRank closer to 10 are listed higher in Google’s search engine results page among competing pages, people focused on SEO are always trying to increase their page’s PageRank, often via unscrupulous means.
Of course nobody outside Google knows the exact formula or base exponents used, but hopefully this post illustrates the value of links from high PageRank pages.
However, I would be remiss if I didn’t point out that a single-minded focus on inbound links is fraught with peril, not the least because it might cause your pages to removed from Google’s index! Just as there are people selling weight loss products they claimn don’t require dieting or exercise, there are people offering ways to inbound links that don’t require having real people link to you. However, Google considers these shortcuts to be gaming the system is ever vigilant to discover those cheaters. If caught cheating, Google will ban your pages from their index without notice.
The best way to gain inbound links for your key pages on your website is to do the hard work of creating a site with great content that people want to link to.
So as an epilogue, getting inbound links is clearly necessary for high PageRank and thus good search engine results, but all those inbound links can be squandered without a good architecture and site management plan. To ensure that a site’s great content and popularity get reflected in appropriately high search engine ranking it’s critical to optimize the architecture of the site, the pages, and the URL structure as well as make plans for how the URL structure might change over time.
Personally, I think the most under-realized aspect of white hat SEO [3] is the lack of attention paid to URL planning and design, especially for larger websites. There are very few tools [4] besides the low-level and effectively simplistic URL rewriters like mod_rewrite for creating and maintaining a URL plan, very few articles [4] that discuss URL design, and no articles [4] I am aware of that discuss URL planning.
However, I believe website owners will see huge improvements compared to their prior rankings if they focus on URL design and create a URL management plan. The good news is that URL design is mostly a one time endeavor assuming site maintainers adhere to the management plan, at least until there is a full site rearchitecture.
But all of the whys and wherefores regarding URL planning and design are beyond the scope of this post, and instead will be the subject of many posts in the future. Stay subscribed!
And, as I stated at the start of this post you can learn more the PageRank formula here, and you can also google for PageRank to get a large list of other resources.
Technorati Tags: PageRank | Whitehat | SEO | URL Design
As promised, this is the first of what will be many URLQuizes here are the blog for The Well Designed URLs Initiative. This URLQuiz discusses the convention of using a subdomain with the name ‘www‘ to identify a website.
As most everyone knows, many of the first sites on the web started using this convention. Examples include www.amazon.com, www.yahoo.com, www.google.com, and www.ebay.com. However, there is nothing about the web that requires a subdomain be named ‘www‘ when selecting the address for a website. To the contrary, many websites use other subdomains for prefixes such as:
There is even a passionate contingent of web developers that believe the ‘www‘ convention is an anachronism and should be deprecated (or ‘eventually abolished‘, in layman’s terms.)
So how should the base domain and subdomain(s) be handled, and what are the pros and cons of each? Here are the options I’ve identified, but feel free to suggest others that come to mind as well:
So there you go; give your answer(s) in the comments. Though I definitely have my opinions on the subject I will stay out of it unless I don’t see anyone mentioning several of the points I think are relevant. After enough comments come in, I’ll summarize and write a follow up post, just like Dan Cederholm did with SimpleQuiz.
Hint: You might want to consider not only online usage but offline usage as well.
UPDATE: Just days after writing this post Tim Bromhead wrote: Which is better for your site: www or no www? Is that weird or what? Tim must have had some kind of a Vulcan Mind Meld or similar going on… Anywho, great article Tim and thanks for being a URLian!
UPDATE#2: Looks like I picked the right time to discuss this issue! A few days ago Scott Hanselman talked about the downside of ignoring the distinction between ‘www’ and the root domain, Jeff Atwood discussed how to solve it, to which Phil Haack then responds with a bit of a rant about the www or lack thereof. Since they both have such strong yet opposite opinions on the subject, maybe we can get both Jeff and Phil to weight in on the subject over here…?
Technorati Tags: URL Design | Subdomains | Canconical Form | www | no-www
I got an email from the PayPal Developer Network today announcing PayPal’s new “NVP” (or “Name-Value Pair“) API. Clearly they’ve learned that the complexity of SOAP is counter productive to adoption. Here’s what the email had to say about their new API:
NVP Is Your Integration MVP
We’re proud to announce that PayPal’s Name-Value Pair API has launched. Complex SOAP structure is now gone. All API methods are supported, except for AddressVerify. Get exclusive sample code – download two SDKs (Java and .NET). Get Details
Taking a look at their examples (in .ASP, .PHP, or ColdFusion) and their SDKs (for Java and ASP.NET [v1.1]) it’s nice to see they are using POST instead of GET. The following is one of their functions from their PHP examples (CallerService.php) that illustrates how their code is calling their NVP API (I edited for line-length only):
function hash_call($methodName,$nvpStr)
{
//declaring of global variables
global $API_Endpoint,$version,$API_UserName,
$API_Password,$API_Signature,$nvp_Header;//setting the curl parameters.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$API_Endpoint);
curl_setopt($ch, CURLOPT_VERBOSE, 1);//turning off the server and peer verification
//(TrustManager Concept).
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);//if USE_PROXY constant set to TRUE in Constants.php,
//then only proxy will be enabled. if USE_PROXY constant
//set to TRUE in Constants.php, then only proxy will be
//enabled.
//Set proxy name to PROXY_HOST and port number to
// PROXY_PORT in constants.php
if(USE_PROXY)
curl_setopt ($ch,
CURLOPT_PROXY,
PROXY_HOST.”:”.PROXY_PORT);//NVPRequest for submitting to server
$nvpreq= “METHOD=”.urlencode($methodName).
“&VERSION=”.urlencode($version).
“&PWD=”.urlencode($API_Password).
“&USER=”.urlencode($API_UserName).
“&SIGNATURE=”.urlencode($API_Signature).$nvpStr;//setting the nvpreq as POST FIELD to curl
curl_setopt($ch,CURLOPT_POSTFIELDS,$nvpreq);//getting response from server
$response = curl_exec($ch);//convrting NVPResponse to an Associative Array
$nvpResArray=deformatNVP($response);
$nvpReqArray=deformatNVP($nvpreq);
$_SESSION[’nvpReqArray’]=$nvpReqArray;if (curl_errno($ch)) {
// moving to display page to display curl errors
$_SESSION[’curl_error_no’]=curl_errno($ch) ;
$_SESSION[’curl_error_msg’]=curl_error($ch);
$location = “APIError.php”;
header(”Location: $location”);
} else {
//closing the curl
curl_close($ch);
}return $nvpResArray;
}
Much nicer and simplier than having to go to all the effort to set up a SOAP call.
Unfortunately, PayPal missed a huge opportunity to make their new API fully RESTful. Instead of designing a URL-centric REST interface (with a hypermedia component to keep the purist or the pure RESTafarians happy) they instead tunneled method calls over HTTP! They used methods like “DoDirectPayment” and “RefundTransaction” Sheesh! (Note: these links to their methods load slower than any website I can remember visiting in ages and while loading my browser does lots of clicking. What the heck is going on in there I have no idea! You can go to the main docs via a much faster downloading PDF here. Wow, if that isn’t usually an oxymoron!)
Though much easier than calling SOAP, clearly not very RESTful. Three steps forward, and two steps back. Sigh…
As I was going through my Akismet spam filter today reviewing the 87 comment spam I got during the prior ~24 hours to ensure I didn’t delete any legitimate comments, it occurred to me that maybe there is a simple solution to comment spam.
What if blog apps could simply mark a hyperlink with?:
rel=”spam”
The simple idea is that rather than delete spams, blogs could start maintaining a special page of links to comment spammer’s websites using rel=”spam” on the “A” element. Basically this would be PageRank in reverse. The search engines would then apply negative weighting to anything marked spam and give the spammers the exact opposite of what they were pursuing when they unethically tried to game the system!\.
For example, Google could give negative PageRank for a spam link compared to positive PageRank for a non-spam link. Google could also weight the relevency of the link text negatively vs. the positive value it would give a non-spam link. This would have the affect of distributing the watch-dogging of spammers out onto the web without requiring any new infrastructure, and it would create a clear disincentive for comment spammers instead of the lack of disincentive from “nofollow.”
Are there problems with this I’m not foreseeing? Probably. I already know that people would try to game the system for negative purposes, and that’s to be expected. Still, I think that for the most part anyone simply using it to field a grudge or in as attempt to harm a competitor would be doing it by definition on such a small scale that it would have no effective. Given that the many comment spammers automate, they can end up with huge numbers of comment spam links. If the search engines merely weighted a spam link as 1/10th the negative value of a positive link, it would certainly still be effective.
Of course the hard-core Linux faithful would immediately spam-link to Microsoft.com just to spite them! But I really don’t (currently) see how that couldn’t be detected and managed via policies and algorithms. For example, if a company has a large number positive links it could be exempt from the effects of spam links. And I’m sure automated methods or methods using collective intelligence could emerge to resolve these problems the vast majority of time. The rest could be handled via policy; get caught spam-linking someone inappropriately and get your domain pulled from the index!
What’s more, it would give bloggers a sense of purpose when they review their spam filters instead of them feeling like the time spent was just a waste. I know that if my efforts to detect comment spammers could get them lower PageRank, I’d feel good about monitoring my comments for spam as I would be doing a service for the public good. And I’m sure most other bloggers would feel the same.
Now I know that Microformats.org has the similiar proposal VoteLinks, but that is about registering opinion as opposed to calling out gamers of the system. VoteLinks is also much broader than what I’m suggesting. If we keep the focus really narrow — shine a spolight on spam so that the search engines can erradicate it — then I’m pretty sure it would be a success.
What do you think? Good idea? Filled with holes I’ve not considered? I look forward to your feedback.
Another URLian has appeared: Brad Fults. Brad just added himself to our wiki and became a signatory; thanks Brad! Better yet, on his user page on our wiki he linked to his post Designing URLs for Multilingual Web Sites; execellent job Brad!
That was a subject I’ve been planning to write for a while, and I’ll probably cover the issue in the future to here on the WDUI Blog to future the conversation but I doubt I could have done as good a job as Brad for my first post on the subject.
One option he did not cover was using using language in filenames, such as #1 - an extention:
example.com/bar/baz.en-US
example.com/bar/baz.en-GB
example.com/bar/baz.de
Or a #2 - suffix to an extension (note I had to add an .html extension for this option):
example.com/bar/baz.html.en-US
example.com/bar/baz.html.en-GB
example.com/bar/baz.html.de
Or as an #3 - extension prefix (also needed an .html extension):
example.com/bar/baz.en-US.html
example.com/bar/baz.en-GB.html
example.com/bar/baz.de.html
Or as an #4 - filename prefix:
example.com/bar/en-US.baz
example.com/bar/en-GB.baz
example.com/bar/de.baz
Or #5 - one level up in the path:
example.com/bar/en-US/baz
example.com/bar/en-GB/baz
example.com/bar/de/baz
Unlike Brad, I didn’t provide an evaluation of these simply because I haven’t researched the subject enough at this time. Maybe he can do a follow up post providing an evaluation of each of these.
However, I can say I don’t really like any of these options, nor are any of the options Brad provides sit well with me except possibly his “Modified Directory Structure (#2)” combined in creative ways with his “Use of Accept-Language HTTP Header (#6)” the latter a.k.a. Content Negotiation. Whatever the case, there will be more on this subject in the future, I’m sure, and it’s good to have this discussion taking place.
I am working on a project that had me was writing about browser plug-ins and I needed to link to the main page for Microsoft’s Internet Explorer Add-ons, for Firefox’s Extensions, and lastly for Greasemonkey for Firefox.
I actually looked up those three in opposite order than I have them listed above. Greasemonkey’s URL was pretty good although it’s a shame it’s not greasemonkey.com/.net/.org; the .com resolves to a 403 forbidden page, the .org resolves to a list of advertising links, and the .net resolves to Grease Monkey International, a franchiser of automotive preventive maintenance centers! Whatever the case, I feel pretty good that this URL is going to have really good persistence. It should be around at least as long a Greasemonkey is relevant if for no other reason than to return a 301:
The second URL for Firefox extensions was not so good, but I still think there a pretty good chance it will still resolve a year from now:
Then there is Microsoft’s horrific URL for Internet Explorer Add-ons. What were they thinking? I’ll bet this URL doesn’t resolve three months from now let alone in a year of five:
http://www.windowsmarketplace.com/category.aspx?bcatid=834&tabid=1&WT.mc_id=0107_20
URLs like this one from Microsoft are a crying shame. Sadly, Microsoft is one of the few companies that can get away with this without be negatively affected. On the other hand, most companies haven’t a clue how bad URLs like this can affect them.
That said, I’d love to get your input:
Here’s a simple best practice. Always ID your heading tags! For example, if you’ve got an <h2> element, be sure to make it <h2 id=”some-heading”>.
IDing heading tags is especially important on long documents.
Why? Because if you don’t, someone else can’t reference the part of the document that they want to reference in a blog post or somewhere else. And if they can’t, they just might reference someone else’s web page instead. Or if they do reference it, readers who click over to your URL might give up on reading before finding the appropriate document, and never come back to your site when they might otherwise have become an avide reader. How often have you see a link to a web page where the person linking included the text “Scroll down to the section entitled…“ Bleach!
Given the heading tag mentioned in the first paragraph above, and assuming it was contained in a document entitled “whitepaper” in the root of www.foo.com, you can point straight to that heading using a URL fragment like so:
Ben Coffey talks about this same problem over at URLs for Specific Portions of Documents. He also talks about CiteBite which helps bloggers and others link directly into a part of a document as if there had been an ID there. But publishers, if others start using CiteBite on your content simply because you don’t include the ID attributes they need to link to your directly, guess who will get the Google PageRank? Not you… ;-)
One more thing. If you are creating content that will be displayed above or below other content, i.e. blog posts that get listed with other blog posts on the same HTML page, you’ll need to make sure your IDs are unique. I personally have started using a convention that appends the date in “YYYYMMDD” format to the end of a meaningful fragment, seperated by a dash, as in:
This tends to work for me because I almost never post more than once per day. Also, though I personally dislike the inclusion of dates in URLs because of how difficult it makes things for users to remember or discover the URLs, having the date as a fragment suffix is not quite at bad. People using the browser URL auto-complete can still easily find the URL they visited recently enough that its URL is still in the browser’s cache. YMMV.
Lastly, if you are going to ID your heading tags, you probably should also create a table of contents. ‘-)
I learned an interesting lesson about URLs and social software; change your article’s URL or publish it under multiple cases, even with 301 redirects, and you’ll end up with fragmented references.
Back in August 2005 I published the article “Well Designed URLs are Beautiful“ on my personal blog and have since had well over 400 people tag it on del.icio.us. Unfortunately during that same time, I’d changed my blog configuration twice to improve its URLs with one more change planned. Of course for each change I made sure the old URLs were redirected with a 301. Unfortunately, these chances resulted in my article being tagged from three different URLs, and hence splitting up it’s ranking on del.icio.us three ways!
So, when changing URLs, be careful. You could harm yourself in the process.
P.S. As an aside, this speaks sadly to ones ability to reorganize a poorly organized website, notwithstanding the fact that Cool URIs (aren’t supposed to) change. I can understand why del.icio.us and others don’t automatically test their links for 301s on an ongoing basis because of the resources required, but delicious and others could offer a “check me” test for links. This would even give them a legitimate reason to periodically email their users and say “Hey, here are all the links that moved!” That email could even ask “Should we keep the changes?” to guard against spammers, though I don’t think this would be a big issue. It is hard enough to get delicious links, how would spammers get them to begin with?
Anyway, I hope to see things evolve in the future where changing links won’t be such a big issue. And yes, sometime in the reasonably near future I plan to be an advocate for some specific plans to enable this.
I recently had an off-list email conversation with Ian Hickson, the editor of the Web Application Hypertext Technology Working Group specifications (i.e. HTML5 and WebForms 2.0). I was proposing to him that the current WebForms 2.0 be draft specification be amended to include a URI Template in the “action” attribute of the FORM element. Because I believe so strongly in the benefit of this proposal and because such things are inline with the Well Designed URLs Initiatitve was envisioned to advocate for, I decided to publish it to our blog and reference it in the WHATWG blog. The following is what I sent to Ian in email:
I really want to see WHATWG incorporate URI Templates for Web Form actions[1]. i.e.:
<form
action="http://foo.com/{make}/{model}/”
method="get">
<input type="text" name="make" />
<input type="text" name="mode" />
<input type="submit" />
</form>If I type “Honda” and “Civic”, it will do a get to:
http://foo.com/Honda/Civic/Instead of the only current possibility being something like:
<form
method="get"
action="http://foo.com/cars.php">
<input type="text" name="make" />
<input type="text" name="mode" />
<input type="submit" />
</form>Which would produce the following for “Honda” and “Civic”:
http://foo.com/cars.php?make=Honda&model=Civic
To which Ian replied in two parts. Here is the first part:
“Why not just write a server-side redirector? That’s a trivial one to write. Four lines of code, maybe 10 if you make the recommended security checks first. You could also do it with a little bit of JavaScript.”
Unfortunately, a server-side redirector is not an appropriate solution in one case for the use-cases this proposal would address and doesn’t work for two others:
<form
action="http://www.myblog.com/{topic}/”
method=”post”>
<select name=”topic”>
<option value=”first”>My 1st Post</option>
<option value=”second”>My 2nd Post</option>
<option value=”third”>My 3d Post</option>
</select>
<input type=”text” name=”comment”>
<input type=”submit”>
</form>
<form
action="http://blog.whatwg.org/{topic}"
method="post">
<select name="topic">
<option value="feed-autodiscovery">
Feed Autodiscovery
</option>
<optionvalue="text-content-checking">
textContent Checking
</option>
<option value="checker-bug-fixes">
Bug Fixes
</option>
<option
value="significant-inline-checking">
Significant Inline Checking
</option>
<option value="charmod-norm-checking">
Charmod Norm Checking
</option>
<optionvalue="proposing-features">
Proposing features
</option>
</select>
<input type="submit">
</form>
So yes, server-side redirection is possible in some cases but by no means all, and for those cases where it’s possible it is not optimal.
Moving on the Ian’s suggestion to use “a little bit of JavaScript” to meet this use-case, I will admit it is possible to use JavaScript but these are the drawbacks in viewing JavaScript as the solution for this use-case:
It’s interesting to note that in the preface to the introduction for Section 3 of the WebForms 2.0 Working Draft of 12 October 2006, the following note is made about how everything that repeating form controls offers can already be done in JavaScript and the DOM. The mere fact that they went to the trouble to include something as complex as repeating controls into HTML5 when it can be done with JavaScript and the DOM implies that well-known patterns in web architecture are better implemented declaratively instead of via JavaScript and the DOM:
Occasionally forms contain repeating sections. For example, an order form could have one row per item, with product, quantity, and subtotal controls. The repeating form controls model defines how such a form can be described without resorting to scripting.
Note: The entire model can be emulated purely using JavaScript and the DOM. With such a library, this model could be used and down-level clients could be supported before user agents implemented it ubiquitously. Creating such a library is left as an exercise for the reader.
So yes it is possible to use JavaScript in many cases, but it no where near optimal. Javascript should not be considered the solution for as well-defined and obvious patterns such as submitting to a clean URL.
To further drive home the value of this proposal, anyone monitoring the REST-discuss list for any length of time will see that most REST experts tend toward using (what I call :) well-designed URLs, i.e. URLs where the resource is identified by path instead of query string. With WebForm 2.0’s pending support of PUT and DELETE, it would be just short of a crime not to include support for posting to clean URLs in WebForms 2.0.
Since having this discussion with Ian via email it was since pointed out to me on rest-discuss by Mark Baker that my proposal as written would break the existing web so was a non-starter. For some reason I wasn’t thinking about that, probably because I was more concerned about getting Ian (who I like to call: Mr. “No“ :) to agree that URI Templates were needed. Still, the solution would be simple.
What follows are my examples from above recast using an optional template attribute that would override the action attribute for WebForms 2.0 compliant browsers. This would of course require the server to accept both query string parameters and clean URLs (and hopefully do a server redirect from the former to the latter), or the submit could be implemented using Javascript for older browsers when applicable. Note that I didn’t show an example using JavaScript but, as the WebForm 2.0 spec says “(that) is left as an exercise for the reader”:
<form
action="http://foo.com/model"
template=”http://foo.com/{make}/{model}/”
method=”get”>
<input type=”text” name=”make” />
<input type=”text” name=”mode” />
<input type=”submit” />
</form>
<form
action="http://www.myblog.com/topic"
template=”http://www.myblog.com/{topic}/”
method=”post”>
<select name=”topic”>
<option value=”first”>My 1st Post</option>
<option value=”second”>My 2nd Post</option>
<option value=”third”>My 3d Post</option>
</select>
<input type=”text” name=”comment”>
<input type=”submit”>
</form>
<form
action="http://blog.whatwg.org/topic"
template=”http://blog.whatwg.org/{topic}”
method="post">
<select name="topic">
<option value="feed-autodiscovery">
Feed Autodiscovery
</option>
<optionvalue="text-content-checking">
textContent Checking
</option>
<option value="checker-bug-fixes">
Bug Fixes
</option>
<option
value="significant-inline-checking">
Significant Inline Checking
</option>
<option value="charmod-norm-checking">
Charmod Norm Checking
</option>
<optionvalue="proposing-features">
Proposing features
</option>
</select>
<input type="submit">
</form>
So in summary I really hope that Ian, who definitely seems to be the gatekeeper for what goes into HTML5 and what doesn’t go into HTML5, can see his way clear to add this feature to WebForms 2.0. If his main issue with it is needing to have it written up for inclusion in the spec, I’m more than happy to help.
As a way to gather a broad spectrum of opinion and engage the community on many different URL-related topics, we’ll be offering a URLQuiz from time to time. Inspired by and patterned after Dan Cederholm’s very well received SimpleQuiz series, we’ll take a simple URL Design question, present a few different alternate approaches, and ask readers to weigh in on which they believe to be the best and why.
Dan proved that his SimpleQuiz was very effective at quickly emerging a consensus for a best practice when a consensus was possible. Building on Dan’s pioneering efforts, we hope to leverage his technique to address the constantly debated issues in URL Design and hopefully arrive at some consensuses of opinion on URL-relates issues ourselves.
So look for the first URLQuiz, part of an ongoing series, here at the Well Designed URLs Initiative blog in the near future.
Probably one of the most interesting projects related to URL Design on which anyone is currently working is the URI Templates project spearheaded by Joe Gregorio of IBM. While not sexy and not something most end users will ever see, infrastructure developers writing blogs, wikis, forums, content management systems, and ideally even web servers, proxies, and routers will be able to utilize URI Templates for configuration and more. Well, once URI Templates are finalized and made an IETF standard that is.
Examples uses could include URL Rewriting Rules, URL Virtualization Configuration, REST’s Hypermedia Requirement, within future HTTP Headers and HTML LINK Elements, and even for Sitemaps (in a hopeful revision to Sitemap.org’s Sitemap protocol.) URI Templates will provide Internet professionals far more flexibility in dealing with URLs such as making it possible to employ URL Construction in a controlled manner without violating that nastly little URI Opacity Axiom.
Unlike most URL Rewriting Tools, URI Templates are very easy to use and, with the possible exception of certain esoteric edge cases that are still being finalized, accessible to mere mortals. URI Templates are certainly nowhere near as difficult as the requirement to master regular expressions and their complex interations when using Apache’s mod_rewrite and IIS’ (3rd Party) ISAPI Rewrite. Almost certainly anyone who has managed to language their own website will have no trouble at all mastering the most common use-cases for URI Templates.
URI Templates use a simple syntax where braces denote variables to be replaced when the templates are converted to actual URLs. For example, if this blog were using URI Templates to configure this page, the URI Template would most certainly look like this:
http://blog.welldesignedurls.org/{year}/{month}/{day}/{post-slug}/
When the template variables have the following values (as they do for this post):
| year: | 2007 |
| month: | 01 |
| day: | 03 |
| post-slug: | introducing-uri-templates |
The resultant URL upon conversion will be:
http://blog.welldesignedurls.org/2007/01/03/introducing-uri-templates/
Pretty simple, huh?
However, be aware that before URI Templates can become an IETF standard there needs to be at least two interoperable implementations, if not a lot more. Mark Nottingham of Yahoo has implemented a URI Template parser in Javascript, but I don’t know for certain if that one will qualify for the purposes of standardization. Whatever the case, if you are working on any projects or products that could benefit from URI Templates I encourage you to participate in the URI working group via their mailing list and then implement the specification in your software projects or products.
Although it may appear to be a boring subject from the outside looking in, the standardization of URI Templates will be probably the most positive thing to happen for URL Design in over a decade.
Here at the Well Designed URLs Initiative we plan to address a wide audience and cover a plethora of URL-related topics. If it wasn’t obvious from yesterday’s post we plan to publish content for a variety of roles so we will categorize all our posts by the audience we are targeting.
Using our audience categories you can subscribe to our RSS feed then configure your feed reader to filter out all but those topics which are likely to appeal to you so as not to be overwhelmed by the rest. Some of our audience categories encompass other categories such as Everyone and Internet Professionals so we’ll plan to tag only the highest level category that applies to avoid duplication. For example, if you are a web developer you might want to filter out all but the Everyone and Internet Professionals, and Web Developers categories. Of course, if that’s too much trouble just subscribe to our entire feed and just ignore those posts that don’t interest you.
The following is the list of categories we’ve set up by audience role:
NOTE: If you read this post shortly after it is published, most of those links above will just redisplay this post. Let me explain. Because of the way our WordPress blog software works, those links would have displayed a 404 Not Found error if no posts existed for the given category. To avoid that I’ve tagged this post with all audience categories contradicting what I said above; that we would only put a post in its highest level category. Moving forward we shouldn’t need to do this again.