URLQuiz #2: URL Equivalence and Cachability

This is quiz #2 of our ongoing URLQuiz series.

In this quiz, there are 26 pairs of URLs (A..Z) and for each pair the questions is: “Which of these two URLs are equivlent?” i.e. which return the same resource when dereferenced, and “Which can be cached as the same URL?

You should answer ‘Yes‘, ‘No‘, or ‘Maybe‘ where ‘Maybe‘ means ‘It might return the same resource but should not be cached according to the specs.’

To answer, leave a comment and ideally explain your reasoning for each. Feel free to group answers based on your reasoning and/or the answer given (Yes, No, and Maybe.) Print it out and take the quiz with pencil and paper if you serious about getting it right, and feel free to use a computer or browser or whatever to test your results before answering. Good luck!

About the TLD .foo[1]

Clarification (2007-March-02): Some people have stated that the server could possibly return the same resource for any two given URLs so they felt the answer could never be ‘No.’ I definitely see their point, for example http://mysite.foo/bar and http://mysite.foo/bazz could possibly return the same thing but nobody would ever reasonably expect them to do so on their on. So let me clarify to say that I meant a quiz taker to select ‘No‘ in the case where where the resource returned would definitely not be the same thing unless the developer or server admin explicity programmed or configured them to do so. On the other hand ‘Maybe‘ would be used in the case where someone might reasonably expect the two URLs to return the same resource even though the RFC 3986 would define the two URLs as being different such as in [footnote 2], or when it depends on the O/S of the server as in [footnote 3]. Regarding fragments, the question is “In a transaction between a client and a server, is the cache allowed to view them as the same?” Regarding which can be cached I was looking for what are appropriate per the spec, not necessarily whether any particular software in the cloud (i.e. routers, proxies, browsers, etc.) actually does cache but instead “Would it be allowed to cache?”

Questions

  1. The ‘www’ domain
    1. http://mysite.foo/
    2. http://www.mysite.foo/
  2. Letter casing in path
    1. http://mysite.foo/Index.htm
    2. http://mysite.foo/index.htm
  3. Letter casing in domain
    1. http://MySite.foo/index.htm
    2. http://mysite.foo/index.htm
  4. Index.htm vs. Default.aspx
    1. http://mysite.foo/Index.htm
    2. http://mysite.foo/Default.aspx
  5. Trailing slash on domain
    1. http://mysite.foo
    2. http://mysite.foo/
  6. Trailing slash on path
    1. http://mysite.foo/path
    2. http://mysite.foo/path/
  7. Empty question mark
    1. http://mysite.foo/
    2. http://mysite.foo/?
  8. Empty parameter
    1. http://mysite.foo/?
    2. http://mysite.foo/?param=
  9. Port 80
    1. http://mysite.foo/
    2. http://mysite.foo:80/
  10. Port 443
    1. http://mysite.foo/
    2. http://mysite.foo:443/
  11. Https vs. Port 443
    1. https://mysite.foo/
    2. http://mysite.foo:443/
  12. Ftp vs. Http
    1. ftp://mysite.foo/
    2. http://mysite.foo/
  13. Letter casing in parameter name
    1. http://mysite.foo/?param=bar
    2. http://mysite.foo/?Param=bar
  14. Letter casing in parameter value
    1. http://mysite.foo/?param=bar
    2. http://mysite.foo/?param=Bar
  15. Hash vs. no hash
    1. http://mysite.foo
    2. http://mysite.foo#
  16. Hash vs. Fragment
    1. http://mysite.foo#frag
    2. http://mysite.foo#
  17. Fragment vs. no Fragment
    1. http://mysite.foo#frag
    2. http://mysite.foo
  18. Plus vs. Space in path
    1. http://mysite.foo/url+design
    2. http://mysite.foo/url design
  19. Space vs. Encoded Space in path
    1. http://mysite.foo/url design
    2. http://mysite.foo/url%20design
  20. Plus vs. Encoded Plus in path
    1. http://mysite.foo/url+design
    2. http://mysite.foo/url%2Bdesign
  21. Slash vs. Encoded Slash in path
    1. http://mysite.foo/top/second
    2. http://mysite.foo/top%2Fsecond
  22. Ampersand vs. Encoded Ampersand in path
    1. http://mysite.foo/abc&xyz
    2. http://mysite.foo/abc%26xyz
  23. Ampersand vs. Encoded Ampersand in parameter value
    1. http://mysite.foo/?q=abc&xyz
    2. http://mysite.foo/?q=abc%26xyz
  24. Equals vs. Encoded Equals in path
    1. http://mysite.foo/abc=xyz/
    2. http://mysite.foo/abc%3Dyxz/
  25. Equals vs. Encoded Equals in parameter value
    1. http://mysite.foo/?q=abc=xyz
    2. http://mysite.foo/?q=abc%3Dyxz
  26. Parameter order
    1. http://mysite.foo/?abc=123&xyz=987
    2. http://mysite.foo/?xyz=987&abc=123

P.S. Don’t stress if you can’t answer them all. It took me months to uncover all these nuances, and if I were taking this quiz I doubt I could get them right all in one sitting.

FootNotes

  1. I’m using the non-existent top-level domain “.foo” to avoid giving any link-love to arbitrary example sites that don’t deserve it! For the purpose of the quiz, just assume that “.foo” is a functioning top level domain.
  2. Question A.
  3. Question B.
This entry was posted in Everyone, URIs, URLQuiz. Bookmark the permalink.

12 Responses to URLQuiz #2: URL Equivalence and Cachability

  1. Devon Young says:

    From what I understand about URL’s…..

    1. The ‘www’ domain = No
    2. Letter casing in path = Maybe (depends if server’s OS is Windows)
    3. Letter casing in domain = No
    4. Index.htm vs. Default.aspx = No
    5. Trailing slash on domain = Yes
    6. Trailing slash on path = No
    7. Empty question mark = No
    8. Empty parameter = No
    9. Port 80 = Yes
    10. Port 443 = No
    11. Https vs. Port 443 = Yes
    12. Ftp vs. Http = No
    13. Letter casing in parameter name = No
    14. Letter casing in parameter value = No
    15. Hash vs. no hash = Maybe
    16. Hash vs. Fragment = No
    17. Fragment vs. no Fragment = No
    18. Plus vs. Space in path = Yes
    19. Space vs. Encoded Space in path = No
    20. Plus vs. Encoded Plus in path = No
    21. Slash vs. Encoded Slash in path = No
    22. Ampersand vs. Encoded Ampersand in path = No
    23. Ampersand vs. Encoded Ampersand in parameter value = No
    24. Equals vs. Encoded Equals in path = No
    25. Equals vs. Encoded Equals in parameter value = No
    26. Parameter order = Yes

    …I’m curious to know how wrong or how right I am. :) this was fun!

  2. Devon: Thanks. To be honest, I’m going to have to veryif a few of them myself before I post the answers. But I’m hoping to get 10 or more entries before writing up the answers. Any help soliciting those would be appreciated…

  3. Stefan says:

    1. No
    2. No
    3. Yes
    4. No
    5. Yes
    6. No
    7. No
    8. No
    9. Yes
    10. No
    11. Yes
    12. No
    13. No
    14. No
    15. Yes
    16. Yes
    17. Yes
    18. No
    19. Yes
    20. Yes
    21. No
    22. Yes
    23. No
    24. Yes
    25. No
    26. No

  4. polaar says:

    I’m having some trouble with the distinction between No and Maybe. Strictly speaking: for all cases that would normally be No, the server could in fact be resolving them as if they were the same. That’s why I have only “maybe” and no “no” answers (although I’d normally say that they should really all be “no”)

    For the hash/fragment examples: depends whether you count the client side fragment resolving as “dereferencing” (say, you’re storing fragments only on some client software). But I’ve assumed that is not what was meant.

    1. The ‘www’ domain: Maybe
    2. Letter casing in path: Maybe
    3. Letter casing in domain: Yes
    4. Index.htm vs. Default.aspx: Maybe
    5. Trailing slash on domain: Yes
    6. Trailing slash on path: Maybe
    7. Empty question mark: Maybe
    8. Empty parameter: Maybe
    9. Port 80: Yes
    10. Port 443: Maybe
    11. Https vs. Port 443: Maybe
    12. Ftp vs. Http: Maybe
    13. Letter casing in parameter name: Maybe
    14. Letter casing in parameter value: Maybe
    15. Hash vs. no hash: Yes
    16. Hash vs. Fragment: Yes
    17. Fragment vs. no Fragment: Yes
    18. Plus vs. Space in path: Maybe
    19. Space vs. Encoded Space in path: Yes
    20. Plus vs. Encoded Plus in path: Yes
    21. Slash vs. Encoded Slash in path: Maybe
    22. Ampersand vs. Encoded Ampersand in path: Yes
    23. Ampersand vs. Encoded Ampersand in parameter value: Maybe
    24. Equals vs. Encoded Equals in path: Yes
    25. Equals vs. Encoded Equals in parameter value: Maybe
    26. Parameter order: Maybe

    Some more:
    - Plus vs space in querystring
    - semicolon vs ampersand in querystring
    - semicolon vs encoded semicolon in path

  5. Jacek says:

    Hi, here’s my take, and I haven’t compared it with the one reply I see now. Answers taken according to specs (IIRC), not to practice which may differ to a good effect. Only when the two answers differ (same resource, cacheable as same) do I include both.

    a. no
    b. no
    c. yes
    d. no
    e. yes (because of URI syntax, empty path meaning root?)
    f. no (but I wish)
    g. yes?
    h. no
    i. yes
    j. no
    k. no (but https://mysite.foo:443/ yes)
    l. no
    m. no
    n. no
    o. no, yes (because fragID shows resource, but doesn’t affect retrieval)
    p. no, yes
    q. no, yes
    r. no (is the URI with space valid?)
    s. is the URI with space valid?
    t. yes?
    u. no
    v. yes?
    w. no
    x. yes?
    y. no
    z. no (I wish)

    I’ll compare with Devon now. 8-)

  6. Dave Risney says:

    I have issues with your definitions of what the answers mean. I’d argue that ‘maybe’ will always be a better answer than ‘no’ because any server may choose to return the same resource for two different URIs. Additionally ‘yes’ meaning
    “Which can be cached as the same URL?” depends on what entity is doing the caching. So I’m answering with ‘yes’ meaning the URIs are equivalent by RFC 3986 and ‘no’ meaning that the URIs aren’t.

    1. The ‘www’ domain = No
    2. Letter casing in path = No
    3. Letter casing in domain = Yes
    4. Index.htm vs. Default.aspx = No
    5. Trailing slash on domain = Yes
    6. Trailing slash on path = No
    7. Empty question mark = No
    8. Empty parameter = No
    9. Port 80 = Yes
    10. Port 443 = No
    11. Https vs. Port 443 = No
    12. Ftp vs. Http = No
    13. Letter casing in parameter name = No
    14. Letter casing in parameter value = No
    15. Hash vs. no hash = No (but HTTP is guarenteed to return the same resource)
    16. Hash vs. Fragment = No (but HTTP is guarenteed to return the same resource)
    17. Fragment vs. no Fragment = No (but HTTP is guarenteed to return the same resource)
    18. Plus vs. Space in path = No (spaces aren’t allowed in URIs)
    19. Space vs. Encoded Space in path = No (spaces aren’t allowed in URIs)
    20. Plus vs. Encoded Plus in path = No
    21. Slash vs. Encoded Slash in path = No
    22. Ampersand vs. Encoded Ampersand in path = No
    23. Ampersand vs. Encoded Ampersand in parameter value = No
    24. Equals vs. Encoded Equals in path = No
    25. Equals vs. Encoded Equals in parameter value = No
    26. Parameter order = No

  7. @polaar >> I’m having some trouble with the distinction between No and Maybe. Strictly speaking: for all cases that would normally be No, the server could in fact be resolving them as if they were the same. That’s why I have only “maybe” and no “no” answers (although I’d normally say that they should really all be “no”) For the hash/fragment examples: depends whether you count the client side fragment resolving as “dereferencing” (say, you’re storing fragments only on some client software). But I’ve assumed that is not what was meant.

    Good points. I added clarification above to explain where I was looking for ‘No’ vs. ‘Maybe.’

    @jacek >> Answers taken according to specs (IIRC), not to practice which may differ to a good effect.

    Definitely to specs, not practice as who knows what evil lurks in the hearts of equipment on the web? ;-)

    @Dave Risney >> “I have issues with your definitions of what the answers mean. I’d argue that ‘maybe’ will always be a better answer than ‘no’ because any server may choose to return the same resource for two different URIs. Additionally ‘yes’ meaning “Which can be cached as the same URL?” depends on what entity is doing the caching. So I’m answering with ‘yes’ meaning the URIs are equivalent by RFC 3986 and ‘no’ meaning that the URIs aren’t.

    As I said to polaar, good points and I (at least attempted) to clarify above.

    ALSO, thanks all for participating! I did this to provide a thought exercise and a way to collect enough research to write an article, not to pass for fail anybody! :) I’ll be doing the writeup on this one in the (hopefully near) future.

  8. Oh, and the quiz will stay open indefinitely. Please test your knowledge…

  9. Devon Young says:

    Since we’re only talking about URL’s, I answered as if I’m a UA, and all I have to go on is the URL itself and no other information about the resource. I felt that was the point of the question(s).

  10. @Devon Young: Yes, that the perspective of the User Agent was what I was expecting.

  11. Tom Barta says:

    See RFC 2606. There are actually real TLDs you are encouraged to use for domains, because they are reserved for testing:

    .test .example .invalid .localhost

    There’s also example.{com,org,net}, which is a valid DNS entry to be used for testing.

    The domain .foo can’t be far behind the inappropriate (but real) .mobi (^:

  12. Dan Gayle says:

    Did you ever publish the results from this? I’m curious to see what the answer is.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>