Search This Blog

Loading...

Saturday, 12 March 2011

1b. The Anatomy of a URL

Most of us know what a URL is and what it is for. But let’s look at it in a bit more detail than you already know.

What is a URL?

The acronym URL stands for Uniform Resource Locator. It is more commonly referred to simply as a link. However, there’s a bit more to it than that. A URL is made up of several parts and the syntax within it determines what type of file you are looking for and where to find that file etc. It also allows you to pass along several optional parameters to the server.

Different parts of a URL


The different parts of a URL



Let’s examine the URL of this page as an example. http://web-knowhow.blogspot.com/2011/03/1b-anatomy-of-url.html

The Protocol

The first part, http://, is known as the Protocol. The protocol tells the server what type of file is being requested. HTTP stands for HyperText Transfer Protocol and it is primarily used for requesting HTML (HyperText Markup Language) files. There are many other protocols, like FTP, SMTP etc. But since browsers deal with HTML files, they mostly use the HTTP protocol. The colon and forward slashes, :// are not part of the protocol name. They are just for correct syntax, and a URL won’t work if they are omitted. The inventor of the http protocol, Tim Breners Lee, has said that he now regrets making them part of the syntax, adding that it seemed a good idea at the time J. In most browsers nowadays, you can omit writing the http:// part in the address bar, but if you want to add a link to your webpage, you have to provide the protocol you will be using.

The Resource name

The next part, www.web-knowhow.blogspot.com, is the resource name. It is identified by a series of domains, separated by periods (dots) between them.

The Sub-domain

In this example, the www is the sub-domain. The www stands for “World Wide Web”. It has become a default sub-domain and in many cases, it is no longer required. Most web servers now re-direct to this default sub-domain if none is provided. That is why if you just write web-knowhow.blogspot.com into your browser, you will still arrive at the same place.

So what is a sub-domain? In the simplest sense, it is a folder that your web content is located in. Most web servers serve multiple types of content. So the sub-domains allow servers to identify unique sites or unique areas of content. When you upload your site, chances are that you are looking for the www directory. There are many other common sub-domains like ftp or video.

The Domain name

The “web-knowhow.blogspot” part of our URL is called the Domain name. This is the name that the browser uses to check with the web server whether this site exists or not. The .blogspot is not to be confused as sort of domain. It is simply an extension of the domain name indicating the service that the site is hosted on. So web-knowhow.blogspot whole is a domain name just like google is in www.google.com.

Top-level Domains

The .com is referred to as the Top-level domain (TLD). This is the name server that your browser will use to resolve the location of the requested site. Most common TLDs are .com, .org, .net etc. You might have noticed that some sites have another part next to the top level domain for e.g. .uk, .us etc. This is called country-code top-level domain (ccTLD). They are specific to each country.

The file path

Since a URL is a request for a specific file, whether a static web page or a video, it should also specify the path on the server to the file needed. In our example, the file path is 2011/03/1b-anatomy-of-url.html#more.

With this path, a browser says to the host server, okay, now that I have found this site (web-knowhow.blogspot.com), go to the folder inside it named 2011. Inside, you’ll find another folder named 03. Give me the file named “1b-anatomy-of-url.html” found within the 03 folder. And the server obliges. You might say that why isn’t there any file name in www.google.com? Well, like the sub-domains, if you do not enter a file name/path, the web server will automatically redirect you to the default value. Although a web server may set any default value (name) to redirect to, it is usually index.html or default.html. That is why most of the websites have a home page named either of these. So whether you write www.google.com or www.google.com/index.html, it makes no difference. You can leave off writing the path for the home page though.

If your website doesn’t have an index.html or default.html file, then the server will return a sitemap of your website. A sitemap is a list of all the files and folders within your website. Sometimes, it is intentional to provide a sitemap. But most of the times, it can jeopardize your privacy since it displays ALL the data available in your website, open for anyone to access and download.

Like I said earlier, you can use a URL to pass along a parameter to the server. We can use this to refine the information retrieved by the server. Let’s say I want to go straight to a specific part in the page and not have to find it manually. Well, for that, I’ll have to edit the HTML of my page first. I will first give the specific section an ID (we’ll talk about how to set up IDs in the future). Then I’ll insert a # mark at the end of my URL and then write the ID that I specified. Here is an example. I gave the above heading “The Protocol” the ID “more”, since that is next thing you read after clicking on the Read More button. If I write #more at the end of my URL (www.web-knowhow.blogspot.com/2011/03/1b-anatomy-of-url.html#more), the browser is going to jump right down to that heading. This technique is especially useful in pages which have a lot of alphabetically arranged information, like a glossary page or similar.

Server port

A URL must also specify which port on the server you should connect to. The port number should be appended right next to the top-level domain after a colon. Connecting to a server with different ports might be familiar to some of you online gamers, but I can bet that many of you have never heard of, or seen a port definition in a URL before. That’s because just like a file path/name, if a port isn’t specified, a default value is taken. For most of the websites, the default value is :80. So you can safely enter www.google.com:80 into your browser and still arrive at the same page. It’s best to ignore it however, since it’s not really necessary, unless you want to connect to another port.

Absolute and Relative URLs

An Absolute URL is a complete and unique URL, from the protocol to the file path and parameters. Absolute links can work anywhere, regardless of where you put them. So if you put this URL, http://web-knowhow.blogspot.com/2011/03/1b-anatomy-of-url.html anywhere on your site, you will still be able to come to this page.

A Relative URL points to the resource from a current reference point, and is usually used within the same domain. It does not contain any protocols or resource names, only the file name and path. In our example, the relative URL is 2011/03/1b-anatomy-of-url.html. It is only relative to our home page, and if I copy it to another website, or even another webpage within my own site that is not in the same root folder as the home page, it will not work, since it will look for a folder named 2011 in the same folder as the webpage the URL is on. So it’s best to avoid them if you change the location of your web pages often.

4 comments:

  1. I have read this post. collection of post is a nice one ..that am doing website designing company chennai india and website development company chennai india. That I will inform about your post to my friends and all the best for your future posts..

    ReplyDelete
  2. I'm on blogger, and your "you can't leave and take your SEO" really worries me now.logo design

    ReplyDelete
  3. Good Posting, Nice keep it up, thanks ffor sharing.
    seo company india

    ReplyDelete