HTTP - Hypertext Transfer Protocol

Introduction to HTTP and the Internet
Decifering a browser's location bar
What is the difference between HTTP and HTML?
Why do some addresses have so many slashes in them?
Hosts, domains, sub-domains, servers and aliases
What are clients and servers?

How HTTP Works

The GET Method and HTTP Requests
HTTP Response Codes


Decifering a browser's location bar

Ever wonder what the "http://www.yahoo.com" in your browser's location bar means. We'll break it down and explain what each part does and what it means.

http is the protocol being used. Http stands for Hyper Text Transer Protocol and is the common protocol for the Internet. Other protocols are FTP, TELNET and GOPHER. The protocol is the method or procedure for transfering information.

www means the site can be found on the World Wide Web, the open global network which is part of the Internet. Many times WWW will be replaced by "home" and can even be left out.

.yahoo in the name of the host or domain, and also sometimes called an alias(there is a difference between host, domain and alias to be explained later; A web site can be either a host, a domain, an alias or a combination of all three). This is the name of the web site or the name of the machine which hosts the web site.

.com is the type of domain or site. com stands for "commercial enterprise." Other tpyes are .gov (U.S. government), .mil (U.S. military), .net (network), .edu (educational institution, university), org (an organization). These are known as top level domains. There are also several other sub domain types and each country has its own domain code (i.e., .ca is Canada, .de is Germany).


What is the difference between HTTP and HTML?

HTML stands for Hypertext Markup Language. The document you are reading is written in HTML. HTTP is the protocol that is tranfering this HTML document to your browser window. HTML contains text to be displayed and tags that tell the browser how to display them. In the previous sentence the word "tags" was italicized. Within the HTML document the word "tags" is enclosed in tags that look like this:< I >tags< /I >. The < I > tells the browser to begin italisizing text and the < /I > tells the browser to stop italisizing text. For more information on HTML tags, see the HTML tutorial.

Every web page, whether it displays it in the location bar or not, has a default HTML page called "index.html". The default page of http://www.yahoo.com is "http://www.yahoo.com/index.html". Try typing in the two different address and you will see that it produces the same result.

HTML is sometimes spelled HTM. This is because some systems de not recognize more than three letters in a document file extension. The page index.html is not the same as index.htm. Be specific.

.HTML is a type of file name like .DOC is a MSWord document, .WPD is a WordPerfect document, .TXT is a text document.


Why do some addresses have so many slashes in them?

The address of the page you are viewing is "http://home.att.net/~gobruen/progs/networking/http_tutorial.html". http://home.att.net is the name of the host domain.
/~gobruen is the name of the sub-domain.
http_tutorial.html is the name of the document you are viewing.
networking/ is the name of the folder the document is located in.
progs/ is the name of the folder that the "networking/" folder is located in.
Each folder is seperated by a slash.


Hosts, domains, sub-domains, servers and aliases

Networks on the Internet can be confugured in many different ways. People may refer to Intetrnet addresses as "hosts" or "domains" or "aliases" or a variety of other names. Computers all have unique numbers used to identify themselves. Serial numbers, Ethernet address numbers, TCP/IP addresses and dozens of others. These numbers are long, difficult to remember and all look the same after a while. In order to make thing simpler for ourselves, we have named computers like people. These definitions may clear up some of the confussion:

Host
In the traditional sense, a host computer is a master or main computer on a network. Early computer systems consisted only of "dumb" terminals connected to a host which held all the files and applications needed. The terminals only had the minimum utilities needed to access the host. Most computer networks now consist of several hosts or file servers which handle email, printer management, network security and file management. Users may also dial-in to hosts remotely from terminals or personal computers over public telephone lines. A host in the Internet sense is a machine which has holds the files or applications that can be accessed through the Internet or through internet protocols.

Domain
A domain name is common name for a network of computers or services on one computer. The domain Yahoo has a number of different machines that all can be accessed through the name www.yahoo.com.

Alias
A web server may have several aliases. A company's files way be accessed through the Internet at the address www.company.com, but the name of the machine within the company's walls ATHENA. That company may loan out a part of its Internet service who names their site www.othercompany.com. company.com, othercompany.com and ATHENA are all aliases for the same computer.

Server
A server is a specially configured computer that provides a service to users. An Email server provides email service. A Web server can provide Internet access from within a network or Web site access from outside a netowrk. These machines are usually protected or separated by a firewall.


Clients and Servers

There are atleast two parties in every Internet action. The client, a browser(Netscape, Explorer) and the server, which is the machine or host the browser is attempting to get data from. On a LAN or WAN network clients are also called "nodes."




How HTTP works


The GET Method

Whenever an address is entered into a location bar, for example: http://www.yahoo.com/index.html. The user typing this address is making a request for an HTML document named "index" from the server "yahoo.com" using the HTTP protocol. Got all that? This request in the HTTP protocol is called a GET method. The user(client browser) makes a request to GET the document index.html. It looks like this:
GET /index.html HTTP/1.0


Behind the GET method is browser information the host machine needs to prepair the request. This information consists of the type of browser making the request, the type of computer/operating system the browser is loaded on, and a list of the types of documents the browser will accept. It looks like this:
User-Agent: Mozilla/4.0 (Win95)
Accept: image/gif, image/jpeg, image/x-xbitmap

There is always a blank line at the end of the request. Without a blank line the reuqest will fail. This is important to remeber when writing CGI scripts or other scripts that generate HTTP requests.

The GET method is the most common method, but HTTP comes with a variety of other methods. Methods are always typed in CAPITAL letters, see the chart below.


Method Description
GETClient request to view a document
POSTClient data submission to a host through a web page
HEADRequest for only a document header
PUTRequest to store a document at a URI
DELETERequest to remove a document from a URI
LINKRequest to associate a header with a document
UNLINKRemoves the connection between a header and a document
OPTIONSRequest for information about the resources at a host server
TRACEA debugging header. Compares a saved document with the same document transfered over the Web



HTTP Response Codes

Code Range Type of Reponse
100 to 199Information for the client about the server
200 to 299Reports a successfully completted client request
300 to 399Page or server redirection
400 to 499Page, server, host not found/Client request failed
500 to 599Internal server errors


Codes in the 100 to 299 range are usually never seen by the user and only read by the browser software
Codes in the 300 to 399 range are sometimes seen briefly while the server redirects the client to another location
Codes in the 400 to 599 range are more commonly seen. 400 to 499 usually means that the page no longer exists,
an "< A HREF= >" tag is mislinked or the user misspelled the address in the location bar, the "404" error.
500 to 599 represents various internal server error, the server may be down or the CGI scripts have software bugs. 500 errors may also mean that a request was denied for security reasons.