FEUP, João Neves
2013
World Wide Web
[email protected]
Before WWW
 Major search tools: Gopher and Archie
 Archie
• Search FTP archives indexes
• Filename based queries
 Gopher
• Friendly interface
• Menu driven queries
João Neves
2
1
FEUP, João Neves
2013
Web Born
 Tim Berners-Lee et al. at CERN in 1991
 HyperText Transfer Protocol (HTTP)
 Hypertext - embedded links in text to link to
another text document
 Hyperlinks
 RFC 1945, May 1996, HTTP/1.0
 RFC 2068 obsolete by RFC 2616, June 1999,
HTTP/1.1
João Neves
3
Internet Evolution
Ano
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
João Neves
Hosts (*)
562
1024
1961
2308
5089
28174
80000
290000
500000
727000
1200000
2217000
4852000
4
2
FEUP, João Neves
2013
Total Sites Across All Domains
August 1995 - March 2008
Source http://news.netcraft.com/archives/web_server_survey.html
João Neves
5
Layering
HyperText
Transfer
Protocol
Simple
Network
Management
Telnet
Transmission Control
Protocol (TCP)
Dynamic
Host
Configuration
User Datagram Protocol
(UDP)
Internet Protocol (IP)
Ethernet
João Neves
Wi-Fi
SONET
6
3
FEUP, João Neves
2013
HTTP





Standard protocol for web transfer
Request-response interaction between client and server
The server has resources as HTML files and images
Request methods: GET, HEAD, PUT, POST, DELETE, …
Response: Status line + additional info (e.g., a web page)
João Neves
7
Introduction to HTTP
 It has been in use by the World-Wide Web global


João Neves
information initiative since 1990
Its first version (referred to as HTTP/0.9) was a simple
protocol for raw data transfer across the Internet
HTTP/1.0 improved the protocol by allowing messages
to be in the format of MIME-like messages:
• containing metainformation about the data
transferred and
• modifiers on the request/response semantics
8
4
FEUP, João Neves
2013
HTTP Transaction
HTTP
Server
Client





HTTP client: web browser
WebRoot
HTTP server: web server
Standard port: 80
dir
Suggested alternate ports: 81, 8080, 8081
file.html
HTTP is used to transmit resources
• File/documents
• Image files
• Query results
• Outputs from CGI scripts
• Anything that can be identified by a URL
João Neves
9
Web Clients
 Lynx 2.0 (1993, character based interface)
 NCSA Mosaic (1993, first with graphical interface)
 Marc Andreessen (author of Mosaic) moved to Netscape
 Microsoft Internet Explorer (“new name for Mosaic…”)
 Mozilla Firefox
 Opera
 Safari
 Chrome …
João Neves
10
5
FEUP, João Neves
2013
The Browser
 The browser
1. fetches the page requested
2. interprets the text and formatting commands that it contains
3. displays the page properly formatted on the screen
 On the page strings of text that are links to other pages, called
hyperlinks
• On the screen the hyperlinks are highlighted, either by underlining,
displaying them in a special color, or both
João Neves
11
Web Servers
 NCSA HTTPd
non-commercial free
 Apache HTTP Server freeware
 Apache Tomcat
freeware
 lighttpd
freeware
 Microsoft Internet Information
João Neves
Services (IIS)
 Zeus Web Server
 Zope
 ...
payware
payware
freeware
12
6
FEUP, João Neves
2013
Server Share
Server Share amongst the Million Busiest Sites, March 2009
source http://news.netcraft.com/archives/web_server_survey.html
João Neves
13
Markup Languages
 HTML
 SHTML
 SGML
 XML
…
João Neves
14
7
FEUP, João Neves
2013
Markup
 “Markup” are codes inserted into texts documents
to manage formatting, printing or other process.
 A description markup indicates the nature,
function, or content of the data in a file.
 A procedural markup defines what processing is to
be carried out at particular points in the document.
João Neves
15
HyperText Markup Language



Language in which web pages are written
Contains formatting commands
Tells browser what to display and how to display
Examples:
<TITLE> Welcome to My Great Site </TITLE>

•
The title of this page is “Welcome to My Great Site”

<B>Great News!</B>

<A HREF=”http://www.xptoo.org/”>I’m the One</A>
•
•
João Neves
Set “Great News!” in boldface
A link pointing to the web page http:// www.xptoo.org/index.html with the text “I’m the One”
displayed
16
8
FEUP, João Neves
2013
Sample HTML Tags
João Neves
<A> </A>
Anchor link or name
<BODY> </BODY>
Document Contents
<BR>
Break
<FORM> </FORM>
Input form
<H1> </H1>
Heading level 1
<HEAD> </HEAD>
Header of a document
<HR>
Horizontal Rule
<HTML> </HTML>
The doc type is HTML
<LI>
List Item
<OL> </OL>
Ordered List
<P> </P>
Paragraph break
<PRE> </PRE>
Preformatted text
<TITLE> </TITLE>
Document title
<UL>
Unnumbered list
17
Uniform Resource Identifiers
RFC 2396, August 1998

A URI is an identifier for some resource, and a Uniform Resource
Locator (URL) gives you specific information as to obtain that
resource

HTTP is also used as a generic protocol for communication between
user agents and proxies/gateways to other Internet systems,
including those supported by the next protocols:
•

João Neves
SMTP, NNTP, FTP
In this way, HTTP allows basic hypermedia access to resources
available from diverse applications
18
9
FEUP, João Neves
2013
Uniform Resource Identifiers
The following examples illustrate URL that are in common use:
Name
Utility
Example
ftp
ftp scheme for File Transfer Protocol services
ftp://ftp.is.co.za/rfc/rfc1808.txt
http
http scheme for Hypertext Transfer Protocol services
http://www.math.uio.no/faq/compressionfaq/part1.html
file
Local file
file:/usr/local/etc/ntp.conf
news
news scheme for USENET news groups and articles
news:comp.infosystems.www.servers.unix
telnet
telnet scheme for interactive services via the TELNET
Protocol
telnet://melvyl.ucop.edu/
mailto
mailto scheme for electronic mail addresses
mailto:[email protected]
gopher
gopher scheme for Gopher and Gopher+ Protocol
services
gopher://stap.umn.edu/00/Weather/Ca/Los%20Angeles
João Neves
19
Uniform Resource Locator
<scheme>: // [userinfo @] hostname [: port] / path [; parameters] [?query]


João Neves
Some URL schemes use the format "user:password" in the
userinfo field.
This practice is NOT RECOMMENDED, because the passing
of authentication information in clear text (such as URI) has
proven to be a security risk in almost every case where it has
been used. [RFC2396]
20
10
FEUP, João Neves
2013
HTTP
HyperText Transfer Protocol

A very simple, stateless protocol for sessionless
exchanges
• Browser creates a new connection each time it wants to make a
new request (for a page, image, etc.)

Exceptions:
• HTTP 1.1 added support for persistent connections and
pipelining
• Clients + servers might keep state information
• Cookies provide a way of recording state
João Neves
21
The http protocol: more
http: TCP transport service




client initiates TCP connection
(creates socket) to server, port 80
server accepts TCP connection from
client
http messages (application-layer
protocol messages) exchanged
between browser (http client) and
Web server (http server)
TCP connection closed
João Neves
http is “stateless”

server maintains no
information about past
client requests
22
11
FEUP, João Neves
2013
HTTP
GET /path/to/file/index.html HTTP/1.0
 HTTP method
 Path: the part of the URL after the hostname, i.e.
request URI
 The HTTP version
João Neves
23
jneves@bart(1)$ telnet www.inescporto.pt 80
[...]
GET /~jneves/index.html HTTP/1.0
From: [email protected]
User-Agent: Camachina/5.0
HTTP
Session
HTTP/1.1 200 OK
Date: Tue, 26 May 2009 18:06:13 GMT
Server: Apache/2.30 (Unix) PHP/5.5 DAV/2 mod_perl/2.9 Perl/v5.20
Last-Modified: Fri, 04 May 2007 18:41:20 GMT
Accept-Ranges: bytes
Content-Length: 91
Connection: close
Content-Type: text/html
<html>
<head>
<meta HTTP-EQUIV="REFRESH" content="0; url=./index.shtml">
</head>
</html>
Connection closed by foreign host.
João Neves
24
12
FEUP, João Neves
2013
HTTP Request Headers
Header
Description
From
RFC822 E-mail address of the user
User-Agent
Client Software
Accept
File types that client will accept, e.g., text/plain, text/html
Accept-encoding
Compression methods, e.g., x-compress; x-zip
Accept-Language
Language(s) used
Referrer
(optional) URL of the document (or element within the
document) from which the URL in the request was obtained
If-Modified-Since
Return document if modified since specified date
Content-length
Length in octets of data to follow
Content-Type
Type of the item
Pragma: no-cache
Directive understood by a proxy server; When present the
proxy should not return a document from the cache
João Neves
25
HTTP Response Headers
Header
Description
Server
Server Software
Date
Current Date
Last-Modified
Modification date of the document
Expires
Document expiration date
Location
The location of the document in
redirection responses
Pragma
A hint, e.g. no cache
MIME-version
João Neves
Link
URL of document’s parent
Content-Length
Length in octets
Allowed
Requests that user can issue, e.g., GET
26
13
FEUP, João Neves
2013
HTTP Status Codes
Code
Text
2xx
Success
3xx
Redirection
301
Moved
302
Found
4xx
Client Errors
400
Bad Request
401
Unauthorized
404
Not found
5xx
Server Errors
500
Internal Error
502
Service Overload
João Neves
27
HTTP over TLS
bash-4.0# openssl s_client -connect secure.xptoo.org:443 -showcerts
CONNECTED(00000004)
[…]
--GET / HTTP/1.0
HTTP/1.1 200 OK
[…]
João Neves
28
14
FEUP, João Neves
2013
HTTP 1.1 Features
 Persistent TCP Connections: remain open for




multiple requests
Partial Document Transfers: clients can specify start
and stop positions
Conditional Fetch: several additional conditions
Better content negotiation
More flexible authentication
João Neves
29
Static vs. Dynamic Pages
 HTML pages vs. database
 Personalized
 Context-aware services
 Browsing Device-dependent
João Neves
30
15
FEUP, João Neves
2013
HTTP Proxy





An intermediary program which acts as both a server and a
client for the purpose of making requests on behalf of other
clients;
Requests are serviced internally or by passing them on, with
possible translation, to other servers;
A proxy must implement both the client and server
requirements of this specification;
The client makes a request to the proxy server using the
complete URL;
The proxy server connects to the remote server and requests
the resource relative to that server (no protocol and hostname
in the URL).
João Neves
31
HTTP Proxy
GET http://hostname/path/to/file.html HTTP/1.0
GET /path/to/file.html HTTP/1.0
HTTP
Proxy
Server
Client
Server
HTTP/1.0 200 Document
....
HTTP/1.0 200 Document
....


The client makes a request to the proxy server using the
complete URL;
The proxy server connects to the remote server and requests
the resource relative to that server (no protocol and
hostname in the URL).
João Neves
WebRoot
dir
file.html
32
16
FEUP, João Neves
2013
HTTP Proxy + Cache
GET http://hostname/path/to/file.html HTTP/1.0
GET /path/to/file.html HTTP/1.0
HTTP
Proxy
Server
Client
Server
HTTP/1.0 200 Document
....
HTTP/1.0 200 Document
....
WebRoot
dir
Cache
file.html
João Neves
33
HTTP Proxy
 Transparent
 Configured (http://proxy.xptoo.org:3128/)
 Automatic (Web Proxy AutoDiscovery)
 Advantages vs. disadvantages
João Neves
34
17
FEUP, João Neves
2013
Why Web Caching (Proxies)?
origin
servers
Assume: cache is “close” to client
(e.g., in same network)
 smaller response time: cache
“closer” to client
 decrease traffic to distant
servers
Internet
1,5 Mb/s access link
(bottleneck…)
institutional
network
• link out of institutional/local ISP
network often bottleneck
10 Mb/s LAN
institutional
cache
João Neves
35
Web Load Handling
 Thousands of clients
 Load sharing
 DNS Round Robin
 Web Switching L4 L7 – Load Balancing Devices
•
•
•
•
Nortel Alteon
A10 Networks
Cisco Content Switching
...
 Akamai
João Neves
36
18
FEUP, João Neves
2013
Bibliography
 Comer, Douglas E.
Internetworking with TCP/IP
(VOL I)
Prentice Hall, 5th Ed. (2006)
ISBN 0-13-187671-6
 Tanenbaum, Andrew S.
Computer Networks
Prentice Hall International Editions
4th Ed. (2003)
ISBN 0-13-038488-7
João Neves
37
19
Download

World Wide Web Before WWW