Protocolo HTTP
Protocolo HTTP
Let us start with this quote from the HTTP
specification document [2]:
The HTTP protocol is based on a request /response
paradigm. A client establishes a connection with a server
and sends a request to the server in the form of a request
method, URI, and protocol version, followed by a MIME-like
message containing request modifiers, client information,
and possible body content. The server responds with a
status line, including the message's protocol version and a
success or error code, followed by a MIME-like message
containing server information, entity meta-information, and
possible body content.
Protocolo HTTP (2)
What this means to libwww-perl is that
communication always take place through
these steps:
First a request object is created and configured.
This object is then passed to a server and we get
a response object in return that we can examine.
A request is always independent of any previous
requests, i.e. the service is stateless (sem
estado). The same simple model is used for any
kind of service we want to access.
Exemplos
1) if we want to fetch a document from a remote file
server, then we send it a request that contains a
name for that document and the response will
contain the document itself.
2) If we access a search engine, then the content of the
request will contain the query parameters and the
response will contain the query result.
3) If we want to send a mail message to somebody
then we send a request object which contains our
message to the mail server and the response object
will contain an acknowledgment that tells us that the
message has been accepted and will be forwarded to
the recipient(s).
O objeto Request
The libwww-perl request object has the class name HTTP::Request. The fact that
the class name uses HTTP:: as a prefix only implies that we use the HTTP model
of communication. It does not limit the kind of services we can try to pass this
request to. For instance, we will send HTTP::Requests both to ftp and gopher
servers, as well as to the local file system.
The main attributes of the request objects are:
The method is a short string that tells what kind of request this is. The most
common methods are GET, PUT, POST and HEAD.
The uri is a string denoting the protocol, server and the name of the
"document" we want to access. The uri might also encode various other
parameters.
The headers contain additional information about the request and can also
used to describe the content. The headers are a set of keyword/value pairs.
The content is an arbitrary amount of data.
O objeto Response
The libwww-perl response object has the class name HTTP::Response. The main
attributes of objects of this class are:
The code is a numerical value that indicates the overall outcome of the request.
The message is a short, human readable string that corresponds to the code.
The headers contain additional information about the response and describe
the content.
The content is an arbitrary amount of data.
Since we don't want to handle all possible code values directly in our programs, a
libwww-perl response object has methods that can be used to query what kind
of response this is. The most commonly used response classification methods
are:
is_success()
The request was was successfully received, understood or accepted.
is_error()
The request failed. The server or the resource might not be available, access to the
resource might be denied or other things might have failed for some reason.
O User Agent (UA)
Let us assume that we have created a request object. What do we
actually do with it in order to receive a response?
The answer is that you pass it to a user agent object and this
object takes care of all the things that need to be done (like
low-level communication and error handling) and returns a
response object. The user agent represents your application on
the network and provides you with an interface that can accept
requests and return responses.
The user agent is an interface layer between your application code
and the network. Through this interface you are able to access
the various servers on the network.
User Agent
The class name for the user agent is LWP::UserAgent.
Every libwww-perl application that wants to
communicate should create at least one object of this
class. The main method provided by this object is
request(). This method takes an HTTP::Request
object as argument and (eventually) returns a
HTTP::Response object.
The user agent has many other attributes that let you
configure how it will interact with the network and
with your application.
The timeout specifies how much time we give remote servers to respond before the library
disconnects and creates an internal timeout response.
The agent specifies the name that your application should use when it presents itself on the
network.
The from attribute can be set to the e-mail address of the person responsible for running the
application. If this is set, then the address will be sent to the servers with every request.
The parse_head specifies whether we should initialize response headers from the <head>
section of HTML documents.
The proxy and no_proxy attributes specify if and when to go through a proxy server.
URL:http://www.w3.org/pub/WWW/Proxies/
The credentials provide a way to set up user names and passwords needed to access certain
services.
Many applications want even more control over how they interact with the network and they get
this by sub-classing LWP::UserAgent. The library includes a sub-class, LWP::RobotUA, for
robot applications
This example shows how the user agent, a request and a response are
represented in actual perl code:
# Create a user agent object use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
# Create a request
my $req = HTTP::Request->new(POST =>
'http://search.cpan.org/search'); $req>content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist'); # Pass request
to the user agent and get a response back my
$res = $ua->request($req); # Check the outcome of the response
if ($res->is_success) { print $res->content; } else { print
$res->status_line, "\n"; }
The $ua is created once when the application starts up. New request
objects should normally created for each request sent.
Capítulo 1 - Introdução
Web Client (Cliente Web)
Cliente Web: é uma aplicação que
comunica-se com um servidor Web
usando o protocolo HTTP
Cliente Web (2)
A interface mais comum a WWW é o
navegador (browser)
web browser permite que você faça o
download de documentos web e veja-os
formatados na tela
URL (Universal Resource
Locator)
É um subconjunto da URI (Universal
Resource Identifier, ou Identificador de
Recursos Universal)
HTTP (Hypertext Transport
Protocol)
Common Gateway Interface
(CGI)
Capítulo 2 – Desmistificando o
Browser
Transação HTTP
programa web
cliente web
servidor web
o protocolo HTTP é baseado em texto,
isto é, podemos ver os comandos sendo
trocados
transação web
A requisição através do
browser
http://hypothetical.ora.com/
http://  protocolo usado
hypothetical.ora.com  servidor
/  diretório no servidor
A requisição do cliente
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/3.0Gold (WinNT; I)
Host: hyphotetical.ora.com
Accept: image/gif, image/x-xbitmap, */*
A resposta do servidor
response
header
HTTP/1.0 200 OK
Date: Fri, 04 Oct 1996 14:31:51 GMT
Server: Apache/1.1.1.
Content-type: text-html
Content-length: 327
Last-modified: Fri, 04 Oct 1996 14:06:11 GMT
<title>
body or
...
body or
entity-body</title>
Transação HTML
Cliente
Servidor
HTML (Hypertext Markup
Language)
Transações
Método POST
POST /cgi-bin/query HTTP/1.0
Referer:
Connection:
User-Agent:
Host:
Accept:
Content-type: application/x-www-form-urlencoded
Content-length: 47
querytype=subject&queryconst=numerical+analysis
Tipos de métodos de
requisição
GET
POST
Método PUT
PUT /example.html HTTP/1.0
Connection:
User-Agent:
Pragma:
Host:
Accept:
Content-Length:
<!
</HTML>
Estrutura de uma transação
HTTP
Requisição do Cliente
Method URI HTTP-version
General-header
Request-header
Entity-header
Entity-body
Resposta do Servidor
HTTP-version Status-code Reason-phrase
General-header
Response-header
Entity-header
Entity-body
Estrutura de uma requisição
do cliente
Estrutura de uma resposta do
Servidor
Capítulo 3 – Aprendendo HTTP
HTTP é um protocolo stateless no qual
o cliente faz uma requisição (request)
ao servidor que envia uma resposta
(response) e então a transação é
finalizada
Métodos de Requisição do
Cliente
O método de requisição do cliente é um
“comando” ou uma requisição que o
cliente web faz ao servidor
Métodos: GET, POST, HEAD, DELETE,
TRACE, PUT
GET: obtendo um Documento
HEAD: Obtendo a informação
do cabeçalho
POST: Enviando dados ao
servidor
PUT: Armazenando o EntityBody na URL
DELETE: Removendo a URL
TRACE: View the Client’s Message
Through the Request Chain
Versões do HTTP
HTTP 1.0
HTTP 1.1
melhor implementação de conexões persistentes
Multihoming (permite um único host, porém
respondendo por vários domínios diferentes)
entity tags
byte ranges – permite que apenas partes do
documento sejam recuperadas
digest authentication
Códigos de Resposta do
Servidor
Faixa de
valores
100-199
Significado da Resposta
200-299
400-499
Requisição do cliente foi feita com
sucesso
A requisição do cliente foi redirecionada.
Outras alterações são necessárias
Requisição do cliente está incompleta
500-599
Erros do servidor
300-399
Informacional
Cabeçalhos HTTP
Diferentes tipos de cabeçalhos
General headers
Request headers
Response headers
Entity Headers
Conexões Persistentes
Connection: Keep-Alive
Tipos de mídia
Accept header
Content-Type
Exemplos:
Accept: */*
Accept: type/*
Accept: type/subtype
Caching de Cliente
Obtendo o tamanho do
Conteúdo
cabeçalho Content-length
Faixa de Bytes (Byte ranges)
Referring Documents
Referer header
Identificação de Cliente e
Servidor
Autorização
An Authorization header is used to
request restricted documents
 Authorization: SCHEME REALM
Exemplo:
Authorization: BASIC
username:password,
onde username:password é codificado em
base64
Autenticação
The realm of the BASIC authentication
schema indicates the type of
authentication requested
See also Digest authentication
(disponível em HTTP 1.1)
Cookies
Set-Cookie e cabeçalhos Cookie
Capítulo 4 – A Biblioteca
Socket
The socket library is a low-level
programmer’s interface that allows
client to set up a TCP/IP connection and
communicate directly to servers.
Servers use sockets to listen for
incoming connections, and clients use
sockets to initiate transactions on the
port that the server is listening to.
Uma conversação típica
usando Sockets
Socket Calls
Rotinas do Cliente
Rotinas do Servidor
socket()
socket()
connect()
bind()
listen
accept()
syswrite()
sysread()
close()
sysread()
syswrite()
close()
Usando chamadas de Socket
Função
socket()
connect()
sysread()
syswrite()
close()
bind()
listen()
accept()
Uso
Proposta
Capítulo 5 – A biblioteca LWP
A Web trabalha sobre o protocolo
TCP/IP, onde o cliente e o servidor
estabelecem uma conexão e trocam as
informações necessárias através dessa
conexão
Apêndice A – Cabeçalhos
HTTP
Há quatro categorias de cabeçalhos:
 General
 Request
 Response
 Entity
Summary if Support Across
HTTP Versions
HTTP 0.9
HTTP 1.0
HTTP 1.1
Apêndice B – Tabelas de
Referência
Media Types
Character Encoding
Languages
Character Sets
Tipos de Mídias
Content-type header
Accept header
Internet Media Types
Text Type/Subtype
text/plan
text/richtext
text/enriched
text/tab-separetae-values
text/html
text/sgml
Multipart Type/Subtype
Message Type/Subtype
Application Type/Subtype
Codificação de Caracteres
Content-type of applicatrion/x-wwwform-urlencoded
caracteres especiais são codificados
para eliminar a ambiguidade
Veja RFC 1738
(http://www.faqs.org/rfcs/rfc1738.html)
Linguagens
A language tag is of the form of:
<primary-tag> <-subtag>
where zero or more subtags are allowed
See RFC 1766 for more information
Conjunto de Caracteres
Accepted-language
Content-language
Veja RFC 1700
(http://www.faqs.org/rfcs/rfc1700.html)
Bibliografia
WONG, C. Web Client Programming
with Perl. 1st Edition March 1997.
O’Reilly
[2]
URL:http://www.w3.org/pub/WWW/Pro
tocols/
Glossário
IANA – Internet Assigned Number
Authority
CGI – Common Gateway Interface
Backup Slides
HTTP é stateless
O HTTP é um protocolo stateless (semestado)  não existe uma conexão
permanente entre o servidor e o cliente
(navegador) portanto o servidor não
sabe se uma conexão seguinte está
relacionada a conexão anterior
Protocolo HTTP
Request HTTP (requisição)
Response HTTP (resposta)
Corpo de uma requisição HTTP
Cookies
São informações armazenadas no
computador do usuário que são
opcionalmente enviadas em cada
requisição pelo navegador, processado
pelo servidor e recebido de volta na
resposta
Container Web