Protocolo HTTP Protocolo HTTP Let us start with this quote from the HTTP specification document [2]: The HTTP protocol is based on a request /response paradigm. A client establishes a connection with a server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible body content. Protocolo HTTP (2) What this means to libwww-perl is that communication always take place through these steps: First a request object is created and configured. This object is then passed to a server and we get a response object in return that we can examine. A request is always independent of any previous requests, i.e. the service is stateless (sem estado). The same simple model is used for any kind of service we want to access. Exemplos 1) if we want to fetch a document from a remote file server, then we send it a request that contains a name for that document and the response will contain the document itself. 2) If we access a search engine, then the content of the request will contain the query parameters and the response will contain the query result. 3) If we want to send a mail message to somebody then we send a request object which contains our message to the mail server and the response object will contain an acknowledgment that tells us that the message has been accepted and will be forwarded to the recipient(s). O objeto Request The libwww-perl request object has the class name HTTP::Request. The fact that the class name uses HTTP:: as a prefix only implies that we use the HTTP model of communication. It does not limit the kind of services we can try to pass this request to. For instance, we will send HTTP::Requests both to ftp and gopher servers, as well as to the local file system. The main attributes of the request objects are: The method is a short string that tells what kind of request this is. The most common methods are GET, PUT, POST and HEAD. The uri is a string denoting the protocol, server and the name of the "document" we want to access. The uri might also encode various other parameters. The headers contain additional information about the request and can also used to describe the content. The headers are a set of keyword/value pairs. The content is an arbitrary amount of data. O objeto Response The libwww-perl response object has the class name HTTP::Response. The main attributes of objects of this class are: The code is a numerical value that indicates the overall outcome of the request. The message is a short, human readable string that corresponds to the code. The headers contain additional information about the response and describe the content. The content is an arbitrary amount of data. Since we don't want to handle all possible code values directly in our programs, a libwww-perl response object has methods that can be used to query what kind of response this is. The most commonly used response classification methods are: is_success() The request was was successfully received, understood or accepted. is_error() The request failed. The server or the resource might not be available, access to the resource might be denied or other things might have failed for some reason. O User Agent (UA) Let us assume that we have created a request object. What do we actually do with it in order to receive a response? The answer is that you pass it to a user agent object and this object takes care of all the things that need to be done (like low-level communication and error handling) and returns a response object. The user agent represents your application on the network and provides you with an interface that can accept requests and return responses. The user agent is an interface layer between your application code and the network. Through this interface you are able to access the various servers on the network. User Agent The class name for the user agent is LWP::UserAgent. Every libwww-perl application that wants to communicate should create at least one object of this class. The main method provided by this object is request(). This method takes an HTTP::Request object as argument and (eventually) returns a HTTP::Response object. The user agent has many other attributes that let you configure how it will interact with the network and with your application. The timeout specifies how much time we give remote servers to respond before the library disconnects and creates an internal timeout response. The agent specifies the name that your application should use when it presents itself on the network. The from attribute can be set to the e-mail address of the person responsible for running the application. If this is set, then the address will be sent to the servers with every request. The parse_head specifies whether we should initialize response headers from the <head> section of HTML documents. The proxy and no_proxy attributes specify if and when to go through a proxy server. URL:http://www.w3.org/pub/WWW/Proxies/ The credentials provide a way to set up user names and passwords needed to access certain services. Many applications want even more control over how they interact with the network and they get this by sub-classing LWP::UserAgent. The library includes a sub-class, LWP::RobotUA, for robot applications This example shows how the user agent, a request and a response are represented in actual perl code: # Create a user agent object use LWP::UserAgent; $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); # Create a request my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search'); $req>content_type('application/x-www-form-urlencoded'); $req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print $res->status_line, "\n"; } The $ua is created once when the application starts up. New request objects should normally created for each request sent. Capítulo 1 - Introdução Web Client (Cliente Web) Cliente Web: é uma aplicação que comunica-se com um servidor Web usando o protocolo HTTP Cliente Web (2) A interface mais comum a WWW é o navegador (browser) web browser permite que você faça o download de documentos web e veja-os formatados na tela URL (Universal Resource Locator) É um subconjunto da URI (Universal Resource Identifier, ou Identificador de Recursos Universal) HTTP (Hypertext Transport Protocol) Common Gateway Interface (CGI) Capítulo 2 – Desmistificando o Browser Transação HTTP programa web cliente web servidor web o protocolo HTTP é baseado em texto, isto é, podemos ver os comandos sendo trocados transação web A requisição através do browser http://hypothetical.ora.com/ http:// protocolo usado hypothetical.ora.com servidor / diretório no servidor A requisição do cliente GET / HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/3.0Gold (WinNT; I) Host: hyphotetical.ora.com Accept: image/gif, image/x-xbitmap, */* A resposta do servidor response header HTTP/1.0 200 OK Date: Fri, 04 Oct 1996 14:31:51 GMT Server: Apache/1.1.1. Content-type: text-html Content-length: 327 Last-modified: Fri, 04 Oct 1996 14:06:11 GMT <title> body or ... body or entity-body</title> Transação HTML Cliente Servidor HTML (Hypertext Markup Language) Transações Método POST POST /cgi-bin/query HTTP/1.0 Referer: Connection: User-Agent: Host: Accept: Content-type: application/x-www-form-urlencoded Content-length: 47 querytype=subject&queryconst=numerical+analysis Tipos de métodos de requisição GET POST Método PUT PUT /example.html HTTP/1.0 Connection: User-Agent: Pragma: Host: Accept: Content-Length: <! </HTML> Estrutura de uma transação HTTP Requisição do Cliente Method URI HTTP-version General-header Request-header Entity-header Entity-body Resposta do Servidor HTTP-version Status-code Reason-phrase General-header Response-header Entity-header Entity-body Estrutura de uma requisição do cliente Estrutura de uma resposta do Servidor Capítulo 3 – Aprendendo HTTP HTTP é um protocolo stateless no qual o cliente faz uma requisição (request) ao servidor que envia uma resposta (response) e então a transação é finalizada Métodos de Requisição do Cliente O método de requisição do cliente é um “comando” ou uma requisição que o cliente web faz ao servidor Métodos: GET, POST, HEAD, DELETE, TRACE, PUT GET: obtendo um Documento HEAD: Obtendo a informação do cabeçalho POST: Enviando dados ao servidor PUT: Armazenando o EntityBody na URL DELETE: Removendo a URL TRACE: View the Client’s Message Through the Request Chain Versões do HTTP HTTP 1.0 HTTP 1.1 melhor implementação de conexões persistentes Multihoming (permite um único host, porém respondendo por vários domínios diferentes) entity tags byte ranges – permite que apenas partes do documento sejam recuperadas digest authentication Códigos de Resposta do Servidor Faixa de valores 100-199 Significado da Resposta 200-299 400-499 Requisição do cliente foi feita com sucesso A requisição do cliente foi redirecionada. Outras alterações são necessárias Requisição do cliente está incompleta 500-599 Erros do servidor 300-399 Informacional Cabeçalhos HTTP Diferentes tipos de cabeçalhos General headers Request headers Response headers Entity Headers Conexões Persistentes Connection: Keep-Alive Tipos de mídia Accept header Content-Type Exemplos: Accept: */* Accept: type/* Accept: type/subtype Caching de Cliente Obtendo o tamanho do Conteúdo cabeçalho Content-length Faixa de Bytes (Byte ranges) Referring Documents Referer header Identificação de Cliente e Servidor Autorização An Authorization header is used to request restricted documents Authorization: SCHEME REALM Exemplo: Authorization: BASIC username:password, onde username:password é codificado em base64 Autenticação The realm of the BASIC authentication schema indicates the type of authentication requested See also Digest authentication (disponível em HTTP 1.1) Cookies Set-Cookie e cabeçalhos Cookie Capítulo 4 – A Biblioteca Socket The socket library is a low-level programmer’s interface that allows client to set up a TCP/IP connection and communicate directly to servers. Servers use sockets to listen for incoming connections, and clients use sockets to initiate transactions on the port that the server is listening to. Uma conversação típica usando Sockets Socket Calls Rotinas do Cliente Rotinas do Servidor socket() socket() connect() bind() listen accept() syswrite() sysread() close() sysread() syswrite() close() Usando chamadas de Socket Função socket() connect() sysread() syswrite() close() bind() listen() accept() Uso Proposta Capítulo 5 – A biblioteca LWP A Web trabalha sobre o protocolo TCP/IP, onde o cliente e o servidor estabelecem uma conexão e trocam as informações necessárias através dessa conexão Apêndice A – Cabeçalhos HTTP Há quatro categorias de cabeçalhos: General Request Response Entity Summary if Support Across HTTP Versions HTTP 0.9 HTTP 1.0 HTTP 1.1 Apêndice B – Tabelas de Referência Media Types Character Encoding Languages Character Sets Tipos de Mídias Content-type header Accept header Internet Media Types Text Type/Subtype text/plan text/richtext text/enriched text/tab-separetae-values text/html text/sgml Multipart Type/Subtype Message Type/Subtype Application Type/Subtype Codificação de Caracteres Content-type of applicatrion/x-wwwform-urlencoded caracteres especiais são codificados para eliminar a ambiguidade Veja RFC 1738 (http://www.faqs.org/rfcs/rfc1738.html) Linguagens A language tag is of the form of: <primary-tag> <-subtag> where zero or more subtags are allowed See RFC 1766 for more information Conjunto de Caracteres Accepted-language Content-language Veja RFC 1700 (http://www.faqs.org/rfcs/rfc1700.html) Bibliografia WONG, C. Web Client Programming with Perl. 1st Edition March 1997. O’Reilly [2] URL:http://www.w3.org/pub/WWW/Pro tocols/ Glossário IANA – Internet Assigned Number Authority CGI – Common Gateway Interface Backup Slides HTTP é stateless O HTTP é um protocolo stateless (semestado) não existe uma conexão permanente entre o servidor e o cliente (navegador) portanto o servidor não sabe se uma conexão seguinte está relacionada a conexão anterior Protocolo HTTP Request HTTP (requisição) Response HTTP (resposta) Corpo de uma requisição HTTP Cookies São informações armazenadas no computador do usuário que são opcionalmente enviadas em cada requisição pelo navegador, processado pelo servidor e recebido de volta na resposta Container Web