[code.view]

[top] / python / PyMOTW / urllib / index.rst

     ======================================================
     urllib -- simple interface for network resource access
     ======================================================
     
     .. module:: urllib
         :synopsis: Accessing remote resources that don't need authentication, cookies, etc.
     
     :Purpose: Accessing remote resources that don't need authentication, cookies, etc.
     :Python Version: 1.4 and later
     
     The :mod:`urllib` module provides a simple interface for network
     resource access. Although :mod:`urllib` can be used with gopher and
     ftp, these examples all use http.
     
     HTTP GET
     ========
     
     .. note::
     
         The test server for these examples is in BaseHTTPServer_GET.py, from the
         PyMOTW examples for :mod:`BaseHTTPServer`. Start the server in one
         terminal window, then run these examples in another.
     
     An HTTP GET operation is the simplest use of urllib. Simply pass the
     URL to :func:`urlopen()` to get a "file-like" handle to the remote
     data.
     
     .. include:: urllib_urlopen.py
         :literal:
         :start-after: #end_pymotw_header
     
     
     The example server takes the incoming values and formats a plain text
     response to send back. The return value from :func:`urlopen()` gives
     access to the headers from the HTTP server through the :func:`info()`
     method, and the data for the remote resource via methods like
     :func:`read()` and :func:`readlines()`.
     
     ::
     
         $ python urllib_urlopen.py
         RESPONSE: <addinfourl at 10180248 whose fp = <socket._fileobject object at 0x935c30>>
         URL     : http://localhost:8080/
         DATE    : Sun, 30 Mar 2008 16:27:10 GMT
         HEADERS :
         ---------
         Server: BaseHTTP/0.3 Python/2.5.1
         Date: Sun, 30 Mar 2008 16:27:10 GMT
     
         LENGTH  : 221
         DATA    :
         ---------
         CLIENT VALUES:
         client_address=('127.0.0.1', 54354) (localhost)
         command=GET
         path=/
         real path=/
         query=
         request_version=HTTP/1.0
     
         SERVER VALUES:
         server_version=BaseHTTP/0.3
         sys_version=Python/2.5.1
         protocol_version=HTTP/1.0
     
     
     The file-like object is also iterable:
     
     .. include:: urllib_urlopen_iterator.py
         :literal:
         :start-after: #end_pymotw_header
     
     
     Since the lines are returned with newlines and carriage returns intact, this
     example strips them before printing the output.
     
     ::
     
         $ python urllib_urlopen_iterator.py
         CLIENT VALUES:
         client_address=('127.0.0.1', 54380) (localhost)
         command=GET
         path=/
         real path=/
         query=
         request_version=HTTP/1.0
     
         SERVER VALUES:
         server_version=BaseHTTP/0.3
         sys_version=Python/2.5.1
         protocol_version=HTTP/1.0
     
     .. _urllib-urlencode:
     
     Encoding Arguments
     ==================
     
     Arguments can be passed to the server by encoding them and appending them to
     the URL.
     
     .. include:: urllib_urlencode.py
         :literal:
         :start-after: #end_pymotw_header
     
     Notice that the query, in the list of client values, contains the encoded
     query arguments.
     
     ::
     
         $ python urllib_urlencode.py
         Encoded: q=query+string&foo=bar
         CLIENT VALUES:
         client_address=('127.0.0.1', 54415) (localhost)
         command=GET
         path=/?q=query+string&foo=bar
         real path=/
         query=q=query+string&foo=bar
         request_version=HTTP/1.0
     
         SERVER VALUES:
         server_version=BaseHTTP/0.3
         sys_version=Python/2.5.1
         protocol_version=HTTP/1.0
     
     To pass a sequence of values using separate occurrences of the
     variable in the query string, set *doseq* to True when calling
     :func:`urlencode()`.
     
     .. include:: urllib_urlencode_doseq.py
         :literal:
         :start-after: #end_pymotw_header
     
     ::
     
         $ python urllib_urlencode_doseq.py
         Single  : foo=%5B%27foo1%27%2C+%27foo2%27%5D
         Sequence: foo=foo1&foo=foo2
     
     
     To decode the query string, see the :class:`FieldStorage` class from
     the :mod:`cgi` module.
     
     Special characters within the query arguments that might cause parse
     problems with the URL on the server side are "quoted" when passed to
     :func:`urlencode()`. To quote them locally to make safe versions of
     the strings, you can use the :func:`quote()` or :func:`quote_plus()`
     functions directly.
     
     .. include:: urllib_quote.py
         :literal:
         :start-after: #end_pymotw_header
     
     Notice that :func:`quote_plus()` is more aggressive about the
     characters it replaces.
     
     .. {{{cog
     .. cog.out(run_script(cog.inFile, 'urllib_quote.py'))
     .. }}}
     
     ::
     
     	$ python urllib_quote.py
     
     	urlencode() : url=http%3A%2F%2Flocalhost%3A8080%2F%7Edhellmann%2F
     	quote()     : http%3A//localhost%3A8080/%7Edhellmann/
     	quote_plus(): http%3A%2F%2Flocalhost%3A8080%2F%7Edhellmann%2F
     
     .. {{{end}}}
     
     
     To reverse the quote operations, use :func:`unquote()` or
     :func:`unquote_plus()`, as appropriate.
     
     .. include:: urllib_unquote.py
         :literal:
         :start-after: #end_pymotw_header
     
     .. {{{cog
     .. cog.out(run_script(cog.inFile, 'urllib_unquote.py'))
     .. }}}
     
     ::
     
     	$ python urllib_unquote.py
     
     	http://localhost:8080/~dhellmann/
     	http://localhost:8080/~dhellmann/
     
     .. {{{end}}}
     
     
     HTTP POST
     =========
     
     .. note::
     
         The test server for these examples is in BaseHTTPServer_POST.py, from the
         PyMOTW examples for the :mod:`BaseHTTPServer`. Start the server in one
         terminal window, then run these examples in another.
     
     To POST data to the remote server, instead of using GET, pass the
     encoded query arguments as data to :func:`urlopen()` instead of
     appending them to the URL.
     
     .. include:: urllib_urlopen_post.py
         :literal:
         :start-after: #end_pymotw_header
     
     ::
     
         $ python urllib_urlopen_post.py
         Client: ('127.0.0.1', 54545)
         Path: /
         Form data:
             q=query string
             foo=bar
     
     
     You can send any byte-string as data, in case the server expects
     something other than url-encoded form arguments in the posted data.
     
     Paths vs. URLs
     ==============
     
     Some operating systems use different values for separating the
     components of paths in local files than URLs. To make your code
     portable, you should use the functions :func:`pathname2url()` and
     :func:`url2pathname()` to convert back and forth. Since I am working
     on a Mac, I have to explicitly import the Windows versions of the
     functions. Using the versions of the functions exported by
     :mod:`urllib` gives you the correct defaults for your platform, so you
     do not need to do this.
     
     .. include:: urllib_pathnames.py
         :literal:
         :start-after: #end_pymotw_header
     
     
     There are two Windows examples, with and without the drive letter at the
     prefix of the path.
     
     .. {{{cog
     .. cog.out(run_script(cog.inFile, 'urllib_pathnames.py'))
     .. }}}
     
     ::
     
     	$ python urllib_pathnames.py
     
     	== Default ==
     	Original: /a/b/c
     	URL     : /a/b/c
     	Path    : /d/e/f
     	
     	== Windows, without drive letter ==
     	Original: \a\b\c
     	URL     : /a/b/c
     	Path    : \d\e\f
     	
     	== Windows, with drive letter ==
     	Original: C:\\a\b\c
     	URL     : ///C:/a/b/c
     	Path    : \d\e\f
     
     .. {{{end}}}
     
     
     Simple Retrieval with Cache
     ===========================
     
     Retrieving data is a common operation, and :mod:`urllib` includes the
     :func:`urlretrieve()` function so you don't have to write your
     own. :func:`urlretrieve()` takes arguments for the URL, a temporary
     file to hold the data, a function to report on download progress, and
     data to pass if the URL refers to a form where data should be
     POSTed. If no filename is given, :func:`urlretrieve()` creates a
     temporary file. You can delete the file yourself, or treat the file as
     a cache and use :func:`urlcleanup()` to remove it.
     
     This example uses GET to retrieve some data from a web server:
     
     .. include:: urllib_urlretrieve.py
         :literal:
         :start-after: #end_pymotw_header
     
     
     Since the server does not return a ``Content-length`` header,
     :func:`urlretrieve()` does not know how big the data should be, and
     passes -1 as the *total_size* argument to :func:`reporthook()`.
     
     ::
     
     
         $ python urllib_urlretrieve.py
         Connection opened
         Read 1 blocks
         Read 2 blocks
         Read 3 blocks
         Read 4 blocks
         Read 5 blocks
         Read 6 blocks
         Read 7 blocks
         Read 8 blocks
         Read 9 blocks
         Read 10 blocks
         Read 11 blocks
         Read 12 blocks
         Read 13 blocks
         Read 14 blocks
         Read 15 blocks
         Read 16 blocks
         Read 17 blocks
         Read 18 blocks
         Read 19 blocks
     
         File: /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/tmp3HRpZP
         Headers:
         Content-Type: text/html; charset=UTF-8
         Last-Modified: Tue, 25 Mar 2008 23:09:10 GMT
         Cache-Control: max-age=0 private
         ETag: "904b02e0-c7ff-47f6-9f35-cc6de5d2a2e5"
         Server: GFE/1.3
         Date: Sun, 30 Mar 2008 17:36:48 GMT
         Connection: Close
     
         File exists before cleanup: True
         File still exists: False
     
     
     URLopener
     =========
     
     :mod:`urllib` provides a :class:`URLopener` base class, and
     :class:`FancyURLopener` with default handling for the supported
     protocols. If you find yourself needing to change their behavior, you
     are probably better off looking at the :mod:`urllib2` module, added in
     Python 2.1.
     
     .. seealso::
     
         `urllib <http://docs.python.org/lib/module-urllib.html>`_
             Standard library documentation for this module.
     
         :mod:`urllib2`
             Updated API for working with URL-based services.
         
         :mod:`urlparse`
             Parse URL values to access their components.
         
         
     

[top] / python / PyMOTW / urllib / index.rst

contact | logmethods.com