<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>urlparse – Split URL into component pieces. — Python Module of the Week</title> <link rel="stylesheet" href="../_static/sphinxdoc.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../', VERSION: '1.132', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <link rel="author" title="About these documents" href="../about.html" /> <link rel="top" title="Python Module of the Week" href="../index.html" /> <link rel="up" title="Internet Protocols and Support" href="../internet_protocols.html" /> <link rel="next" title="uuid – Universally unique identifiers" href="../uuid/index.html" /> <link rel="prev" title="urllib2 – Library for opening URLs." href="../urllib2/index.html" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../uuid/index.html" title="uuid – Universally unique identifiers" accesskey="N">next</a> |</li> <li class="right" > <a href="../urllib2/index.html" title="urllib2 – Library for opening URLs." accesskey="P">previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../internet_protocols.html" accesskey="U">Internet Protocols and Support</a> »</li> </ul> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h3><a href="../contents.html">Table Of Contents</a></h3> <ul> <li><a class="reference internal" href="#">urlparse – Split URL into component pieces.</a><ul> <li><a class="reference internal" href="#parsing">Parsing</a></li> <li><a class="reference internal" href="#unparsing">Unparsing</a></li> <li><a class="reference internal" href="#joining">Joining</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="../urllib2/index.html" title="previous chapter">urllib2 – Library for opening URLs.</a></p> <h4>Next topic</h4> <p class="topless"><a href="../uuid/index.html" title="next chapter">uuid – Universally unique identifiers</a></p> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="../_sources/urlparse/index.txt" rel="nofollow">Show Source</a></li> </ul> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="../search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="module-urlparse"> <span id="urlparse-split-url-into-component-pieces"></span><h1>urlparse – Split URL into component pieces.<a class="headerlink" href="#module-urlparse" title="Permalink to this headline">¶</a></h1> <table class="docutils field-list" frame="void" rules="none"> <col class="field-name" /> <col class="field-body" /> <tbody valign="top"> <tr class="field"><th class="field-name">Purpose:</th><td class="field-body">Split URL into component pieces.</td> </tr> <tr class="field"><th class="field-name">Python Version:</th><td class="field-body">since 1.4</td> </tr> </tbody> </table> <p>The <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-mod docutils literal"><span class="pre">urlparse</span></tt></a> module provides functions for breaking URLs down into their component parts, as defined by the relevant RFCs.</p> <div class="section" id="parsing"> <h2>Parsing<a class="headerlink" href="#parsing" title="Permalink to this headline">¶</a></h2> <p>The return value from the <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-func docutils literal"><span class="pre">urlparse()</span></tt></a> function is an object which acts like a tuple with 6 elements.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="s">'http://netloc/path;parameters?query=argument#fragment'</span><span class="p">)</span> <span class="k">print</span> <span class="n">parsed</span> </pre></div> </div> <p>The parts of the URL available through the tuple interface are the scheme, network location, path, parameters, query, and fragment.</p> <div class="highlight-python"><pre>$ python urlparse_urlparse.py ParseResult(scheme='http', netloc='netloc', path='/path', params='parameters', query='query=argument', fragment='fragment')</pre> </div> <p>Although the return value acts like a tuple, it is really based on a <a class="reference internal" href="../collections/namedtuple.html#collections-namedtuple"><em>namedtuple</em></a>, a subclass of tuple that supports accessing the parts of the URL via named attributes instead of indexes. That’s especially useful if, like me, you can’t remember the index order. In addition to being easier to use for the programmer, the attribute API also offers access to several values not available in the tuple API.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="s">'http://user:pass@NetLoc:80/path;parameters?query=argument#fragment'</span><span class="p">)</span> <span class="k">print</span> <span class="s">'scheme :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">scheme</span> <span class="k">print</span> <span class="s">'netloc :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">netloc</span> <span class="k">print</span> <span class="s">'path :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">path</span> <span class="k">print</span> <span class="s">'params :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">params</span> <span class="k">print</span> <span class="s">'query :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">query</span> <span class="k">print</span> <span class="s">'fragment:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">fragment</span> <span class="k">print</span> <span class="s">'username:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">username</span> <span class="k">print</span> <span class="s">'password:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">password</span> <span class="k">print</span> <span class="s">'hostname:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">hostname</span><span class="p">,</span> <span class="s">'(netloc in lower case)'</span> <span class="k">print</span> <span class="s">'port :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">port</span> </pre></div> </div> <p>The <em>username</em> and <em>password</em> are available when present in the input URL and <tt class="xref docutils literal"><span class="pre">None</span></tt> when not. The <em>hostname</em> is the same value as <em>netloc</em>, in all lower case. And the <em>port</em> is converted to an integer when present and <tt class="xref docutils literal"><span class="pre">None</span></tt> when not.</p> <div class="highlight-python"><pre>$ python urlparse_urlparseattrs.py scheme : http netloc : user:pass@NetLoc:80 path : /path params : parameters query : query=argument fragment: fragment username: user password: pass hostname: netloc (netloc in lower case) port : 80</pre> </div> <p>The <tt class="xref py py-func docutils literal"><span class="pre">urlsplit()</span></tt> function is an alternative to <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-func docutils literal"><span class="pre">urlparse()</span></tt></a>. It behaves a little different, because it does not split the parameters from the URL. This is useful for URLs following <span class="target" id="index-0"></span><a class="rfc reference external" href="http://tools.ietf.org/html/rfc2396.html"><strong>RFC 2396</strong></a>, which supports parameters for each segment of the path.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlsplit</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlsplit</span><span class="p">(</span><span class="s">'http://user:pass@NetLoc:80/path;parameters/path2;parameters2?query=argument#fragment'</span><span class="p">)</span> <span class="k">print</span> <span class="n">parsed</span> <span class="k">print</span> <span class="s">'scheme :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">scheme</span> <span class="k">print</span> <span class="s">'netloc :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">netloc</span> <span class="k">print</span> <span class="s">'path :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">path</span> <span class="k">print</span> <span class="s">'query :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">query</span> <span class="k">print</span> <span class="s">'fragment:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">fragment</span> <span class="k">print</span> <span class="s">'username:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">username</span> <span class="k">print</span> <span class="s">'password:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">password</span> <span class="k">print</span> <span class="s">'hostname:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">hostname</span><span class="p">,</span> <span class="s">'(netloc in lower case)'</span> <span class="k">print</span> <span class="s">'port :'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">port</span> </pre></div> </div> <p>Since the parameters are not split out, the tuple API will show 5 elements instead of 6, and there is no <em>params</em> attribute.</p> <div class="highlight-python"><pre>$ python urlparse_urlsplit.py SplitResult(scheme='http', netloc='user:pass@NetLoc:80', path='/path;parameters/path2;parameters2', query='query=argument', fragment='fragment') scheme : http netloc : user:pass@NetLoc:80 path : /path;parameters/path2;parameters2 query : query=argument fragment: fragment username: user password: pass hostname: netloc (netloc in lower case) port : 80</pre> </div> <p>To simply strip the fragment identifier from a URL, as you might need to do to find a base page name from a URL, use <tt class="xref py py-func docutils literal"><span class="pre">urldefrag()</span></tt>.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urldefrag</span> <span class="n">original</span> <span class="o">=</span> <span class="s">'http://netloc/path;parameters?query=argument#fragment'</span> <span class="k">print</span> <span class="n">original</span> <span class="n">url</span><span class="p">,</span> <span class="n">fragment</span> <span class="o">=</span> <span class="n">urldefrag</span><span class="p">(</span><span class="n">original</span><span class="p">)</span> <span class="k">print</span> <span class="n">url</span> <span class="k">print</span> <span class="n">fragment</span> </pre></div> </div> <p>The return value is a tuple containing the base URL and the fragment.</p> <div class="highlight-python"><pre>$ python urlparse_urldefrag.py http://netloc/path;parameters?query=argument#fragment http://netloc/path;parameters?query=argument fragment</pre> </div> </div> <div class="section" id="unparsing"> <h2>Unparsing<a class="headerlink" href="#unparsing" title="Permalink to this headline">¶</a></h2> <p>There are several ways to assemble a split URL back together into a single string. The parsed URL object has a <tt class="xref py py-func docutils literal"><span class="pre">geturl()</span></tt> method.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span> <span class="n">original</span> <span class="o">=</span> <span class="s">'http://netloc/path;parameters?query=argument#fragment'</span> <span class="k">print</span> <span class="s">'ORIG :'</span><span class="p">,</span> <span class="n">original</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">original</span><span class="p">)</span> <span class="k">print</span> <span class="s">'PARSED:'</span><span class="p">,</span> <span class="n">parsed</span><span class="o">.</span><span class="n">geturl</span><span class="p">()</span> </pre></div> </div> <p><tt class="xref py py-func docutils literal"><span class="pre">geturl()</span></tt> only works on the object returned by <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-func docutils literal"><span class="pre">urlparse()</span></tt></a> or <tt class="xref py py-func docutils literal"><span class="pre">urlsplit()</span></tt>.</p> <div class="highlight-python"><pre>$ python urlparse_geturl.py ORIG : http://netloc/path;parameters?query=argument#fragment PARSED: http://netloc/path;parameters?query=argument#fragment</pre> </div> <p>If you have a regular tuple of values, you can use <tt class="xref py py-func docutils literal"><span class="pre">urlunparse()</span></tt> to combine them into a URL.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span><span class="p">,</span> <span class="n">urlunparse</span> <span class="n">original</span> <span class="o">=</span> <span class="s">'http://netloc/path;parameters?query=argument#fragment'</span> <span class="k">print</span> <span class="s">'ORIG :'</span><span class="p">,</span> <span class="n">original</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">original</span><span class="p">)</span> <span class="k">print</span> <span class="s">'PARSED:'</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">parsed</span><span class="p">),</span> <span class="n">parsed</span> <span class="n">t</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[:]</span> <span class="k">print</span> <span class="s">'TUPLE :'</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">t</span><span class="p">),</span> <span class="n">t</span> <span class="k">print</span> <span class="s">'NEW :'</span><span class="p">,</span> <span class="n">urlunparse</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> </pre></div> </div> <p>While the <tt class="xref py py-class docutils literal"><span class="pre">ParseResult</span></tt> returned by <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-func docutils literal"><span class="pre">urlparse()</span></tt></a> can be used as a tuple, in this example I explicitly create a new tuple to show that <tt class="xref py py-func docutils literal"><span class="pre">urlunparse()</span></tt> works with normal tuples, too.</p> <div class="highlight-python"><pre>$ python urlparse_urlunparse.py ORIG : http://netloc/path;parameters?query=argument#fragment PARSED: <class 'urlparse.ParseResult'> ParseResult(scheme='http', netloc='netloc', path='/path', params='parameters', query='query=argument', fragment='fragment') TUPLE : <type 'tuple'> ('http', 'netloc', '/path', 'parameters', 'query=argument', 'fragment') NEW : http://netloc/path;parameters?query=argument#fragment</pre> </div> <p>If the input URL included superfluous parts, those may be dropped from the unparsed version of the URL.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urlparse</span><span class="p">,</span> <span class="n">urlunparse</span> <span class="n">original</span> <span class="o">=</span> <span class="s">'http://netloc/path;?#'</span> <span class="k">print</span> <span class="s">'ORIG :'</span><span class="p">,</span> <span class="n">original</span> <span class="n">parsed</span> <span class="o">=</span> <span class="n">urlparse</span><span class="p">(</span><span class="n">original</span><span class="p">)</span> <span class="k">print</span> <span class="s">'PARSED:'</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">parsed</span><span class="p">),</span> <span class="n">parsed</span> <span class="n">t</span> <span class="o">=</span> <span class="n">parsed</span><span class="p">[:]</span> <span class="k">print</span> <span class="s">'TUPLE :'</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">t</span><span class="p">),</span> <span class="n">t</span> <span class="k">print</span> <span class="s">'NEW :'</span><span class="p">,</span> <span class="n">urlunparse</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> </pre></div> </div> <p>In this case, the <em>parameters</em>, <em>query</em>, and <em>fragment</em> are all missing in the original URL. The new URL does not look the same as the original, but is equivalent according to the standard.</p> <div class="highlight-python"><pre>$ python urlparse_urlunparseextra.py ORIG : http://netloc/path;?# PARSED: <class 'urlparse.ParseResult'> ParseResult(scheme='http', netloc='netloc', path='/path', params='', query='', fragment='') TUPLE : <type 'tuple'> ('http', 'netloc', '/path', '', '', '') NEW : http://netloc/path</pre> </div> </div> <div class="section" id="joining"> <h2>Joining<a class="headerlink" href="#joining" title="Permalink to this headline">¶</a></h2> <p>In addition to parsing URLs, <a class="reference internal" href="#module-urlparse" title="urlparse: Split URL into component pieces."><tt class="xref py py-mod docutils literal"><span class="pre">urlparse</span></tt></a> includes <tt class="xref py py-func docutils literal"><span class="pre">urljoin()</span></tt> for constructing absolute URLs from relative fragments.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">urlparse</span> <span class="kn">import</span> <span class="n">urljoin</span> <span class="k">print</span> <span class="n">urljoin</span><span class="p">(</span><span class="s">'http://www.example.com/path/file.html'</span><span class="p">,</span> <span class="s">'anotherfile.html'</span><span class="p">)</span> <span class="k">print</span> <span class="n">urljoin</span><span class="p">(</span><span class="s">'http://www.example.com/path/file.html'</span><span class="p">,</span> <span class="s">'../anotherfile.html'</span><span class="p">)</span> </pre></div> </div> <p>In the example, the relative portion of the path (<tt class="docutils literal"><span class="pre">"../"</span></tt>) is taken into account when the second URL is computed.</p> <div class="highlight-python"><pre>$ python urlparse_urljoin.py http://www.example.com/path/anotherfile.html http://www.example.com/anotherfile.html</pre> </div> <div class="admonition-see-also admonition seealso"> <p class="first admonition-title">See also</p> <dl class="last docutils"> <dt><a class="reference external" href="http://docs.python.org/lib/module-urlparse.html">urlparse</a></dt> <dd>Standard library documentation for this module.</dd> <dt><a class="reference internal" href="../urllib/index.html#module-urllib" title="urllib: Accessing remote resources that don't need authentication, cookies, etc."><tt class="xref py py-mod docutils literal"><span class="pre">urllib</span></tt></a></dt> <dd>Retrieve the contents of a resource identified by a URL.</dd> <dt><a class="reference internal" href="../urllib2/index.html#module-urllib2" title="urllib2: Library for opening URLs."><tt class="xref py py-mod docutils literal"><span class="pre">urllib2</span></tt></a></dt> <dd>Alternative API for accessing remote URLs.</dd> </dl> </div> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../uuid/index.html" title="uuid – Universally unique identifiers" >next</a> |</li> <li class="right" > <a href="../urllib2/index.html" title="urllib2 – Library for opening URLs." >previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../internet_protocols.html" >Internet Protocols and Support</a> »</li> </ul> </div> <div class="footer"> © Copyright Doug Hellmann. Last updated on Oct 24, 2010. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a>. <br/><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/us/" rel="license"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png"/></a> </div> </body> </html>