<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>zlib – Low-level access to GNU zlib compression library — Python Module of the Week</title> <link rel="stylesheet" href="../_static/sphinxdoc.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../', VERSION: '1.132', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <link rel="author" title="About these documents" href="../about.html" /> <link rel="top" title="Python Module of the Week" href="../index.html" /> <link rel="up" title="Data Compression and Archiving" href="../compression.html" /> <link rel="next" title="Data Persistence" href="../persistence.html" /> <link rel="prev" title="zipfile – Read and write ZIP archive files" href="../zipfile/index.html" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../persistence.html" title="Data Persistence" accesskey="N">next</a> |</li> <li class="right" > <a href="../zipfile/index.html" title="zipfile – Read and write ZIP archive files" accesskey="P">previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../compression.html" accesskey="U">Data Compression and Archiving</a> »</li> </ul> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h3><a href="../contents.html">Table Of Contents</a></h3> <ul> <li><a class="reference internal" href="#">zlib – Low-level access to GNU zlib compression library</a><ul> <li><a class="reference internal" href="#working-with-data-in-memory">Working with Data in Memory</a></li> <li><a class="reference internal" href="#working-with-streams">Working with Streams</a></li> <li><a class="reference internal" href="#mixed-content-streams">Mixed Content Streams</a></li> <li><a class="reference internal" href="#checksums">Checksums</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="../zipfile/index.html" title="previous chapter">zipfile – Read and write ZIP archive files</a></p> <h4>Next topic</h4> <p class="topless"><a href="../persistence.html" title="next chapter">Data Persistence</a></p> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="../_sources/zlib/index.txt" rel="nofollow">Show Source</a></li> </ul> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="../search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="module-zlib"> <span id="zlib-low-level-access-to-gnu-zlib-compression-library"></span><h1>zlib – Low-level access to GNU zlib compression library<a class="headerlink" href="#module-zlib" title="Permalink to this headline">¶</a></h1> <table class="docutils field-list" frame="void" rules="none"> <col class="field-name" /> <col class="field-body" /> <tbody valign="top"> <tr class="field"><th class="field-name">Purpose:</th><td class="field-body">Low-level access to GNU zlib compression library</td> </tr> <tr class="field"><th class="field-name">Python Version:</th><td class="field-body">2.5 and later</td> </tr> </tbody> </table> <p>The <a class="reference internal" href="#module-zlib" title="zlib: Low-level access to GNU zlib compression library"><tt class="xref py py-mod docutils literal"><span class="pre">zlib</span></tt></a> module provides a lower-level interface to many of the functions in the <a class="reference internal" href="#module-zlib" title="zlib: Low-level access to GNU zlib compression library"><tt class="xref py py-mod docutils literal"><span class="pre">zlib</span></tt></a> compression library from GNU.</p> <div class="section" id="working-with-data-in-memory"> <h2>Working with Data in Memory<a class="headerlink" href="#working-with-data-in-memory" title="Permalink to this headline">¶</a></h2> <p>The simplest way to work with <a class="reference internal" href="#module-zlib" title="zlib: Low-level access to GNU zlib compression library"><tt class="xref py py-mod docutils literal"><span class="pre">zlib</span></tt></a> requires holding all of the data to be compressed or decompressed in memory, and then using <tt class="xref py py-func docutils literal"><span class="pre">compress()</span></tt> and <tt class="xref py py-func docutils literal"><span class="pre">decompress()</span></tt>.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">zlib</span> <span class="kn">import</span> <span class="nn">binascii</span> <span class="n">original_data</span> <span class="o">=</span> <span class="s">'This is the original text.'</span> <span class="k">print</span> <span class="s">'Original :'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">original_data</span><span class="p">),</span> <span class="n">original_data</span> <span class="n">compressed</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">compress</span><span class="p">(</span><span class="n">original_data</span><span class="p">)</span> <span class="k">print</span> <span class="s">'Compressed :'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">compressed</span><span class="p">),</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="n">compressed</span><span class="p">)</span> <span class="n">decompressed</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">decompress</span><span class="p">(</span><span class="n">compressed</span><span class="p">)</span> <span class="k">print</span> <span class="s">'Decompressed :'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">decompressed</span><span class="p">),</span> <span class="n">decompressed</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_memory.py Original : 26 This is the original text. Compressed : 32 789c0bc9c82c5600a2928c5485fca2ccf4ccbcc41c8592d48a123d007f2f097e Decompressed : 26 This is the original text.</pre> </div> <p>Notice that for short text, the compressed version can be longer. While the actual results depend on the input data, for short bits of text it is interesting to observe the compression overhead.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">zlib</span> <span class="n">original_data</span> <span class="o">=</span> <span class="s">'This is the original text.'</span> <span class="n">fmt</span> <span class="o">=</span> <span class="s">'</span><span class="si">%15s</span><span class="s"> </span><span class="si">%15s</span><span class="s">'</span> <span class="k">print</span> <span class="n">fmt</span> <span class="o">%</span> <span class="p">(</span><span class="s">'len(data)'</span><span class="p">,</span> <span class="s">'len(compressed)'</span><span class="p">)</span> <span class="k">print</span> <span class="n">fmt</span> <span class="o">%</span> <span class="p">(</span><span class="s">'-'</span> <span class="o">*</span> <span class="mi">15</span><span class="p">,</span> <span class="s">'-'</span> <span class="o">*</span> <span class="mi">15</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span> <span class="n">data</span> <span class="o">=</span> <span class="n">original_data</span> <span class="o">*</span> <span class="n">i</span> <span class="n">compressed</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">compress</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">print</span> <span class="n">fmt</span> <span class="o">%</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">compressed</span><span class="p">)),</span> <span class="s">'*'</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">compressed</span><span class="p">)</span> <span class="k">else</span> <span class="s">''</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_lengths.py len(data) len(compressed) --------------- --------------- 0 8 * 26 32 * 52 35 78 35 104 36 130 36 156 36 182 36 208 36 234 36 260 36 286 36 312 37 338 37 364 38 390 38 416 38 442 38 468 38 494 38</pre> </div> </div> <div class="section" id="working-with-streams"> <h2>Working with Streams<a class="headerlink" href="#working-with-streams" title="Permalink to this headline">¶</a></h2> <p>The in-memory approach has obvious drawbacks that make it impractical for real-world use cases. The alternative is to use <tt class="xref py py-class docutils literal"><span class="pre">Compress</span></tt> and <tt class="xref py py-class docutils literal"><span class="pre">Decompress</span></tt> objects to manipulate streams of data, so that the entire data set does not have to fit into memory.</p> <p>The simple server below responds to requests consisting of filenames by writing a compressed version of the file to the socket used to communicate with the client. It has some artificial chunking in place to illustrate the buffering behavior that happens when the data passed to <tt class="xref py py-func docutils literal"><span class="pre">compress()</span></tt> or <tt class="xref py py-func docutils literal"><span class="pre">decompress()</span></tt> doesn’t result in a complete block of compressed or uncompressed output.</p> <div class="admonition warning"> <p class="first admonition-title">Warning</p> <p class="last">This server has obvious security implications. Do not run it on a system on the open internet or in any environment where security might be an issue.</p> </div> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">zlib</span> <span class="kn">import</span> <span class="nn">logging</span> <span class="kn">import</span> <span class="nn">SocketServer</span> <span class="kn">import</span> <span class="nn">binascii</span> <span class="n">BLOCK_SIZE</span> <span class="o">=</span> <span class="mi">64</span> <span class="k">class</span> <span class="nc">ZlibRequestHandler</span><span class="p">(</span><span class="n">SocketServer</span><span class="o">.</span><span class="n">BaseRequestHandler</span><span class="p">):</span> <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s">'Server'</span><span class="p">)</span> <span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">compressor</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">compressobj</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c"># Find out what file the client wants</span> <span class="n">filename</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'client asked for: "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">filename</span><span class="p">)</span> <span class="c"># Send chunks of the file as they are compressed</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="nb">input</span><span class="p">:</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">block</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">BLOCK_SIZE</span><span class="p">)</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">block</span><span class="p">:</span> <span class="k">break</span> <span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'RAW "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">block</span><span class="p">)</span> <span class="n">compressed</span> <span class="o">=</span> <span class="n">compressor</span><span class="o">.</span><span class="n">compress</span><span class="p">(</span><span class="n">block</span><span class="p">)</span> <span class="k">if</span> <span class="n">compressed</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'SENDING "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="n">compressed</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">compressed</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'BUFFERING'</span><span class="p">)</span> <span class="c"># Send any data being buffered by the compressor</span> <span class="n">remaining</span> <span class="o">=</span> <span class="n">compressor</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span> <span class="k">while</span> <span class="n">remaining</span><span class="p">:</span> <span class="n">to_send</span> <span class="o">=</span> <span class="n">remaining</span><span class="p">[:</span><span class="n">BLOCK_SIZE</span><span class="p">]</span> <span class="n">remaining</span> <span class="o">=</span> <span class="n">remaining</span><span class="p">[</span><span class="n">BLOCK_SIZE</span><span class="p">:]</span> <span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'FLUSHING "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="n">to_send</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">to_send</span><span class="p">)</span> <span class="k">return</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">'__main__'</span><span class="p">:</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="kn">import</span> <span class="nn">threading</span> <span class="kn">from</span> <span class="nn">cStringIO</span> <span class="kn">import</span> <span class="n">StringIO</span> <span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="s">'</span><span class="si">%(name)s</span><span class="s">: </span><span class="si">%(message)s</span><span class="s">'</span><span class="p">,</span> <span class="p">)</span> <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="s">'Client'</span><span class="p">)</span> <span class="c"># Set up a server, running in a separate thread</span> <span class="n">address</span> <span class="o">=</span> <span class="p">(</span><span class="s">'localhost'</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="c"># let the kernel give us a port</span> <span class="n">server</span> <span class="o">=</span> <span class="n">SocketServer</span><span class="o">.</span><span class="n">TCPServer</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="n">ZlibRequestHandler</span><span class="p">)</span> <span class="n">ip</span><span class="p">,</span> <span class="n">port</span> <span class="o">=</span> <span class="n">server</span><span class="o">.</span><span class="n">server_address</span> <span class="c"># find out what port we were given</span> <span class="n">t</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">server</span><span class="o">.</span><span class="n">serve_forever</span><span class="p">)</span> <span class="n">t</span><span class="o">.</span><span class="n">setDaemon</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span> <span class="c"># Connect to the server</span> <span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s">'Contacting server on </span><span class="si">%s</span><span class="s">:</span><span class="si">%s</span><span class="s">'</span><span class="p">,</span> <span class="n">ip</span><span class="p">,</span> <span class="n">port</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="n">ip</span><span class="p">,</span> <span class="n">port</span><span class="p">))</span> <span class="c"># Ask for a file</span> <span class="n">requested_file</span> <span class="o">=</span> <span class="s">'lorem.txt'</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'sending filename: "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">requested_file</span><span class="p">)</span> <span class="n">len_sent</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">requested_file</span><span class="p">)</span> <span class="c"># Receive a response</span> <span class="nb">buffer</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span> <span class="n">decompressor</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">decompressobj</span><span class="p">()</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">response</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">BLOCK_SIZE</span><span class="p">)</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">response</span><span class="p">:</span> <span class="k">break</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'READ "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="n">response</span><span class="p">))</span> <span class="c"># Include any unconsumed data when feeding the decompressor.</span> <span class="n">to_decompress</span> <span class="o">=</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">unconsumed_tail</span> <span class="o">+</span> <span class="n">response</span> <span class="k">while</span> <span class="n">to_decompress</span><span class="p">:</span> <span class="n">decompressed</span> <span class="o">=</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">decompress</span><span class="p">(</span><span class="n">to_decompress</span><span class="p">)</span> <span class="k">if</span> <span class="n">decompressed</span><span class="p">:</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'DECOMPRESSED "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">decompressed</span><span class="p">)</span> <span class="nb">buffer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">decompressed</span><span class="p">)</span> <span class="c"># Look for unconsumed data due to buffer overflow</span> <span class="n">to_decompress</span> <span class="o">=</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">unconsumed_tail</span> <span class="k">else</span><span class="p">:</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'BUFFERING'</span><span class="p">)</span> <span class="n">to_decompress</span> <span class="o">=</span> <span class="bp">None</span> <span class="c"># deal with data reamining inside the decompressor buffer</span> <span class="n">remainder</span> <span class="o">=</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span> <span class="k">if</span> <span class="n">remainder</span><span class="p">:</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'FLUSHED "</span><span class="si">%s</span><span class="s">"'</span><span class="p">,</span> <span class="n">remainder</span><span class="p">)</span> <span class="nb">buffer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">reaminder</span><span class="p">)</span> <span class="n">full_response</span> <span class="o">=</span> <span class="nb">buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span> <span class="n">lorem</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'lorem.txt'</span><span class="p">,</span> <span class="s">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s">'response matches file contents: </span><span class="si">%s</span><span class="s">'</span><span class="p">,</span> <span class="n">full_response</span> <span class="o">==</span> <span class="n">lorem</span><span class="p">)</span> <span class="c"># Clean up</span> <span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">server</span><span class="o">.</span><span class="n">socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_server.py Client: Contacting server on 127.0.0.1:54429 Client: sending filename: "lorem.txt" Server: client asked for: "lorem.txt" Server: RAW "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec " Server: SENDING "7801" Server: RAW "egestas, enim et consectetuer ullamcorper, lectus ligula rutrum " Server: BUFFERING Server: RAW "leo, a elementum elit tortor eu quam. Duis tincidunt nisi ut ant" Server: BUFFERING Server: RAW "e. Nulla facilisi. Sed tristique eros eu libero. Pellentesque ve" Server: BUFFERING Server: RAW "l arcu. Vivamus purus orci, iaculis ac, suscipit sit amet, pulvi" Server: BUFFERING Server: RAW "nar eu, lacus. Praesent placerat tortor sed nisl. Nunc blandit d" Server: BUFFERING Server: RAW "iam egestas dui. Pellentesque habitant morbi tristique senectus " Server: BUFFERING Server: RAW "et netus et malesuada fames ac turpis egestas. Aliquam viverra f" Server: BUFFERING Server: RAW "ringilla leo. Nulla feugiat augue eleifend nulla. Vivamus mauris" Server: BUFFERING Server: RAW ". Vivamus sed mauris in nibh placerat egestas. Suspendisse poten" Server: BUFFERING Server: RAW "ti. Mauris massa. Ut eget velit auctor tortor blandit sollicitud" Server: BUFFERING Server: RAW "in. Suspendisse imperdiet justo. " Server: BUFFERING Server: FLUSHING "5592418edb300c45f73e050f60f80e05ba6c8b0245bb676426c382923c22e9f3f70bc94c1ac00b9b963eff7fe4b73ea4921e9e95f66e7d906b105789954a6f2e" Server: FLUSHING "25245206f1ae877ad17623318d8dbef62665919b78b0af244d2b49bc5e4a33aea58f43c64a06ad7432bda5318d8c819e267d255ec4a44a0b14a638451f784892" Server: FLUSHING "de932b7aa53a85b6a27bb6a0a6ae94b0d94236fa31bb2c572e6aa86ff44b768aa11efa9e4232ba4f21d30b5e37fa2966e8243e7f9e62c4a3e4467ff4e49abe1c" Server: FLUSHING "39e0b18fa22b299784247159c913d90f587be239d24e6d3c6dae8be1ac437db038e4e94041067f467198826d9b765ba18b71dba1b62b23f29de1b227dcbff87b" Server: FLUSHING "e38b065252ede3a2ffa5428f3b4d106f181022c652d9c49377a62b06387d53e4c0d43e3a6cf4c500052d4f3d650c1c1c18a84e7e18c403255d256f0aeb9cb709" Server: FLUSHING "d044afd2607f72fe24459513909fdf480807b346da90f5f2f684f04888d9a41fd05277a1a3074821f2f7fbadcaeed0ff1d73a962ce666e6296b9098f85f8c0e6" Server: FLUSHING "dd4c8b46eeda5e45b562d776058dbfe9d1b7e51f6f370ea5" Client: READ "78015592418edb300c45f73e050f60f80e05ba6c8b0245bb676426c382923c22e9f3f70bc94c1ac00b9b963eff7fe4b73ea4921e9e95f66e7d906b105789954a" Client: DECOMPRESSED "Lorem ipsum dolor sit amet, c" Client: READ "6f2e25245206f1ae877ad17623318d8dbef62665919b78b0af244d2b49bc5e4a33aea58f43c64a06ad7432bda5318d8c819e267d255ec4a44a0b14a638451f78" Client: DECOMPRESSED "onsectetuer adipiscing elit. Donec egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo, a elementum elit tor" Client: READ "4892de932b7aa53a85b6a27bb6a0a6ae94b0d94236fa31bb2c572e6aa86ff44b768aa11efa9e4232ba4f21d30b5e37fa2966e8243e7f9e62c4a3e4467ff4e49a" Client: DECOMPRESSED "tor eu quam. Duis tincidunt nisi ut ante. Nulla facilisi. Sed tristique eros eu libero. Pellentesque vel arcu. Vivamu" Client: READ "be1c39e0b18fa22b299784247159c913d90f587be239d24e6d3c6dae8be1ac437db038e4e94041067f467198826d9b765ba18b71dba1b62b23f29de1b227dcbf" Client: DECOMPRESSED "s purus orci, iaculis ac, suscipit sit amet, pulvinar eu, lacus. Praesent placerat tortor sed nisl. Nunc blandit diam egestas dui. " Client: READ "f87be38b065252ede3a2ffa5428f3b4d106f181022c652d9c49377a62b06387d53e4c0d43e3a6cf4c500052d4f3d650c1c1c18a84e7e18c403255d256f0aeb9c" Client: DECOMPRESSED "Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Aliquam viverra fringilla leo. Nulla feugiat au" Client: READ "b709d044afd2607f72fe24459513909fdf480807b346da90f5f2f684f04888d9a41fd05277a1a3074821f2f7fbadcaeed0ff1d73a962ce666e6296b9098f85f8" Client: DECOMPRESSED "gue eleifend nulla. Vivamus mauris. Vivamus sed mauris in nibh placerat egestas. Suspendisse potenti. Mauris massa. Ut eget velit auctor tortor " Client: READ "c0e6dd4c8b46eeda5e45b562d776058dbfe9d1b7e51f6f370ea5" Client: DECOMPRESSED "blandit sollicitudin. Suspendisse imperdiet justo. " Client: response matches file contents: True</pre> </div> </div> <div class="section" id="mixed-content-streams"> <h2>Mixed Content Streams<a class="headerlink" href="#mixed-content-streams" title="Permalink to this headline">¶</a></h2> <p>The <tt class="xref py py-class docutils literal"><span class="pre">Decompress</span></tt> class returned by <tt class="xref py py-func docutils literal"><span class="pre">decompressobj()</span></tt> can also be used in situations where compressed and uncompressed data is mixed together. After decompressing all of the data, the <em>unused_data</em> attribute contains any data not used.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">zlib</span> <span class="n">lorem</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'lorem.txt'</span><span class="p">,</span> <span class="s">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">compressed</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">compress</span><span class="p">(</span><span class="n">lorem</span><span class="p">)</span> <span class="n">combined</span> <span class="o">=</span> <span class="n">compressed</span> <span class="o">+</span> <span class="n">lorem</span> <span class="n">decompressor</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">decompressobj</span><span class="p">()</span> <span class="n">decompressed</span> <span class="o">=</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">decompress</span><span class="p">(</span><span class="n">combined</span><span class="p">)</span> <span class="k">print</span> <span class="s">'Decompressed matches lorem:'</span><span class="p">,</span> <span class="n">decompressed</span> <span class="o">==</span> <span class="n">lorem</span> <span class="k">print</span> <span class="s">'Unused data matches lorem :'</span><span class="p">,</span> <span class="n">decompressor</span><span class="o">.</span><span class="n">unused_data</span> <span class="o">==</span> <span class="n">lorem</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_mixed.py Decompressed matches lorem: True Unused data matches lorem : True</pre> </div> </div> <div class="section" id="checksums"> <h2>Checksums<a class="headerlink" href="#checksums" title="Permalink to this headline">¶</a></h2> <p>In addition to compression and decompression functions, <a class="reference internal" href="#module-zlib" title="zlib: Low-level access to GNU zlib compression library"><tt class="xref py py-mod docutils literal"><span class="pre">zlib</span></tt></a> includes two functions for computing checksums of data, <tt class="xref py py-func docutils literal"><span class="pre">adler32()</span></tt> and <tt class="xref py py-func docutils literal"><span class="pre">crc32()</span></tt>. Neither checksum is billed as cryptographically secure, and they are only intended for use for data integrity verification.</p> <p>Both functions take the same arguments, a string of data and an optional value to be used as a starting point for the checksum. They return a 32-bit signed integer value which can also be passed back on subsequent calls as a new starting point argument to produce a <em>running</em> checksum.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">zlib</span> <span class="n">data</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'lorem.txt'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">cksum</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">adler32</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">print</span> <span class="s">'Adler32: </span><span class="si">%12d</span><span class="s">'</span> <span class="o">%</span> <span class="n">cksum</span> <span class="k">print</span> <span class="s">' : </span><span class="si">%12d</span><span class="s">'</span> <span class="o">%</span> <span class="n">zlib</span><span class="o">.</span><span class="n">adler32</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">cksum</span><span class="p">)</span> <span class="n">cksum</span> <span class="o">=</span> <span class="n">zlib</span><span class="o">.</span><span class="n">crc32</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">print</span> <span class="s">'CRC-32 : </span><span class="si">%12d</span><span class="s">'</span> <span class="o">%</span> <span class="n">cksum</span> <span class="k">print</span> <span class="s">' : </span><span class="si">%12d</span><span class="s">'</span> <span class="o">%</span> <span class="n">zlib</span><span class="o">.</span><span class="n">crc32</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">cksum</span><span class="p">)</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_checksums.py Adler32: 1865879205 : 118955337 CRC-32 : 1878123957 : -1940264325</pre> </div> <p>The Adler32 algorithm is said to be faster than a standard CRC, but I found it to be slower in my own tests.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">timeit</span> <span class="n">iterations</span> <span class="o">=</span> <span class="mi">1000</span> <span class="k">def</span> <span class="nf">show_results</span><span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">iterations</span><span class="p">):</span> <span class="s">"Print results in terms of microseconds per pass and per item."</span> <span class="n">per_pass</span> <span class="o">=</span> <span class="mi">1000000</span> <span class="o">*</span> <span class="p">(</span><span class="n">result</span> <span class="o">/</span> <span class="n">iterations</span><span class="p">)</span> <span class="k">print</span> <span class="s">'</span><span class="si">%s</span><span class="s">:</span><span class="se">\t</span><span class="si">%.2f</span><span class="s"> usec/pass'</span> <span class="o">%</span> <span class="p">(</span><span class="n">title</span><span class="p">,</span> <span class="n">per_pass</span><span class="p">)</span> <span class="n">adler32</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span> <span class="n">stmt</span><span class="o">=</span><span class="s">"zlib.adler32(data)"</span><span class="p">,</span> <span class="n">setup</span><span class="o">=</span><span class="s">"import zlib; data=open('lorem.txt','r').read() * 10"</span><span class="p">,</span> <span class="p">)</span> <span class="n">show_results</span><span class="p">(</span><span class="s">'Adler32, separate'</span><span class="p">,</span> <span class="n">adler32</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">iterations</span><span class="p">),</span> <span class="n">iterations</span><span class="p">)</span> <span class="n">adler32_running</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span> <span class="n">stmt</span><span class="o">=</span><span class="s">"cksum = zlib.adler32(data, cksum)"</span><span class="p">,</span> <span class="n">setup</span><span class="o">=</span><span class="s">"import zlib; data=open('lorem.txt','r').read() * 10; cksum = zlib.adler32(data)"</span><span class="p">,</span> <span class="p">)</span> <span class="n">show_results</span><span class="p">(</span><span class="s">'Adler32, running'</span><span class="p">,</span> <span class="n">adler32_running</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">iterations</span><span class="p">),</span> <span class="n">iterations</span><span class="p">)</span> <span class="n">crc32</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span> <span class="n">stmt</span><span class="o">=</span><span class="s">"zlib.crc32(data)"</span><span class="p">,</span> <span class="n">setup</span><span class="o">=</span><span class="s">"import zlib; data=open('lorem.txt','r').read() * 10"</span><span class="p">,</span> <span class="p">)</span> <span class="n">show_results</span><span class="p">(</span><span class="s">'CRC-32, separate'</span><span class="p">,</span> <span class="n">crc32</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">iterations</span><span class="p">),</span> <span class="n">iterations</span><span class="p">)</span> <span class="n">crc32_running</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span> <span class="n">stmt</span><span class="o">=</span><span class="s">"cksum = zlib.crc32(data, cksum)"</span><span class="p">,</span> <span class="n">setup</span><span class="o">=</span><span class="s">"import zlib; data=open('lorem.txt','r').read() * 10; cksum = zlib.crc32(data)"</span><span class="p">,</span> <span class="p">)</span> <span class="n">show_results</span><span class="p">(</span><span class="s">'CRC-32, running'</span><span class="p">,</span> <span class="n">crc32_running</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">iterations</span><span class="p">),</span> <span class="n">iterations</span><span class="p">)</span> </pre></div> </div> <div class="highlight-python"><pre>$ python zlib_checksum_tests.py Adler32, separate: 37.38 usec/pass Adler32, running: 7.05 usec/pass CRC-32, separate: 10.19 usec/pass CRC-32, running: 10.33 usec/pass</pre> </div> <div class="admonition-see-also admonition seealso"> <p class="first admonition-title">See also</p> <dl class="last docutils"> <dt><a class="reference external" href="http://docs.python.org/library/zlib.html">zlib</a></dt> <dd>The standard library documentation for this module.</dd> <dt><a class="reference internal" href="../gzip/index.html#module-gzip" title="gzip: Read and write gzip files"><tt class="xref py py-mod docutils literal"><span class="pre">gzip</span></tt></a></dt> <dd>The gzip module includes a higher level (file-based) interface to the zlib library.</dd> <dt><a class="reference external" href="http://www.zlib.net/">http://www.zlib.net/</a></dt> <dd>Home page for zlib library.</dd> <dt><a class="reference external" href="http://www.zlib.net/manual.html">http://www.zlib.net/manual.html</a></dt> <dd>Complete zlib documentation.</dd> <dt><a class="reference internal" href="../bz2/index.html#module-bz2" title="bz2: bzip2 compression"><tt class="xref py py-mod docutils literal"><span class="pre">bz2</span></tt></a></dt> <dd>The bz2 module provides a similar interface to the bzip2 compression library.</dd> </dl> </div> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../persistence.html" title="Data Persistence" >next</a> |</li> <li class="right" > <a href="../zipfile/index.html" title="zipfile – Read and write ZIP archive files" >previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../compression.html" >Data Compression and Archiving</a> »</li> </ul> </div> <div class="footer"> © Copyright Doug Hellmann. Last updated on Oct 24, 2010. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a>. <br/><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/us/" rel="license"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png"/></a> </div> </body> </html>