[code.view]

[top] / python / PyMOTW / docs / csv / index.html


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>csv – Comma-separated value files &mdash; Python Module of the Week</title>
    <link rel="stylesheet" href="../_static/sphinxdoc.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../',
        VERSION:     '1.132',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <link rel="author" title="About these documents" href="../about.html" />
    <link rel="top" title="Python Module of the Week" href="../index.html" />
    <link rel="up" title="File Formats" href="../file_formats.html" />
    <link rel="next" title="ConfigParser – Work with configuration files" href="../ConfigParser/index.html" />
    <link rel="prev" title="File Formats" href="../file_formats.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="../ConfigParser/index.html" title="ConfigParser – Work with configuration files"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="../file_formats.html" title="File Formats"
             accesskey="P">previous</a> |</li>
        <li><a href="../contents.html">PyMOTW</a> &raquo;</li>
          <li><a href="../file_formats.html" accesskey="U">File Formats</a> &raquo;</li> 
      </ul>
    </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../contents.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">csv &#8211; Comma-separated value files</a><ul>
<li><a class="reference internal" href="#reading">Reading</a></li>
<li><a class="reference internal" href="#writing">Writing</a><ul>
<li><a class="reference internal" href="#quoting">Quoting</a></li>
</ul>
</li>
<li><a class="reference internal" href="#dialects">Dialects</a><ul>
<li><a class="reference internal" href="#creating-a-dialect">Creating a Dialect</a></li>
<li><a class="reference internal" href="#dialect-parameters">Dialect Parameters</a></li>
<li><a class="reference internal" href="#automatically-detecting-dialects">Automatically Detecting Dialects</a></li>
</ul>
</li>
<li><a class="reference internal" href="#using-field-names">Using Field Names</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="../file_formats.html"
                        title="previous chapter">File Formats</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="../ConfigParser/index.html"
                        title="next chapter">ConfigParser &#8211; Work with configuration files</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="../_sources/csv/index.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" size="18" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="module-csv">
<span id="csv-comma-separated-value-files"></span><h1>csv &#8211; Comma-separated value files<a class="headerlink" href="#module-csv" title="Permalink to this headline">¶</a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field"><th class="field-name">Purpose:</th><td class="field-body">Read and write comma separated value files.</td>
</tr>
<tr class="field"><th class="field-name">Python Version:</th><td class="field-body">2.3 and later</td>
</tr>
</tbody>
</table>
<p>The <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> module is useful for working with data exported from
spreadsheets and databases into text files formatted with fields and
records, commonly referred to as <em>comma-separated value</em> (CSV) format
because commas are often used to separate the fields in a record.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">The Python 2.5 version of <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> does not support Unicode
data. There are also &#8220;issues with ASCII NUL characters&#8221;. Using UTF-8
or printable ASCII is recommended.</p>
</div>
<div class="section" id="reading">
<h2>Reading<a class="headerlink" href="#reading" title="Permalink to this headline">¶</a></h2>
<p>Use <tt class="xref py py-func docutils literal"><span class="pre">reader()</span></tt> to create a an object for reading data from a CSV
file.  The reader can be used as an iterator to process the rows of
the file in order. For example:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;rt&#39;</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="k">print</span> <span class="n">row</span>
<span class="k">finally</span><span class="p">:</span>
    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
</div>
<p>The first argument to <tt class="xref py py-func docutils literal"><span class="pre">reader()</span></tt> is the source of text lines. In
this case, it is a file, but any iterable is accepted (<a class="reference internal" href="../StringIO/index.html#module-StringIO" title="StringIO: Work with text buffers using file-like API"><tt class="xref py py-mod docutils literal"><span class="pre">StringIO</span></tt></a>
instances, lists, etc.).  Other optional arguments can be given to
control how the input data is parsed.</p>
<p>This example file was exported from <a class="reference external" href="http://www.neooffice.org/">NeoOffice</a>.</p>
<div class="highlight-python"><pre>"Title 1","Title 2","Title 3"
1,"a",08/18/07
2,"b",08/19/07
3,"c",08/20/07
4,"d",08/21/07
5,"e",08/22/07
6,"f",08/23/07
7,"g",08/24/07
8,"h",08/25/07
9,"i",08/26/07
</pre>
</div>
<p>As it is read, each row of the input data is parsed and converted to a
list of strings.</p>
<div class="highlight-python"><pre>$ python csv_reader.py testdata.csv

['Title 1', 'Title 2', 'Title 3']
['1', 'a', '08/18/07']
['2', 'b', '08/19/07']
['3', 'c', '08/20/07']
['4', 'd', '08/21/07']
['5', 'e', '08/22/07']
['6', 'f', '08/23/07']
['7', 'g', '08/24/07']
['8', 'h', '08/25/07']
['9', 'i', '08/26/07']</pre>
</div>
<p>The parser handles line breaks embedded within strings in a row, which
is why a &#8220;row&#8221; is not always the same as a &#8220;line&#8221; of input from the
file.</p>
<div class="highlight-python"><pre>"Title 1","Title 2","Title 3"
1,"first line
second line",08/18/07
</pre>
</div>
<p>Values with line breaks in the input retain the internal line breaks
when returned by the parser.</p>
<div class="highlight-python"><pre>$ python csv_reader.py testlinebreak.csv

['Title 1', 'Title 2', 'Title 3']
['1', 'first line\nsecond line', '08/18/07']</pre>
</div>
</div>
<div class="section" id="writing">
<h2>Writing<a class="headerlink" href="#writing" title="Permalink to this headline">¶</a></h2>
<p>Writing CSV files is just as easy as reading them. Use <tt class="xref py py-func docutils literal"><span class="pre">writer()</span></tt>
to create an object for writing, then iterate over the rows, using
<tt class="xref py py-func docutils literal"><span class="pre">writerow()</span></tt> to print them.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;wt&#39;</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
    <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="s">&#39;Title 1&#39;</span><span class="p">,</span> <span class="s">&#39;Title 2&#39;</span><span class="p">,</span> <span class="s">&#39;Title 3&#39;</span><span class="p">)</span> <span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
        <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">&#39;a&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">),</span> <span class="s">&#39;08/</span><span class="si">%02d</span><span class="s">/07&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span> <span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>

<span class="k">print</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;rt&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</pre></div>
</div>
<p>The output does not look exactly like the exported data used in the reader
example:</p>
<div class="highlight-python"><pre>$ python csv_writer.py testout.csv

Title 1,Title 2,Title 3
1,a,08/01/07
2,b,08/02/07
3,c,08/03/07
4,d,08/04/07
5,e,08/05/07
6,f,08/06/07
7,g,08/07/07
8,h,08/08/07
9,i,08/09/07
10,j,08/10/07</pre>
</div>
<p>The default quoting behavior is different for the writer, so the string column
is not quoted. That is easy to change by adding a quoting argument to quote
non-numeric values:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONNUMERIC</span><span class="p">)</span>
</pre></div>
</div>
<p>And now the strings are quoted:</p>
<div class="highlight-python"><pre>$ python csv_writer_quoted.py testout_quoted.csv

"Title 1","Title 2","Title 3"
1,"a","08/01/07"
2,"b","08/02/07"
3,"c","08/03/07"
4,"d","08/04/07"
5,"e","08/05/07"
6,"f","08/06/07"
7,"g","08/07/07"
8,"h","08/08/07"
9,"i","08/09/07"
10,"j","08/10/07"</pre>
</div>
<div class="section" id="quoting">
<span id="csv-quoting"></span><h3>Quoting<a class="headerlink" href="#quoting" title="Permalink to this headline">¶</a></h3>
<p>There are four different quoting options, defined as constants in the csv
module.</p>
<dl class="docutils">
<dt>QUOTE_ALL</dt>
<dd>Quote everything, regardless of type.</dd>
<dt>QUOTE_MINIMAL</dt>
<dd>Quote fields with special characters (anything that would confuse a parser
configured with the same dialect and options). This is the default</dd>
<dt>QUOTE_NONNUMERIC</dt>
<dd>Quote all fields that are not integers or floats. When used with the reader,
input fields that are not quoted are converted to floats.</dd>
<dt>QUOTE_NONE</dt>
<dd>Do not quote anything on output. When used with the reader, quote characters
are included in the field values (normally, they are treated as delimiters and
stripped).</dd>
</dl>
</div>
</div>
<div class="section" id="dialects">
<h2>Dialects<a class="headerlink" href="#dialects" title="Permalink to this headline">¶</a></h2>
<p>There is no well-defined standard for comma-separated value files, so
the parser needs to be flexible.  This flexibility means there are
many parameters to control how <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> parses or writes data.
Rather than passing each of these parameters to the reader and writer
separately, they are grouped together conveniently into a <em>dialect</em>
object.</p>
<p>Dialect classes can be registered by name, so that callers of the csv
module do not need to know the parameter settings in advance.  The
complete list of registered dialects can be retrieved with
<tt class="xref py py-func docutils literal"><span class="pre">list_dialects()</span></tt>.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>

<span class="k">print</span> <span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()</span>
</pre></div>
</div>
<p>The standard library includes two dialects: <tt class="docutils literal"><span class="pre">excel</span></tt>, and
<tt class="docutils literal"><span class="pre">excel-tabs</span></tt>. The <tt class="docutils literal"><span class="pre">excel</span></tt> dialect is for working with data in the
default export format for Microsoft Excel, and also works with
OpenOffice or NeoOffice.</p>
<div class="highlight-python"><pre>$ python csv_list_dialects.py

['excel-tab', 'excel']</pre>
</div>
<div class="section" id="creating-a-dialect">
<h3>Creating a Dialect<a class="headerlink" href="#creating-a-dialect" title="Permalink to this headline">¶</a></h3>
<p>Suppose instead of using commas to delimit fields, the input file uses
<tt class="docutils literal"><span class="pre">|</span></tt>, like this:</p>
<div class="highlight-python"><pre>"Title 1"|"Title 2"|"Title 3"
1|"first line
second line"|08/18/07
</pre>
</div>
<p>A new dialect can be registered using the appropriate delimiter:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>

<span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">&#39;pipes&#39;</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">&#39;|&#39;</span><span class="p">)</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">&#39;testdata.pipes&#39;</span><span class="p">,</span> <span class="s">&#39;r&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="s">&#39;pipes&#39;</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="k">print</span> <span class="n">row</span>
</pre></div>
</div>
<p>and the file can be read just as with the comma-delimited file:</p>
<div class="highlight-python"><pre>$ python csv_dialect.py

['Title 1', 'Title 2', 'Title 3']
['1', 'first line\nsecond line', '08/18/07']</pre>
</div>
</div>
<div class="section" id="dialect-parameters">
<h3>Dialect Parameters<a class="headerlink" href="#dialect-parameters" title="Permalink to this headline">¶</a></h3>
<p>A dialect specifies all of the tokens used when parsing or writing a
data file.  Every aspect of the file format can be specified, from the
way columns are delimited to the character used to escape a token.</p>
<table border="1" class="docutils">
<colgroup>
<col width="15%" />
<col width="21%" />
<col width="64%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Attribute</th>
<th class="head">Default</th>
<th class="head">Meaning</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>delimiter</td>
<td><tt class="docutils literal"><span class="pre">,</span></tt></td>
<td>Field separator (one character)</td>
</tr>
<tr><td>doublequote</td>
<td>True</td>
<td>Flag controlling whether quotechar instances are doubled</td>
</tr>
<tr><td>escapechar</td>
<td>None</td>
<td>Character used to indicate an escape sequence</td>
</tr>
<tr><td>lineterminator</td>
<td><tt class="docutils literal"><span class="pre">\r\n</span></tt></td>
<td>String used by writer to terminate a line</td>
</tr>
<tr><td>quotechar</td>
<td><tt class="docutils literal"><span class="pre">&quot;</span></tt></td>
<td>String to surround fields containing special values (one character)</td>
</tr>
<tr><td>quoting</td>
<td><tt class="xref py py-const docutils literal"><span class="pre">QUOTE_MINIMAL</span></tt></td>
<td>Controls quoting behavior described above</td>
</tr>
<tr><td>skipinitialspace</td>
<td>False</td>
<td>Ignore whitespace after the field delimiter</td>
</tr>
</tbody>
</table>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">&#39;escaped&#39;</span><span class="p">,</span> <span class="n">escapechar</span><span class="o">=</span><span class="s">&#39;</span><span class="se">\\</span><span class="s">&#39;</span><span class="p">,</span> <span class="n">doublequote</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONE</span><span class="p">)</span>
<span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">&#39;singlequote&#39;</span><span class="p">,</span> <span class="n">quotechar</span><span class="o">=</span><span class="s">&quot;&#39;&quot;</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_ALL</span><span class="p">)</span>

<span class="n">quoting_modes</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span> <span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="n">csv</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">dir</span><span class="p">(</span><span class="n">csv</span><span class="p">)</span> <span class="k">if</span> <span class="n">n</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">&#39;QUOTE_&#39;</span><span class="p">)</span> <span class="p">)</span>

<span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()):</span>
    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">Dialect: &quot;</span><span class="si">%s</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&#39;</span> <span class="o">%</span> <span class="n">name</span>
    <span class="n">dialect</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">get_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>

    <span class="k">print</span> <span class="s">&#39;  delimiter   = </span><span class="si">%-6r</span><span class="s">    skipinitialspace = </span><span class="si">%r</span><span class="s">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">,</span>
                                                             <span class="n">dialect</span><span class="o">.</span><span class="n">skipinitialspace</span><span class="p">)</span>
    <span class="k">print</span> <span class="s">&#39;  doublequote = </span><span class="si">%-6r</span><span class="s">    quoting          = </span><span class="si">%s</span><span class="s">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">doublequote</span><span class="p">,</span>
                                                             <span class="n">quoting_modes</span><span class="p">[</span><span class="n">dialect</span><span class="o">.</span><span class="n">quoting</span><span class="p">])</span>
    <span class="k">print</span> <span class="s">&#39;  quotechar   = </span><span class="si">%-6r</span><span class="s">    lineterminator   = </span><span class="si">%r</span><span class="s">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">quotechar</span><span class="p">,</span>
                                                             <span class="n">dialect</span><span class="o">.</span><span class="n">lineterminator</span><span class="p">)</span>
    <span class="k">print</span> <span class="s">&#39;  escapechar  = </span><span class="si">%-6r</span><span class="s">&#39;</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">escapechar</span>
    <span class="k">print</span>
    
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
        <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span>
            <span class="p">(</span><span class="s">&#39;col1&#39;</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">&#39;10/</span><span class="si">%02d</span><span class="s">/2010&#39;</span> <span class="o">%</span> <span class="n">i</span><span class="p">,</span>
             <span class="s">&#39;Contains special chars: &quot; </span><span class="se">\&#39;</span><span class="s"> </span><span class="si">%s</span><span class="s"> to be parsed&#39;</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">)</span>
            <span class="p">)</span>
    <span class="k">print</span>
</pre></div>
</div>
<p>This program shows how the same data appears in several different dialects.</p>
<div class="highlight-python"><pre>$ python csv_dialect_variations.py


Dialect: "escaped"

  delimiter   = ','       skipinitialspace = 0
  doublequote = 0         quoting          = QUOTE_NONE
  quotechar   = '"'       lineterminator   = '\r\n'
  escapechar  = '\\'

col1,0,10/00/2010,Contains special chars: \" ' \, to be parsed
col1,1,10/01/2010,Contains special chars: \" ' \, to be parsed
col1,2,10/02/2010,Contains special chars: \" ' \, to be parsed


Dialect: "excel"

  delimiter   = ','       skipinitialspace = 0
  doublequote = 1         quoting          = QUOTE_MINIMAL
  quotechar   = '"'       lineterminator   = '\r\n'
  escapechar  = None

col1,0,10/00/2010,"Contains special chars: "" ' , to be parsed"
col1,1,10/01/2010,"Contains special chars: "" ' , to be parsed"
col1,2,10/02/2010,"Contains special chars: "" ' , to be parsed"


Dialect: "excel-tab"

  delimiter   = '\t'      skipinitialspace = 0
  doublequote = 1         quoting          = QUOTE_MINIMAL
  quotechar   = '"'       lineterminator   = '\r\n'
  escapechar  = None

col1    0       10/00/2010      "Contains special chars: "" '    to be parsed"
col1    1       10/01/2010      "Contains special chars: "" '    to be parsed"
col1    2       10/02/2010      "Contains special chars: "" '    to be parsed"


Dialect: "singlequote"

  delimiter   = ','       skipinitialspace = 0
  doublequote = 1         quoting          = QUOTE_ALL
  quotechar   = "'"       lineterminator   = '\r\n'
  escapechar  = None

'col1','0','10/00/2010','Contains special chars: " '' , to be parsed'
'col1','1','10/01/2010','Contains special chars: " '' , to be parsed'
'col1','2','10/02/2010','Contains special chars: " '' , to be parsed'</pre>
</div>
</div>
<div class="section" id="automatically-detecting-dialects">
<h3>Automatically Detecting Dialects<a class="headerlink" href="#automatically-detecting-dialects" title="Permalink to this headline">¶</a></h3>
<p>The best way to configure a dialect for parsing an input file is to
know the right settings in advance.  For data where the dialect
parameters are unknown, the <tt class="xref py py-class docutils literal"><span class="pre">Sniffer</span></tt> class can be used to make
an educated guess.  The <tt class="xref py py-func docutils literal"><span class="pre">sniff()</span></tt> method takes a sample of the
input data and an optional argument giving the possible delimiter
characters.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">from</span> <span class="nn">StringIO</span> <span class="kn">import</span> <span class="n">StringIO</span>
<span class="kn">import</span> <span class="nn">textwrap</span>

<span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">&#39;escaped&#39;</span><span class="p">,</span> <span class="n">escapechar</span><span class="o">=</span><span class="s">&#39;</span><span class="se">\\</span><span class="s">&#39;</span><span class="p">,</span> <span class="n">doublequote</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONE</span><span class="p">)</span>
<span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">&#39;singlequote&#39;</span><span class="p">,</span> <span class="n">quotechar</span><span class="o">=</span><span class="s">&quot;&#39;&quot;</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_ALL</span><span class="p">)</span>

<span class="c"># Generate sample data for all known dialects</span>

<span class="n">samples</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()):</span>
    <span class="nb">buffer</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span>
    <span class="n">dialect</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">get_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="nb">buffer</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
        <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span>
            <span class="p">(</span><span class="s">&#39;col1&#39;</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">&#39;10/</span><span class="si">%02d</span><span class="s">/2010&#39;</span> <span class="o">%</span> <span class="n">i</span><span class="p">,</span>
             <span class="s">&#39;Contains special chars: &quot; </span><span class="se">\&#39;</span><span class="s"> </span><span class="si">%s</span><span class="s"> to be parsed&#39;</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">)</span>
            <span class="p">)</span>
    <span class="n">samples</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">dialect</span><span class="p">,</span> <span class="nb">buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">())</span> <span class="p">)</span>

<span class="c"># Guess the dialect for a given sample, then use the results to parse</span>
<span class="c"># the data.</span>
 
<span class="n">sniffer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">Sniffer</span><span class="p">()</span>

<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">expected</span><span class="p">,</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">samples</span><span class="p">:</span>
    <span class="k">print</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">Dialect: &quot;</span><span class="si">%s</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&#39;</span> <span class="o">%</span> <span class="n">name</span>

    <span class="n">dialect</span> <span class="o">=</span> <span class="n">sniffer</span><span class="o">.</span><span class="n">sniff</span><span class="p">(</span><span class="n">sample</span><span class="p">,</span> <span class="n">delimiters</span><span class="o">=</span><span class="s">&#39;,</span><span class="se">\t</span><span class="s">&#39;</span><span class="p">)</span>

    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="k">print</span> <span class="n">row</span>
</pre></div>
</div>
<p><tt class="xref py py-func docutils literal"><span class="pre">sniff()</span></tt> returns a <tt class="xref py py-class docutils literal"><span class="pre">Dialect</span></tt> instance with the settings to
be used for parsing the data.  The results are not always perfect, as
demonstrated by the &#8220;escaped&#8221; dialect in the example.</p>
<div class="highlight-python"><pre>$ python csv_dialect_sniffer.py


Dialect: "escaped"

['col1', '0', '10/00/2010', 'Contains special chars: \\" \' \\', ' to be parsed']
['col1', '1', '10/01/2010', 'Contains special chars: \\" \' \\', ' to be parsed']
['col1', '2', '10/02/2010', 'Contains special chars: \\" \' \\', ' to be parsed']

Dialect: "excel"

['col1', '0', '10/00/2010', 'Contains special chars: " \' , to be parsed']
['col1', '1', '10/01/2010', 'Contains special chars: " \' , to be parsed']
['col1', '2', '10/02/2010', 'Contains special chars: " \' , to be parsed']

Dialect: "excel-tab"

['col1', '0', '10/00/2010', 'Contains special chars: " \' \t to be parsed']
['col1', '1', '10/01/2010', 'Contains special chars: " \' \t to be parsed']
['col1', '2', '10/02/2010', 'Contains special chars: " \' \t to be parsed']

Dialect: "singlequote"

['col1', '0', '10/00/2010', 'Contains special chars: " \' , to be parsed']
['col1', '1', '10/01/2010', 'Contains special chars: " \' , to be parsed']
['col1', '2', '10/02/2010', 'Contains special chars: " \' , to be parsed']</pre>
</div>
</div>
</div>
<div class="section" id="using-field-names">
<h2>Using Field Names<a class="headerlink" href="#using-field-names" title="Permalink to this headline">¶</a></h2>
<p>In addition to working with sequences of data, the <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> module
includes classes for working with rows as dictionaries so that the
fields can be named. The <tt class="xref py py-class docutils literal"><span class="pre">DictReader</span></tt> and <tt class="xref py py-class docutils literal"><span class="pre">DictWriter</span></tt>
classes translate rows to dictionaries instead of lists. Keys for the
dictionary can be passed in, or inferred from the first row in the
input (when the row contains headers).</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;rt&#39;</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">DictReader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="k">print</span> <span class="n">row</span>
<span class="k">finally</span><span class="p">:</span>
    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
</div>
<p>The dictionary-based reader and writer are implemented as wrappers
around the sequence-based classes, and use the same methods and
arguments. The only difference in the reader API is that rows are
returned as dictionaries instead of lists or tuples.</p>
<div class="highlight-python"><pre>$ python csv_dictreader.py testdata.csv

{'Title 1': '1', 'Title 3': '08/18/07', 'Title 2': 'a'}
{'Title 1': '2', 'Title 3': '08/19/07', 'Title 2': 'b'}
{'Title 1': '3', 'Title 3': '08/20/07', 'Title 2': 'c'}
{'Title 1': '4', 'Title 3': '08/21/07', 'Title 2': 'd'}
{'Title 1': '5', 'Title 3': '08/22/07', 'Title 2': 'e'}
{'Title 1': '6', 'Title 3': '08/23/07', 'Title 2': 'f'}
{'Title 1': '7', 'Title 3': '08/24/07', 'Title 2': 'g'}
{'Title 1': '8', 'Title 3': '08/25/07', 'Title 2': 'h'}
{'Title 1': '9', 'Title 3': '08/26/07', 'Title 2': 'i'}</pre>
</div>
<p>The <tt class="xref py py-class docutils literal"><span class="pre">DictWriter</span></tt> must be given a list of field names so it
knows how to order the columns in the output.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">sys</span>

<span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;wt&#39;</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">fieldnames</span> <span class="o">=</span> <span class="p">(</span><span class="s">&#39;Title 1&#39;</span><span class="p">,</span> <span class="s">&#39;Title 2&#39;</span><span class="p">,</span> <span class="s">&#39;Title 3&#39;</span><span class="p">)</span>
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">DictWriter</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">fieldnames</span><span class="o">=</span><span class="n">fieldnames</span><span class="p">)</span>
    <span class="n">headers</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span> <span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">fieldnames</span> <span class="p">)</span>
    <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span><span class="n">headers</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
        <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">({</span> <span class="s">&#39;Title 1&#39;</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span>
                          <span class="s">&#39;Title 2&#39;</span><span class="p">:</span><span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">&#39;a&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">),</span>
                          <span class="s">&#39;Title 3&#39;</span><span class="p">:</span><span class="s">&#39;08/</span><span class="si">%02d</span><span class="s">/07&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span>
                          <span class="p">})</span>
<span class="k">finally</span><span class="p">:</span>
    <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>

<span class="k">print</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;rt&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</pre></div>
</div>
<div class="highlight-python"><pre>$ python csv_dictwriter.py testout.csv

Title 1,Title 2,Title 3
1,a,08/01/07
2,b,08/02/07
3,c,08/03/07
4,d,08/04/07
5,e,08/05/07
6,f,08/06/07
7,g,08/07/07
8,h,08/08/07
9,i,08/09/07
10,j,08/10/07</pre>
</div>
<div class="admonition-see-also admonition seealso">
<p class="first admonition-title">See also</p>
<dl class="last docutils">
<dt><a class="reference external" href="http://docs.python.org/library/csv.html">csv</a></dt>
<dd>The standard library documentation for this module.</dd>
<dt><span class="target" id="index-0"></span><a class="pep reference external" href="http://www.python.org/dev/peps/pep-0305"><strong>PEP 305</strong></a></dt>
<dd>CSV File API</dd>
</dl>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="../ConfigParser/index.html" title="ConfigParser – Work with configuration files"
             >next</a> |</li>
        <li class="right" >
          <a href="../file_formats.html" title="File Formats"
             >previous</a> |</li>
        <li><a href="../contents.html">PyMOTW</a> &raquo;</li>
          <li><a href="../file_formats.html" >File Formats</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
      &copy; Copyright Doug Hellmann.
      Last updated on Oct 24, 2010.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a>.

    <br/><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/us/" rel="license"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png"/></a>
    
    </div>
  </body>
</html>

[top] / python / PyMOTW / docs / csv / index.html

contact | logmethods.com