<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>csv – Comma-separated value files — Python Module of the Week</title> <link rel="stylesheet" href="../_static/sphinxdoc.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../', VERSION: '1.132', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <link rel="author" title="About these documents" href="../about.html" /> <link rel="top" title="Python Module of the Week" href="../index.html" /> <link rel="up" title="File Formats" href="../file_formats.html" /> <link rel="next" title="ConfigParser – Work with configuration files" href="../ConfigParser/index.html" /> <link rel="prev" title="File Formats" href="../file_formats.html" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../ConfigParser/index.html" title="ConfigParser – Work with configuration files" accesskey="N">next</a> |</li> <li class="right" > <a href="../file_formats.html" title="File Formats" accesskey="P">previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../file_formats.html" accesskey="U">File Formats</a> »</li> </ul> </div> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <h3><a href="../contents.html">Table Of Contents</a></h3> <ul> <li><a class="reference internal" href="#">csv – Comma-separated value files</a><ul> <li><a class="reference internal" href="#reading">Reading</a></li> <li><a class="reference internal" href="#writing">Writing</a><ul> <li><a class="reference internal" href="#quoting">Quoting</a></li> </ul> </li> <li><a class="reference internal" href="#dialects">Dialects</a><ul> <li><a class="reference internal" href="#creating-a-dialect">Creating a Dialect</a></li> <li><a class="reference internal" href="#dialect-parameters">Dialect Parameters</a></li> <li><a class="reference internal" href="#automatically-detecting-dialects">Automatically Detecting Dialects</a></li> </ul> </li> <li><a class="reference internal" href="#using-field-names">Using Field Names</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="../file_formats.html" title="previous chapter">File Formats</a></p> <h4>Next topic</h4> <p class="topless"><a href="../ConfigParser/index.html" title="next chapter">ConfigParser – Work with configuration files</a></p> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="../_sources/csv/index.txt" rel="nofollow">Show Source</a></li> </ul> <div id="searchbox" style="display: none"> <h3>Quick search</h3> <form class="search" action="../search.html" method="get"> <input type="text" name="q" size="18" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> <p class="searchtip" style="font-size: 90%"> Enter search terms or a module, class or function name. </p> </div> <script type="text/javascript">$('#searchbox').show(0);</script> </div> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="module-csv"> <span id="csv-comma-separated-value-files"></span><h1>csv – Comma-separated value files<a class="headerlink" href="#module-csv" title="Permalink to this headline">¶</a></h1> <table class="docutils field-list" frame="void" rules="none"> <col class="field-name" /> <col class="field-body" /> <tbody valign="top"> <tr class="field"><th class="field-name">Purpose:</th><td class="field-body">Read and write comma separated value files.</td> </tr> <tr class="field"><th class="field-name">Python Version:</th><td class="field-body">2.3 and later</td> </tr> </tbody> </table> <p>The <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as <em>comma-separated value</em> (CSV) format because commas are often used to separate the fields in a record.</p> <div class="admonition note"> <p class="first admonition-title">Note</p> <p class="last">The Python 2.5 version of <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> does not support Unicode data. There are also “issues with ASCII NUL characters”. Using UTF-8 or printable ASCII is recommended.</p> </div> <div class="section" id="reading"> <h2>Reading<a class="headerlink" href="#reading" title="Permalink to this headline">¶</a></h2> <p>Use <tt class="xref py py-func docutils literal"><span class="pre">reader()</span></tt> to create a an object for reading data from a CSV file. The reader can be used as an iterator to process the rows of the file in order. For example:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'rt'</span><span class="p">)</span> <span class="k">try</span><span class="p">:</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span> <span class="k">print</span> <span class="n">row</span> <span class="k">finally</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> </pre></div> </div> <p>The first argument to <tt class="xref py py-func docutils literal"><span class="pre">reader()</span></tt> is the source of text lines. In this case, it is a file, but any iterable is accepted (<a class="reference internal" href="../StringIO/index.html#module-StringIO" title="StringIO: Work with text buffers using file-like API"><tt class="xref py py-mod docutils literal"><span class="pre">StringIO</span></tt></a> instances, lists, etc.). Other optional arguments can be given to control how the input data is parsed.</p> <p>This example file was exported from <a class="reference external" href="http://www.neooffice.org/">NeoOffice</a>.</p> <div class="highlight-python"><pre>"Title 1","Title 2","Title 3" 1,"a",08/18/07 2,"b",08/19/07 3,"c",08/20/07 4,"d",08/21/07 5,"e",08/22/07 6,"f",08/23/07 7,"g",08/24/07 8,"h",08/25/07 9,"i",08/26/07 </pre> </div> <p>As it is read, each row of the input data is parsed and converted to a list of strings.</p> <div class="highlight-python"><pre>$ python csv_reader.py testdata.csv ['Title 1', 'Title 2', 'Title 3'] ['1', 'a', '08/18/07'] ['2', 'b', '08/19/07'] ['3', 'c', '08/20/07'] ['4', 'd', '08/21/07'] ['5', 'e', '08/22/07'] ['6', 'f', '08/23/07'] ['7', 'g', '08/24/07'] ['8', 'h', '08/25/07'] ['9', 'i', '08/26/07']</pre> </div> <p>The parser handles line breaks embedded within strings in a row, which is why a “row” is not always the same as a “line” of input from the file.</p> <div class="highlight-python"><pre>"Title 1","Title 2","Title 3" 1,"first line second line",08/18/07 </pre> </div> <p>Values with line breaks in the input retain the internal line breaks when returned by the parser.</p> <div class="highlight-python"><pre>$ python csv_reader.py testlinebreak.csv ['Title 1', 'Title 2', 'Title 3'] ['1', 'first line\nsecond line', '08/18/07']</pre> </div> </div> <div class="section" id="writing"> <h2>Writing<a class="headerlink" href="#writing" title="Permalink to this headline">¶</a></h2> <p>Writing CSV files is just as easy as reading them. Use <tt class="xref py py-func docutils literal"><span class="pre">writer()</span></tt> to create an object for writing, then iterate over the rows, using <tt class="xref py py-func docutils literal"><span class="pre">writerow()</span></tt> to print them.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'wt'</span><span class="p">)</span> <span class="k">try</span><span class="p">:</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="s">'Title 1'</span><span class="p">,</span> <span class="s">'Title 2'</span><span class="p">,</span> <span class="s">'Title 3'</span><span class="p">)</span> <span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">),</span> <span class="s">'08/</span><span class="si">%02d</span><span class="s">/07'</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span> <span class="p">)</span> <span class="k">finally</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="k">print</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> </pre></div> </div> <p>The output does not look exactly like the exported data used in the reader example:</p> <div class="highlight-python"><pre>$ python csv_writer.py testout.csv Title 1,Title 2,Title 3 1,a,08/01/07 2,b,08/02/07 3,c,08/03/07 4,d,08/04/07 5,e,08/05/07 6,f,08/06/07 7,g,08/07/07 8,h,08/08/07 9,i,08/09/07 10,j,08/10/07</pre> </div> <p>The default quoting behavior is different for the writer, so the string column is not quoted. That is easy to change by adding a quoting argument to quote non-numeric values:</p> <div class="highlight-python"><div class="highlight"><pre><span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONNUMERIC</span><span class="p">)</span> </pre></div> </div> <p>And now the strings are quoted:</p> <div class="highlight-python"><pre>$ python csv_writer_quoted.py testout_quoted.csv "Title 1","Title 2","Title 3" 1,"a","08/01/07" 2,"b","08/02/07" 3,"c","08/03/07" 4,"d","08/04/07" 5,"e","08/05/07" 6,"f","08/06/07" 7,"g","08/07/07" 8,"h","08/08/07" 9,"i","08/09/07" 10,"j","08/10/07"</pre> </div> <div class="section" id="quoting"> <span id="csv-quoting"></span><h3>Quoting<a class="headerlink" href="#quoting" title="Permalink to this headline">¶</a></h3> <p>There are four different quoting options, defined as constants in the csv module.</p> <dl class="docutils"> <dt>QUOTE_ALL</dt> <dd>Quote everything, regardless of type.</dd> <dt>QUOTE_MINIMAL</dt> <dd>Quote fields with special characters (anything that would confuse a parser configured with the same dialect and options). This is the default</dd> <dt>QUOTE_NONNUMERIC</dt> <dd>Quote all fields that are not integers or floats. When used with the reader, input fields that are not quoted are converted to floats.</dd> <dt>QUOTE_NONE</dt> <dd>Do not quote anything on output. When used with the reader, quote characters are included in the field values (normally, they are treated as delimiters and stripped).</dd> </dl> </div> </div> <div class="section" id="dialects"> <h2>Dialects<a class="headerlink" href="#dialects" title="Permalink to this headline">¶</a></h2> <p>There is no well-defined standard for comma-separated value files, so the parser needs to be flexible. This flexibility means there are many parameters to control how <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> parses or writes data. Rather than passing each of these parameters to the reader and writer separately, they are grouped together conveniently into a <em>dialect</em> object.</p> <p>Dialect classes can be registered by name, so that callers of the csv module do not need to know the parameter settings in advance. The complete list of registered dialects can be retrieved with <tt class="xref py py-func docutils literal"><span class="pre">list_dialects()</span></tt>.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="k">print</span> <span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()</span> </pre></div> </div> <p>The standard library includes two dialects: <tt class="docutils literal"><span class="pre">excel</span></tt>, and <tt class="docutils literal"><span class="pre">excel-tabs</span></tt>. The <tt class="docutils literal"><span class="pre">excel</span></tt> dialect is for working with data in the default export format for Microsoft Excel, and also works with OpenOffice or NeoOffice.</p> <div class="highlight-python"><pre>$ python csv_list_dialects.py ['excel-tab', 'excel']</pre> </div> <div class="section" id="creating-a-dialect"> <h3>Creating a Dialect<a class="headerlink" href="#creating-a-dialect" title="Permalink to this headline">¶</a></h3> <p>Suppose instead of using commas to delimit fields, the input file uses <tt class="docutils literal"><span class="pre">|</span></tt>, like this:</p> <div class="highlight-python"><pre>"Title 1"|"Title 2"|"Title 3" 1|"first line second line"|08/18/07 </pre> </div> <p>A new dialect can be registered using the appropriate delimiter:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">'pipes'</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">'|'</span><span class="p">)</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'testdata.pipes'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="s">'pipes'</span><span class="p">)</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span> <span class="k">print</span> <span class="n">row</span> </pre></div> </div> <p>and the file can be read just as with the comma-delimited file:</p> <div class="highlight-python"><pre>$ python csv_dialect.py ['Title 1', 'Title 2', 'Title 3'] ['1', 'first line\nsecond line', '08/18/07']</pre> </div> </div> <div class="section" id="dialect-parameters"> <h3>Dialect Parameters<a class="headerlink" href="#dialect-parameters" title="Permalink to this headline">¶</a></h3> <p>A dialect specifies all of the tokens used when parsing or writing a data file. Every aspect of the file format can be specified, from the way columns are delimited to the character used to escape a token.</p> <table border="1" class="docutils"> <colgroup> <col width="15%" /> <col width="21%" /> <col width="64%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Attribute</th> <th class="head">Default</th> <th class="head">Meaning</th> </tr> </thead> <tbody valign="top"> <tr><td>delimiter</td> <td><tt class="docutils literal"><span class="pre">,</span></tt></td> <td>Field separator (one character)</td> </tr> <tr><td>doublequote</td> <td>True</td> <td>Flag controlling whether quotechar instances are doubled</td> </tr> <tr><td>escapechar</td> <td>None</td> <td>Character used to indicate an escape sequence</td> </tr> <tr><td>lineterminator</td> <td><tt class="docutils literal"><span class="pre">\r\n</span></tt></td> <td>String used by writer to terminate a line</td> </tr> <tr><td>quotechar</td> <td><tt class="docutils literal"><span class="pre">"</span></tt></td> <td>String to surround fields containing special values (one character)</td> </tr> <tr><td>quoting</td> <td><tt class="xref py py-const docutils literal"><span class="pre">QUOTE_MINIMAL</span></tt></td> <td>Controls quoting behavior described above</td> </tr> <tr><td>skipinitialspace</td> <td>False</td> <td>Ignore whitespace after the field delimiter</td> </tr> </tbody> </table> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">'escaped'</span><span class="p">,</span> <span class="n">escapechar</span><span class="o">=</span><span class="s">'</span><span class="se">\\</span><span class="s">'</span><span class="p">,</span> <span class="n">doublequote</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONE</span><span class="p">)</span> <span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">'singlequote'</span><span class="p">,</span> <span class="n">quotechar</span><span class="o">=</span><span class="s">"'"</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_ALL</span><span class="p">)</span> <span class="n">quoting_modes</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span> <span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="n">csv</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">dir</span><span class="p">(</span><span class="n">csv</span><span class="p">)</span> <span class="k">if</span> <span class="n">n</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">'QUOTE_'</span><span class="p">)</span> <span class="p">)</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()):</span> <span class="k">print</span> <span class="s">'</span><span class="se">\n</span><span class="s">Dialect: "</span><span class="si">%s</span><span class="s">"</span><span class="se">\n</span><span class="s">'</span> <span class="o">%</span> <span class="n">name</span> <span class="n">dialect</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">get_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">print</span> <span class="s">' delimiter = </span><span class="si">%-6r</span><span class="s"> skipinitialspace = </span><span class="si">%r</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">,</span> <span class="n">dialect</span><span class="o">.</span><span class="n">skipinitialspace</span><span class="p">)</span> <span class="k">print</span> <span class="s">' doublequote = </span><span class="si">%-6r</span><span class="s"> quoting = </span><span class="si">%s</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">doublequote</span><span class="p">,</span> <span class="n">quoting_modes</span><span class="p">[</span><span class="n">dialect</span><span class="o">.</span><span class="n">quoting</span><span class="p">])</span> <span class="k">print</span> <span class="s">' quotechar = </span><span class="si">%-6r</span><span class="s"> lineterminator = </span><span class="si">%r</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">dialect</span><span class="o">.</span><span class="n">quotechar</span><span class="p">,</span> <span class="n">dialect</span><span class="o">.</span><span class="n">lineterminator</span><span class="p">)</span> <span class="k">print</span> <span class="s">' escapechar = </span><span class="si">%-6r</span><span class="s">'</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">escapechar</span> <span class="k">print</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="s">'col1'</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">'10/</span><span class="si">%02d</span><span class="s">/2010'</span> <span class="o">%</span> <span class="n">i</span><span class="p">,</span> <span class="s">'Contains special chars: " </span><span class="se">\'</span><span class="s"> </span><span class="si">%s</span><span class="s"> to be parsed'</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">)</span> <span class="p">)</span> <span class="k">print</span> </pre></div> </div> <p>This program shows how the same data appears in several different dialects.</p> <div class="highlight-python"><pre>$ python csv_dialect_variations.py Dialect: "escaped" delimiter = ',' skipinitialspace = 0 doublequote = 0 quoting = QUOTE_NONE quotechar = '"' lineterminator = '\r\n' escapechar = '\\' col1,0,10/00/2010,Contains special chars: \" ' \, to be parsed col1,1,10/01/2010,Contains special chars: \" ' \, to be parsed col1,2,10/02/2010,Contains special chars: \" ' \, to be parsed Dialect: "excel" delimiter = ',' skipinitialspace = 0 doublequote = 1 quoting = QUOTE_MINIMAL quotechar = '"' lineterminator = '\r\n' escapechar = None col1,0,10/00/2010,"Contains special chars: "" ' , to be parsed" col1,1,10/01/2010,"Contains special chars: "" ' , to be parsed" col1,2,10/02/2010,"Contains special chars: "" ' , to be parsed" Dialect: "excel-tab" delimiter = '\t' skipinitialspace = 0 doublequote = 1 quoting = QUOTE_MINIMAL quotechar = '"' lineterminator = '\r\n' escapechar = None col1 0 10/00/2010 "Contains special chars: "" ' to be parsed" col1 1 10/01/2010 "Contains special chars: "" ' to be parsed" col1 2 10/02/2010 "Contains special chars: "" ' to be parsed" Dialect: "singlequote" delimiter = ',' skipinitialspace = 0 doublequote = 1 quoting = QUOTE_ALL quotechar = "'" lineterminator = '\r\n' escapechar = None 'col1','0','10/00/2010','Contains special chars: " '' , to be parsed' 'col1','1','10/01/2010','Contains special chars: " '' , to be parsed' 'col1','2','10/02/2010','Contains special chars: " '' , to be parsed'</pre> </div> </div> <div class="section" id="automatically-detecting-dialects"> <h3>Automatically Detecting Dialects<a class="headerlink" href="#automatically-detecting-dialects" title="Permalink to this headline">¶</a></h3> <p>The best way to configure a dialect for parsing an input file is to know the right settings in advance. For data where the dialect parameters are unknown, the <tt class="xref py py-class docutils literal"><span class="pre">Sniffer</span></tt> class can be used to make an educated guess. The <tt class="xref py py-func docutils literal"><span class="pre">sniff()</span></tt> method takes a sample of the input data and an optional argument giving the possible delimiter characters.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">from</span> <span class="nn">StringIO</span> <span class="kn">import</span> <span class="n">StringIO</span> <span class="kn">import</span> <span class="nn">textwrap</span> <span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">'escaped'</span><span class="p">,</span> <span class="n">escapechar</span><span class="o">=</span><span class="s">'</span><span class="se">\\</span><span class="s">'</span><span class="p">,</span> <span class="n">doublequote</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_NONE</span><span class="p">)</span> <span class="n">csv</span><span class="o">.</span><span class="n">register_dialect</span><span class="p">(</span><span class="s">'singlequote'</span><span class="p">,</span> <span class="n">quotechar</span><span class="o">=</span><span class="s">"'"</span><span class="p">,</span> <span class="n">quoting</span><span class="o">=</span><span class="n">csv</span><span class="o">.</span><span class="n">QUOTE_ALL</span><span class="p">)</span> <span class="c"># Generate sample data for all known dialects</span> <span class="n">samples</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">csv</span><span class="o">.</span><span class="n">list_dialects</span><span class="p">()):</span> <span class="nb">buffer</span> <span class="o">=</span> <span class="n">StringIO</span><span class="p">()</span> <span class="n">dialect</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">get_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="nb">buffer</span><span class="p">,</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span> <span class="p">(</span><span class="s">'col1'</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="s">'10/</span><span class="si">%02d</span><span class="s">/2010'</span> <span class="o">%</span> <span class="n">i</span><span class="p">,</span> <span class="s">'Contains special chars: " </span><span class="se">\'</span><span class="s"> </span><span class="si">%s</span><span class="s"> to be parsed'</span> <span class="o">%</span> <span class="n">dialect</span><span class="o">.</span><span class="n">delimiter</span><span class="p">)</span> <span class="p">)</span> <span class="n">samples</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">dialect</span><span class="p">,</span> <span class="nb">buffer</span><span class="o">.</span><span class="n">getvalue</span><span class="p">())</span> <span class="p">)</span> <span class="c"># Guess the dialect for a given sample, then use the results to parse</span> <span class="c"># the data.</span> <span class="n">sniffer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">Sniffer</span><span class="p">()</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">expected</span><span class="p">,</span> <span class="n">sample</span> <span class="ow">in</span> <span class="n">samples</span><span class="p">:</span> <span class="k">print</span> <span class="s">'</span><span class="se">\n</span><span class="s">Dialect: "</span><span class="si">%s</span><span class="s">"</span><span class="se">\n</span><span class="s">'</span> <span class="o">%</span> <span class="n">name</span> <span class="n">dialect</span> <span class="o">=</span> <span class="n">sniffer</span><span class="o">.</span><span class="n">sniff</span><span class="p">(</span><span class="n">sample</span><span class="p">,</span> <span class="n">delimiters</span><span class="o">=</span><span class="s">',</span><span class="se">\t</span><span class="s">'</span><span class="p">)</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">StringIO</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">dialect</span><span class="o">=</span><span class="n">dialect</span><span class="p">)</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span> <span class="k">print</span> <span class="n">row</span> </pre></div> </div> <p><tt class="xref py py-func docutils literal"><span class="pre">sniff()</span></tt> returns a <tt class="xref py py-class docutils literal"><span class="pre">Dialect</span></tt> instance with the settings to be used for parsing the data. The results are not always perfect, as demonstrated by the “escaped” dialect in the example.</p> <div class="highlight-python"><pre>$ python csv_dialect_sniffer.py Dialect: "escaped" ['col1', '0', '10/00/2010', 'Contains special chars: \\" \' \\', ' to be parsed'] ['col1', '1', '10/01/2010', 'Contains special chars: \\" \' \\', ' to be parsed'] ['col1', '2', '10/02/2010', 'Contains special chars: \\" \' \\', ' to be parsed'] Dialect: "excel" ['col1', '0', '10/00/2010', 'Contains special chars: " \' , to be parsed'] ['col1', '1', '10/01/2010', 'Contains special chars: " \' , to be parsed'] ['col1', '2', '10/02/2010', 'Contains special chars: " \' , to be parsed'] Dialect: "excel-tab" ['col1', '0', '10/00/2010', 'Contains special chars: " \' \t to be parsed'] ['col1', '1', '10/01/2010', 'Contains special chars: " \' \t to be parsed'] ['col1', '2', '10/02/2010', 'Contains special chars: " \' \t to be parsed'] Dialect: "singlequote" ['col1', '0', '10/00/2010', 'Contains special chars: " \' , to be parsed'] ['col1', '1', '10/01/2010', 'Contains special chars: " \' , to be parsed'] ['col1', '2', '10/02/2010', 'Contains special chars: " \' , to be parsed']</pre> </div> </div> </div> <div class="section" id="using-field-names"> <h2>Using Field Names<a class="headerlink" href="#using-field-names" title="Permalink to this headline">¶</a></h2> <p>In addition to working with sequences of data, the <a class="reference internal" href="#module-csv" title="csv: Read and write comma separated value files."><tt class="xref py py-mod docutils literal"><span class="pre">csv</span></tt></a> module includes classes for working with rows as dictionaries so that the fields can be named. The <tt class="xref py py-class docutils literal"><span class="pre">DictReader</span></tt> and <tt class="xref py py-class docutils literal"><span class="pre">DictWriter</span></tt> classes translate rows to dictionaries instead of lists. Keys for the dictionary can be passed in, or inferred from the first row in the input (when the row contains headers).</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'rt'</span><span class="p">)</span> <span class="k">try</span><span class="p">:</span> <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">DictReader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span> <span class="k">print</span> <span class="n">row</span> <span class="k">finally</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> </pre></div> </div> <p>The dictionary-based reader and writer are implemented as wrappers around the sequence-based classes, and use the same methods and arguments. The only difference in the reader API is that rows are returned as dictionaries instead of lists or tuples.</p> <div class="highlight-python"><pre>$ python csv_dictreader.py testdata.csv {'Title 1': '1', 'Title 3': '08/18/07', 'Title 2': 'a'} {'Title 1': '2', 'Title 3': '08/19/07', 'Title 2': 'b'} {'Title 1': '3', 'Title 3': '08/20/07', 'Title 2': 'c'} {'Title 1': '4', 'Title 3': '08/21/07', 'Title 2': 'd'} {'Title 1': '5', 'Title 3': '08/22/07', 'Title 2': 'e'} {'Title 1': '6', 'Title 3': '08/23/07', 'Title 2': 'f'} {'Title 1': '7', 'Title 3': '08/24/07', 'Title 2': 'g'} {'Title 1': '8', 'Title 3': '08/25/07', 'Title 2': 'h'} {'Title 1': '9', 'Title 3': '08/26/07', 'Title 2': 'i'}</pre> </div> <p>The <tt class="xref py py-class docutils literal"><span class="pre">DictWriter</span></tt> must be given a list of field names so it knows how to order the columns in the output.</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">csv</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'wt'</span><span class="p">)</span> <span class="k">try</span><span class="p">:</span> <span class="n">fieldnames</span> <span class="o">=</span> <span class="p">(</span><span class="s">'Title 1'</span><span class="p">,</span> <span class="s">'Title 2'</span><span class="p">,</span> <span class="s">'Title 3'</span><span class="p">)</span> <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">DictWriter</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">fieldnames</span><span class="o">=</span><span class="n">fieldnames</span><span class="p">)</span> <span class="n">headers</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span> <span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="n">fieldnames</span> <span class="p">)</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span><span class="n">headers</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span> <span class="n">writer</span><span class="o">.</span><span class="n">writerow</span><span class="p">({</span> <span class="s">'Title 1'</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="s">'Title 2'</span><span class="p">:</span><span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span> <span class="o">+</span> <span class="n">i</span><span class="p">),</span> <span class="s">'Title 3'</span><span class="p">:</span><span class="s">'08/</span><span class="si">%02d</span><span class="s">/07'</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="p">})</span> <span class="k">finally</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="k">print</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">'rt'</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> </pre></div> </div> <div class="highlight-python"><pre>$ python csv_dictwriter.py testout.csv Title 1,Title 2,Title 3 1,a,08/01/07 2,b,08/02/07 3,c,08/03/07 4,d,08/04/07 5,e,08/05/07 6,f,08/06/07 7,g,08/07/07 8,h,08/08/07 9,i,08/09/07 10,j,08/10/07</pre> </div> <div class="admonition-see-also admonition seealso"> <p class="first admonition-title">See also</p> <dl class="last docutils"> <dt><a class="reference external" href="http://docs.python.org/library/csv.html">csv</a></dt> <dd>The standard library documentation for this module.</dd> <dt><span class="target" id="index-0"></span><a class="pep reference external" href="http://www.python.org/dev/peps/pep-0305"><strong>PEP 305</strong></a></dt> <dd>CSV File API</dd> </dl> </div> </div> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="../ConfigParser/index.html" title="ConfigParser – Work with configuration files" >next</a> |</li> <li class="right" > <a href="../file_formats.html" title="File Formats" >previous</a> |</li> <li><a href="../contents.html">PyMOTW</a> »</li> <li><a href="../file_formats.html" >File Formats</a> »</li> </ul> </div> <div class="footer"> © Copyright Doug Hellmann. Last updated on Oct 24, 2010. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a>. <br/><a href="http://creativecommons.org/licenses/by-nc-sa/3.0/us/" rel="license"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png"/></a> </div> </body> </html>