Python: How to read and write CSV files


What is CSV File? #


CSV (Comma-separated values) is a common data exchange format used by the applications to produce and consume data. Some other well-known data exchange formats are XML, HTML, JSON etc.

A CSV file is a simple text file where each line contains a list of values (or fields) delimited by commas.

Although the term "Comma" appears in the format name itself, but you will encounter CSV files where data is delimited using tab (\t) or pipe (|) or any other character that can be used as a delimiter.

The first line of the CSV file represents the header containing a list of column names in the file. The header is optional but highly recommended.

The CSV file is commonly used to represent tabular data. For example, consider the following table:

ID Name CountryCode District Population
1 Kabul AFG Kabol 1780000
2 Qandahar AFG Qandahar 237500
3 Herat AFG Herat 186800
4 Mazar-e-Sharif AFG Balkh 127800
5 Amsterdam NLD Noord-Holland 731200

The above table can be represented using CSV format as follows:

1
2
3
4
5
6
"ID","Name","CountryCode","District","Population"
"1","Kabul","AFG","Kabol","1780000"
"2","Qandahar","AFG","Qandahar","237500"
"3","Herat","AFG","Herat","186800"
"4","Mazar-e-Sharif","AFG","Balkh","127800"
"5","Amsterdam","NLD","Noord-Holland","731200"

If the values in your CSV file contains commas, then it must be enclosed inside double quotes. For example:

Name Age Address
Jerry 10 2776 McDowell Street, Nashville, Tennessee
Tom 20 3171 Jessie Street, Westerville, Ohio
Mike 30 1818 Sherman Street, Hope, Kansas

To retain the commas inside the Address field enclose it in double quotation marks, as follows:

1
2
3
4
Name,Age,Address
Jerry,10,"2776 McDowell Street, Nashville, Tennessee"
Tom,20,"3171 Jessie Street, Westerville, Ohio"
Mike,30,"1818 Sherman Street, Hope, Kansas"

Similarly, if you have double quotes embedded inside a field, it must be escaped with another double quote character. Otherwise, they will not be interpreted correctly. For example:

Id User Comment
1 Bob John said "Hello World"
2 Tom "The Magician"

To retain the retain the double quotes inside the Comment field use two double quotes.

1
2
3
Id,User,Comment
1,Bob,"John said ""Hello World"""
2,Tom,""The Magician""

It is important to note that the CSV format is not fully standardized. So the rules we have just mentioned are not universal. Once in a while, you will encounter CSV files that have a different way of representing fields.

Fortunately, to make things easier for us Python provides the csv module.

Before we start reading and writing CSV files, you should have a good understanding of how to work with files in general. If you need a refresher, consider reading how to read and write file in Python.

The csv module is used for reading and writing files. It mainly provides following classes and functions:

  1. reader()
  2. writer()
  3. DictReader()
  4. DictWriter()

Let's start with the reader() function.

Reading a CSV File with reader() #


The reader() function takes a file object and returns a _csv.reader object that can be used to iterate over the contents of a CSV file. The syntax of reader() function is as follows:

Syntax: reader(fileobj [, dialect='excel' [, **fmtparam] ]) ->  _csv.reader

Argument Description
fileobj (required) It refers to the file object
dialect (optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft Excel. We will discuss dialect in detail later in this post.
fmtparam (optional) It refers to the set of keyword arguments to customize the dialect (see the next section).

Let's say we have following CSV file:

employees.csv

1
2
3
4
id,name,email,age,designation
1,John,john@mail.com,24,programmer
2,Bob,bob@mail.com,34,designer
3,Mary,mary@mail.com,43,sales

Here is how to read this CSV file:

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f)

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
['id', 'name', 'email', 'age', 'designation']
['1', 'John', 'john@mail.com', '24', 'programmer']
['2', 'Bob', 'bob@mail.com', '34', 'designer']
['3', 'Mary', 'mary@mail.com', '43', 'sales']

Notice that each line in the CSV file is returned as a list of strings.

To get the data from certain fields, you can use indexing. For example:

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f)

    for line in csv_reader:
        print(line[0], line[1], line[2])

Expected Output:

1
2
3
4
id name email
1 John john@mail.com
2 Bob bob@mail.com
3 Mary mary@mail.com

If you want to skip heading call the next() built-in function on the _csv.reader object and then loop over the remaining lines as usual.

1
2
3
4
5
6
7
8
9
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f)

    next(csv_reader) # skip the heading

    for line in csv_reader:
        print(line[0], line[1], line[2])

Expected Output:

1
2
3
1 John john@mail.com
2 Bob bob@mail.com
3 Mary mary@mail.com

Customizing the reader() #


By default, the csv module works according to the format used by Microsoft excel, but you can also define your own format using something called Dialect.

The following are some additional arguments that you can pass to the reader() function to customize its working.

  • delimiter - It refers to the character used to separate values (or fields) in the CSV file. It defaults to comma (,).
  • skipinitialspace  - It controls how the space following the delimiter will be interpreted. If True , the initial whitespaces will be removed. It defaults to False.
  • lineterminator - It refers to the character sequence used to terminate the line. It defaults to \r\n.
  • quotechar - It refers to the single character string that will be used to quote values if special characters (like delimiter) appears inside the field. It defaults to ".
  • quoting  - controls when quotes should be generated by the writer or recognized by the reader. It can take one of the following constants:
    • csv.QUOTE_MINIMAL means add quote only when required, for example, when a field contains either the quotechar or the delimiter. This is the default.
    • csv.QUOTE_ALL means quotes everything regardless of the field type.
    • csv.QUOTE_NONNUMERIC means quotes everything except integers and floats.
    • csv.QUOTE_NONE means that do not quote anything on output. However, while reading quotes are included around the field values.
  • escapechar - It refers to the one-character string used to escape the delimiter when quoting is set to QUOTE_NONE. It defaults to None.
  • doublequote  - controls the handling of quotes inside fields. When True, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes. Let's workthrough some examples to better understand how these arguments work:

delimiter argument #


employees_pipe.csv

1
2
3
4
id|name|email|age|designation
1|John|john@mail.com|24|programmer
2|Bob|bob@mail.com|34|designer
3|Mary|mary@mail.com|43|sales

This file uses pipe (|) character as a delimiter. Here is how to read this CSV file:

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f, delimiter='|')

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
['id', 'name', 'email', 'age', 'designation']
['1', 'John', 'john@mail.com', '24', 'programmer']
['2', 'Bob', 'bob@mail.com', '34', 'designer']
['3', 'Mary', 'mary@mail.com', '43', 'sales']

skipinitialspace argument #


baseball_players.csv

1
2
3
4
5
6
7
8
9
"Name", "Team", "Position", "Height(inches)", "Weight(lbs)", "Age"
"Adam Donachie", "BAL", "Catcher", 74, 180, 22.99
"Paul Bako", "BAL", "Catcher", 74, 215, 34.69
"Ramon Hernandez", "BAL", "Catcher", 72, 210, 30.78
"Kevin Millar", "BAL", "First Baseman", 72, 210, 35.43
"Chris Gomez", "BAL", "First Baseman", 73, 188, 35.71
"Brian Roberts", "BAL", "Second Baseman", 69, 176, 29.39
"Miguel Tejada", "BAL", "Shortstop", 69, 209, 30.77
"Melvin Mora", "BAL", "Third Baseman", 71, 200, 35.07

This CSV file contains spaces following the comma (,). To read this CSV file correctly, set skipinitialspace to True, as follows:

1
2
3
4
5
6
7
import csv

with open('baseball_players.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True)

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
5
6
7
8
9
['Name', 'Team', 'Position', 'Height(inches)', 'Weight(lbs)', 'Age']
['Adam Donachie', 'BAL', 'Catcher', '74', '180', '22.99']
['Paul Bako', 'BAL', 'Catcher', '74', '215', '34.69']
['Ramon Hernandez', 'BAL', 'Catcher', '72', '210', '30.78']
['Kevin Millar', 'BAL', 'First Baseman', '72', '210', '35.43']
['Chris Gomez', 'BAL', 'First Baseman', '73', '188', '35.71']
['Brian Roberts', 'BAL', 'Second Baseman', '69', '176', '29.39']
['Miguel Tejada', 'BAL', 'Shortstop', '69', '209', '30.77']
['Melvin Mora', 'BAL', 'Third Baseman', '71', '200', '35.07']

quotechar argument #


addresses.csv

1
2
3
4
Name, Age, Address
Jerry, 44, '2776 McDowell Street, Nashville, Tennessee'
Tom, 21, '3171 Jessie Street, Westerville, Ohio'
Mike, 32, '1818 Sherman Street, Hope, Kansas'

There are two things to notice in this file. First, the address field is wrapped using single quote (') instead of " double quote (which is the default). Second, there are spaces following the comma (,).

If you try to read this file without changing the quote character, you will get the output as follows:

1
2
3
4
5
6
7
import csv

with open('addresses.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True)

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
['Name', 'Age', 'Address']
['Jerry', '44', "'2776 McDowell Street", 'Nashville', "Tennessee'"]
['Tom', '21', "'3171 Jessie Street", 'Westerville', "Ohio'"]
['Mike', '32', "'1818 Sherman Street", 'Hope', "Kansas'"]

Notice that the address is split into three fields, which is certainly not correct. To fix the issue simply change quote character to a single quote (') using the quotechar argument:

1
2
3
4
5
6
7
import csv

with open('housing.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True, quotechar="'")

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
['Name', 'Age', 'Address']
['Jerry', '44', '2776 McDowell Street, Nashville, Tennessee']
['Tom', '21', '3171 Jessie Street, Westerville, Ohio']
['Mike', '32', '1818 Sherman Street, Hope, Kansas']

escapechar argument #


comments.csv

1
2
3
4
5
Id, User, Comment
1, Bob, "John said \"Hello World\""
2, Tom, "\"The Magician\""
3, Harry, "\"walk around the corner\" she explained to the child"
4, Louis, "He said, \"stop pulling the dog's tail\""

This file uses backslash (\) character to escape the embedded double quotes. However, by default the default csv module uses a double quote character to escape double quote character.

If try to read this file using default options you will get the output like this:

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True)

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
5
['Id', 'User', 'Comment']
['1', 'Bob', 'John said \\Hello World\\""']
['2', 'Tom', '\\The Magician\\""']
['3', 'Harry', '\\walk around the corner\\" she explained to the child"']
['4', 'Louis', 'He said, \\stop pulling the dog\'s tail\\""']

This output is certainly not desirable. To get the correct output change the escape character using escapechar argument, as follows:

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True, escapechar='\\')

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
5
['Id', 'User', 'Comment']
['1', 'Bob', 'John said "Hello World"']
['2', 'Tom', '"The Magician"']
['3', 'Harry', '"walk around the corner" she explained to the child']
['4', 'Louis', 'He said, "stop pulling the dog\'s tail"']

doublequote argument #


dialogues.csv

1
2
3
4
5
Id, Actor, Dialogue
1, Harley Betts, "The suspect told the arresting officer, ""I was nowhere near the crime."""
2, Clyde Esparza, "John said, ""I have just finished reading Browning's 'My Last Duchess.'"""
3, Zack Campbell, "Bill asked Sandra, ""Will you marry me?"""
4, Keziah Chaney, "The librarian whispered to us, ""The sign on the wall says 'Quiet'"""

This file uses double quote to escape the embedded double quote characters in the field. By default, doublequote is set to True. As a result, while reading two consecutive double quotes are interpreted as one.

1
2
3
4
5
6
7
8
import csv

with open('employees.csv', 'rt') as f:
    # same as csv_reader = csv.reader(f, skipinitialspace=True)
    csv_reader = csv.reader(f, skipinitialspace=True, doublequote=True)

for line in csv_reader:
    print(line)

Expected Output:

1
2
3
4
5
['Id', 'Actor', 'Dialogue']
['1', 'Harley Betts', 'The suspect told the arresting officer, "I was nowhere near the crime."']
['2', 'Clyde Esparza', 'John said, "I have just finished reading Browning\'s \'My Last Duchess.\'"']
['3', 'Zack Campbell', 'Bill asked Sandra, "Will you marry me?"']
['4', 'Keziah Chaney', 'The librarian whispered to us, "The sign on the wall says \'Quiet\'"']

If, however you set doublequote to False, the consecutive double quotes will appear in the output.

1
2
3
4
5
6
7
import csv

with open('employees.csv', 'rt') as f:
    csv_reader = csv.reader(f, skipinitialspace=True, doublequote=False)

    for line in csv_reader:
        print(line)

Expected Output:

1
2
3
4
5
['Id', 'Actor', 'Dialogue']
['1', 'Harley Betts', 'The suspect told the arresting officer, "I was nowhere near the crime."""']
['2', 'Clyde Esparza', 'John said, "I have just finished reading Browning\'s \'My Last Duchess.\'"""']
['3', 'Zack Campbell', 'Bill asked Sandra, "Will you marry me?"""']
['4', 'Keziah Chaney', 'The librarian whispered to us, "The sign on the wall says \'Quiet\'"""']

Writing CSV files with writer() #


To write data to a CSV file we use the writer() function. It accepts the same argument as the reader() function but returns a writer object (i.e _csv.writer):

Syntax: writer(fileobj [, dialect='excel' [, **fmtparam] ]) -> csv_writer

Argument Description
fileobj (required) It refers to the file object
dialect (optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft Excel. We will discuss dialect in detail later in this post.
fmtparam (optional) Formatting parameters, work same as the reader()'s function.

The writer instance provides the following two methods to write data:

Method Description
writerow(row) Writes a single row of data and returns the number of characters written. The row must be a sequence of strings and number.
writerows(rows) Writes multiple rows of data and returns None. The rows must be a sequence.

Here are examples:

Example 1: Using writerow()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f)

    csv_writer.writerow(header) # write header

    for row in rows:
        csv_writer.writerow(row)

Example 2: Using writerows()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f)

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

The output generated by both listing will be the same and it looks like this:

customers.csv

1
2
3
4
5
6
id,name,address,zip
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401
3,Sam,"3952 Little Street, Akron, Ohio",93704
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257

Notice that only the address field is wrapped around double quotes. This is because by default the quoting argument is set to QUOTE_MINIMAL. In other words, fields will be quoted only when quotechar or delimiter appears in the data.

Let's say you want double quotes around all textual data. To achieve this, set quoting argument to QUOTE_NONNUMERIC.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

Expected Output:

customers.csv

1
2
3
4
5
6
"id","name","address","zip"
1,"Hannah","4891 Blackwell Street, Anchorage, Alaska",99503
2,"Walton","4223 Half and Half Drive, Lemoore, California",97401
3,"Sam","3952 Little Street, Akron, Ohio",93704
4,"Chris","3192 Flinderation Road, Arlington Heights, Illinois",62677
5,"Doug","3236 Walkers Ridge Way, Burr Ridge",61257

Now all the names and addresses have double quotes around them.

If you want double quotes around all fields regardless of whether quotechar or delimiter appears in the data or not, set quoting to csv.QUOTE_ALL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f, quoting=csv.QUOTE_ALL)

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

Expected Output:

1
2
3
4
5
6
"id","name","address","zip"
"1","Hannah","4891 Blackwell Street, Anchorage, Alaska","99503"
"2","Walton","4223 Half and Half Drive, Lemoore, California","97401"
"3","Sam","3952 Little Street, Akron, Ohio","93704"
"4","Chris","3192 Flinderation Road, Arlington Heights, Illinois","62677"
"5","Doug","3236 Walkers Ridge Way, Burr Ridge","61257"

Everything is double-quoted now.

It is important to note that when quoting is on (i.e quoting parameter has a value other than csv.QUOTE_NONE), the csv module uses the quotechar (which defaults to ") to quote field.

The following listing changes the quote character from double quote (") to a single quote (').

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f, quotechar="'")

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

Expected Output:

1
2
3
4
5
6
id,name,address,zip
1,Hannah,'4891 Blackwell Street, Anchorage, Alaska',99503
2,Walton,'4223 Half and Half Drive, Lemoore, California',97401
3,Sam,'3952 Little Street, Akron, Ohio',93704
4,Chris,'3192 Flinderation Road, Arlington Heights, Illinois',62677
5,Doug,'3236 Walkers Ridge Way, Burr Ridge',61257

In this case, the csv module uses the single quote (') instead of (") to quote fields containing quotechar or delimiter.

We can also turn off quoting all-together by setting quoting to csv.QUOTE_NONE. However, if you do that and delimiter character appears in the data then you will get an error like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f, quoting=csv.QUOTE_NONE)

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

Expected Output:

1
2
3
4
Traceback (most recent call last):
...
csv_writer.writerows(rows)
_csv.Error: need to escape, but no escapechar set

The problem is that the address field contains embedded commas (,) and since we have turned off the ability to quote fields, the csv module doesn't know how to escape them properly.

This where the escapechar argument comes into play. It takes a one-character string that will be used to escape the delimiter when the the quoting is turned off (i.e quoting=csv.QUOTE_NONE).

The following listing set the escapechar to backslash (\).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import csv

header = ['id', 'name', 'address', 'zip']
rows = [
    [1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
    [2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
    [3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
    [4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
    [5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]

with open('customers.csv', 'wt') as f:
    csv_writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\')

    csv_writer.writerow(header) # write header

    csv_writer.writerows(rows)

Expected Output:

1
2
3
4
5
6
id,name,address,zip
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257

Notice that the commas (,) in the address field is escaped using the backslash (\) character.

You should now have a good understanding of various formatting arguments and the context in which they are used with the reader() and writer() function. In the next section will see some other ways to read and write data.

Reading a CSV file with DictReader #


DictReader works almost exactly like reader() but instead of retuning a line as a list, it returns a dictionary. Its syntax is as follows:

Syntax:: DictReader(fileobj, fieldnames=None, restkey=None, restval=None, dialect='excel', **fmtparam)

Argument Description
fileobj (required) It refers to the file object.
fieldnames (optional) It refers to the list of keys that will be used in the returned dictionary in order. If omitted, the field names are inferred from the first row of the CSV file.
restkey (optional) If the row has more fields than specified in the fieldnames parameter, then the remaining fields is stored as a sequence keyed by the value of restkey argument.
restval (optional) It provides value to fields which are missing from the input.
dialect (optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft excel. We will discuss dialect in detail later in this post.
fmtparam It refers to formatting arguments and works exactly like reader() and writer().

Let's take some examples:

Example 1:

customers.csv

1
2
3
4
5
6
id,name,address,zip
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
1
2
3
4
5
6
7
import csv

with open('customers.csv', 'rt') as f:
    csv_reader = csv.DictReader(f, escapechar='\\')

    for row in csv_reader:
        print(row)

Expected Output:

1
2
3
4
5
6
{'id': '1', 'name': 'Hannah', 'zip': '99503', 'address': '4891 Blackwell Street, Anchorage, Alaska'}
{'id': '2', 'name': 'Walton', 'zip': '97401', 'address': '4223 Half and Half Drive, Lemoore, California'}
{'id': '3', 'name': 'Sam', 'zip': '93704', 'address': '3952 Little Street, Akron, Ohio'}
{'id': '4', 'name': 'Chris', 'zip': '62677', 'address': '3192 Flinderation Road, Arlington Heights, Illinois'}
{'id': '5', 'name': 'Doug', 'zip': '61257', 'address': '3236 Walkers Ridge Way, Burr Ridge'}
**Note:** Order of keys in the result may vary. Since dictionary doesn't preserve the order of elements.

In this case, the field names are inferred from the first line (or header) of the CSV file.

Example 2: Using fieldnames parameter

1
2
3
4
5
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257

This CSV file has no header. So we have to provide field names via the fieldnames parameter.

1
2
3
4
5
6
7
8
9
import csv

with open('customers.csv', 'rt') as f:
    fields = ['id', 'name', 'address', 'zip']

    csv_reader = csv.DictReader(f, fieldnames=fields, escapechar='\\')

    for row in csv_reader:
        print(row)

Expected Output:

1
2
3
4
5
{'name': 'Hannah', 'zip': '99503', 'id': '1', 'address': '4891 Blackwell Street, Anchorage, Alaska'}
{'name': 'Walton', 'zip': '97401', 'id': '2', 'address': '4223 Half and Half Drive, Lemoore, California'}
{'name': 'Sam', 'zip': '93704', 'id': '3', 'address': '3952 Little Street, Akron, Ohio'}
{'name': 'Chris', 'zip': '62677', 'id': '4', 'address': '3192 Flinderation Road, Arlington Heights, Illinois'}
{'name': 'Doug', 'zip': '61257', 'id': '5', 'address': '3236 Walkers Ridge Way, Burr Ridge'}

Example 3: Using restkey parameter

1
2
3
4
5
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import csv

with open('customers.csv', 'rt') as f:

    fields = ['id','name',]

    csv_reader = csv.DictReader(f, fieldnames=fields, restkey='extra', escapechar='\\')

    for row in csv_reader:
        print(row)

Expected Output:

1
2
3
4
5
{'id': '1', 'name': 'Hannah', 'extra': ['4891 Blackwell Street, Anchorage, Alaska', '99503']}
{'id': '2', 'name': 'Walton', 'extra': ['4223 Half and Half Drive, Lemoore, California', '97401']}
{'id': '3', 'name': 'Sam', 'extra': ['3952 Little Street, Akron, Ohio', '93704']}
{'id': '4', 'name': 'Chris', 'extra': ['3192 Flinderation Road, Arlington Heights, Illinois', '62677']}
{'id': '5', 'name': 'Doug', 'extra': ['3236 Walkers Ridge Way, Burr Ridge', '61257']}

Notice that the address and zip code are now stored as a sequence keyed by value extra.

Example 4: Using restval

1
2
3
4
5
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import csv

with open('customers.csv', 'rt') as f:

    fields = ['id','name', 'address', 'zip', 'phone', 'email'] # two extra fields

    csv_reader = csv.DictReader(f, fieldnames=fields, restkey='extra', restval='NA', escapechar='\\')

    for row in csv_reader:
        print(row)

Expected Output:

1
2
3
4
5
{'id': '1', 'name': 'Hannah', 'email': 'NA', 'phone': 'NA', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': '99503'}
{'id': '2', 'name': 'Walton', 'email': 'NA', 'phone': 'NA', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': '97401'}
{'id': '3', 'name': 'Sam', 'email': 'NA', 'phone': 'NA', 'address': '3952 Little Street, Akron, Ohio', 'zip': '93704'}
{'id': '4', 'name': 'Chris', 'email': 'NA', 'phone': 'NA', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': '62677'}
{'id': '5', 'name': 'Doug', 'email': 'NA', 'phone': 'NA', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': '61257'}

In this case, we have specified field two extra fields: phone and email. The values for extra fields is provided by the restval argument.

Writing CSV files with DictWriter() #


The DictWriter object writes a dictionary to a CSV file. Its syntax is as follows:

Syntax: DictWriter(fileobj, fieldnames, restval='', extrasaction='raise', dialect='excel', **fmtparam)

Argument Description
fileobj It refers to the file object
fieldnames It refers to the field names and the order in which they will be written the file.
restval It provides the missing value for the keys which doesn't exist in the dictionary.
extrasaction It controls what action to take if the dictionary contains a key, that is not found in the fieldnames argument. By default, extrasaction is set to raise, which means an exception will be raised in such an event. If you want to ignore the extra values set extrasaction to ignore.

The DictWriter provides the following three methods to write data.

Method Description
writeheader() Writes the header (i.e fieldnames) to the CSV file and returns None.
writerow(row) Writes a single row of data and returns the number of characters written. The row must be a sequence of strings and number.
writerows(rows) Writes multiple rows of data and returns None. The rows must be a sequence.

Let's take some examples:

Example 1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import csv

header = ['id', 'name', 'address', 'zip']

rows = [
    {'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
    {'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
    {'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
    {'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
    {'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]

with open('dictcustomers.csv', 'wt') as f:

    csv_writer = csv.DictWriter(f, fieldnames=header)

    csv_writer.writeheader() # write header

    csv_writer.writerows(rows)

Expected Output:

dictcustomers.csv

1
2
3
4
5
6
id,name,address,zip
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401
3,Sam,"3952 Little Street, Akron, Ohio",93704
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257

Example 2: Using restval

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import csv

header = ['id', 'name', 'address', 'zip', 'email'] # an extra field email

rows = [
    {'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
    {'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
    {'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
    {'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
    {'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]

with open('dictcustomers.csv', 'wt') as f:

    csv_writer = csv.DictWriter(f, fieldnames=header, restval="NA")

    csv_writer.writeheader() # write header

    csv_writer.writerows(rows)

Expected Output:

dictcustomers.csv

1
2
3
4
5
6
id,name,address,zip,email
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503,NA
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401,NA
3,Sam,"3952 Little Street, Akron, Ohio",93704,NA
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677,NA
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257,NA

In this case, the value of email field is missing from the dictionaries. As s result, the value of restval will be used for the email field.

Example 3: Using extrasaction

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import csv

header = ['id', 'name', 'address'] # notice zip is missing

rows = [
    {'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
    {'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
    {'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
    {'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
    {'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]

with open('dictcustomers.csv', 'wt') as f:

    csv_writer = csv.DictWriter(
                 f,
                 fieldnames=header,
                 restval="NA",
                 extrasaction='ignore' # ignore extra values in the dictionary
    )

    csv_writer.writeheader() # write header

    csv_writer.writerows(rows)

Here, the dictionary contains an extra key named zip which is not present in the header list. To prevent the exception from being raised we have set extrasaction to ignore.

Expected Output:

dictcustomers.csv

1
2
3
4
5
6
id,name,address
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska"
2,Walton,"4223 Half and Half Drive, Lemoore, California"
3,Sam,"3952 Little Street, Akron, Ohio"
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois"
5,Doug,"3236 Walkers Ridge Way, Burr Ridge"

Creating Dialect #


Earlier in this post, we have learned various formatting parameters that allow us to customize the reader and writer object to accommodate for differences in the CSV conventions.

If you find yourself passing the same set of formatting parameters over and over again. Consider creating your own Dialect.

A dialect object or (simply dialect) is a way to group various formatting parameters. Once you have created the dialect object, simply pass it to the reader or writer, rather than passing each formatting argument separately.

To create a new dialect, we use register_dialect() function. It accepts dialect name as a string and one or more formatting parameters as keyword arguments.

The following table lists all the formatting arguments along with their default values:

Argument Default Description
delimiter , It refers to the character used to separate values (or fields) in the CSV file.
skipinitialspace False It controls how the space following the delimiter will be interpreted. If True, the initial whitespaces will be removed.
lineterminator \r\n It refers to the character sequence used to terminate the line.
quotechar " It refers to the single character string that will be used to quote values if special characters (like delimiter) appears inside the field.
quoting csv.QUOTE_NONE controls when quotes should be generated by the writer or recognized by the reader (see above for other options).
escapechar None It refers to the one-character string used to escape the delimiter when quoting is set to.
doublequote True controls the handling of quotes inside fields. When True, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes.

Let's create a simple dialect.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import csv

# create and register new dialect
csv.register_dialect('psv', delimiter='|', quoting=csv.QUOTE_NONNUMERIC)

header = ['id', 'year', 'age', 'name', 'movie']

rows = [
    {'id': 1, 'year': 2013, 'age': 55, 'name': "Daniel Day-Lewis", 'movie': "Lincoln" },
    {'id': 2, 'year': 2014, 'age': 44, 'name': "Matthew McConaughey", 'movie': "Dallas Buyers Club" },
    {'id': 3, 'year': 2015, 'age': 33, 'name': "Eddie Redmayne", 'movie': "The Theory of Everything" },
    {'id': 4, 'year': 2016, 'age': 41, 'name': "Leonardo DiCaprio", 'movie': "The Revenant" }
]

with open('oscars.csv', 'wt') as f:

    csv_writer = csv.DictWriter(
                 f,
                 fieldnames=header,
                 dialect='psv', # pass the new dialect
                 extrasaction='ignore'
    )

    csv_writer.writeheader() # write header

    csv_writer.writerows(rows)

Expected Output:

oscars.csv

1
2
3
4
5
"id"|"year"|"age"|"name"|"movie"
1|2013|55|"Daniel Day-Lewis"|"Lincoln"
2|2014|44|"Matthew McConaughey"|"Dallas Buyers Club"
3|2015|33|"Eddie Redmayne"|"The Theory of Everything"
4|2016|41|"Leonardo DiCaprio"|"The Revenant"

Other Tutorials (Sponsors)