(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!
Python: How to read and write CSV files
Updated on Jan 07, 2020
What is CSV File? #
CSV (Comma-separated values) is a common data exchange format used by the applications to produce and consume data. Some other well-known data exchange formats are XML, HTML, JSON etc.
A CSV file is a simple text file where each line contains a list of values (or fields) delimited by commas.
Although the term "Comma" appears in the format name itself, but you will encounter CSV files where data is delimited using tab (\t
) or pipe (|
) or any other character that can be used as a delimiter.
The first line of the CSV file represents the header containing a list of column names in the file. The header is optional but highly recommended.
The CSV file is commonly used to represent tabular data. For example, consider the following table:
ID | Name | CountryCode | District | Population |
---|---|---|---|---|
1 | Kabul | AFG | Kabol | 1780000 |
2 | Qandahar | AFG | Qandahar | 237500 |
3 | Herat | AFG | Herat | 186800 |
4 | Mazar-e-Sharif | AFG | Balkh | 127800 |
5 | Amsterdam | NLD | Noord-Holland | 731200 |
The above table can be represented using CSV format as follows:
1 2 3 4 5 6 | "ID","Name","CountryCode","District","Population"
"1","Kabul","AFG","Kabol","1780000"
"2","Qandahar","AFG","Qandahar","237500"
"3","Herat","AFG","Herat","186800"
"4","Mazar-e-Sharif","AFG","Balkh","127800"
"5","Amsterdam","NLD","Noord-Holland","731200"
|
If the values in your CSV file contains commas, then it must be enclosed inside double quotes. For example:
Name | Age | Address |
---|---|---|
Jerry | 10 | 2776 McDowell Street, Nashville, Tennessee |
Tom | 20 | 3171 Jessie Street, Westerville, Ohio |
Mike | 30 | 1818 Sherman Street, Hope, Kansas |
To retain the commas inside the Address field enclose it in double quotation marks, as follows:
1 2 3 4 | Name,Age,Address
Jerry,10,"2776 McDowell Street, Nashville, Tennessee"
Tom,20,"3171 Jessie Street, Westerville, Ohio"
Mike,30,"1818 Sherman Street, Hope, Kansas"
|
Similarly, if you have double quotes embedded inside a field, it must be escaped with another double quote character. Otherwise, they will not be interpreted correctly. For example:
Id | User | Comment |
---|---|---|
1 | Bob | John said "Hello World" |
2 | Tom | "The Magician" |
To retain the retain the double quotes inside the Comment field use two double quotes.
1 2 3 | Id,User,Comment
1,Bob,"John said ""Hello World"""
2,Tom,""The Magician""
|
It is important to note that the CSV format is not fully standardized. So the rules we have just mentioned are not universal. Once in a while, you will encounter CSV files that have a different way of representing fields.
Fortunately, to make things easier for us Python provides the csv
module.
Before we start reading and writing CSV files, you should have a good understanding of how to work with files in general. If you need a refresher, consider reading how to read and write file in Python.
The csv
module is used for reading and writing files. It mainly provides following classes and functions:
reader()
writer()
DictReader()
DictWriter()
Let's start with the reader()
function.
Reading a CSV File with reader() #
The reader()
function takes a file object and returns a _csv.reader
object that can be used to iterate over the contents of a CSV file. The syntax of reader()
function is as follows:
Syntax: reader(fileobj [, dialect='excel' [, **fmtparam] ]) -> _csv.reader
Argument | Description |
---|---|
fileobj |
(required) It refers to the file object |
dialect |
(optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft Excel. We will discuss dialect in detail later in this post. |
fmtparam |
(optional) It refers to the set of keyword arguments to customize the dialect (see the next section). |
Let's say we have following CSV file:
employees.csv
1 2 3 4 | id,name,email,age,designation
1,John,john@mail.com,24,programmer
2,Bob,bob@mail.com,34,designer
3,Mary,mary@mail.com,43,sales
|
Here is how to read this CSV file:
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 | ['id', 'name', 'email', 'age', 'designation']
['1', 'John', 'john@mail.com', '24', 'programmer']
['2', 'Bob', 'bob@mail.com', '34', 'designer']
['3', 'Mary', 'mary@mail.com', '43', 'sales']
|
Notice that each line in the CSV file is returned as a list of strings.
To get the data from certain fields, you can use indexing. For example:
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f)
for line in csv_reader:
print(line[0], line[1], line[2])
|
Expected Output:
1 2 3 4 | id name email
1 John john@mail.com
2 Bob bob@mail.com
3 Mary mary@mail.com
|
If you want to skip heading call the next()
built-in function on the _csv.reader
object and then loop over the remaining lines as usual.
1 2 3 4 5 6 7 8 9 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f)
next(csv_reader) # skip the heading
for line in csv_reader:
print(line[0], line[1], line[2])
|
Expected Output:
1 2 3 | 1 John john@mail.com
2 Bob bob@mail.com
3 Mary mary@mail.com
|
Customizing the reader() #
By default, the csv
module works according to the format used by Microsoft excel, but you can also define your own format using something called Dialect.
The following are some additional arguments that you can pass to the reader()
function to customize its working.
delimiter
- It refers to the character used to separate values (or fields) in the CSV file. It defaults to comma (,
).skipinitialspace
- It controls how the space following the delimiter will be interpreted. If True , the initial whitespaces will be removed. It defaults toFalse
.lineterminator
- It refers to the character sequence used to terminate the line. It defaults to\r\n
.quotechar
- It refers to the single character string that will be used to quote values if special characters (like delimiter) appears inside the field. It defaults to"
.quoting
- controls when quotes should be generated by the writer or recognized by the reader. It can take one of the following constants:csv.QUOTE_MINIMAL
means add quote only when required, for example, when a field contains either the quotechar or the delimiter. This is the default.csv.QUOTE_ALL
means quotes everything regardless of the field type.csv.QUOTE_NONNUMERIC
means quotes everything except integers and floats.csv.QUOTE_NONE
means that do not quote anything on output. However, while reading quotes are included around the field values.
escapechar
- It refers to the one-character string used to escape the delimiter when quoting is set toQUOTE_NONE
. It defaults toNone
.doublequote
- controls the handling of quotes inside fields. WhenTrue
, two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes. Let's workthrough some examples to better understand how these arguments work:
delimiter argument #
employees_pipe.csv
1 2 3 4 | id|name|email|age|designation
1|John|john@mail.com|24|programmer
2|Bob|bob@mail.com|34|designer
3|Mary|mary@mail.com|43|sales
|
This file uses pipe (|
) character as a delimiter. Here is how to read this CSV file:
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f, delimiter='|')
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 | ['id', 'name', 'email', 'age', 'designation']
['1', 'John', 'john@mail.com', '24', 'programmer']
['2', 'Bob', 'bob@mail.com', '34', 'designer']
['3', 'Mary', 'mary@mail.com', '43', 'sales']
|
skipinitialspace argument #
baseball_players.csv
1 2 3 4 5 6 7 8 9 | "Name", "Team", "Position", "Height(inches)", "Weight(lbs)", "Age"
"Adam Donachie", "BAL", "Catcher", 74, 180, 22.99
"Paul Bako", "BAL", "Catcher", 74, 215, 34.69
"Ramon Hernandez", "BAL", "Catcher", 72, 210, 30.78
"Kevin Millar", "BAL", "First Baseman", 72, 210, 35.43
"Chris Gomez", "BAL", "First Baseman", 73, 188, 35.71
"Brian Roberts", "BAL", "Second Baseman", 69, 176, 29.39
"Miguel Tejada", "BAL", "Shortstop", 69, 209, 30.77
"Melvin Mora", "BAL", "Third Baseman", 71, 200, 35.07
|
This CSV file contains spaces following the comma (,
). To read this CSV file correctly, set skipinitialspace
to True
, as follows:
1 2 3 4 5 6 7 | import csv
with open('baseball_players.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 5 6 7 8 9 | ['Name', 'Team', 'Position', 'Height(inches)', 'Weight(lbs)', 'Age']
['Adam Donachie', 'BAL', 'Catcher', '74', '180', '22.99']
['Paul Bako', 'BAL', 'Catcher', '74', '215', '34.69']
['Ramon Hernandez', 'BAL', 'Catcher', '72', '210', '30.78']
['Kevin Millar', 'BAL', 'First Baseman', '72', '210', '35.43']
['Chris Gomez', 'BAL', 'First Baseman', '73', '188', '35.71']
['Brian Roberts', 'BAL', 'Second Baseman', '69', '176', '29.39']
['Miguel Tejada', 'BAL', 'Shortstop', '69', '209', '30.77']
['Melvin Mora', 'BAL', 'Third Baseman', '71', '200', '35.07']
|
quotechar argument #
addresses.csv
1 2 3 4 | Name, Age, Address
Jerry, 44, '2776 McDowell Street, Nashville, Tennessee'
Tom, 21, '3171 Jessie Street, Westerville, Ohio'
Mike, 32, '1818 Sherman Street, Hope, Kansas'
|
There are two things to notice in this file. First, the address field is wrapped using single quote ('
) instead of "
double quote (which is the default). Second, there are spaces following the comma (,
).
If you try to read this file without changing the quote character, you will get the output as follows:
1 2 3 4 5 6 7 | import csv
with open('addresses.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 | ['Name', 'Age', 'Address']
['Jerry', '44', "'2776 McDowell Street", 'Nashville', "Tennessee'"]
['Tom', '21', "'3171 Jessie Street", 'Westerville', "Ohio'"]
['Mike', '32', "'1818 Sherman Street", 'Hope', "Kansas'"]
|
Notice that the address is split into three fields, which is certainly not correct. To fix the issue simply change quote character to a single quote ('
) using the quotechar
argument:
1 2 3 4 5 6 7 | import csv
with open('housing.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True, quotechar="'")
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 | ['Name', 'Age', 'Address']
['Jerry', '44', '2776 McDowell Street, Nashville, Tennessee']
['Tom', '21', '3171 Jessie Street, Westerville, Ohio']
['Mike', '32', '1818 Sherman Street, Hope, Kansas']
|
escapechar argument #
comments.csv
1 2 3 4 5 | Id, User, Comment
1, Bob, "John said \"Hello World\""
2, Tom, "\"The Magician\""
3, Harry, "\"walk around the corner\" she explained to the child"
4, Louis, "He said, \"stop pulling the dog's tail\""
|
This file uses backslash (\
) character to escape the embedded double quotes. However, by default the default csv
module uses a double quote character to escape double quote character.
If try to read this file using default options you will get the output like this:
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 5 | ['Id', 'User', 'Comment']
['1', 'Bob', 'John said \\Hello World\\""']
['2', 'Tom', '\\The Magician\\""']
['3', 'Harry', '\\walk around the corner\\" she explained to the child"']
['4', 'Louis', 'He said, \\stop pulling the dog\'s tail\\""']
|
This output is certainly not desirable. To get the correct output change the escape character using escapechar
argument, as follows:
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True, escapechar='\\')
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 5 | ['Id', 'User', 'Comment']
['1', 'Bob', 'John said "Hello World"']
['2', 'Tom', '"The Magician"']
['3', 'Harry', '"walk around the corner" she explained to the child']
['4', 'Louis', 'He said, "stop pulling the dog\'s tail"']
|
doublequote argument #
dialogues.csv
1 2 3 4 5 | Id, Actor, Dialogue
1, Harley Betts, "The suspect told the arresting officer, ""I was nowhere near the crime."""
2, Clyde Esparza, "John said, ""I have just finished reading Browning's 'My Last Duchess.'"""
3, Zack Campbell, "Bill asked Sandra, ""Will you marry me?"""
4, Keziah Chaney, "The librarian whispered to us, ""The sign on the wall says 'Quiet'"""
|
This file uses double quote to escape the embedded double quote characters in the field. By default, doublequote
is set to True
. As a result, while reading two consecutive double quotes are interpreted as one.
1 2 3 4 5 6 7 8 | import csv
with open('employees.csv', 'rt') as f:
# same as csv_reader = csv.reader(f, skipinitialspace=True)
csv_reader = csv.reader(f, skipinitialspace=True, doublequote=True)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 5 | ['Id', 'Actor', 'Dialogue']
['1', 'Harley Betts', 'The suspect told the arresting officer, "I was nowhere near the crime."']
['2', 'Clyde Esparza', 'John said, "I have just finished reading Browning\'s \'My Last Duchess.\'"']
['3', 'Zack Campbell', 'Bill asked Sandra, "Will you marry me?"']
['4', 'Keziah Chaney', 'The librarian whispered to us, "The sign on the wall says \'Quiet\'"']
|
If, however you set doublequote
to False
, the consecutive double quotes will appear in the output.
1 2 3 4 5 6 7 | import csv
with open('employees.csv', 'rt') as f:
csv_reader = csv.reader(f, skipinitialspace=True, doublequote=False)
for line in csv_reader:
print(line)
|
Expected Output:
1 2 3 4 5 | ['Id', 'Actor', 'Dialogue']
['1', 'Harley Betts', 'The suspect told the arresting officer, "I was nowhere near the crime."""']
['2', 'Clyde Esparza', 'John said, "I have just finished reading Browning\'s \'My Last Duchess.\'"""']
['3', 'Zack Campbell', 'Bill asked Sandra, "Will you marry me?"""']
['4', 'Keziah Chaney', 'The librarian whispered to us, "The sign on the wall says \'Quiet\'"""']
|
Writing CSV files with writer() #
To write data to a CSV file we use the writer()
function. It accepts the same argument as the reader()
function but returns a writer object (i.e _csv.writer
):
Syntax: writer(fileobj [, dialect='excel' [, **fmtparam] ]) -> csv_writer
Argument | Description |
---|---|
fileobj |
(required) It refers to the file object |
dialect |
(optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft Excel. We will discuss dialect in detail later in this post. |
fmtparam |
(optional) Formatting parameters, work same as the reader() 's function. |
The writer instance provides the following two methods to write data:
Method | Description |
---|---|
writerow(row) |
Writes a single row of data and returns the number of characters written. The row must be a sequence of strings and number. |
writerows(rows) |
Writes multiple rows of data and returns None . The rows must be a sequence. |
Here are examples:
Example 1: Using writerow()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(header) # write header
for row in rows:
csv_writer.writerow(row)
|
Example 2: Using writerows()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
The output generated by both listing will be the same and it looks like this:
customers.csv
1 2 3 4 5 6 | id,name,address,zip
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401
3,Sam,"3952 Little Street, Akron, Ohio",93704
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257
|
Notice that only the address field is wrapped around double quotes. This is because by default the quoting
argument is set to QUOTE_MINIMAL
. In other words, fields will be quoted only when quotechar or delimiter appears in the data.
Let's say you want double quotes around all textual data. To achieve this, set quoting
argument to QUOTE_NONNUMERIC
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
Expected Output:
customers.csv
1 2 3 4 5 6 | "id","name","address","zip"
1,"Hannah","4891 Blackwell Street, Anchorage, Alaska",99503
2,"Walton","4223 Half and Half Drive, Lemoore, California",97401
3,"Sam","3952 Little Street, Akron, Ohio",93704
4,"Chris","3192 Flinderation Road, Arlington Heights, Illinois",62677
5,"Doug","3236 Walkers Ridge Way, Burr Ridge",61257
|
Now all the names and addresses have double quotes around them.
If you want double quotes around all fields regardless of whether quotechar or delimiter appears in the data or not, set quoting
to csv.QUOTE_ALL
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f, quoting=csv.QUOTE_ALL)
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
Expected Output:
1 2 3 4 5 6 | "id","name","address","zip"
"1","Hannah","4891 Blackwell Street, Anchorage, Alaska","99503"
"2","Walton","4223 Half and Half Drive, Lemoore, California","97401"
"3","Sam","3952 Little Street, Akron, Ohio","93704"
"4","Chris","3192 Flinderation Road, Arlington Heights, Illinois","62677"
"5","Doug","3236 Walkers Ridge Way, Burr Ridge","61257"
|
Everything is double-quoted now.
It is important to note that when quoting is on (i.e quoting
parameter has a value other than csv.QUOTE_NONE
), the csv
module uses the quotechar
(which defaults to "
) to quote field.
The following listing changes the quote character from double quote ("
) to a single quote ('
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f, quotechar="'")
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
Expected Output:
1 2 3 4 5 6 | id,name,address,zip
1,Hannah,'4891 Blackwell Street, Anchorage, Alaska',99503
2,Walton,'4223 Half and Half Drive, Lemoore, California',97401
3,Sam,'3952 Little Street, Akron, Ohio',93704
4,Chris,'3192 Flinderation Road, Arlington Heights, Illinois',62677
5,Doug,'3236 Walkers Ridge Way, Burr Ridge',61257
|
In this case, the csv
module uses the single quote ('
) instead of ("
) to quote fields containing quotechar or delimiter.
We can also turn off quoting all-together by setting quoting
to csv.QUOTE_NONE
. However, if you do that and delimiter character appears in the data then you will get an error like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f, quoting=csv.QUOTE_NONE)
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
Expected Output:
1 2 3 4 | Traceback (most recent call last):
...
csv_writer.writerows(rows)
_csv.Error: need to escape, but no escapechar set
|
The problem is that the address field contains embedded commas (,
) and since we have turned off the ability to quote fields, the csv
module doesn't know how to escape them properly.
This where the escapechar
argument comes into play. It takes a one-character string that will be used to escape the delimiter when the the quoting is turned off (i.e quoting=csv.QUOTE_NONE
).
The following listing set the escapechar
to backslash (\
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
[1, 'Hannah', '4891 Blackwell Street, Anchorage, Alaska', 99503 ],
[2, 'Walton', '4223 Half and Half Drive, Lemoore, California', 97401 ],
[3, 'Sam', '3952 Little Street, Akron, Ohio', 93704],
[4, 'Chris', '3192 Flinderation Road, Arlington Heights, Illinois', 62677],
[5, 'Doug', '3236 Walkers Ridge Way, Burr Ridge', 61257],
]
with open('customers.csv', 'wt') as f:
csv_writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\')
csv_writer.writerow(header) # write header
csv_writer.writerows(rows)
|
Expected Output:
1 2 3 4 5 6 | id,name,address,zip
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
|
Notice that the commas (,
) in the address field is escaped using the backslash (\
) character.
You should now have a good understanding of various formatting arguments and the context in which they are used with the reader()
and writer()
function. In the next section will see some other ways to read and write data.
Reading a CSV file with DictReader #
DictReader
works almost exactly like reader()
but instead of retuning a line as a list, it returns a dictionary. Its syntax is as follows:
Syntax:: DictReader(fileobj, fieldnames=None, restkey=None, restval=None, dialect='excel', **fmtparam)
Argument | Description |
---|---|
fileobj |
(required) It refers to the file object. |
fieldnames |
(optional) It refers to the list of keys that will be used in the returned dictionary in order. If omitted, the field names are inferred from the first row of the CSV file. |
restkey |
(optional) If the row has more fields than specified in the fieldnames parameter, then the remaining fields is stored as a sequence keyed by the value of restkey argument. |
restval |
(optional) It provides value to fields which are missing from the input. |
dialect |
(optional) Dialect refers to the different ways of formatting the CSV document. By default, the csv module uses the same format as Microsoft excel. We will discuss dialect in detail later in this post. |
fmtparam |
It refers to formatting arguments and works exactly like reader() and writer() . |
Let's take some examples:
Example 1:
customers.csv
1 2 3 4 5 6 | id,name,address,zip
1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
|
1 2 3 4 5 6 7 | import csv
with open('customers.csv', 'rt') as f:
csv_reader = csv.DictReader(f, escapechar='\\')
for row in csv_reader:
print(row)
|
Expected Output:
1 2 3 4 5 6 | {'id': '1', 'name': 'Hannah', 'zip': '99503', 'address': '4891 Blackwell Street, Anchorage, Alaska'}
{'id': '2', 'name': 'Walton', 'zip': '97401', 'address': '4223 Half and Half Drive, Lemoore, California'}
{'id': '3', 'name': 'Sam', 'zip': '93704', 'address': '3952 Little Street, Akron, Ohio'}
{'id': '4', 'name': 'Chris', 'zip': '62677', 'address': '3192 Flinderation Road, Arlington Heights, Illinois'}
{'id': '5', 'name': 'Doug', 'zip': '61257', 'address': '3236 Walkers Ridge Way, Burr Ridge'}
**Note:** Order of keys in the result may vary. Since dictionary doesn't preserve the order of elements.
|
In this case, the field names are inferred from the first line (or header) of the CSV file.
Example 2: Using fieldnames
parameter
1 2 3 4 5 | 1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
|
This CSV file has no header. So we have to provide field names via the fieldnames
parameter.
1 2 3 4 5 6 7 8 9 | import csv
with open('customers.csv', 'rt') as f:
fields = ['id', 'name', 'address', 'zip']
csv_reader = csv.DictReader(f, fieldnames=fields, escapechar='\\')
for row in csv_reader:
print(row)
|
Expected Output:
1 2 3 4 5 | {'name': 'Hannah', 'zip': '99503', 'id': '1', 'address': '4891 Blackwell Street, Anchorage, Alaska'}
{'name': 'Walton', 'zip': '97401', 'id': '2', 'address': '4223 Half and Half Drive, Lemoore, California'}
{'name': 'Sam', 'zip': '93704', 'id': '3', 'address': '3952 Little Street, Akron, Ohio'}
{'name': 'Chris', 'zip': '62677', 'id': '4', 'address': '3192 Flinderation Road, Arlington Heights, Illinois'}
{'name': 'Doug', 'zip': '61257', 'id': '5', 'address': '3236 Walkers Ridge Way, Burr Ridge'}
|
Example 3: Using restkey
parameter
1 2 3 4 5 | 1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
|
1 2 3 4 5 6 7 8 9 10 | import csv
with open('customers.csv', 'rt') as f:
fields = ['id','name',]
csv_reader = csv.DictReader(f, fieldnames=fields, restkey='extra', escapechar='\\')
for row in csv_reader:
print(row)
|
Expected Output:
1 2 3 4 5 | {'id': '1', 'name': 'Hannah', 'extra': ['4891 Blackwell Street, Anchorage, Alaska', '99503']}
{'id': '2', 'name': 'Walton', 'extra': ['4223 Half and Half Drive, Lemoore, California', '97401']}
{'id': '3', 'name': 'Sam', 'extra': ['3952 Little Street, Akron, Ohio', '93704']}
{'id': '4', 'name': 'Chris', 'extra': ['3192 Flinderation Road, Arlington Heights, Illinois', '62677']}
{'id': '5', 'name': 'Doug', 'extra': ['3236 Walkers Ridge Way, Burr Ridge', '61257']}
|
Notice that the address and zip code are now stored as a sequence keyed by value extra
.
Example 4: Using restval
1 2 3 4 5 | 1,Hannah,4891 Blackwell Street\, Anchorage\, Alaska,99503
2,Walton,4223 Half and Half Drive\, Lemoore\, California,97401
3,Sam,3952 Little Street\, Akron\, Ohio,93704
4,Chris,3192 Flinderation Road\, Arlington Heights\, Illinois,62677
5,Doug,3236 Walkers Ridge Way\, Burr Ridge,61257
|
1 2 3 4 5 6 7 8 9 10 | import csv
with open('customers.csv', 'rt') as f:
fields = ['id','name', 'address', 'zip', 'phone', 'email'] # two extra fields
csv_reader = csv.DictReader(f, fieldnames=fields, restkey='extra', restval='NA', escapechar='\\')
for row in csv_reader:
print(row)
|
Expected Output:
1 2 3 4 5 | {'id': '1', 'name': 'Hannah', 'email': 'NA', 'phone': 'NA', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': '99503'}
{'id': '2', 'name': 'Walton', 'email': 'NA', 'phone': 'NA', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': '97401'}
{'id': '3', 'name': 'Sam', 'email': 'NA', 'phone': 'NA', 'address': '3952 Little Street, Akron, Ohio', 'zip': '93704'}
{'id': '4', 'name': 'Chris', 'email': 'NA', 'phone': 'NA', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': '62677'}
{'id': '5', 'name': 'Doug', 'email': 'NA', 'phone': 'NA', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': '61257'}
|
In this case, we have specified field two extra fields: phone
and email
. The values for extra fields is provided by the restval
argument.
Writing CSV files with DictWriter() #
The DictWriter
object writes a dictionary to a CSV file. Its syntax is as follows:
Syntax: DictWriter(fileobj, fieldnames, restval='', extrasaction='raise', dialect='excel', **fmtparam)
Argument | Description |
---|---|
fileobj |
It refers to the file object |
fieldnames |
It refers to the field names and the order in which they will be written the file. |
restval |
It provides the missing value for the keys which doesn't exist in the dictionary. |
extrasaction |
It controls what action to take if the dictionary contains a key, that is not found in the fieldnames argument. By default, extrasaction is set to raise , which means an exception will be raised in such an event. If you want to ignore the extra values set extrasaction to ignore . |
The DictWriter
provides the following three methods to write data.
Method | Description |
---|---|
writeheader() |
Writes the header (i.e fieldnames ) to the CSV file and returns None . |
writerow(row) |
Writes a single row of data and returns the number of characters written. The row must be a sequence of strings and number. |
writerows(rows) |
Writes multiple rows of data and returns None . The rows must be a sequence. |
Let's take some examples:
Example 1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import csv
header = ['id', 'name', 'address', 'zip']
rows = [
{'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
{'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
{'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
{'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
{'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]
with open('dictcustomers.csv', 'wt') as f:
csv_writer = csv.DictWriter(f, fieldnames=header)
csv_writer.writeheader() # write header
csv_writer.writerows(rows)
|
Expected Output:
dictcustomers.csv
1 2 3 4 5 6 | id,name,address,zip
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401
3,Sam,"3952 Little Street, Akron, Ohio",93704
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257
|
Example 2: Using restval
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import csv
header = ['id', 'name', 'address', 'zip', 'email'] # an extra field email
rows = [
{'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
{'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
{'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
{'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
{'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]
with open('dictcustomers.csv', 'wt') as f:
csv_writer = csv.DictWriter(f, fieldnames=header, restval="NA")
csv_writer.writeheader() # write header
csv_writer.writerows(rows)
|
Expected Output:
dictcustomers.csv
1 2 3 4 5 6 | id,name,address,zip,email
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska",99503,NA
2,Walton,"4223 Half and Half Drive, Lemoore, California",97401,NA
3,Sam,"3952 Little Street, Akron, Ohio",93704,NA
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois",62677,NA
5,Doug,"3236 Walkers Ridge Way, Burr Ridge",61257,NA
|
In this case, the value of email
field is missing from the dictionaries. As s result, the value of restval
will be used for the email
field.
Example 3: Using extrasaction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import csv
header = ['id', 'name', 'address'] # notice zip is missing
rows = [
{'id': 1, 'name': 'Hannah', 'address': '4891 Blackwell Street, Anchorage, Alaska', 'zip': 99503 },
{'id': 2, 'name': 'Walton', 'address': '4223 Half and Half Drive, Lemoore, California', 'zip': 97401 },
{'id': 3, 'name': 'Sam', 'address': '3952 Little Street, Akron, Ohio', 'zip': 93704 },
{'id': 4, 'name': 'Chris', 'address': '3192 Flinderation Road, Arlington Heights, Illinois', 'zip': 62677},
{'id': 5, 'name': 'Doug', 'address': '3236 Walkers Ridge Way, Burr Ridge', 'zip': 61257},
]
with open('dictcustomers.csv', 'wt') as f:
csv_writer = csv.DictWriter(
f,
fieldnames=header,
restval="NA",
extrasaction='ignore' # ignore extra values in the dictionary
)
csv_writer.writeheader() # write header
csv_writer.writerows(rows)
|
Here, the dictionary contains an extra key named zip
which is not present in the header
list. To prevent the exception from being raised we have set extrasaction
to ignore
.
Expected Output:
dictcustomers.csv
1 2 3 4 5 6 | id,name,address
1,Hannah,"4891 Blackwell Street, Anchorage, Alaska"
2,Walton,"4223 Half and Half Drive, Lemoore, California"
3,Sam,"3952 Little Street, Akron, Ohio"
4,Chris,"3192 Flinderation Road, Arlington Heights, Illinois"
5,Doug,"3236 Walkers Ridge Way, Burr Ridge"
|
Creating Dialect #
Earlier in this post, we have learned various formatting parameters that allow us to customize the reader and writer object to accommodate for differences in the CSV conventions.
If you find yourself passing the same set of formatting parameters over and over again. Consider creating your own Dialect.
A dialect object or (simply dialect) is a way to group various formatting parameters. Once you have created the dialect object, simply pass it to the reader or writer, rather than passing each formatting argument separately.
To create a new dialect, we use register_dialect()
function. It accepts dialect name as a string and one or more formatting parameters as keyword arguments.
The following table lists all the formatting arguments along with their default values:
Argument | Default | Description |
---|---|---|
delimiter |
, |
It refers to the character used to separate values (or fields) in the CSV file. |
skipinitialspace |
False |
It controls how the space following the delimiter will be interpreted. If True , the initial whitespaces will be removed. |
lineterminator |
\r\n |
It refers to the character sequence used to terminate the line. |
quotechar |
" |
It refers to the single character string that will be used to quote values if special characters (like delimiter) appears inside the field. |
quoting |
csv.QUOTE_NONE |
controls when quotes should be generated by the writer or recognized by the reader (see above for other options). |
escapechar |
None |
It refers to the one-character string used to escape the delimiter when quoting is set to. |
doublequote |
True |
controls the handling of quotes inside fields. When True , two consecutive quotes are interpreted as one during read, and when writing, each quote character embedded in the data is written as two quotes. |
Let's create a simple dialect.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import csv
# create and register new dialect
csv.register_dialect('psv', delimiter='|', quoting=csv.QUOTE_NONNUMERIC)
header = ['id', 'year', 'age', 'name', 'movie']
rows = [
{'id': 1, 'year': 2013, 'age': 55, 'name': "Daniel Day-Lewis", 'movie': "Lincoln" },
{'id': 2, 'year': 2014, 'age': 44, 'name': "Matthew McConaughey", 'movie': "Dallas Buyers Club" },
{'id': 3, 'year': 2015, 'age': 33, 'name': "Eddie Redmayne", 'movie': "The Theory of Everything" },
{'id': 4, 'year': 2016, 'age': 41, 'name': "Leonardo DiCaprio", 'movie': "The Revenant" }
]
with open('oscars.csv', 'wt') as f:
csv_writer = csv.DictWriter(
f,
fieldnames=header,
dialect='psv', # pass the new dialect
extrasaction='ignore'
)
csv_writer.writeheader() # write header
csv_writer.writerows(rows)
|
Expected Output:
oscars.csv
1 2 3 4 5 | "id"|"year"|"age"|"name"|"movie"
1|2013|55|"Daniel Day-Lewis"|"Lincoln"
2|2014|44|"Matthew McConaughey"|"Dallas Buyers Club"
3|2015|33|"Eddie Redmayne"|"The Theory of Everything"
4|2016|41|"Leonardo DiCaprio"|"The Revenant"
|
Other Tutorials (Sponsors)
This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!
View Comments