Python: How to read and write files

Updated on Jan 07, 2020

In this post, we will learn how to read and write files in Python.

Working with files consists of the following three steps:

Open a file
Perform read or write operation
Close the file

Let's see look at each step in detail.

Types of files #

There are two types of files:

Text files
Binary files

A text file is simply a file which stores sequences of characters using an encoding like utf-8, latin1 etc., whereas in the case of binary file data is stored in the same format as in Computer memory.

Here are some examples of text and binary files:

Text files: Python source code, HTML file, text file, markdown file etc.

Binary files: executable files, images, audio etc.

It is important to note that inside the disk both types of files are stored as a sequence of 1s and 0s. The only difference is that when a text file is opened the data is decoded back using the same encoding scheme they were encoded in. However, in the case of binary files no such thing happens.

Opening the file - open() function #

The open() built-in function is used to open the file. Its syntax is as follows:

open(filename, mode) -> file object

On success, open() returns a file object. On failure, it raises IOError or it's subclass.

Argument	Description
`filename`	Absolute or relative path of the file to be opened.
`mode`	(optional) mode is a string which refers to the processing mode (i.e read, write, append etc;) and file type.

The following are the possible values of mode.

Mode	Description
`r`	Open the file for reading (default).
`w`	Open the file for writing.
`a`	Open the file in append mode i.e add new data to the end of the file.
`r+`	Open the file for reading and writing both
`x`	Open the file for writing, only if it doesn't already exist.

We can also append t or b to the mode string to indicate the type of the file we will be working with. The t is used for text file and b for binary files. If neither specified, t is assumed by default.

The mode is optional, if not specified then the file will be opened as a text file for reading only.

This means that the following three calls to open() are equivalent:

# open file todo.md for reading in text mode

open('todo.md') 

open('todo.md', 'r')

open('todo.md', 'rt')

Note that before you can read a file, it must already exist, otherwise open() will raise FileNotFoundError exception. However, if you open a file for writing (using mode such as w, a, or r+), Python will automatically create the file for you. If the file already exists then its content will be deleted. If you want to prevent that open the file in x mode.

Closing the file - close() method #

When you are done working with the file, you should close it. Although, the file is automatically closed when the program ends but it is still a good practice to do so explicitly. Failing to close the file in a large program could be problematic and may even cause the program to crash.

To close the file call the close() method of the file object. Closing the file frees up resources associated with it and flushes the data in the buffer to the disk.

File Pointer #

When you open a file via the open() method. The operating system associates a pointer that points to a character in the file. The file pointer determines from where the read and write operation will take place. Initially, the file pointer points at the start of the file and advances as we read and write data to the file. Later in this post, we will see how to determine the current position of the file pointer and use it to randomly access parts of the file.

Reading files using read(), readline() and readlines() #

To read data, the file object provides the following methods:

Method	Argument
`read([n])`	Reads and returns `n` bytes or less (if there aren't enough characters to read) from the file as a string. If `n` not specified, it reads the entire file as a string and returns it.
`readline()`	Reads and returns the characters until the end of the line is reached as a string.
`readlines()`	Reads and returns all the lines as a list of strings.

When the end of the file (EOF) is reached the read() and readline() methods returns an empty string, while readlines() returns an empty list ([]).

Here are some examples:

poem.txt

The caged bird sings
with a fearful trill
of things unknown
but longed for still

Example 1: Using read()

>>>
>>> f = open("poem.txt", "r")
>>>
>>> f.read(3) # read the first 3 characters
'The'
>>>
>>> f.read() # read the remaining characters in the file.
' caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still\n'
>>>
>>> f.read() # End of the file (EOF) is reached
''
>>>
>>> f.close()
>>>

Example 2: Using readline()

>>>
>>> f = open("poem.txt", "r")
>>>
>>> f.read(4) # read first 4 characters
'The '
>>>
>>> f.readline() # read until the end of the line is reached
'caged bird sings\n'
>>>
>>> f.readline() # read the second line
'with a fearful trill\n'
>>>
>>> f.readline() # read the third line
'of things unknown\n'
>>>
>>> f.readline() # read the fourth line
'but longed for still'
>>>
>>> f.readline() # EOF reached
''
>>>
>>> f.close()
>>>

Example 3: Using readlines()

>>>
>>> f = open("poem.txt", "r")
>>>
>>> f.readlines()
['The caged bird sings\n', 'with a fearful trill\n', 'of things unknown\n', 'but longed for still\n']
>>>
>>> f.readlines() # EOF reached
[]
>>>
>>> f.close()
>>>

Reading File in Chunks #

The read() (without argument) and readlines() methods reads the all data into memory at once. So don't use them to read large files.

A better approach is to read the file in chunks using the read() or read the file line by line using the readline(), as follows:

Example: Reading file in chunks

>>>
>>> f = open("poem.txt", "r")
>>>
>>> chunk = 200
>>>
>>> while True:
...     data = f.read(chunk)
...     if not data:
...         break
...     print(data)
...
The caged bird sings
with a fearful trill
of things unknown
but longed for still
>>>

Example: Reading file line by line

>>>
>>> f = open("poem.txt", "r")
>>>
>>> while True:
...     line = f.readline()
...     if not line:
...         break
...     print(line)
...
The caged bird sings
with a fearful trill
of things unknown
but longed for still
>>>

Instead of using read() (with argument) or readline() methods you can also use file object to iterate over the content of the file one line at a time.

>>>
>>> f = open("poem.txt", "r")
>>>
>>> for line in f:
...     print(line, end="")
...
The caged bird sings
with a fearful trill
of things unknown
but longed for still
>>>

This code is equivalent to the preceding example but it is more concise, readable and easier to type.

warning:

Beware with the readline() method, if you run into a misfortune of opening a huge file without any newline then readline() is no better than read() (without arguments). The same is true when you use the file object as an iterator.

Writing Data using write() and writelines() #

For writing data the file object provides the following two methods:

Method	Description
`write(s)`	Writes the string `s` to the file and returns the number characters written.
`writelines(s)`	Writes all strings in the sequence `s` to the file.

Here are examples:

>>>
>>> f = open("poem_2.txt", "w")
>>>
>>> f.write("When I think about myself, ")
26
>>> f.write("I almost laugh myself to death.")
31
>>> f.close() # close the file and flush the data in the buffer to the disk
>>>
>>>
>>> f = open("poem_2.txt", "r") # open the file for reading
>>>
>>> data = f.read() # read entire file
>>>
>>> data
'When I think about myself, I almost laugh myself to death.'
>>>
>>> print(data)
When I think about myself, I almost laugh myself to death.
>>>
>>> f.close()
>>>

Notice that unlike the print() function the write() method doesn't add a newline character (\n) at the end of the line. If you want the newline character you have to add it manually, as follows:

>>>
>>>
>>> f = open("poem_2.txt", "w")
>>>
>>> f.write("When I think about myself, \n") # notice newline
27
>>> f.write("I almost laugh myself to death.\n") # notice newline
32
>>>
>>> f.close()
>>>
>>>
>>> f = open("poem_2.txt", "r") # open the file again
>>>
>>> data = f.read() # read the entire file
>>>
>>> data
'When I think about myself, \nI almost laugh myself to death.\n'
>>>
>>> print(data)
When I think about myself,
I almost laugh myself to death.

>>>
>>>

You can also append the newline to the line using the print() function, as follows:

>>>
>>> f = open("poem_2.txt", "w")
>>>
>>> print("When I think about myself, ", file=f)
>>>
>>> print("I almost laugh myself to death.", file=f)
>>>
>>> f.close()
>>>
>>>
>>> f = open("poem_2.txt", "r") # open the file again
>>>
>>> data = f.read()
>>>
>>> data
'When I think about myself, \nI almost laugh myself to death.\n'
>>>
>>> print(data)
When I think about myself,
I almost laugh myself to death.

>>>
>>>

Here is an example of writelines() method.

>>>
>>> lines = [
... "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod",
... "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,"
... "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
... "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse",
... "cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non",
... "proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
... ]
>>>
>>>
>>> f = open("lorem.txt", "w")
>>>
>>> f.writelines(lines)
>>>
>>> f.close()
>>>

The writelines() method internally calls the write() method.

def writelines(self, lines):
    self._checkClosed()
    for line in lines:
       self.write(line)

Here is another example which opens the file in append mode.

>>>
>>> f = open("poem_2.txt", "a")
>>>
>>> f.write("\nAlone, all alone. Nobody, but nobody. Can make it out here alone.")
65
>>> f.close()
>>>
>>> data = open("poem_2.txt").read()
>>> data
'When I think about myself, \nI almost laugh myself to death.\n\nAlone, all alone. Nobody, but nobody. Can make it out here alone.'
>>>
>>> print(data)
When I think about myself,
I almost laugh myself to death.

Alone, all alone. Nobody, but nobody. Can make it out here alone.
>>>

Let's assume file poem_2.txt is important to use and we don't want it to be overwritten. To prevent that open the file in x mode

>>>
>>> f = open("poem_2.txt", "x")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'poem.txt'
>>>

The x mode only opens the file for writing, if it doesn't already exist.

Buffering and Flushing #

Buffering is the process of storing data temporarily before it is moved to a new location.

In the case of files, the data is not immediately written to the disk instead it is stored in the buffer memory.

This rationale behind doing this is that the writing data to disk takes time as opposed to writing data to the physical memory. Imagine a program writing data every time the write() method is called. Such a program would be very slow.

When we use a buffer, the data is written to the disk only when the buffer becomes full or when the close() method is called. This process is called flushing the output. You can also flush the output manually using the flush() method of the file object. Note that the flush() only saves the buffered data to the disk. It doesn't close the file.

The open() method provides an optional third argument to control the buffer. To learn more about it visit the official documentation.

Reading and Writing Binary data #

Reading and writing binary file is done by appending b to the mode string.

In Python 3, the binary data is represented using a special type called bytes.

The bytes type represents an immutable sequence of numbers between 0 and 255.

Let's create a binary version of the poem by reading poem.txt file.

>>>
>>> binary_poem = bytes(open("poem.txt").read(), encoding="utf-8")
>>>
>>> binary_poem
b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still'
>>>
>>>
>>> binary_poem[0] # ASCII value of character T
84
>>> binary_poem[1] # ASCII value of character h
104
>>>

Note that indexing a bytes object returns an int.

Let's write our binary poem in a new file.

>>>
>>> f = open("binary_poem", "wb")
>>>
>>> f.write(binary_poem)
80
>>>
>>> f.close()
>>>

Our binary poem is now written to the file. To read it open the file in rb mode.

>>>
>>> f = open("binary_poem", "rb")
>>> 
>>> data = f.read()
>>> 
>>> data
b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still'
>>> 
>>> print(data)
b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still'
>>> 
>>> f.close()
>>>

It is important to note that, in our case, binary data happens to contain printable characters, like alphabets, newline etc. However, this will not be the case most of the time. It means that with binary data we can't reliably use readline() and file object (as an iterator) to read the contents of a file because might be no newline character in a file. The best way to read binary data is to read it in chunks using the read() method.

>>>
>>> # Just as with text files, you can read (or write) binary files in chunks.
>>>
>>> f = open("binary_poem", "rb")
>>>
>>> chunk = 200
>>>
>>> while True:
...     data = f.read(chunk)
...     if not data:
...         break
...     print(data)
...
b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still'
>>> 
>>>

Random Access using fseek() and ftell() #

Earlier in this post, we learned that when the file is opened, the system associates a pointer with it , which determines the position from where reading or writing will take place.

So far we have read and write files linearly. But it is also possible to read and write at specific locations. To achieve this the file object provides following two methods:

Method Description

tell() Returns the current position of the file pointer.

seek(offset, [whence=0]) Moves the file pointer to the given offset. The offset refers to the byte count and whence determines the position relative to which the offset will move the file pointer. The default value of whence is 0, which means that offset will move the file pointer from the beginning of the file. If whence is set to 1 or 2, the offset will move the file's pointer from the current position or from the end of the file, respectively.

Method	Description
`tell()`	Returns the current position of the file pointer.
`seek(offset, [whence=0])`	Moves the file pointer to the given `offset`. The `offset` refers to the byte count and `whence` determines the position relative to which the `offset` will move the file pointer. The default value of `whence` is 0, which means that offset will move the file pointer from the beginning of the file. If whence is set to `1` or `2`, the offset will move the file's pointer from the current position or from the end of the file, respectively.

Let's take some examples now.

>>>
>>> ###### binary poem at a glance #######
>>> 
>>> for i in open("binary_poem", "rb"):
...     print(i)
... 
b'The caged bird sings\n'
b'with a fearful trill\n'
b'of things unknown\n'
b'but longed for still'
>>> 
>>> f.close()
>>> 
>>> #####################################
>>>
>>> f = open('binary_poem', 'rb') # open binary_poem file for reading
>>>
>>> f.tell() # initial position of the file pointer
0
>>>
>>> f.read(5) # read 5 bytes
b'The c'
>>>
>>> f.tell()
5
>>>

After reading 5 characters, the file pointer is now at character a (in word caged). So the next read (or write) operation will start from this point.

>>>
>>>
>>> f.read()
b'aged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still'
>>>
>>> f.tell()
80
>>>
>>> f.read() # EOF reached
b''
>>>
>>> f.tell()
80
>>>

We have now reached the end of the file. At this point, we can use fseek() method to rewind the file pointer to the beginning of the file, as follows:

>>>
>>> f.seek(0) # rewind the file pointer to the beginning, same as seek(0, 0)
0
>>>
>>> f.tell()
0
>>>

The file pointer is now at the beginning of the file. All the read and write operations from now on will take place from the beginning of the file again.

>>>
>>> f.read(14) # read the first 14 characters
b'The caged bird'
>>>
>>>
>>> f.tell()
14
>>>

To move the file pointer from 12 bytes forward from current position call seek() as follows:

>>>
>>> f.seek(12, 1)
26
>>>
>>> f.tell()
26
>>> 
>>>

The file pointer is now at character a (after the word with), so the read and write operation will take place from there.

>>>
>>> 
>>> f.read(15)
b'a fearful trill'
>>>
>>>

We can also move the file pointer backward. For example, the following call to seek() moves the file pointer 13 bytes backward from the current position.

>>>
>>> f.seek(-13, 1)
28
>>>
>>> f.tell()
28
>>> 
>>> f.read(7)
b'fearful'
>>>

Let's say we want to read the last 16 bytes of the file. To do so, move the file pointer 16 bytes backward relative to the end of the file.

>>>
>>> f.seek(-16, 2)
64
>>>
>>> f.read()
b'longed for still'
>>>

The values of the whence argument of fseek() are also defined as constants in the os module.

Value	Constant
`0`	`SEEK_SET`
`1`	`SEEK_CUR`
`2`	`SEEK_END`

with statement #

The with statement allows us to automatically close the file once we are done working with it. Its syntax is as follows:

with expression as variable:
    # do operations on file here.

The statements inside the with statement must be indented equally just like the for loop, otherwise SyntaxError exception will be raised.

Here is an example:

>>> 
>>> with open('poem.txt') as f:
...     print(f.read()) # read the entire file
... 
The caged bird sings
with a fearful trill
of things unknown
but longed for still
>>>

Python: How to read and write files

Types of files #

Opening the file - open() function #

Closing the file - close() method #

File Pointer #

Reading files using read(), readline() and readlines() #

Reading File in Chunks #

Writing Data using write() and writelines() #

Buffering and Flushing #

Reading and Writing Binary data #

Random Access using fseek() and ftell() #

with statement #

Recent Posts