Python: How to read and write files

(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and watching videos by expert instructors. Start Now!

In this post, we will learn how to read and write files in Python.

Working with files consists of the following three steps:

  1. Open a file
  2. Perform read or write operation
  3. Close the file

Let’s see look at each step in detail.

Types of files

There are two types of files:

  1. Text files.
  2. Binary files.

A text file is simply a file which stores sequences of characters using an encoding like utf-8, latin1 etc., whereas in the case of binary file data is stored in the same format as in Computer memory.

Here are some examples of text and binary files:

Text files: Python source code, HTML file, text file, markdown file etc.
Binary files: executable files, images, audio etc.

It is important to note that inside the disk both types of files are stored as a sequence of 1s and 0s. The only difference is that when a text file is opened the data is decoded back using the same encoding scheme they were encoded in. However, in the case of binary files no such thing happens.

Opening the file – open() function

The open() built-in function is used to open the file. Its syntax is as follows:

On success open() returns a file object. On failure, it raises IOError or it’s subclass.

filenameAbsolute or relative path of the file to be opened.
mode(optional) mode is a string which refers to the processing mode (i.e read, write, append etc;) and file type.

The following are the possible values of mode.

rOpen the file for reading (default).
wOpen the file for writing.
aOpen the file in append mode i.e add new data to the end of the file.
r+Open the file for reading and writing both
xOpen the file for writing, only if it doesn’t already exist.

We can also append t or b to the mode string to indicate the type of the file we will be working with. The t is used for text file and b for binary files. If neither specified, t is assumed by default.

The mode is optional, if not specified then the file will be opened as a text file for reading only.

This means that the following three calls to open() are equivalent:

Note that before you can read a file, it must already exist, otherwise open() will raise FileNotFoundError exception. However, if you open a file for writing (using mode such as w, a, or r+), Python will automatically create the file for you. If the file already exists then its content will be deleted. If you want to prevent that open the file in x mode.

Closing the file – close() method

When you are done working with the file, you should close it. Although, the file is automatically closed when the program ends but it is still a good practice to do so explicitly. Failing to close the file in a large program could be problematic and may even cause the program to crash.

To close the file call the close() method of the file object. Closing the file frees up resources associated with it and flushes the data in the buffer to the disk.

File Pointer

When you open a file via the open() method. The operating system associates a pointer that points to a character in the file. The file pointer determines from where the read and write operation will take place. Initially, the file pointer points at the start of the file and advances as we read and write data to the file. Later in this post, we will see how to determine the current position of the file pointer and use it to randomly access parts of the file.

Reading files using read(), readline() and readlines()

To read data, the file object provides the following methods:

read([n])Reads and returns n bytes or less (if there aren’t enough characters to read) from the file as a string. If n not specified, it reads the entire file as a string and returns it.
readline()Reads and returns the characters until the end of the line is reached as a string.
readlines()Reads and returns all the lines as a list of strings.

When the end of the file (EOF) is reached the read() and readline() methods returns an empty string, while readlines() returns an empty list ( []).

Here are some examples:


Example 1: Using read()

Example 2: Using readline()

Example 3: Using readlines()

Reading File in Chunks

The read() (without argument) and readlines() methods reads the all data into memory at once. So don’t use them to read large files.

A better approach is to read the file in chunks using the read() or read the file line by line using the readline(), as follows:

Example: Reading file in chunks

Example: Reading file line by line

Instead of using read() (with argument) or readline() methods you can also use file object to iterate over the content of the file one line at a time.

This code is equivalent to the preceding example but it is more concise, readable and easier to type.

Note: Beware with the readline() method,  if you run into a misfortune of opening a huge file without any newline then readline() is no better than read() (without arguments). The same is true when you use the file object as an iterator.

Writing Data using write() and writelines()

For writing data the file object provides the following two methods:

write(s)Writes the string s to the file and returns the number characters written.
writelines(s)Writes all strings in the sequence s to the file.

Here are examples:

Notice that unlike the print() function the write() method doesn’t add a newline character ( \n) at the end of the line. If you want the newline character you have to add it manually, as follows:

You can also append the newline to the line using the print() function, as follows:

Here is an example of writelines() method.

The writelines() method internally calls the write() method.

Here is another example which opens the file in append mode.

Let’s assume file poem_2.txt is important to use and we don’t want it to be overwritten. To prevent that open the file in x mode

The x mode only opens the file for writing, if it doesn’t already exist.

Buffering and Flushing

Buffering is the process of storing data temporarily before it is moved to a new location.

In the case of files, the data is not immediately written to the disk instead it is stored in the buffer memory.

This rationale behind doing this is that the writing data to disk takes time as opposed to writing data to the physical memory. Imagine a program writing data every time the write() method is called. Such a program would be very slow.

When we use a buffer, the data is written to the disk only when the buffer becomes full or when the close() method is called. This process is called flushing the output. You can also flush the output manually using the flush() method of the file object. Note that the flush() only saves the buffered data to the disk. It doesn’t close the file.

The open() method provides an optional third argument to control the buffer. To learn more about it visit the official documentation.

Reading and Writing Binary data

Reading and writing binary file is done by appending b to the mode string.

In Python 3, the binary data is represented using a special type called bytes.

The bytes type represents an immutable sequence of numbers between 0 and 255.

Let’s create a binary version of the poem by reading poem.txt file.

Note that indexing a bytes object returns an int.

Let’s write our binary poem in a new file.

Our binary poem is now written to the file. To read it open the file in rb mode.

It is important to note that, in our case, binary data happens to contain printable characters, like alphabets, newline etc. However, this will not be the case most of the time. It means that with binary data we can’t reliably use readline() and file object (as an iterator) to read the contents of a file because might be no newline character in a file. The best way to read binary data is to read it in chunks using the read() method.

Random Access using fseek() and ftell()

Earlier in this post, we learned that when the file is opened, the system associates a pointer with it , which determines the position from where reading or writing will take place.

So far we have read and write files linearly. But it is also possible to read and write at specific locations. To achieve this the file object provides following two methods:

tell()Returns the current position of the file pointer.
seek(offset, [whence=0])Moves the file pointer to the given offset. The offset refers to the byte count and whence determines the position relative to which the offset will move the file pointer. The default value of whence is 0, which means that offset will move the file pointer from the beginning of the file. If whence is set to 1 or 2, the offset will move the file’s pointer from the current position or from the end of the file, respectively.

Let’s take some examples now.

After reading 5 characters, the file pointer is now at character a (in word caged). So the next read (or write) operation will start from this point.

We have now reached the end of the file. At this point, we can use fseek() method to rewind the file pointer to the beginning of the file, as follows:

The file pointer is now at the beginning of the file. All the read and write operations from now on will take place from the beginning of the file again.

To move the file pointer from 12 bytes forward from current position call seek() as follows:

The file pointer is now at character a (after the word with), so the read and write operation will take place from there.

We can also move the file pointer backward. For example, the following call to seek() moves the file pointer 13 bytes backward from the current position.

Let’s say we want to read the last 16 bytes of the file. To do so, move the file pointer 16 bytes backward relative to the end of the file.

The values of the whence argument of fseek() are also defined as constants in the os module.


with statement

The with statement allows us to automatically close the file once we are done working with it. Its syntax is as follows:

The statements inside the with statement must be indented equally just like the for loop, otherwise SyntaxError exception will be raised.

Here is an example:

Other Tutorials (Sponsors)

This site generously supported by DataCamp. DataCamp offers online interactive Python Tutorials for Data Science. Join over a million other learners and get started learning Python for data science today!

Leave a Reply

Your email address will not be published. Required fields are marked *