Javascript required
Skip to content Skip to sidebar Skip to footer

Open, Read, Compare the Sentences in Python

Photo by Kollinger on Pixabay

The Best Practice of Reading Text Files In Python

Combine multiple files into a unmarried stream with richer metadata

Christopher Tao

Reading text files in Python is relatively easy to compare with most of the other programming languages. Usually, we just use the "open()" function with reading or writing mode and then start to loop the text files line by line.

This is already the all-time practice and it cannot be any easie r means. Nonetheless, when we want to read content from multiple files, there is definitely a better way. That is, using the "File Input" module that is built-in to Python. Information technology combines the content from multiple files that allow us to process everything in a single for-loop and plenty of other benefits.

In this article, I'll demonstrate this module with examples.

0. Without the FileInput Module

Photo by DGlodowska on Pixabay

Let's have a expect at the "ordinary" way of reading multiple text files using the open() function. But before that, we demand to create two sample files for demonstration purpose.

          with open('my_file1.txt', mode='due west') every bit f:
f.write('This is line ane-i\northward')
f.write('This is line 1-2\northward')
with open('my_file2.txt', mode='w') equally f:
f.write('This is line two-1\northward')
f.write('This is line 2-ii\due north')

In the in a higher place code, we open a file with the mode w which means "write". Then, nosotros write ii lines in the file. Please be noticed that we need to add the new line \n. Otherwise, the 2 sentences volition be written in a unmarried line.

Later on that, we should have 2 text files in the current working directory.

At present, let'due south say nosotros want to read from both the text files and print the content line past line. Of course, we can even so exercise that use the open() function.

          # Iterate through all file
for file in ['my_file1.txt', 'my_file2.txt']:
with open(file, 'r') as f:
for line in f:
print(line)

Here nosotros have to apply two nested for-loops. The outer loop is for the files, while the inner ane is for the lines inside each file.

one. Using the FileInput Module

Photo by Complimentary-Photos on Pixabay

Well, zilch prevents us from using the open() role. Withal, the fileinput module just provides u.s. with a neater fashion of reading multiple text files into a single stream.

First of all, we need to import the module. This is a Python built-in module and then that we don't need to download anything.

          import fileinput as fi        

And then, we can use information technology for reading from the two files.

          with fi.input(files=['my_file1.txt', 'my_file2.txt']) as f:
for line in f:
print(line)

Because the fileinput module is designed for reading from multiple files, nosotros don't need to loop the file names anymore. Instead, the input() function takes an iterable collection type such as a list equally a parameter. As well, the cracking thing is that all the lines from both files are accessible in a single for-loop.

2. Apply the FileInput Module with Glob

Photograph by jarmoluk on Pixabay

Sometimes, it may non be applied to have such a file proper name listing with all the names that are manually typed. Information technology is quite common to read all the files from a directory. Too, we might exist only interested in certain types of files.

In this case, we tin can use the glob module which is some other Python born module together with the fileinput module.

Nosotros tin can do a simple experiment before that. The os module can help us to listing all the files in the current working directory.

It can exist seen that in that location are many files other than the two text files. Therefore, we want to filter the file names because nosotros want to read the text files merely. We tin can utilise the glob module equally follows.

          from glob import glob          glob('*.txt')        

Now, we can put the glob() function into the fileinput.input() function as the parameter. So, only these two text files will exist read.

          with fi.input(files=glob('*.txt')) as f:
for line in f:
print(line)

3. Get the Metadata of Files

Photo by StockSnap on Pixabay

You may ask how can we know which file exactly the "line" is from when nosotros are reading from the stream that is actually combined with multiple files?

Indeed, using the open() function with nested loop seems to be very piece of cake to get such data because nosotros can admission the current file proper noun from the outer loop. However, this is in fact much easier in the fileinput module.

          with fi.input(files=glob('*.txt')) as f:
for line in f:
print(f'File Name: {f.filename()} | Line No: {f.lineno()} | {line}')

See, in the above code, nosotros apply the filename() to admission the electric current file that the line comes from and the lineno() to access the current index of the line we are getting.

4. When the Cursor Reaches a New File

Photo past DariuszSankowski on Pixabay

Apart from that, in that location are more than functions from the fileinput module that we can make apply of. For example, what if we desire to do something when we reach a new file?

The function isfirstline() helps us to decide whether we're reading the outset line from a new file.

          with fi.input(files=glob('*.txt')) every bit f:
for line in f:
if f.isfirstline():
print(f'> First to read {f.filename()}...')
print(line)

This could be very useful for logging purpose. And so, we tin can be indicated with the electric current progress.

v. Jump to the Side by side File

Photo by Free-Photos on Pixabay

Nosotros can also hands stop reading the current file and jump to the next 1. The function nextfile() allows us to do and then.

Before we can demo this feature, please allow me re-write the two sample files.

          with open up('my_file1.txt', mode='w') as f:
f.write('This is line 1-1\north')
f.write('stop reading\n')
f.write('This is line 1-2\n')
with open up('my_file2.txt', manner='w') as f:
f.write('This is line 2-1\n')
f.write('This is line ii-2\north')

The only difference from the original files is that I added a line of text cease reading in the starting time text file. Let's say that we want the fileinput module to stop reading the outset file and jump to the second when it sees such content.

          with fi.input(files=glob('*.txt')) every bit f:
for line in f:
if f.isfirstline():
print(f'> Starting time to read {f.filename()}...')
if line == 'stop reading\northward':
f.nextfile()
else:
print(line)

In the above lawmaking, another if-status is added. When the line text is cease reading information technology will leap to the side by side file. Therefore, we tin see that the line "1–2" was not read and output.

6. Read Shrink File Without Extracting

Photo past kaboompics on Pixabay

Sometimes nosotros may have compressed files to read. Ordinarily, we volition have to uncompress them before nosotros tin read the content. However, with the fileinput module, we may non have to extract the content from the compressed files before we tin can read information technology.

Let's make upwards a compressed text file using Gzip. This file will be used for demonstration purposes later.

          import gzip
import shutil
with open('my_file1.txt', 'rb') equally f_in:
with gzip.open('my_file.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)

In the above code, we added the file my_file1.txt into a compressed file using gzip. Now, let's meet how fileinput can read it without extra steps for uncompressing.

          with fi.input(files='my_file.gz', openhook=fi.hook_compressed) every bit f:
for line in f:
impress(line)

By using the parameter openhook and the flag fi.hook_compressed, the gzip file volition be uncompressed on the fly.

The fileinput module currently supports gzip and bzip2. Unfortunately not the other format.

Summary

Photo by Free-Photos on Pixabay

In this article, I have introduced the Python congenital-in module fileinput and how to use information technology to read multiple text files. Of grade, it will never replace the open up() function, but in terms of reading multiple files into a single stream, I believe it is the best exercise.

If y'all experience my articles are helpful, please consider joining Medium Membership to back up me and thousands of other writers! (Click the link above)

Open, Read, Compare the Sentences in Python

Source: https://towardsdatascience.com/the-best-practice-of-reading-text-files-in-python-509b1d4f5a4