How to Read a File Line by Line and Sort Each Line in Python
Chapter four. Information in Files and Arrays: Sort it out
As your programs develop, so practice your data handling needs.
And when you lot take lots of data to work with, using an individual variable for each piece of data gets really old, actually quickly. So programmers employ some rather awesome containers (known every bit data structures ) to help them work with lots of data. More times than not, all that data comes from a file stored on a hard disk. So, how tin can you piece of work with information in your files? Turns out information technology's a cakewalk. Flip the page and permit'southward larn how!
Surf's up in Codeville
The annual Codeville Surf-A-Thon is more pop than always this year.
Because in that location are and so many contestants, the organizers asked you to write a Python program to procedure the scores. Eager to please, you agreed.
The problem is, even though the contest is over and the embankment is now clear, you can't hit the waves until the program is written. Your plan has to work out the highest surfing scores. Despite your urge to surf, a promise is a hope, and so writing the program has to come up first.
Find the highest score in the results file
Later on the judges rate the competitors, the scores are stored in a file called results.txt
. There is i line in the file for each competitor's score. You need to write a programme that reads through each of these lines, picks out the score, and and so works out the highest score in the Surf-A-Thon.
It sounds simple enough, except for one minor detail. You've written programs to read data from the Spider web, and read information that's been typed in at the keyboard, but you lot haven't yet written any lawmaking that reads data stored in a file.
Iterate through the file with the open, for, shut pattern
If y'all need to read from a file using Python, i mode is to use the built-in open()
command. Open a file called results.txt
like this:
The call to open up()
creates a file handle , which is a shorthand that you'll utilise to refer to the file you are working with inside your code.
Because y'all'll need to read the file 1 line at a time , Python gives yous the for
loop for only this purpose. Like while
loops, the for
loop runs repeatedly, running the loop code once for each of the items in something. Remember of a for
loop as your very own custom-fabricated information shredder:
Each fourth dimension the body of the for
loop runs, a variable is set to a string containing the current line of text in the file. This is referred to as iterating through the data in the file:
Load this!
To successfully run this program, yous need to grab a copy of the results.txt
data file from the Head Offset Programming website. Exist sure to put the data file in the aforementioned directory (or folder) that contains your code.
The file contains more than numbers...
To encounter what happened, allow's take another look at the judge's score canvass to see if you missed anything:
The judges as well recorded the name of each surf contestant next to his or her score. This is a problem for the program merely if the name was added to the results.txt
file. Let'due south take a await:
Certain enough, the results.txt
file also contains the contestant names. And that's a problem for our code because, equally it iterates through the file, the string y'all read is no longer merely a number .
Carve up each line as you read it
Each line in the for
loop represents a single string containing two pieces of data:
You need to somehow extract the score from the string. In each line, in that location is a name, followed by a space, followed past the score. You already know how to extract one string from some other; you did information technology for Starbuzz back in Chapter 2. And yous could practice something similar here using the discover()
method and index manipulation, searching for the position of a space (' ') character in each line and then extracting the substring that follows it.
Programmers ofttimes have to deal with information in strings that comprise several pieces of data separated past spaces. It'southward and so mutual, in fact, that Python provides a special string method to perform the cutting you demand: split()
.
Python strings accept a congenital-in split()
method.
Annotation
And you lot'll find that other programming languages accept very similar mechanisms for breaking upwards strings.
The dissever() method cuts the cord
Imagine you have a string containing several words assigned to a variable. Call up of a variable as if it'south a labeled jar :
The rock_band
string, similar all Python strings, has a split()
method that returns a collection of substrings: one for each word in the original string.
Using a programming characteristic called multiple assignment , y'all tin take the result from the cut performed past split()
and assign it to a collection of variables:
Each of the return values from the split()
on rock_band
is assigned to its own separately named variable, which allows you and then to work with each discussion in whatsoever fashion yous want. Note that the rock_band
variable still exists and that it still contains the original string of four names.
Looks similar yous can use multiple assignment and split()
to extract the scores from the results.txt
file.
But y'all need more than than one top score
As shortly every bit the pinnacle score appears, people start to wonder what the second and third highest scores are:
It seems that the organizers didn't tell yous everything you needed to know. The contest doesn't only accolade a prize for the winner, only also honors those surfers in second and tertiary place.
Our plan currently iterates through each of the lines in the results.txt
file and works out the highest score. Simply what it actually needs to exercise is keep track of the top three scores , perhaps in 3 separate variables:
Keeping rails of iii scores makes the code more complex
Then how volition you keep track of the actress scores? You could do something like this:
Yous tin see that at that place'southward a lot more than logic here, because the programme needs to "recall" a bit more. Unfortunately, turning this logic into lawmaking will make the program longer and harder to modify in the futurity. And, let's be honest, it's somewhat more difficult to understand what's really going on with the logic as shown here.
How could you brand this simpler?
An ordered list makes lawmaking much simpler
If you had some way of reading the data from the file and so producing an ordered copy of the data, the program would be a lot simpler to write. Ordering information within a program is known as "sorting:"
But how practise you order, or sort , your data? What happens to the original information in the file? Does it remain unsorted or is information technology sorted, too? Can the data even be sorted on disk and, if then, does this make things easier, faster, or slower?
Sorting sounds tricky... is there a "best" way?
Sorting is easier in memory
If you are writing a program that is going to deal with a lot of data, yous need to decide where you need to go along that data while the plan works with it. Near of the time, you will have two choices:
-
Proceed the data in files on the disk.
If you accept a very large amount of data, the obvious identify to put it is on disk. Computers can store far more information on disk than they tin can in memory. Disk storage is persistent : if you yank the power string, the figurer doesn't forget the data written on the disk. Only there is one real problem with manipulating data on disk: it can be very slow .
-
Keep the data in memory.
Data is much quicker to access and change if information technology's stored in the computer's memory. Only, it'due south non persistent: data in retention disappears when your program exits, or when the computer is switched off (unless you remember to salvage it to a file, in which example it becomes persistent).
Keep the information in memory
If you lot want to sort a lot of information, you lot will need to shuffle data around quite a lot. This is much faster in memory than on disk.
Of class, before you sort the data, y'all need to read it into memory, perhaps into a large number of individual variables:
Brain Power
Y'all are going to have a problem if yous attempt to move all those lines of information into the computer'due south retention. What type of problem exercise you think you'll accept?
You tin can't use a split variable for each line of data
Programming languages use variables to requite you lot access to information in memory. So if you are going to shop the data from the results.txt
file in memory, it makes sense that you'll demand to use lots of variables to access all the data, correct?
But how many variables practice you lot need?
Imagine the file only had three scores in information technology. Y'all could write a program that read each of the lines from the file and stored them in variables called first_score
, second_score
, and third_score
:
But what if at that place were four scores in the file? Or five? Even worse, what if at that place were 10,000 scores? Y'all'd shortly run out of variable names and (perchance) retentiveness in your computer, not to mention the wear and tear on your fingers.
Sometimes, yous need to bargain with a whole bundle of information, all at once. To exercise that, most languages give you lot the array .
An array lets you manage a whole train of information
And then far, you've used variables to store only a unmarried piece of data . But sometimes, you lot desire to refer to a whole bunch of data all at once. For that, y'all need a new type of variable: the assortment .
An array is a "collection variable" or data structure . It'south designed to group a whole agglomeration of data items together in ane place and give them a name.
Recall of an array equally a data train. Each motorcar in the train is chosen an array chemical element and can store a single slice of data. If you desire to shop a number in ane element and a string in another, you can.
You might think that as y'all are storing all of that data in an array, you even so might need variables for each of the items information technology stores. But this is not the case. An array is itself just another variable , and you can requite it its own variable name:
Even though an array contains a whole bunch of information items, the array itself is a unmarried variable , which only and then happens to incorporate a collection of data. Once your data is in an assortment, y'all can treat the array just like any other variable.
And then how do yous use arrays?
Python gives y'all arrays with lists
Sometimes, different programming languages have different names for roughly the aforementioned matter. For example, in Python almost programmers think array when they are actually using a Python listing . For our purposes, think of Python lists and arrays as the substantially same matter.
Annotation
Python coders typically apply the word "array" to more than correctly refer to a list that contains simply data of one type, similar a bunch of strings or a bunch of numbers. And Python comes with a built-in applied science chosen "assortment" for simply that purpose. However, as lists are very similar and much more flexible, we prefer to use them, and then yous don't need to worry about this stardom for now.
You create an array in Python like this:
You can read individual pieces of data from inside the array using an alphabetize , merely similar you lot read private characters from inside a string .
As with strings, the index for the start piece of information is 0. The second piece has index 1, and then on.
Arrays can exist extended
But what if you need to add some extra information to an array? Similar strings, arrays come with a bunch of built-in methods. Use the append()
method to add an extra element onto the stop of the array:
Sort the assortment before displaying the results
The assortment is storing the scores in the order they were read from the file. Still, yous still need to sort them and then that the highest scores announced first .
Y'all could sort the array by comparing each of the elements with each of the other elements, and and so swap any that are in the wrong order.
Arrays in Python have a whole host of methods that make many tasks easier.
Permit'southward see which ones might aid.
Brain Barbell
Can you piece of work out which 2 methods you need to employ to allow you to sort the information in the order that you demand?
Brain Barbell Solution
You were to piece of work out which two methods you needed to employ to allow you to sort the data in the lodge that y'all needed.
The sort()
and opposite()
methods look the nearly useful. Y'all need to use reverse()
afterwards you lot sort()
the data, because the default ordering used past sort()
is everyman-to-highest , the opposite of what you need.
Sort the scores from highest to lowest
You now need to add the two method calls into your lawmaking that will sort the array. The lines need to go between the code that reads the data into the listing and before the code that displays the offset three elements:
Geek Bits
Information technology was very uncomplicated to sort an assortment of information using merely ii lines of code. Merely it turns out you can do even better than that if you use an option with the sort()
method. Instead of using these 2 lines:
scores.sort() scores.contrary()
you could accept used just 1, which gives the same outcome: scores.sort(opposite = Truthful)
And the winner is...?
It'due south time for the award ceremony.
The prizes are lined upward and the scores are on the scoreboard. There's just one problem.
Nobody knows which surfer got which score.
You somehow forgot the surfer names
With your rush to catch some waves before the calorie-free is gone, you forgot nearly the other piece of data stored in the results.txt
file: the name of each surfer.
Without the names, you lot tin't possibly know which score goes with which name, then the scoreboard is only half-complete.
The trouble is, your array stores one data detail in each element, non two. Looks similar you however have your work cut out for you lot. In that location'll be no catching waves until this upshot is resolved.
How practise you retrieve y'all can remember the names and the scores for each surfer in the competition?
Once y'all've thought nigh this problem, turn over to Chapter 5 and meet if you can resolve this outcome.
You've got Chapter 4 under your belt. Permit's expect back at what you've learned in this chapter:
Programming Tools
* files - reading data stored on deejay
* arrays - a collection variable that holds multiple data items that can be accessed by index
* sorting - arranging a collection in a specific order
Python Tools
* open() - open a file for processing
* close() - close a file
* for - iterate over something
* string.separate() - cut a string into multiple parts
* [] - the array index operator
* array.suspend() - add together an item to the cease of an array
* array.sort() - sort an assortment, lowest-to-highest
* array.opposite() - change the gild of an array by reversing it
Source: https://www.oreilly.com/library/view/head-first-programming/9780596806682/ch04.html
0 Response to "How to Read a File Line by Line and Sort Each Line in Python"
Enregistrer un commentaire