No BS: Given a .txt file, count the appearance of each word (ignoring case) and do either:
- Print a full list sorted by alphabetical order.
- Print a list of the top 20 most common words.
Taken from Google’s Python class: https://developers.google.com/edu/python/dict-files (bottom of the page).
Once I’ve figured out how to display code snippets correctly then I will upload what I have done for this exercise. There will likely be a quicker/faster way to do this exercise but that’s not the point, this is to showcase what I’ve learned in a week or so of online tutorials.
With BS: So as with any programming problem there will be a number of ways that you could approach this. Google even is nice enough to provide a few hints on how to tackle this problem, they suggest the following:
- Make use of str.split() to split the text on all whitespace.
- It should be noted that this does not get rid of any words immediately followed by punctuation e.g. ‘Alice sat down.’ would flag ‘down.’ as a whole string.
- Define a helper function to avoid code duplication (very handy)
- This lets the user try out user-defined functions where we can define a small function to produce the word/count dictionary. e.g. def utility(file.txt)
- With this done, we can create two other functions to call on this utility function.
- Don’t build the whole program at once
- Again very handy because once you have a certain result you can test things from there.
There are a few logical steps to getting the output that we require, namely:
- Get a list of each unique word (ignoring case)
- Count how many times that word appears.
- Essentially create a tuple, or in this case a dictionary, of (word, num_of_appearances)
This forms the core of our utility function since it is a common component of both of our objectives. In other words, we need to carry out steps 1, 2, and 3 before we can manipulate the data any further.