Word/Count Dict (Part 1)

No BS: Given a .txt file, count the appearance of each word (ignoring case) and do either:

  1. Print a full list sorted by alphabetical order.
  2. Print a list of the top 20 most common words.

Taken from Google’s Python class: https://developers.google.com/edu/python/dict-files (bottom of the page).

Once I’ve figured out how to display code snippets correctly then I will upload what I have done for this exercise. There will likely be a quicker/faster way to do this exercise but that’s not the point, this is to showcase what I’ve learned in a week or so of online tutorials.

 

With BS: So as with any programming problem there will be a number of ways that you could approach this. Google even is nice enough to provide a few hints on how to tackle this problem, they suggest the following:

  • Make use of str.split() to split the text on all whitespace.
    • It should be noted that this does not get rid of any words immediately followed by punctuation e.g. ‘Alice sat down.’ would flag ‘down.’ as a whole string.
  • Define a helper function to avoid code duplication (very handy)
    • This lets the user try out user-defined functions where we can define a small function to produce the word/count dictionary. e.g. def utility(file.txt)
    • With this done, we can create two other functions to call on this utility function.
  • Don’t build the whole program at once
    • Again very handy because once you have a certain result you can test things from there.

There are a few logical steps to getting the output that we require, namely:

  1. Get a list of each unique word (ignoring case)
  2. Count how many times that word appears.
  3. Essentially create a tuple, or in this case a dictionary, of (word, num_of_appearances)

This forms the core of our utility function since it is a common component of both of our objectives. In other words, we need to carry out steps 1, 2, and 3 before we can manipulate the data any further.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.