Word/Count Dict (Part 2)

No BS: Previous code didn’t strip out punctuation and left us with weird duplicates of words. For example, “the”, “the–“, and “(the” would all be considered separate words. I added a piece of code to strip out the punctuation which was:

out = s.translate(str.maketrans('', '', string.punctuation))

I then sorted the words based on the output out which gave us a neat dictionary with no weird punctuation.

Import of the string module is required at the start of your code.

import string

 

With BS: Truth be told I actually just looked this up online and it wasn’t incredibly hard to find. What it did do is reduce the duplicates that we get in our final dictionary. This must iterate over every single word in the text so it’s going to be O(n) at the very least where n is the number of words in our text. Don’t even get me started on the for/while loops we have later on.

The output of the piece of code is simply out, this allows us to check:

  1. is out already in the dictionary? If not, add it to the dictionary and set the counter to one.
  2. If it is already in the dictionary, just add one to our counter.

What do the rest of the pieces of the code do?

str.maketrans(from, to)

Returns a table that is suitable for use in string.translate(). It has two inputs: from, and to.

For our example, we’re not actually interested in mapping any characters to any other characters. Because of this, our two inputs are just blank or rather our maketrans statement is

str.maketrans('', '')

But if you look closely, our table also includes this weird string.punctuation part, too. This is because string.translate has a second optional input.

string.translate(table[ , deletechars])

Here, our table is str.maketrans(”, ”) so if we include another ‘,‘ inside maketrans(), we can also include a list of deletechars that we would like the function to delete when it iterates string.translate.

In our case, we will pull this from the string module, and it will simply be a list of all the ASCII punctuation in a single string. Hence our final chunk of code that strips out punctuation is:

out = s.translate(str.maketrans('', '', string.punctuation))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.