Published on Sep 27, 2009

Counting words in Os Lusíadas

I’ve decided to start experimenting on text analysis. As a starting point, the analysis itself is very raw — I limited myself to analyze word frequency in the text.

The text chosen was a very well know Portuguese epic poem — Os Lusíadas. I chose this poem mainly as a provocative towards what it seems a banalized intellectual status among the inhabitants of the Kingdom. The result is a collection of 10 static pieces about each of the 10 most frequent words in the poem. I wanted to keep the graphical output as simple and elegant as my knowledge allowed.

Os Lusíadas is a Portuguese epic poem by Luís Vaz de Camões first printed in 1572.
The poem consists of ten cantos and 1102 stanzas.
At the left are the ten most frequent words in the poem by descending order of occurence.
This piece showcase one of those ten words.
Above is an area that directly represents the frequency of that word in each canto.
Each canto has a corresponding list of the ten most frequent words in that canto sorted by descending order of occurrence.
The length of the vertical lines for each canto represents its extension in number of verses.

The counted words were filtered before presentation by two factors — 1st No word with less than 10 occurrences in the whole poem would be taken into account. 2nd As you can imagine the most common words in Portuguese language weren’t considered like adverbs and pronouns. Some verbs and other words without a specific relevancy for the extrapolation of any concept, weren’t taken into account either.

As you should see, number ten was the magic number chosen for this composition. Each canto is composed by eight verses. The result was exported as pdf too  – I’ll get this piece printed for sure!

For a matter of curiosity, here is a previous study that originated the final concept. This study displays each of the 50 most frequent words in Os Lusíadas along with the position of each occurrence in the text. Here are the first 21.