Boston, households, and entropy.

How can we see and understand human diversity in new and more profound ways?

What can analysis of the household units that make up a city such as Boston tell us about the hidden dynamics of diversity?

In order to address these questions, we looked at the variability of several population measures and characteristics for Boston, not only for the city as a whole, but specifically for groups of people who live in the same dwelling.

The data visualizations below show different types of households for Boston by composition, spotlighting three particular characteristics: race, language spoken, and birthplace. In order to provide a novel lens through which to view the diversity of a household, we calculate its “information entropy.” Entropy here is a mathematical measure of how rare it is to find a specific arrangement of people living together in our data. The higher the entropy, the higher the “informational” content, and the higher the degree of disorder of that household. By highlighting the households with more entropy, we provide a more intimate and fine-grained perspective on Boston’s diversity.

For example, in the examples below using the race characteristic, a household of 8 white people has the highest entropy and therefore is very rare; by contrast, the household type with the lowest entropy for racial composition is that of a single white person, which means this is relatively common, followed by a household with a single Asian person.

These are some of the initial experiments done for the “Diversity Explorer” project for the Boston Research Center at Northeastern University, led by faculty Pedro Cruz (Art+Design) and John Wihbey (Journalism). Graduate research assistants: Alexa Gagosz (Journalism), Avni Ghael (Computer Science), and Aashish Singh (Computer Science). Data: anonymized micro-data from ACS 2016 (1% sample).

Here is a bit more on the underlying data analysis and calculations that inform the visualizations: If S is a household of three people where two people speak English and the other speaks Spanish, then S = {e, e, s}. The cardinality of S is 3 ( |S| = 3 ). For the general case, we define the entropy of S, H( S ), as a function of the natural logarithm of PS :

H( S ) = − |S| · ln ( PS )

Where PS is the probability of finding group S in any given household in the dataset. This means that the configuration of household S can be found in another household. For example, if household W = {e, s, e, s}, then S ⊂ W, meaning that the configuration of S occurs in W.

The visualizations below show all types of households in our dataset, highlighting their information entropy along the lines of race, language spoken, or birthplace. White bar chart lines on scale 0-70+ show the normalized distribution of ages of persons in those types of households.

Our visualization also employs some stylized design elements to bring alive the notion of diversity in a visually impactful way, aiding the viewer’s ability to see diversity at a glance across a wide range of types of household arrangements. The colored clusters of entangled lines above the representations of each household are inspired by the image of folded strands of DNA; each type of household is correlated with a precise set of entangled strings.

The number of strings corresponds to the number of persons (cardinality) in the household, and the color is associated with the characteristic described (race, language, birth place) of the persons. The amount of dispersion of the entangled lines grows with the household’s entropy.