Building a Word Cloud Using D3.js for Data Analysis with CivicData

I recently came across an incredible resource, Learn JS Data, a guide that teaches the basics of manipulating data using JavaScript in the browser put together by the team at Bocoup.  The guide is a quick read and covers topics ranging from reading in data to advanced techniques of analyzing data all using the core JavaScript API and the d3.js library. D3.js is an incredible JavaScript library for manipulating documents based on data mostly known for giving you the full capabilities of modern browsers to produce powerful visualization components.

What I learned through this guide is d3.js also contains some incredible data analysis tools to simplify many common data operations. Armed with this new found knowledge I decided to tackle a data visualization I have been wanting to pursue for a while, the word cloud. I’ve always thought it would be interesting to visualize permits descriptions from a given set of time using a word cloud.

As defined by wikipedia a word cloud is:

… a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color.

In the context of building permit descriptions here is what our word cloud will look like once complete:

Word Cloud

This post will provide is a brief overview using core JavaScript and the jQuery and d3.js libraries to develop this visualization. The source code for this example is available on GitHub:

Getting the data

To get started we need to retrieve the data we need to analyze for our word cloud. We will be using a Building Permit dataset in Grand Rapids and the CivicData API to retrieve our data.   Mark Headd did a great post, Getting Started with the CivicData API, which introduces you to querying data available on CivicData, so I will skip over some of the details of making an ajax request using jQuery.

For our word cloud we are interested in the permit descriptions from the column ‘Description’ in this dataset.  Because CivicData offers a simple yet powerful API allowing us to query data using SQL, we will use a query that looks like this:

SELECT "Description" from "e741edf8-04ad-450d-bc62-6684a7a427dd" WHERE "Issued" >= '" + dateCalc + "'"

The query will return all of the values in ‘Description’ column from the dataset where the ‘Issued’ date is greater than the variable dateCalc which will hold a calculated date of 12 months in the past.

Group and summarize the data

When executed, the above query will return a long list of permit descriptions in JSON format that looks like this:


To populate the word cloud we will need to calculate a total count of each word used in the descriptions and then sort them in descending order.  To accomplish this here are the steps we will take:

Combine all descriptions into one long string

We do this to bring all of the descriptions together to for ease of data accessibility.

var descString = "";
descriptions.forEach(function(d) {
    descString += d.Description + " ";

Explode the newly formed string into an array so each individual word occupies a separate index.

Once we split the string into a newly formed array using the space delimiter we will also exclude common words that may not make sense to include in the word cloud.  Lastly to leverage some of the d3.js tools we will prepare the data into an array of objects with an attribute of named ‘description’ where the value of each is the individual word.

var descArray = descString.split(" ");

var descObjects = [];

descArray.forEach(function(d) {
    if (!isNumeric(d) && !matches(d,"AND","OF","TO","","&","ON","-          ","THE","IN","BE","FOR","A")) {
      var descObject = {}
      descObject.description = d;

Group and summarize the data to get a count of each word and sort it in descending order.

Here is where the power of d3.js comes into play.  We are able to group the data by a key in our array of objects, by utilizing the d3.js function nest. In this case we will use the ‘description’ attribute, and then once we have grouped that data we will then calculate the number of occurrences for each words found in our array.  Lastly we can use a simple sort function to arrange our words in decreasing order of occurrence.

var wordCount = d3.nest()
  .key(function(d) { return d.description; })
  .rollup(function(v) { return v.length; })

wordCount.sort(function(a,b) {
  return b.values - a.values;

Lastly we will prepare the data in the format the word cloud we are using is expecting it which is an array of arrays where the word and its corresponding count are each in their own array.

var tags = [];

wordCount.forEach(function(d) {

tags = tags.slice(0,250);

Creating the word cloud

To visually represent the word cloud I chose to use a JavaScript library called wordcloud2.js for its simplicity and flexibility. As you will see below it is expecting two parameters, a DOM element of the canvas that will contain your word cloud and an options object specifying properties of how the word cloud should display. One of the options properties is a list which contains our list of words and the count for each word the tags array from the above code sample, i.e. [[‘New’,600],[‘Existing’,545]].

WordCloud(document.getElementById('cloud'), {
  gridSize: 12,
  weightFactor: 2,
  rotateRatio: 0.5,
  list : { return [word[0], Math.round(word[1]/5)]; }),
  wait: 10

And that’s it, we now have a word cloud generated in real time from data hosted on CivicData. Click here to check out the live word cloud in action!

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s