How To Guide:
Intro to Visualizations

Scatterplot

Code snippets from http://alignedleft.com/tutorials/d3/making-a-scatterplot. Explanations are mine.

Step 1: Create the SVG element

This selects the HTML element 'body', attaches an 'svg' element to it, and assigns the height and width to that 'svg' element.

//Create SVG element
    var svg = d3.select("body")
      .append("svg")
      .attr("width", w)
      .attr("height", h);

Step 2: Create the circles

selectAll("circle") selects all circles in the svg element. If there are none, it is an empty set, which is the case here. From there, the data is joined to the set with data(dataset).
enter() signifies the set of data points that are new to the join. append("circle") then creates a new circle for each of the data points, assigning them the default attributes.

svg.selectAll("circle")
   .data(dataset)
   .enter()
   .append("circle")

Step 3: Set attributes of the circles

Next, set the attributes of the circle as desired to represent your data points. r is the radius and affects the size of the data, while cx and cy are the coordinates of the center of the circle and would determine the position of the circle on the graph.
You wouldn't necessarily need to use the sizing of r, but it is definitely helpful with data representations.

.attr("cx", function(d) {
    return d[0];
})
.attr("cy", function(d) {
    return d[1];
})
.attr("r", 5);

Demo 1

Here is the working code so far and the chart it generates.

//Width and height
var w = 500;
var h = 100;

var dataset = [[5, 20], [480, 90], [250, 50], [100, 33], [330, 95],
        			[410, 12], [475, 44], [25, 67], [85, 21], [220, 88]];

//Create SVG element
var svg = d3.select("body")
    .append("svg")
    .attr("width", w)
    .attr("height", h);

svg.selectAll("circle")
   .data(dataset)
   .enter()
   .append("circle")
   .attr("cx", function(d) {
   		return d[0];
   })
   .attr("cy", function(d) {
   		return d[1];
   })
   .attr("r", 5);

However, this doesn't look that great. It's just a bunch of floating dots right now.
It has no background, no labels, and the dots are all the same size. It would be more meaningful if we had differing circle sizes (and the data to represent with that).

Step 4: Change the circle size

The first thing we can do is alter the size of the circles. Normally, you'd only do this when you have some data to represent with the size of the circles. It would typically be bound to them at the time they are created, but could also be updated later if needed.
For this example, an equation is used to generate random numbers.

.attr("r", function(d) {
    return Math.sqrt(h - d[1]);
});

Step 5: Add labels to circles

Since we don't have axes to label the graph, the first option is to add text labels. This code actually labels the points with their coordinates. attr() is used a number of times here to assign both data values and styles.

svg.selectAll("text")
  .data(dataset)
  .enter()
  .append("text")
  .text(function(d) {
    return d[0] + "," + d[1];
  })
  .attr("x", function(d) {
    return d[0];
  })
  .attr("y", function(d) {
    return d[1];
  })
  .attr("font-family", "sans-serif")
  .attr("font-size", "11px")
  .attr("fill", "red");

Demo 2

This next demo now shows the labels that were just added.

//Width and height
var w = 500;
var h = 100;

var dataset = [[5, 20], [480, 90], [250, 50], [100, 33], [330, 95],
               [410, 12], [475, 44], [25, 67], [85, 21], [220, 88]];

//Create SVG element
var svg = d3.select("body")
  .append("svg")
  .attr("width", w)
  .attr("height", h);

svg.selectAll("circle")
  .data(dataset)
  .enter()
  .append("circle")
  .attr("cx", function(d) {
   	return d[0];
  })
  .attr("cy", function(d) {
   	return d[1];
  })
  .attr("r", function(d) {
   	return Math.sqrt(h - d[1]);
  });

svg.selectAll("text")
  .data(dataset)
  .enter()
  .append("text")
  .text(function(d) {
   	return d[0] + "," + d[1];
  })
  .attr("x", function(d) {
   	return d[0];
  })
  .attr("y", function(d) {
   	return d[1];
  })
  .attr("font-family", "sans-serif")
  .attr("font-size", "11px")
  .attr("fill", "red");

It's getting better, but now we have a new problem where the circles are extending beyond the graph.

Step 5: Scales

There are two parts to a scale: domain (input) and output range (output). Here is an example of setting up a static scale:

var scale = d3.scale.linear()
  .domain([100, 500])
  .range([10, 350]);

Step 6: Set dynamic scale

A dynamic scale is set based on the data input. This is ideal, especially since one of the main reasons for using D3 is to make dynamic visualizations! The code here finds the max of the entire dataset and sets the domain and range accordingly.

var xScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[0]; })])
  .range([0, w]);
  
var yScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([0, h]);

Alter the code where the circles were created.

Now that the scale is dynamic, the code for creating the circles needs to be altered. It was coded to static values and should now be updated to reflect the new scaling.
These functions don't do this, but it's possible to set the scaling functions up such that you can normalize your data view a bit in extreme cases.

Replace this code:

.attr("cx", function(d) {
  return d[0];
})

.attr("cy", function(d) {
  return d[1];
})

With this code:

.attr("cx", function(d) {
  return xScale(d[0]);
})

.attr("cy", function(d) {
  return yScale(d[1]);
})

Alter the code where the text labels were generated.

These also need to be updated to match the functions for the circles. If not, you could end up with text and circles in different areas since they might be to a different scale.

Replace this code:

.attr("x", function(d) {
  return d[0];
})
.attr("y", function(d) {
  return d[1];
})

With this code:

.attr("x", function(d) {
  return xScale(d[0]);
})
.attr("y", function(d) {
  return yScale(d[1]);
})

Demo 3

This is the graph, now updated for dynamic scaling.

//Width and height
var w = 500;
var h = 100;

var dataset = [[5, 20], [480, 90], [250, 50], [100, 33], [330, 95],
               [410, 12], [475, 44], [25, 67], [85, 21], [220, 88]];

//Create scale functions
var xScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[0]; })])
  .range([0, w]);

var yScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([0, h]);

//Create SVG element
var svg = d3.select("body")
  .append("svg")
  .attr("width", w)
  .attr("height", h);

svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
  return xScale(d[0]);
})
.attr("cy", function(d) {
  return yScale(d[1]);
})
.attr("r", function(d) {
  return Math.sqrt(h - d[1]);
});

svg.selectAll("text")
  .data(dataset)
  .enter()
  .append("text")
  .text(function(d) {
   	return d[0] + "," + d[1];
  })
  .attr("x", function(d) {
   	return xScale(d[0]);
  })
  .attr("y", function(d) {
   	return yScale(d[1]);
  })
  .attr("font-family", "sans-serif")
  .attr("font-size", "11px")
  .attr("fill", "red");

Interestingly, the problem where the circles are extending beyond the graph has actually now gotten worse. This is because the range that is being calculated is being used for the center points of the circles, so you are more likely to end up running outside the svg element.

Step 7: "Correct" the origin

This step flips the chart so that the smaller values are at the bottom instead of the top. This is much simpler than the correction for a bar graph.

This can be fixed by changing yScale from .range([0, h]); to .range([h, 0]);.

You can see that now the lower values are at the bottom instead of the top.

However, we still need to fix the edges of the graph.

Step 8: Build in margins

To address the running out of room problem, we use margin or padding. This pushes the graph area in farther so there's extra space on the edges. It can take a bit of testing to figure out the appropriate amount of space to use and varying data can affect how much is needed as well.

Add in a padding variable: var padding = 20;

Change the ranges for xScale and yScale to now account for the padding.

range([0, w]) becomes .range([padding, w - padding]);

range([h, 0]) becomes .range([h - padding, padding]);

Now, for anything still running off the SVG, you will need to adjust the margin. Rendering this would show that the right side is all that's still being cut off with this dataset. This can be corrected by multiplying the padding by 2. xScale now becomes .range([padding, w - padding * 2]);.

Demo 5

Now we can see the graph with padding.

//Width and height
var w = 500;
var h = 100;

var dataset = [[5, 20], [480, 90], [250, 50], [100, 33], [330, 95],
               [410, 12], [475, 44], [25, 67], [85, 21], [220, 88]];

//Create scale functions
var xScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[0]; })])
  .range([padding, w - padding * 2]);

var yScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([h - padding, padding]);

//Create SVG element
var svg = d3.select("body")
  .append("svg")
  .attr("width", w)
  .attr("height", h);

svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
  return xScale(d[0]);
})
.attr("cy", function(d) {
  return yScale(d[1]);
})
.attr("r", function(d) {
  return Math.sqrt(h - d[1]);
});

svg.selectAll("text")
  .data(dataset)
  .enter()
  .append("text")
  .text(function(d) {
   	return d[0] + "," + d[1];
  })
  .attr("x", function(d) {
   	return xScale(d[0]);
  })
  .attr("y", function(d) {
   	return yScale(d[1]);
  })
  .attr("font-family", "sans-serif")
  .attr("font-size", "11px")
  .attr("fill", "red");

Finally, everything fits within the SVG.

Step 9: Other scales

You can also set up scales for other elements such as the radius. This is especially helpful for making sure you don't extend beyond the size of the SVG. This can be applied to any dataset and would be particularly useful. It not only keeps the data from getting too large, but also ensures that the smallest data points can be seen.

Here is a scale for r, the radius of the circles:

var rScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([2, 5]);

The radius would then change from the arbitrary return Math.sqrt(h - d[1]); to return rScale(d[1]);

Step 10: Axes

Setting up axes requires creation of the axis, provision of a scale, and orientation.

x Axis

This code creates an axis function, assigns it a scale, and then assigns where the labels should appear. This can be a bit misleading. orient("bottom") does not put the axis at the bottom of the chart. It only puts the labels at the bottom of the axis. The axis defaults to being created at the origin (which is the top left).

var xAxis = d3.svg.axis()
  .scale(xScale)
  .orient("bottom");

We can use this function to actually create an axis now.

svg.append("g")
  .call(xAxis);

y Axis

The same can be done for the y axis. You might notice that here there's an additional attribute, ticks(). This is assigned automatically, but you can also override it as shown here and it determines the number of markings there are along the axis.

//Define Y axis
var yAxis = d3.svg.axis()
  .scale(yScale)
  .orient("left")
  .ticks(5);

This example also shows some additions compare to the sample from the x axis. This assigns the class "axis" to the axis so that it can be styled with CSS. The ability to style with CSS is one of the many great features of SVG/D3. You can also see the transform that moves the axis to its desired location.

//Create Y axis
svg.append("g")
  .attr("class", "axis")
  .attr("transform", "translate(" + padding + ",0)")
  .call(yAxis);

Step 11: Adjust visuals

Clean up tick marks if desired, add CSS for styling, and update padding as needed to account for the addition of the axes.
For more details on how to handle tick marks, see the link at the top of the page.

.axis path,
.axis line {
    fill: none;
    stroke: black;
    shape-rendering: crispEdges;
}

.axis text {
    font-family: sans-serif;
    font-size: 11px;
}

This CSS styling cleans up the graph a bit and helps ensure that it's crisp and easy to read.

Demo 6 and 7

This demo includes all of the new changes - including adjustments to ticks and transforms for the x axis. It is also now generating a dynamic data set. This shows how this code could be used for many different data sets and chart sizes and still function how it should.

//Width and height
var w = 500;
var h = 300;
var padding = 30;

/*
//Static dataset
var dataset = [
        [5, 20], [480, 90], [250, 50], [100, 33], [330, 95],
        [410, 12], [475, 44], [25, 67], [85, 21], [220, 88],
        [600, 150]
        ];
*/

//Dynamic, random dataset
var dataset = [];     //Initialize empty array
var numDataPoints = 50;     //Number of dummy data points to create
var xRange = Math.random() * 1000;  //Max range of new x values
var yRange = Math.random() * 1000;  //Max range of new y values
for (var i = 0; i < numDataPoints; i++) {     //Loop numDataPoints times
	var newNumber1 = Math.round(Math.random() * xRange);  //New random integer
	var newNumber2 = Math.round(Math.random() * yRange);  //New random integer
	dataset.push([newNumber1, newNumber2]);     //Add new number to array
}

//Create scale functions
var xScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[0]; })])
  .range([padding, w - padding * 2]);

var yScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([h - padding, padding]);

var rScale = d3.scale.linear()
  .domain([0, d3.max(dataset, function(d) { return d[1]; })])
  .range([2, 5]);

//Define X axis
var xAxis = d3.svg.axis()
  .scale(xScale)
  .orient("bottom")
  .ticks(5);

//Define Y axis
var yAxis = d3.svg.axis()
  .scale(yScale)
  .orient("left")
  .ticks(5);

//Create SVG element
var svg = d3.select("body")
  .append("svg")
  .attr("width", w)
  .attr("height", h);

//Create circles
svg.selectAll("circle")
.data(dataset)
.enter()
.append("circle")
.attr("cx", function(d) {
 	return xScale(d[0]);
})
.attr("cy", function(d) {
 	return yScale(d[1]);
})
.attr("r", function(d) {
 	return rScale(d[1]);
});

//Create labels
svg.selectAll("text")
.data(dataset)
.enter()
.append("text")
.text(function(d) {
 	return d[0] + "," + d[1];
})
.attr("x", function(d) {
 	return xScale(d[0]);
})
.attr("y", function(d) {
 	return yScale(d[1]);
})
.attr("font-family", "sans-serif")
.attr("font-size", "11px")
.attr("fill", "red");

//Create X axis
svg.append("g")
  .attr("class", "axis")
  .attr("transform", "translate(0," + (h - padding) + ")")
  .call(xAxis);

//Create Y axis
svg.append("g")
  .attr("class", "axis")
  .attr("transform", "translate(" + padding + ",0)")
  .call(yAxis);

This is now a demo of a randomly generated dataset, so the same code with the labels deactivated generates a completely different chart:

It could still use some styling with CSS for the background, but that is not part of this demo. You would just assign a background color to the svg element. The best way to handle that would be to assign a class or id to it using attr().

Why do I care about D3?

This guide was written as an assignment for Oregon State's CS 290 course. I selected this topic because I am a business intelligence developer working with SAP's platforms including BOBJ (BusinessObjects). My team is fairly new within the company and we are quickly learning our limitations for graphical representations. I plan to leverage what I learned from researching and writing this in development of extensions for use with Design Studio. For those who are unfamiliar, Design Studio makes heavy use of CSS and JavaScript.

Image