Brief


Overview

Bad Data Viz is in fact the theme of our website. So, please let us indulge by listing just a few more examples here:

bad visualization bad visualization bad visualization

In this assignment, you will use D3, an interactive visualization library, to develop your own data dashboard. D3 is extremely powerful and can be used to make very specific, informative diagrams. We highly recommend that you complete the D3 + JavaScript lab before you attempt this assignment. We have a ton of explanation and introduction to D3 in the lab.

To illustrate the capabilities of D3, here are some stunning examples of visualizations created with it:

We know that D3 code is easy to find online. Even if we provide you with reference implementations, do not copy any code.


Before you begin:

Many students find this assignment to be one of the most challenging in the class. This is because JavaScript is a new language and framework for many, and we don't formally teach it. We HIGHLY recommend you walk through the lab listed here before you begin. The solution code for the D3 lab is a great place for starter code on how to create your own graphs for this assignment.

Since this project is more free-form than previous assignment, there is no solution code for this assignment. We will do our best to debug your dashboard during TA hours, but TAs at hours can only be so helpful debugging your particular project. A few tips:

We hope for some this will be a fun assignment and closely resemble future data science work!


Assignment

Your task is to create your own informative D3 dashboard! This assignment, in particular, is very flexible. There are no strictly correct or incorrect answers since visualization is inherently subjective. That said, we will evaluate your work on a number of requirements listed below. Further, we expect that you incorporate concepts that Ellie has discussed in class like color palettes, font types/sizing, orientation, clarity, organization, and informativeness.

Dashboard

68 points

Datasets

(5 points! Just for using the right data...)

You must use data from one of the following datasets. Each dataset has a series of leading questions you may use as inspiration. You should make sure that your dashboard answers these questions. Imagine your boss gives you 2 weeks to build a dashboard on these issues.

Note: All 3 datasets are provided in the data directory under the data_viz folder. Each topic includes a link to a Kaggle site with more information about the dataset.

  1. Video Game Sales | Kaggle
    Questions:
    1. Your boss wants to know the top 10 video games of all time or top 10 for a specific year.
    2. Your boss wants to understand which genre is most popular. We'd like to see genre sales broken out per region. (This question can be answered by showing the top genre in each region if you want to implement a map, otherwise you should show genre sales broken down by region in bar/scatter/line/pie etc.)
    3. Lastly, your boss wants to know which publisher to pick based on which genre a game is. Your chart should provide a clear top publisher for each genre (could be interactive or statically show).

  2. International Football Results | Kaggle
    Questions:
    1. Your boss wants to know the number of football games by year. You should show at minimum 5 years, but you can choose which years to show.
    2. Your boss wants to understand the top winning nations. We would like to see a winning percentage for the top 10 nations. You can show this in a map form if you would like to.
    3. Lastly, we are trying to bet on which team will win the world cup 2022. Over the last 2 world cups, which teams were top performing. You can decide how to interpret "top performing". A few approaches we would reccomend: winning percentage in the world cup, victory strength, strength of opponent. You may show any combination of those. We don't have a specific answer we expect, and you should explain your choice in the written questions.

  3. Neflix Collection | Kaggle
    Questions:
    1. Your boss wants to know the number of titles per genre on Netflix.
    2. Your boss wants to understand the average runtime of movies by release year.
    3. Lastly, we want to learn about the cast and directors. You have two choices here: 1) the top director + actor pairs by number of movies made 2) a flow chart where each actor is a node, and a link refers to a movie they both acted in (just the connection, no need to specify number of movies made together or which movies those are)

Dashboard Requirements

Please put your dashboard in dashboard.html with your JavaScript D3 code in main.js and any custom styling in main.css. Feel free to add new JavaScript files if you want.
Graph Structure
(35 points)
Interactivity
(10 points)
Style
(8 points)
Communication
(15 points)

Extra Credit

(Up to +30 points!)
We think D3 is really cool, and some of you probably agree. We don't want to limit any of your creative impulses and therefore are offering bonus points for this assignment. Simply put, your final grade can be over 100% for this assignment. Here are some ways to earn bonus points:
  1. You implement some form of dynamic stats calculation. By dynamic, we mean that it updates depending on which data is being shown. Our example dashboard calculates a regression line, but you can show box plot whiskers, calculate percentiles, calculate a t-test.

    If you choose to add this, please add a note below your dashboard and written answers which in 1-3 sentences explains what you did and why it's statistical. (10 points)

  2. You may attempt one or both of these and get up to an additional 20 points!

    For some of the provided data sets and questions, we ask about geographic impact. One way to show this is with a bar/scatter/line graph. Another possibility is to show this on a geographic map! If one of your graphs is a "Map type" (https://www.d3-graph-gallery.com/) you earn an additional 10 points.

    For other data sets we discuss the relationship between particular data points, particularly in a graph way. i.e. how many hops between 2 actors on Netflix. You can use D3 to make and visualize this graph! If one of your graphs is a a "Flow type" (https://www.d3-graph-gallery.com/) you earn an additional 10 points.

    If you have any questions about whether a particular chart qualifies for either of these, please ask on piazza.

Example Dashboard

Created by CS 1951A's Arvind Yalavarti, we have an example D3 dashboard here. using the TA dataset from the D3 lab. Our dashboard goes above and beyond our expectations for you, but we though it would be helpful for reference, especially for those of you who want to go above and beyond.

Use of External Libraries

We've already included D3 and Bootstrap in the stencil code provided. To perform statistical calculations take a look at jStat and d3-regression. If you would like to use either of these libraries, add the following lines to your dashboard.html file:

If you would like to use other external libraries, please ask on Piazza first! There will be a deduction for using not allowed JavaScript libraries which potentially trivialize the assignment.

Run Specifications

To view your visualization in the browser, we're going to load the webpage via a local web server. Navigate to the directory containing your dashboard.html file, and run
python -m http.server 8000
You then can open a browser to the url http://localhost:8000/dashboard.html to view your dashboard.

Working Locally

We highly recommend setting up FUSE/SSHFS for Linux and Mac OSX users (instructions found at the bottom of the ML handout).

Port Forwarding

Another alternative is to set up port forwarding. This will allow you to edit your code over SSH and view your resulting dashboard locally.
When port forwarding, have two terminal windows open.

Now, on your local machine, you can navigate to http://localhost:8000/dashboard.html to view your dashboard, and edits you make on the department machine will be reflected on the dashboard you are viewing on your local machine.

Actually Locally

This assignment is completely in javascript and html. This means that it does not require packages or python to run! We recommend using the python http server to host the page as that is responsive. As such, if you install base python3 (or 2), have a text editor, and a web browser you can run this assignment natively on your machine. You should confirm things work the same (we have no reason to believe they wouldn't) on the department before handing in, but we do encourage you to develop locally if that is easy for you.


Written Questions

32 points

Write answers to the following questions below your dashboard in dashboard.html :
  1. Describe how your dashboard answers the questions presented. You don't have to address every question directly, but should at a high level address the main questions. (10 points)
  2. List 3 reasons why D3 was helpful and improved your visualization (6 points)
  3. List 3 reasons why D3 would not be the best tool for creating a visualization (6 points)

Hopefully through this exercise, you've all seen how data visualizations are powerful tools that we can leverage to efficiently and easily communicate ideas. However, not all graphs/charts are made equal. Read this article on misleading graphs and answer the following questions underneath your dashboard in dashboard.html as well.

If you enjoyed the reading or want to see some more real life examples of flawed visualizations (and how they could be improved), feel free to read the following articles as well. However, these are not required.


Handing In

Your ~/course/cs1951a/data_viz/must contain at least the following:

Then run: cs1951a_handin data_viz.

Important!


Credits

This assignment was created in Spring 2020 by Arvind Yalavarti (ayalava2) and Joshua Levin (jlevin1).