Aaron Doubek-Kraft
adoubekk@ucsc.edu
CMPS 161
2/3/2017

Objectives:

The primary objective of this project is to create an interactive web-based application to aid in analyzing graduate student admissions data, motivated by the unusually large pool of applicants to the UCSC computer science graduate program. This tool will facilitate the identification of strong candidates for particular research areas and trends among students who share similar research interests. The secondary objective is to generalize this application to provide a tool for more generic multivariate data analysis.

Technical Details:

This application will make use of the D3.js Javascript library (Data Driven Documents) for the interactive presentation, and Python scripts to perform data normalization tasks such as standardizing test scores.

Methods:

A common approach to multivariate visualization is the Parallel Coordinate Plot, which displays individual records as poly-lines connecting parallel axes representing the data components. This technique has the advantage of being able to quickly identify potential trends based on the intersection of the lines, but the plots quickly become crowded when dealing with large datasets. The basis of this application will be Parallel Coordinate Plots. In order to improve readability, the application will allow users to select subsets of the data to plot, and provide options to map certain subsets of the data to colors, or emphasize certain subsets by greying out or hiding other subsets. In the specific case of the graduate admissions data, the subsets could be MS vs PHD applications, research interests, country of origin, etc. In general, the tool should use qualitative information from the record to construct subsets automatically, and display quantitative data on the parallel axes based on user input.


Parallel Coordinate Plot


As time permits, the Parallel Coordinate Plot may be supplemented with additional visualizations. Other potential options to explore are derivatives of the Parallel Coordinate Plots such as Radial Visualization Plots, which often have the advantage of greater readability, but may require that some clustering operations be run on the data prior to visualization.

Schedule:

Week Objective
Feb 6 Complete preliminary data tasks: normalization of scores, construction of Javascript data structures, division into subsets.
Feb 13 Construct a basic Parallel Coordinate Plot based on the full dataset.
Feb 20 Add user input for coloring and plotting of individual subsets
Feb 27 Improve GUI and implement different strategies to attempt to improve readability/Project Update Presentation
Mar 6 Experiment with supplementary visualizations, begin analysis and technical paper
Mar 13 Continue technical paper
Mar 20 Final Project Submission

References: