Data Competition

We are currently experience server issues and as a result cannot offer competition registration via the website as we have in prior years. For the time being please e-mail gro.tats-pu@ofni with:

  • Your team name
  • Your team members (no more than 4)
  • School name

As soon as the server issues are resolved we will once again have user accounts and registration through the website.

  • A team of no more than 4 students may enter the competition by e-mailing gro.tats-pu@ofni by Tuesday, March 15, 2016
  • The information should indicate the team name, school, and the names of the team members.
  • The data set will be accessible after registration.
  • Teams will be required to submit their results and the code that they used to obtain their results by Wednesday, April 13, 2016.
  • Prizes will be awarded to the top 3 teams, all of whom will be asked to present their results at the conference.

Data Competition Overview

The data set in this challenge was obtained during a pilot study conducted at the University of Michigan. The data were emitted by appropriately instrumented vehicles to a number of transceivers at curves, parking lots and intersections. Essentially, as a vehicle comes within a certain distance of a transceiver, a handshake commences, a temporary ID is established for the vehicle and data are transmitted nominally at 10Hz until the vehicle is out of range. These data are meant to illustrate what is called "vehicle to infrastructure" or V2I technology. The promise of this technology is that the data can be used for optimizing traffic signal timing, warning drivers about approaching a curve or intersection too fast, etc. Below is a table of the data fields. Many more are allowed in the protocol, but we limit to a few that describe vehicle dynamics.

Field Description Unit
GenTimeSec Number of Seconds since Jan 1, 2004 Seconds
Latitude GPS Latitude Position deg
Longitude GPS Longitude Position deg
Elevation GPS Elevation m
Speed GPS Estimated Speed m/s
Heading GPS Heading deg
Ax Estimated Longitudinal acceleration: acceleration in a straight line m/s2
Ay Estimated Lateral acceleration m/s2
Az Estimated Vertical acceleration m/s2
Yawrate Estimated Yaw rate: vehicle's angular velocity around its vertical axis deg/s
ID Mysterious string that contains id's of vehicles and transponders Character

As one would expect, in the real world, the data are fraught with quality issues. The data selected here are multivariate time series lasting at least 5 seconds. Some time series are quite a bit longer. There are dropouts (not every transmitted sample is received), some are "stuck at zero" or are contaminated with bad "reads".

Objectives

The challenge is to develop a smoothing algorithm to estimate the true values of all ten variables such that one could infer precisely the dynamics of the vehicle from historical data. The laws of physics dictates that a vehicle should have continuous first and second derivatives. Vehicles do come to a stop by breaking and accelerate in some of the time series, but they don't crash (at least in this data set). All of the measurements here theoretically vary smoothly, but are obviously measured unevenly and noisily.

All the vehicles traveled on roads (some are in parking lots), so mapping the data will give a reasonable ground truth about where the vehicle traveled. Road data are not part of the data set however and cannot be used by your algorithm, which will be tested on data you have not seen. But that might help you figure out what happened during gaps in the data and help you determine whether you reconstructed the paths correctly.

Your task is to produce an algorithm that reconstructs curves for each of the ten variables at 10Hz sampling rate. For example, if the input time series is 20 seconds long, your output should be a ten-component time series of length 200 with equally spaced samples (every tenth of a second). You must supply the source for our execution and testing on a hold -out data set. You must also produce a write-up of the details and scientific justification for how it works.

There are 50 time series in the training set and 25 time series in our test set. Smoothing evaluation will be done by computing a root mean squared error for each component where the errors have been normalized to be unitless. RMSE's will be summed for a total score on the error. We will also measure the smoothness of each curve. Each curve will also be inspected to assess behavior on missing data and any unrealisting behavior will be flagged as a penalty.

The final score will be based on 1) accuracy in tracking the data; 2) structural plausibility (smoothness and behavior on missing data); and 3) scientific justification in the write up, e.g. mathematical rigor and generality.

Important Dates

Abstract Submission Deadline
Friday, March 11, 2016
Notification of Acceptance of Submission
Friday, March 25, 2016
Data Competition Entry Deadline
Tuesday, March 15, 2016
Data Competition Submission Deadline
Wednesday, April 13, 2016
Online Ticket Sales Close
Thursday, April 21, 2016
Registration Deadline
Thursday, April 21, 2016
Organized Session Submission Deadline
Friday, March 4, 2016

Conference Topics

This year's conference will focus on (but is not limited to) Data Science, Statistical Practice, and Education. Submissions on the following topics are encouraged:
  • Novel contributions to statistical methods or computing
  • Applications of statistical methods to interesting data sets from biology/medicine, social sciences, business/finance, and other fields
  • Issues in statistics / data science education
  • Statistics education in secondary schools (and beyond)
  • Other aspects of statistical methodology and applications

When and Where


Canisius College
2001 Main St
Buffalo, NY 14208


Cocktail Hour & Poster Session
When: Friday, April 22, 2016 , 6 P.M. - 7 P.M.
Where: Grupp Fireside Lounge, 2nd Floor of Student Centre

Banquet
When: Friday, April 22, 2016, 7 P.M. - 9 P.M.
Where: Regis Room, 2nd Floor of Student Center

Conference
Friday, April 22, 2016
  • 9AM - 3PM: Tutorials
  • 3:15 PM - 4:15 PM: Fr. Haus Memorial Mathematics Lecture
  • 4:30 - 5:55 PM: Panel Discussion ("The Multiple Facets of Data Science")
Saturday, April 23, 2016
  • Conference: 8:00 A.M. - 5:30 P.M.

Contact

Abstract Submissions
gro.tats-pu@stcartsba
General Conference Questions
gro.tats-pu@ofni
Website Issues
gro.tats-pu@retsambew