Important Dates

Abstract Submission Deadline
Friday, February 27th, 2015
Notification of Acceptance of Submission
Friday, March 13th, 2015
Data Competition Entry Deadline
Friday, March 13th, 2015
Data Competition Submission Deadline
Friday, March 20th, 2015
Online Ticket Sales Close
Saturday, April 11th, 2015
Deadline for Registration
Saturday, April 11th, 2015

Conference Topics

This year's conference will focus on (but is not limited to) Statistical modeling in the era of data science. Submissions on the following topics are encouraged:

  • Novel approaches to data analysis
  • Applications of statistical and computational tools to interesting data sets
  • Statistics education in secondary schools (and beyond)
  • Other aspects of statistical methodology and applications
  • Fun statistics

When and Where



State University of New York Geneseo
1 College Circle
Geneseo, NY 14454


Banquet & Poster Session
Friday April 10th, 6 P.M. - 9 P.M.
Conference
Saturday April 11th 8:30 A.M. - 5:30 P.M.

Contact

Abstract Submissions
gro.tats-pu@stcartsba
General Conference Questions
gro.tats-pu@ofni
Website Issues
gro.tats-pu@retsambew

UP-STAT 2015 Data Competition

Sponsored by Xerox

Rules

The Data Competition will be open to all students. If you are interested in participating please proceed to the registration page!

  • All students participating in the competition must be registered for the conference. If you're registering as a team, all team members must be registered for your entry to be valid.
  • Students may participate individually or in teams.
  • The maximum team size is 4.
  • No student may be on more than one team.
  • Submissions are due Friday, March 20th, 2014.
  • Each team will be required to submit their results and code (SAS, R, etc.) used to obtain their results.
  • Prizes will be awarded to the top 3 teams, all of whom will be asked to present their results at the conference.

Urban Analytics Challenge Data Description

The data was obtained from a "loop detector" underneath the pavement of the southbound lane 200 feet north of the intersection of East Main Street and Culver Road in Rochester, NY. It was provided by the Monroe County Department of Transportation under the Freedom of Information Law. This data set represents eight months of observations from Oct 13, 2013 to June 7, 2014.

Loop detectors are used to identify areas and episodes of traffic congestion. Here, we offer them as a challenge in urban analytics.

The variables are:

  • Volume is the number of vehicles per hours going by the location. It is computed as twelve times the actual volume measured over five minutes (this converts it to an hourly flow rate).
  • Speed (expressed in miles per hour) is calculated from the difference in time between the beginning and ending of vehicle detection. It is the average over all estimated speeds of detected vehicles within a five-minute window.
  • Delay (given in seconds) estimates how long vehicles wait between arrival at and departure from the intersection.
  • Stops (given as numbers of vehicles per hour) count how many vehicles are approaching a red light. It approximates the length of the queue when the light is red.
  • DateTime includes day, month, year, and time.

Objectives of the Challenge

The goal of this challenge is to discover the most compelling, appealing, and practical patterns in the episodes of traffic congestion. The merits will be measured in terms of:

  • Novelty of the patterns discovered.
  • Usability of the recommendations made.
  • Originality and correctness of the scientific methods used.

Additional Notes

  • This data set contains a few extreme outliers on Delays and Stops. You are strongly encouraged to provide your best guess of the possible origin of such spikes and/or extreme outliers. Central imputation or other imputation methods may be used to replace these outliers.
  • The recommendations made by the winning teams will be given to the Monroe County Department of Transportation. In the interest of usability, the recommendation should use as little statistical jargon as possible.
  • The data set contains 68,437 observations (rows) on 5 variables (columns).