Menu
website menu
HORSE2016 Logo

HORSE2016

On “Horses” and “Potemkin Villages” in Applied Machine Learning

Research workshop, QMUL, London
Monday 19 September 2016
Location: Arts Lecture Theatre (Arts 1, School of English and Drama, QMUL) map

HORSE2016 YouTube Channel

What are "horses" and "Potemkin Villages"?

  • “As an intentional nod to Clever Hans, a 'horse' is just a system that is not actually addressing the problem it appears to be solving.” (B. L. Sturm, “A simple method to determine if a music information retrieval system is a 'horse',” IEEE Trans. Multimedia 16(6):1636–1644, 2014.)
  • “[Our] results suggest that classifiers based on modern machine learning techniques ... are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village.” (I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. ICLR, 2015.)

About HORSE2016

HORSE2016 is a free one-day workshop (with free coffee & nibbles and a free lunch) that will explore issues surrounding “horses” and “Potemkin villages” in applied machine learning. One of the most famous “horses” is the “tank detector” of early neural networks research (https://neil.fraser.name/writing/tank): after great puzzlement over its success, the system was found to just be detecting sky conditions, which happened to be confounded with the ground truth. Humans can be “horses” as well, e.g., magicians and psychics. In contrast, machine learning does not deceive on purpose, but only makes do with what little information it is fed about a problem domain. The onus is thus on a researcher to demonstrate the sanity of the resulting model; but too often it seems evaluation of applied machine learning ends with a report of the number of correct answers produced by a system, and not with uncovering how a system is producing right or wrong answers in the first place.

Schedule

Time Speaker Title
9h30 Coffee and nibbles
10h00 Bob L. Sturm
Horse taxonomy and taxidermy
video, slides (pdf) slides (pptx)
10h30 Roisin Loughran
When The Means Justifies the End: Why We Must Evaluate on More than Mere Output
video, slides
11h00 Mathieu Lagrange
Computational experiments in Science: Horse wrangling in the digital age
video, slides
11h30 Tim Hospedales
Gated Neural Networks for Option Pricing: Enforcing Sanity in a Black Box Model
video, slides
12h00 Geraint Wiggins
Keynote: Trying to be accurate; or On the prevention of horses
video
13h00 Lunch
14h00 Sacha Krstulovic Avoiding deadly horses in Automatic Environmental Sound Recognition
14h30 Francisco Rodríguez-Algarra You don't hear a thing... but my Horse knows it's Rock!
video, slides
15h00 Jeff Clune (via skype) How much do deep neural networks understand about the images they recognize?
video, slides
15h30 Ricardo Silva The role of causal inference in machine learning
video, slides
16h00 Ian Goodfellow (via skype)
Adversarial Examples and Adversarial Training
video, slides

Funding

This event is funded with support from the EPSRC through the Platform Grant on Digital Music (EP/K009559/1), and is co-organised with the QMUL Applied Machine Learning Lab and Machine Listening Lab.


CALL FOR CONTRIBUTIONS

Have you uncovered a “horse” in your domain?* Or perhaps discovered a “Potemkin village”?†

We invite presentations for this free one-day workshop (with free coffee & nibbles and a free lunch), which will explore issues surrounding “horses” and “Potemkin villages” in applied machine learning. One of the most famous “horses” is the “tank detector” of early neural networks research (https://neil.fraser.name/writing/tank): after great puzzlement over its success, the system was found to just be detecting sky conditions, which happened to be confounded with the ground truth. Humans can be “horses” as well, e.g., magicians and psychics. In contrast, machine learning does not deceive on purpose, but only makes do with what little information it is fed about a problem domain. The onus is thus on a researcher to demonstrate the sanity of the resulting model; but too often it seems evaluation of applied machine learning ends with a report of the number of correct answers produced by a system, and not with uncovering how a system is producing right or wrong answers in the first place.

The day will feature a keynote lecture — and free coffee & nibbles and a free lunch — but we are looking for contributions to the day in the form of 20-minute talk/discussions about all things “horse” and "Potemkin village." We seek presentations from both academia and industry. Some of the deeper questions we hope to explore during the day are:

- How can one know what their machine learnings have learned?
- How does one know if and when the internal model of a system are “sane”, or “sane enough”?
- Is a “general" model a “sane" model?
- When is a “horse" just overfitting? When is it not?
- When is it important to avoid “horses”? When is it not important?
- How can one detect a “horse” before sending it out into the real world?
- How can one make machine learning robust to “horses”?
- Are “horses” more harmful to academia or to industry?
- Is the pressure to publish fundamentally at odds with detecting “horses”?

Please submit your proposals (one page max, or two pages if you have nice figures) by July 1, 2016 to b.sturm@qmul.ac.uk, subject line: “On ‘horses', in memoriam Alan Young (1919-2016)”. Notification will be made July 7, 2016. Registration (free) will then be opened soon after.

* “As an intentional nod to Clever Hans, a 'horse' is just a system that is not actually addressing the problem it appears to be solving.” (B. L. Sturm, “A simple method to determine if a music information retrieval system is a 'horse',” IEEE Trans. Multimedia 16(6):1636–1644, 2014.)

† “[Our] results suggest that classifiers based on modern machine learning techniques ... are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village.” (I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. ICLR, 2015.)
Return to top