website menu
HORSE2017 Logo


On “Horses” in Applied Machine Learning

Research workshop, QMUL, London
Wednesday 20 September 2017, 9h00-17h
Location: Arts One Lecture Theatre, QMUL, London E1 4NS

What are "horses"?

“As an intentional nod to Clever Hans, a 'horse' is just a system that is not actually addressing the problem it appears to be solving.” (B. L. Sturm, “A simple method to determine if a music information retrieval system is a 'horse',” IEEE Trans. Multimedia 16(6):1636–1644, 2014. Winner of the 2017 IEEE Transactions on Multimedia Prize Paper Award)

About HORSE2017

HORSE 2017 is a one-day workshop exploring issues surrounding “horses” in applied machine learning. Last year's edition was a great success, and we aim to top it this year.

The keynote of HORSE 2017 will be delivered by: Prof David J. Hand Free for speakers, and a minimal amount (£5) for others.

The day includes lunch and coffee.

10 free tickets are available for students: email to nominate yourself for a free ticket.


Time Speaker Title
9h00 Coffee
9h45 Welcome (slides)
10h00 Dan Stowell
Reducing confounding factors in automatic acoustic recognition of individual birds
10h30 Yann Bayle Preventing "Horses" in Music Information Retrieval tasks
11h00 Erica Thompson
On Hawkmoths and Horses: Epistemic issues in modelling complex systems
11h30 Saumitra Mishra Opening the black box: On the interpretability of machine learning models for machine listening
12h00 Lunch
13h00 David J. Hand
Keynote: But that’s not how I see it: “horse” constructs and what we want to know

Abstract: “Horses” have attracted attention in machine learning because of the dramatic way their classifications differ from the intentions of the system designers. But they do not arise solely in machine learning, nor solely at the level of the feature construction and mapping to the set of classes of the objects. I examine other situations in which “horse-type” errors have arisen, including examples of fundamental ambiguity in scientific research, errors occurring in complex systems, and death spirals in insurance.
(slides, QMUL A/V failed to capture lecture)

14h Adrian Bevan Machine Learning in High Energy Physics
(slides, QMUL A/V failed to capture lecture)
14h30 Artur Garcez
Avoiding Deep Horses: Finding Structure in Deep Networks
(slides, QMUL A/V failed to capture lecture)
15h Coffee
15h30 Kiri Wagstaff (via skype) Understanding machine learning model expertise
(slides, QMUL A/V failed to capture lecture)
16h00 Performance of music generated by the horse folk-rnn (slides, video)
16h30 Trip to the Half Moon Pub


This event is funded with support from the AHRC project DaCaRyH (AH/N504531/1), RAEng Research Fellowship RF/128, EPSRC Research Fellowship EP/L020505/1, and is co-organised with the QMUL Machine Listening Lab.

Call for contributions

Have you uncovered a “horse” in your domain?

We invite presentations exploring issues surrounding “horses” in applied machine learning. One of the most famous “horses” is the “tank detector” of early neural networks research ( after great puzzlement over its success, the system was found to just be detecting sky conditions, which happened to be confounded with the ground truth. Humans can be “horses” as well, e.g., magicians and psychics. In contrast, machine learning does not deceive on purpose, but only makes do with what little information it is fed about a problem domain. The onus is thus on a researcher to demonstrate the sanity of the resulting model; but too often it seems evaluation of applied machine learning ends with a report of the number of correct answers produced by a system, and not with uncovering how a system is producing right or wrong answers in the first place.

We are looking for contributions to the day in the form of 20-minute talk/discussions about all things “horse”. We seek presentations from both academia and industry. Some of the deeper questions we hope to explore during the day are:

  • How can one know what their machine learnings have learned?
  • How does one know if and when the internal model of a system are “sane”, or “sane enough”?
  • Is a “general" model a “sane" model?
  • When is a “horse" just overfitting? When is it not?
  • When is it important to avoid “horses”? When is it not important?
  • How can one detect a “horse” before sending it out into the real world?
  • How can one make machine learning robust to “horses”?
  • Are “horses” more harmful to academia or to industry?
  • Is the pressure to publish fundamentally at odds with detecting “horses”?
Please submit your proposals (one page max, or two pages if you have nice figures) by July 15, 2017 to Notification will be made July 25, 2017. Registration will then be opened soon after.
Return to top