Monday, 25 July 2016

Judging in the Mundial Final (Fun with Data)

You would think it would be easy to download the scores for a fairly simple dance competition. There are forty-odd pairs of competitors, there are seven judges, the judges observe the competitors doing their thing, and each judge utters a score for each pair. The scores are recorded and tabulated, an average is calculated for each pair, and they are ranked accordingly. It's that simple. They don't even do a 'sporting average' - which would mean they knocked off the highest and lowest scores before calculation. Repeat yearly.

As it turned out, it's rather a pain, but the data for 2015 was published by someone who apparently knew what they were doing and could create a relatively sensible PDF table of results, so I started there. But below, you can explore results for each year from 2012, which is where we start to get half-way useful data.

The data is not perfect; in particular there are errors in the names of couples where I had to look these up from different documents that were very poorly formatted and I didn't have time to fix all the problems. There are lots of messed-up accented characters, and some town or country names mixed in with the couple names. But relationship between couple ID number and score should always be right, and the name recognisable, where it's available at all.

It's possible that there is cleaner data somewhere else, but I decided to go entirely from the official website and do the data cleaning myself. If two people do this independently, that's no bad thing.

Before starting, I had some questions.

  1. How much agreement is there between the judges about which couples are better than others?
  2. If the highest and lowest scores were rejected before calculating the average, as is done in most competitions with subjective scoring, how much difference would it make to the results? 
  3. Supposing there is agreement between the judges, is there anything we can observe about the couples that explains high or low scores?
Below, I've embedded a Power BI dashboard addressing these questions.

It's interactive. You can navigate between the pages using the arrows at the bottom, and select the year using buttons.  It has a page of notes, but I'm going to repeat the gist of them below. The big tables take several seconds to load. If you can't see it well, it may behave better if you make it full screen using the arrow thing at bottom right.

The data all comes from, but you can download my cleaned-up compilation instead (from a few minutes after posting time).

For some years, the names of the couples are not given in the final rankings, only their competition numbers. Where possible, I have looked up the names from the published scores of preliminary rounds. I assume that the couple's ID number stays the same throughout the competition. Not all couple numbers appear in the scores of preliminary rounds, perhaps because they reached the final via other rounds in other countries or other competitions. In these cases, the couple name reads "Not Provided" with the year and ID number.

In this report, as well as the official average, I also calculate what I call the "sporting average" as used in most subjectively scored competitions; that is, the average if you ignore the couple's highest and lowest score. Finally I calculate the standard deviation of the scores.

The pages are as follows:
  1. Scores chart - shows the scores given by each judge in the selected year.
  2. Hi/Lo chart - shows the high and low scores averages for each couple.
  3. Ranks chart - shows how far the judges agreed on how to rank the couples.
  4. Scores table - shows how many places each couple moves if you ignore high and low scores in calculating the average.
  5. Ranks table - shows detail of how each judge ranked the couples. If they gave two couples equal scores, those couples get the same rank.
  6. Competition ID - we'll come back to this below.
  7. Notes, basically this information.
  8. A table of all the data, not as it looks in the underlying spreadsheet, but as it looks after Power Query mashes all the years into one data set for calculations. This also shows the average score and the standard deviation calculated over the population as a whole; you can select individual years and judges.

Question 1: agreement between the judges

There is not very much consensus between the judges on either the score or the ranking of any particular couple. They make it difficult for themselves to make fine distinctions by not awarding the full range of marks. Marks are out of ten, but the lowest that appears in any of the clasificatorias (not shown in this data) is 3.75.

I see a floor in the marks for the final; in 2015 the flat lines at 7 stand out in the scatter of scores, as though the judges felt collectively that anything lower would be impolite.

The second-placed couple in 2015 has a high score of 10 and a low score equal to that of the lowest couple. The first-placed couple were not ranked first by any judge. The only couple ranked first by more than one judge was placed 9th. To find the lowest-ranked who were placed top by at least one judge, we have to go down the couple ranked 25th overall. The lowest-ranked couple with a top-three ranking from at least one judge were placed 39th of the 41 couples. Looking at the other years, 2015 does not look atypical. In 2012 and 2013, exactly one of the top five was placed first by more than one judge, and in 2014 two of them were, including the winners.

There seems, looking at the Hi-Lo charts, to be slightly more consensus at the bottom than at the top, but this could be just because of the unofficial floors (which it looks as though not every judge agrees on). When I look at the chart of rankings, rather than scores, I don't see any more agreement at the lower end than the higher end.

In the ranking table, you can de-select a particular judge or combination of judges to see how your favourite couple might have done without them.

On only two occasions from 2012 has any one of the top five couples been placed first by more than one single judge.

On the final page of the report you can look at the standard deviation in the scores awarded by individual judges. Some judges appear in more than one year, sometimes with their names formatted differently, as full names were given in only one year. If a judge has a higher standard deviation, it means they awarded a wider range of marks; presumably, they were more convinced that some couples were better than others. A lower standard deviation means they awarded similar marks to everyone. Unfortunately the judges don't seem to agree on which couples they are, or are not, so convinced about.

Question 2: Sporting Average

Because the marks are, in my view, all over the place anyway, eliminating high and low scores before calculating the average doesn't make a lot of difference to the competition overall. It does make a difference to individual couples: it would have reversed the top 2 in 2015, and the couple placed 30th would have risen 8 places. This is the largest gain in any year, and also occurred in 2014. The largest loss is 12 places in 2012, and there seem to be bigger losses than gains for individual couples generally; someone goes down by a lot and everyone they drop below gains one. This seems consistent with the observed 'marking floor'; when a judge disagrees with their peers, they apparently tend to do so by awarding a very high mark rather than by going below the general 'floor' for that year.

Question 3: Is there anything we can observe about the couples that goes with high or low scores?

There isn't, in my view, enough agreement between the judges - or enough good video - to say much about this question.

I noticed is that there was a pattern to the numbers pinned on the couples' suits; there are a lot more lower ones. Closer inspection of the source data shows that this probably has something to do with the geographical origin of the couple and their route to the final. The system of awarding numbers is not covered in the published rules, but it seems the lower numbers are given in Buenos Aires and the higher numbers further afield.

So, taking this as a proxy for where couples came from, I checked to see if it was also related to their scores, and this is shown in the final chart, "Competition ID". Answer: not really.

The line in the same chart shows the average score for each block of 10. There are more couples with lower numbers, so perhaps we'd expect their average score to end up closer to the overall average of all couples than it is; it's rather higher. But those couples are also likely to have had more serious competition in previous rounds, which should also drive their average up compared to everyone else arriving via other routes. There isn't an obvious relationship between couple number and score as such. The foreigners are fine too, there just aren't that many of them.

More precise geographical origin of the couples is at least partially given in the source data, but as it's mostly in the form of tiny flags in graphics it would be a lot more work to get it, which I haven't done.

So, basically, no, there isn't anything I can say about how to do well, based on this data. There's no couple who did so clearly well or so clearly badly that you could watch and learn.

General remarks

In my own opinion, it's rather unrealistic of me to look at the Mundial as though it were a sporting competition. If it were you were really going for an exciting sporting competition, or some sort of mechanism for identifying the best dancers, then you would probably design a rather different event. It might, for example, include challenging tests of the ability to dance well to a variety of music, including milonga and vals, on a floor more than one-third full. There might be more rounds, with the judges taking longer looks at fewer couples in each. Judging criteria would be a matter of public record, rather than rumour. And there would be a system for creating agreement between the judges over time, beyond simply agreeing that scores below 7 were impolite. What it is, rather, is a marketing exercise for the 'Tango Salon' industry, designed to honour the heritage and disseminate awareness of the music and dance, while bringing lots of young couples who dance in a certain popular, standard-ish way, to public attention and prosperity.

If you are choosing a teacher, having reached the final in the Mundial indicates that a couple dance well in a particular style and have good tango technique, at least when dancing with their competition partner - as opposed to the very different sort of technique that is used for "Argentine Tango" on Strictly Come Dancing. It is not evidence that even one judge in the final thought they were the best. They may have been, but the chances are the judges didn't know - or if they thought they knew, they certainly didn't agree - in which case, I definitely don't know, and you don't know, either. Their ranking within the final says very little, if anything at all.

This is, in my opinion, pretty much how it should be. I don't think a true sporting competition in these circumstances would necessarily be a good idea. It didn't do ballroom any good, as a social dance.

In particular, I think it's probably a good thing that the judges don't agree. Standardisation would be toxic.

I do have a couple more questions.
  • Can we seperate the level of disagreement between the judges from the question of whether there is any real difference between the couples that they could possibly measure? I can compare the real data with simulated data based on having and not having a real difference, and the results are amusing, but I think I end up assuming what I set out to prove. It might be more interesting to compare the Campeonato de la Ciudad.
  • Does the order in which the couples are called - in four rondas - have any relation to their scores? I do have at least partial data for this, but putting it together requires some more work.
  • It would be nice to have tidy data about geographical origin, but again, it's a lot of work to peer at all the little flags in the published data and write down what they are, and it probably doesn't tell us much more than the competition ID numbers do; most of the people who are both interested in entering this competition, and competent enough to do well, are Argentinians.
Anyway, enjoy interacting with the report, and go ahead, share and comment. I'll upload the data so you can download it and do your own analysis.

Mundial Dialogue (Fun with Data)

This dialogue is an imaginary summary of at least a full day of faffing about trying to get the data together.

Hello. I notice there is rather a lack of anything fact-based about how the Mundial actually works, the judging, and so on. I'd like to download all the published results from previous years and analyse them to see if I can say anything interesting about it.

Hello. We have a website that covers each year since 2009, and most of the results are probably there somewhere, except for 2009 when they aren't.

Great, I can get all the files and mash up the data with Power Query.

We don't publish any individual judge scores, even for the semifinal or finals, before 2012.

Not to worry. Four years is a good start.

Before 2012, the results are sensible tables. From 2012 onwards, the semi-finals and finals are just PDFs of pictures of the scores. We don't name the couples. For 2014 and 2015 we do name the couples, but seperately from the scores, and in different formats.

I have access to some pretty good OCR technology, I have a great text editor, I know Regular Expressions, I have Power Query, and if it comes to that, I can put the PDF on my tablet on a book-chair and retype it all at 50wpm. I bet you named most of the couples in the classification rounds.

The links for the 2012 Final and Semifinal are broken. Those documents are just Error 404s.

But I know exactly how these things work, and I can guess, by analogy with documents that are there,  that for the final, instead of
you meant And for the Semifinal, instead of, you made a different mistake and you actually meant

Do you think this is a good idea?

Do you think you're going to win this?

Tuesday, 21 June 2016

Beautiful art on a big decision

Matthew has made a brilliant little piece of art; the way the Remain campaign should have been done.

If you are on a mobile you will have to "request desktop site" (it's always there somewhere) - it's because of the music, which is hilarious.

Saturday, 4 June 2016

Music of the Mundial Final

This post studies the music used in the final of the Mundial de Tango in the years 2012-2015. I have no information about how or why the music is chosen, or whether any guidelines exist for the person or committee choosing the music. In this post I simply observe what they actually chose.


I compiled this data by watching the videos in this playlist. They are kindly provided by Aires de Milonga, a website I recommend; they provide these videos for nothing, but they offer additional services to those who subscribe a very small annual sum via Paypal.

Each final consists of approximately forty couples, and is divided into four Rondas. For each ronda, three tracks are played. Only tangos are used; no milonga or vals.

Over the four years, this gives a total of 3 x 4 x 4 = 48 tracks, but there are actually 49, because in the first ronda of 2013 something happens off-camera during track 3 that bumps the floor and disturbs the competitors' concentration. A fourth track is played, in the same style.

To begin with, I noted the orchestra, singer, and title of each track.  I then searched for the tracks on and on YouTube until I was reasonably satisfied that I had identified them correctly.

The full data set can be downloaded here: if you notice an error, please describe it in the comments. The tracks are announced at about the 3-minute mark of each ronda, immediately after the couples do their preliminary walk around the floor so the judges can see their numbers.


Style rotation

I perceived the tracks for each ronda as covering a predominantly dramatic style, a predominantly rhythmic style, and an in-between, lyrical, or other style, in no particular order. I have added these wholly subjective categorisations in the full data set. You will probably disagree with at least some of them, perhaps many. The word "Lyrical" is fairly meaningless and just refers to the in-between or mixed or melody-led style of track that isn't either of the others; often it is the track that would allow competitors to show off the technical achievement of a slow, smooth, graceful walk. I may update my classifications to make them a bit more meaningful and regular.

The use of these three broad styles in each ronda makes sense on the basis that each couple gets the chance to show off a broad range of technical and musical powers. Each ronda in each final obviously needs to be stylistically similar to the other two. I note, though, that 40 couples seems a lot for a 'final'; the naive observer might have expected to see, say, only ten different couples, and see them dance for a little longer or to a wider range of music.


The orchestras used looked like this.

Orchestras of recordings used in the final of the Mundial de Tango, 2012-2015

It seems notable to me that there is absolutely no Biagi, and absolutely no Canaro.

Given the volume and excellence of their output, if they were going to be used at all, you'd think they'd be in there somewhere, over the four years. If you were practicing for the final, and you didn't have this data, you might spend time with those guys; but it seems you'd be wrong.

It can imagine a pretty good argument for not using any Biagi. There's no reasonable substitute, so if you used, say, one of the great Biagi instrumentals in one ronda, it might seem very unfair not to use another in each of the four Rondas. Everybody needs a roughly equal chance to either shine or make fools of themselves; and that would make Biagi too prominent and would mean you had to sacrifice something else. I hypothesise that if there were a vals competition, there'd be plenty of Biagi in there.

There is already a widely-held belief that Argentinians consider Canaro a bit 'common'. Nothing in this data really supports or dispels such an idea; but they don't use any in the final. Nor do they use any of the orchestras that come to mind as stylistically similar to Canaro's most currently-popular output; Lomuto, OTV, Carabelli, Típica Porteña, etc. So it does support the idea that this style of music is not considered appropriate for competition. And again, if there were a milonga competition, we'd see Canaro.

Years of Recording

This is what the years of recording look like.

Year of recordings used in the final of the Mundial de Tango, 2012-2015
It's notable that there's a long tail to the right, stretching all the way to 1959, but nothing at all on the left earlier than 1934.

Decade of recordings used in the final of the Mundial de Tango, 2012-2015
It's no suprise that most of it is from the 40s. But the notable thing for me is that more than a fifth of the recordings are after 1949; they slightly outnumber the ones from the 30s.


If you were doing well in the Mundial and you were practicing for the final, it would make a lot of sense to spend about a fifth of your time on each of D'Arienzo, Di Sarli, Pugliese, and Troilo, and the other fifth on exploring how what you have learned applies to whatever else you like among tracks that can be used as stylistic subsitutes for those four; provided that it is not Biagi, not Canaro, and not anything recorded before 1934.

You would also think about three (or more - this is very subjective) broad classifications of style, and you would focus on forming a range of improvisational habits that worked well for each style, regardless of the orchestra.

If you dance socially in Europe, it might also make sense to spend some extra time improving your dance to the 50's output. There's some support in this data for the widespread idea that the Argentinians think the Golden Age of tango music began and ended five to ten years later than the Europeans think it did. You may be less familiar with the nearly 30% of these tracks that were recorded after 1945, and you will probably have no chance to show what you can do with anything before 1935, so that experience is somewhat wasted. Being able to hit 80% of Biagi's off-beats will also be 100% useless, while being able to dance to 50's tangos generally without getting the giggles could be something you need.

Further research, or exercises for the interested reader

Yesterday I attended the first round of the related competition organised in London (there were 14 couples, one from the UK). You might be wondering if the pattern I've seen here was followed, or if it is followed in your own local competition, or the European competition, or anywhere else. I haven't gone through my notes yet, but the data so far says no. Despite dividing thirteen couples into a rather excessive three rondas, I don't think they followed the rotation of styles, and Pablo played both Canaro and OTV. I probably won't attend rounds 2 or 3, as it costs £25 to get in, and that adds up to a bit much, but if you feel like having something to focus on while you're there, go ahead and collect the data. It would be good to note the couple numbers in each round, too, along with your personal top six, and the results.

My guess is that no guidelines are published anywhere about the music, so the practice in local competitions is probably completely unrelated to what's done in the final. I have not tried to collect data for the semi-finals, either, and there's no reason to assume it's the same.

An interesting exercise for the reader - or for further research - would be to consider what three tracks you would use if you wanted, by observation, to identify the best dancers - by your own definition - in a room.

[Edit: I think the announcement at yesterday's competition was that there were 13 couples, but my notes show 14 different numbers; so I've changed it to 14. I could be wrong].

Wednesday, 18 May 2016


Sometimes people take video clips of tango events I travel to. And occasionally, like everyone else, I can see glimpses of myself dancing, in between other couples.

If I see myself leading, which is rare enough for me not to be used to it, the one thing that really hits me is how TINY I am.

Now, this could be partly in relation to my partners. I usually (not always) change into flat shoes if I want to lead, and most (not all) of the women I dance with usually (not always) wear about a 7cm heel for following. I am about 166cm tall, which when I look it up turns out to be two to four centimetres taller than average for an English woman. The women I dance with vary a lot in height, but most of the time their heels will prevent them looking much smaller than me. And I am rather lightly built, so I don't look like much of anything from a distance.

But I don't think that's what makes me look tiny. What does that is the contrast in size and bulk of us as a couple, with the other couples led by men. And that's something I simply never think about, and am never aware of, until I see it.

I am like a little dog that doesn't know what "small" is.

Wednesday, 11 May 2016

Red silk tango boots, 1910-1920

How did I miss this?

"These red, silk satin French-made “barrette/Tango boots” are in the collection of the Bata Shoe Museum in Toronto ( and date circa 1910s-1920s. Even if you do not fancy dancing, they just beckon you to have some fun, don’t they? In order to dance the Tango, the shoe needed to be well-fitted and secure. The lacing, or barrette-style straps, run up the ankle (and often the calf, as in this example) adding a provocative, sensual twist – appropriate for the dance itself."
Look at them! I don't have permission to use the image; if I can find a way of getting in touch, I will ask, and if it's ok I'll add it here. But comments are restricted, and I can't find an email.

Everything SilkDamask writes about the requirements for a tango shoe still applies; it must hold firmly to the foot, flatter the leg, look beautiful, be sensually pleasing, and fit well. They must also have a very flexible sole, on which the dancer can easily pivot. The fashion for heels is thinner, and the shoes themselves are generally less substantial.

These days we have numerous manufacturers to choose from, and they compete for the custom of serious dancers on comfort, fit, function, beauty, and to some extent price, although generally not on prompt delivery or reliable service.

I invite you to compare the Yeite glossy red by Balanceo, the Silver Ramona by Madame Pivot, and the Recoleta in purple polka dots by Regina. From the Argentine manufacturers, Fabioshoes make this rather gorgeous practice shoe. Comme Il Faut have continued to make their more extreme, colourful, elaborate and detailed designs, but are possibly collected as art objects about as often than they are used to dance in, at least nowadays in the European market.

Tuesday, 26 April 2016

Vague, Subjective Guide to Obvious Tango Orchestras

In no particular order, not even alphabetical, this a dump of my 100% subjective and fairly eccentric mental catalogue. In no way should this be taken as reliable or comprehensive.

Sounds the most orchestral of all orchestras - the most like a symphony orchestra. If there seems to be a lot going on, with numerous different sounds having equal prominence, and at the same time it's all rather grand and maybe wants to be part of a larger work, it's probably Troilo. Also sounds like everybody, so if you know you know it, but can't think who it is, it's probably Troilo.

Di Sarli
I can't improve on Mike Lavocah's insight: Di Sarli does everything with the violins. Recorded over a long time with huge variations of style and sound and feeling, but the violins thing is consistent. Also sounds like Fresedo with a difficult girlfriend. Some of the late stuff sounds like Di Sarli trying to be Pugliese; with results that are kind of wonderful, if you can keep a straight face.

Sounds loud, even when playing quietly. Dense texture. Had Biagi at the piano for a lot of it, and sounds like Biagi without the holes. Some of the late stuff sounds not just intense, but completely bonkers, until you compare it with Biagi. The flesh and organs of tango.

Sounds like D'Arienzo with holes in. A lighter texture, and not just intense, but genuinely eccentric. The offbeats are much more unexpected. Appears never to have recorded a bad or boring track. A truckload of amazing valses, not enough milongas. In the late recordings he doesn't fall into the trap of imitating Pugliese; he's influenced by changes of style, and grows, but remains his entirely genuine, electric, eccentric self.

Kind of the backbone of tango, including the dodgy discs and the odd hernia, gunshot splinter or missing fortune. Always sounds totally professional, brisk, and organised. Best when there's a really strong tune and Maida singing. But recorded a lot, for a long time, and is very various.

Sounds glossy and dark. Salon in a dark suit. Tbh I can't describe it very well because are only a few tracks I like all that much, and most of those remind me of the JAWS theme. The ones that are good are magnificent.

Sounds like a 30's Hollywood movie involving palm trees and pineapple hats. Tuneful, romantic, often a bit sugary. Or like Di Sarli with a nicer girlfriend and fewer violins.

Sort of buzzy. Like dancing bees. The most famous tracks have Alberto Podesta doing the operatic tenor thing. When he flies, he still comes down at the right moment. Bermudez has a darker, lower voice. But the bees are still there.

Sounds kind of like teeny-tiny Troilo for a much smaller room. Still a lot going on, but more portably.

The sound of the orchestra itself, with or without the singer, is full of love. This is music of a bigger picture, but expressed for and by two humans in one embrace that embraces a whole world.  Pugliese is never caught up in narrow emotionality; but is deeply and satisfyingly real, like a plate of egg and chips. Pugliese was so influential that all the modern orchestras seem to be trying to sound like him, which is a trap better musicians than they are have also fallen into.

Sounds like walking along very carefully, one step at a time, not stepping on the lines on the pavement, because of the bears. But either quite amiably tipsy, or, in some tracks, weeping drunk.

Sauntering, episodic, and sounds a bit pissed off, or rambling drunk. Nice to play with when you're not going anywhere in particular. On reflection, I prefer the ones where Vargas comes in early and sings his thing, and the orchestra provides a fairly brisk, minimal frame otherwise.

De Angelis
Another kind of sauntering and episodic that wanders off in no particular direction and has a giant drama while staring at some random wall. Generally a bit art-movie. History's most annoying Cumparsita, which wurbles on for what seems like twenty minutes, is instantly forgotten, and then keeps coming back in your head for days.

Probably the one that sounds most like late 19th century opera that you can dance to. Like the Pearl Fishers or Tosca. Properly entertaining music, with loads of rhythm, super tunes, and the occasional quite good joke.

Lush (Mike Lavocah again). A gooey chocolate hazelnutty sort of sound with a trace of jazz. I love it.

Daaah-dum. Dark woodwind with a distinctive ending.

OTV (Orquesta Tipica Victor)
Victor studio house band: sounds incredibly professional and picked good songs. Wants to be driving scene music for a low-dialogue film.

I usually don't hear the opening of the first track and name the orchestra in my head, unless the track is a particular favourite or has a very distinctive opening: I usually just focus on how much I like it, and who I am most interested in dancing with, based on the general feeling it gives me. When I name the orchestra it's usually later, whether I'm dancing or not. This is why I get absolutely ropeable if the DJ plays their weakest track first in the tanda, or plays inconsistent tandas that don't carry through what the first track promised. Don't do it!