Transcript
(Editor's note: transcripts don't do talks justice.
This transcript is useful for searching and reference, but we recommend watching the video rather than reading the transcript alone!
For a reader of typical speed, reading this will take 15% less time than watching the video, but you'll miss out on body language and the speaker's slides!)
[APPLAUSE] Hi. I'm Ryan Herr. And today we're going to teach a computer how to banjoify a song. That's the technical term for when you transform a plain melody into Bluegrass banjo style. Let's start with an example of this transformation.
The song we'll use is "Man of Constant Sorrow," which was in the movie O Brother, Where Art Thou? Here's what the plain melody sounds like before we banjoify it.
(SINGING) I am a man of constant sorrow. I've seen trouble all my days.
[APPLAUSE]
Thank you. And here's what it sounds like after we banjoify it.
[COMPUTERIZED BANJO PLAYING]
So this example and all the music you see and hear during this talk was generated with code. Here's how the process works. Each individual note from the plain melody is expanded into multiple notes on the banjo. Each group of notes we'll call a banjo phrase. And we combine all the phrases into the complete banjo arrangement for the song. For example, the plain melody starts with this note.
[BANJO NOTE]
This text you see, A 12, that's called ABC music notation. It means the note's pitch is A its duration is 12 beats long. But it doesn't sound like a long note when you play it on the banjo. The banjo can't sustain long notes like a singer can. So instead, we expand the one note 12 beats long into 12 notes each one beat long.
[COMPUTERIZED BANJO PLAYING]
The next melody note--
[BANJO NOTE]
--is shorter, two beats long, so it becomes two notes on the banjo.
[TWO NOTES PLAYING]
Again, two beats of melody becomes two notes for the banjo.
[TWO NOTES PLAYING]
And 12 beats of melody--
[BANJO NOTE]
--is expanded to 12 notes of banjo.
[COMPUTERIZED BANJO PLAYING]
You get the idea. I wrote Python code to do this.
[LAUGHTER]
I import my module named banjo. It has our song, "Man of Constant Sorrow." This is the plain melody here, just as a string of text in ABC music notation. The banjo module has a function named Banjoify. You input a song and some rules, and it outputs the transformed banjo arrangement of that song. It's still just a string of text, but if we pass that string into the Play function, we can also see the sheet music and listen to the same audio you just heard.
So here's our goal-- recreate this banjoify function and the rules from scratch, mostly. We will use some open-source software to help with our goal. The Play function is just a wrapper I wrote around some command line utilities to convert between different music formats, from ABC to SVG sheet music, from ABC to MIDI music, and from MIDI to wave audio.
We're using Jupyter Notebook here. Jupyter is an interactive shell in your browser. And it's not limited to text output. So we can display the SVG images and wave audio in line with our Python code.
And finally, NLTK-- NLTK is an acronym for Natural Language Toolkit. This might sound funny, using natural language processing to make music, but by now nothing here should surprise you. And music is just a language. And we've seen that the Banjoify function takes text input an returns text output. We haven't seen the rules yet, but the rules use NLTK's context-free grammar implementation.
A grammar is just a set of rules for the structure of a language. So let's program a banjo grammar. We import from NLTK. And we will start small with just one rule hard coded to transform one note of one song. But we will end with a small set of rules that can transform any note in countless songs.
The rules are represented as a multi-line string. Each rule will have a left side, then an arrow, then a right side. The arrow means the left side can be expanded to or replaced with the right side. We want to transform the first note in "Man of Constant Sorrow," which is 12 beats of A.
We'll expand it into 12 notes, but which notes? Well, how about we just repeat the same pitch over and over? It's a start. Hello, world. 12 beats of A can be expanded to 12 As. We can make a grammar object from this string of rules. We'll name this object Grammar.
Then if we generate a text from our grammar, we get a Python generator, which uses lazy evaluation, doesn't yield output until you ask for it. So let's ask for the next item from the generator, or in this case, the first, last, and only item that we can generate with this grammar. And we get 12 A's.
We'll call this our phrase. And notice the square brackets on the output. It's a list with one item. We want to make this list into a string. So we use Python's quirky syntax here to join the elements of the list into a string. And we get the string of 12 As.
This code here, just these three lines, is the foundation of our Banjoify function. We'll define the function here. And for now, it'll just take one input rules. We don't need the song input yet because we've hardcoded for one song, in fact, one note of one song. We return the phrase, then, as a string.
If we run the function, it works. Now pass it into the Play function. We can see and hear the music.
[COMPUTERIZED BANJO PLAYING]
But in Bluegrass banjo style you don't just repeat the same pitch over and over. Let's add some variation. For now, still hardcoded, we'll change every other pitch to a D, which I'll explain soon. Now let's play it.
[COMPUTERIZED BANJO PLAYING]
We've got variation back and forth between two pitches. I want to refactor the rules now from this to this. So these are the same. I'll show you how they're the same. First, as a test, let's just listen to the output.
[COMPUTERIZED BANJO PLAYING]
We get the same output. Now I'll explain this step by step. This will help understand these rules, and it will also help us understand formal grammars in general.
So we start from the top. We follow the rules and make replacements. We start with the symbol 12. Then we look in the grammar for a rule for 12. We have a rule. 12 can be expanded to six twos. So we replace the 12 with six twos. OK.
Now we go with this first two here. We look in the grammar for a rule for two. The rule is two can be expanded to melody followed by string one. So we replace two with melody and string one. Look in the grammar for a rule for melody. Melody can be replaced with the quoted literal A.
And when you get to a quoted literal, that's called a terminal symbol. It means you're done rewriting that symbol. You stop there with that branch. But the other symbols without quotes are called non-terminals. So you keep on expanding here the non-terminal symbols until you're left with only terminals, these quoted literal strings.
So we go back up to the string one symbol. It's non-terminal. We look in the grammar for a rule. String one can be expanded to the quoted literal D. We go back up to this two symbol. You get the idea. We had two before. It can be expanded to melody and string one. So we do that. Expand melody to A, expand string one to D, and so on. OK.
So we've seen this derivation now step by step. We learn some formal grammar terminology, terminal and non-terminal symbols. Now I can explain briefly what context-free grammar means. A context-free rule, the left side is always a single, non-terminal symbol, and the right side is some finite sequence of terminals and/or non-terminals. And because left side of the rule is always a single non-terminal, then the application of the rule does not depend on context from other symbols.
OK. So that was a crash course on the theory. But now I'll explain why we have this first rule in our banjo grammar. It's because that going back and forth between two pitches, that wasn't really a 12-beat pattern. That was a two-beat pattern repeated six times. That was the structure. So I want that structure in my grammar, because a grammar is a set of rules for the structure of a language.
And this rule-- what's string one about? Well, string one is this string on the banjo, which is usually tuned to the pitch of D. And we're using this first string of the banjo as a drone. To understand drones, let's look at another beloved popular instrument, the bagpipes.
[LAUGHTER]
Here are some MIDI-generated bagpipes.
[COMPUTERIZED BAGPIPES PLAYING]
So with bagpipes, a drone is a persistent lower note. With banjo, drone is an intermittent higher note. So we're learning here. We're having fun. But we're still just banjoifying a single node. So let's do at least the first four notes.
We want to loop over each note to repeat this process. So I wrote a helper function, parse ABC. It parses a string of ABC music notation into a list of notes so that we can iterate over each note. Each item in the list is a tuple with the note's pitch and the note's duration.
Now we're going to change the Banjoify function to loop over each note. We're going to add an input in the song. And for each note in the song, we want to do some stuff, mostly the same stuff, and return an output at the end.
What we want to do here is build up that banjo arrangement that we talked about in the beginning by combining the phrases. So we start with our empty list. And then we append each phrase to the arrangement and return the complete arrangement with all the phrases joined together as a string.
Now let's run the Banjoify function on the first four notes of "Man of Constant Sorrow," and then pass that into the Play function.
[COMPUTERIZED BANJO PLAYING]
I won't make you listen to all that. We wanted four groups of notes, four phrases, and we got it, so progress. But it's still wrong. What's happened here-- we're not tracking the changing pitch of the melody, right? We've hardcoded the melody to always be A, but we want the banjo melody to change when the song melody changes.
So how do we apply these pitches? How would we do that? One way that we could do that-- we could add rules like this, repeated for each possible pitch. It would be very tedious to do by hand. So we could do that programmatically, write code to generate the rules, to generate banjo music, pretty meta. And that's what I tried at first. But I decided I like being able to see my whole grammar concise and self-contained in this nice syntax with the arrows.
And also if you generate rules programmatically, you have to do annoying stuff like this, too. But really, 12 beats of A is not different than 12 beats of B when you were breaking it up into these rhythm patterns. It just felt like the wrong abstraction. So I did something different than this. I replaced the hardcoded pitch with a placeholder, a template.
In Python, like many other languages, curly braces are used for string interpolation. So we can format the string and replace the pitch with whatever value we want at the time. Melody can be A, melody can be B, and so on. We can change the rules as we go to fit whichever melody note we're currently on in the song. So we'll update the Banjoify function to use this.
First, remember the parse ABC function returns a list of tuples with the note's pitch and duration. So let's unpack that tuple here. OK, so now we've got pitch. Now we need to change the rules as we go. So we format the rules with the current pitch. And now we can run the updated Banjoify function. We see the melody changing in this text here. Let's play it.
[COMPUTERIZED BANJO PLAYING]
So it's still wrong, but in a more interesting way. Everything is being expanded to 12 beats, the long notes as well as the short notes. So what we need to do here is to apply the durations. Now, why is this happening that we're always getting long notes? Well, it's because when we went through that derivation before, we said, hey, we start at the top and apply the rules, right? So we're always starting with this 12 symbol.
But sometimes when we have those shorter notes in "Man of Constant Sorrow," we want to start with this two symbol. So we're going to update our Banjoify function to make that change. We have in here in the loop, we have the duration variable. But you notice we haven't made use of it yet. So let's change that.
Where we want to change it is that the step where we're generating from the grammar. NLTK has an option to specify a different start symbol. So that's what we want to do. We want to say we want to start with a different non-terminal, which happens to be whichever duration that we're on right now in the melody. We import that non-terminal object from NLTK. And now we can play the Banjoify function with the rules for the first four notes.
[COMPUTERIZED BANJO PLAYING]
OK. So we've got pitch. We've got duration. But those fill-in notes in between the melody, it's just the same thing over and over. So if that's the aesthetic you're going for, that could be OK. But really, I would like to have some more rhythms in my banjo grammar.
So what we're going to do here, for these longer notes, these 12 beats, instead of dividing it up into always using these two-beat patterns, we can divide up 12 beats in some different ways like, say, why don't we divide it up into four groups of three beats? And then we have to define what to do with three beats. And we say that will be melody followed by string one, followed by string five, another drone, which we'll have to define.
That pitch is G, and it's this note on the banjo, which is really a weird string, this fifth string, because it's starting partway up the neck. And it is most of the time used as a drone. And so that's how we represent it here in our banjo grammar. Now if we play the updated rules on the first four notes of "Man of Constant Sorrow"--
[COMPUTERIZED BANJO PLAYING]
OK. So we've recreated the original example for those first four notes. Now we just want to do the rest of the song. Our grammar can handle the durations for most of these notes, but the very last note is 16 beats. So we'll create a rule for that.
So 16 beats, similarly to what we did with 12, we'll put it in groups of three, because that is sort of a idiomatic banjo sound. It's actually called the forward roll, these groups of three. And then you'll have an extra beat left over because 16 doesn't divide by 3 evenly. And that, again, gives some of the unique banjo sound this rhythm.
So we have to define what to do with that group of four. We'll say that that will be melody string one, string five, string one. And now if we play this--
[COMPUTERIZED BANJO PLAYING]
All right. We have fully recreated the original example. But wait, there's more. This grammar is not just about banjoifying a single song. But now we can apply it to more songs. So we're going to do "Happy Birthday," just because it's a very familiar melody. The grammar in the Banjoify function as-is is almost ready for "Happy Birthday," but we do see one duration that's highlighted here that we haven't accounted for yet. And this is eight beats.
And again, if you wanted to, I mean, these rules are pretty trivial. It's how do you take integers to add up to another integer? So you could manually or programmatically generate a whole bunch of these. And then you'd be ready for countless songs about any song. So if we add that single rule to say what do we do with eight beats and we run the Banjoify function with that on "Happy Birthday"-- and then this is actually in a different key.
And it's a different time signature. "Happy Birthday" is in waltz time. It doesn't matter. Doesn't matter in this grammar at all. It will still work. So here's the output what it looks like, and here's what it sounds like, "Happy Birthday, banjoified.
[COMPUTERIZED BANJO PLAYING]
All right.
[APPLAUSE]
There's still something weird about this grammar, though. We've shown that we can play multiple different songs. But each time we run it on a given song, we're only going to get one output, the same thing every time. That's kind of weird, right? We want more possibilities, more variations. We're going to add just three rules to our grammar here. I'm sticking with "Happy Birthday" for a moment.
Revisiting our initial rule, what to do with two beats, instead of doing melody followed by string one, that drone, what if we just say keep two beats, two beats? This is like an identity function here. This quoted literal two is again, in ABC music notation, it's saying, hold whatever came before it for two beats. So we're introducing instead of the unbroken, relentless steady stream of notes, we're going to add some pauses in here to let it breathe a little bit and get more variation.
We're going to apply a similar concept to some of these groups of four. This first pattern here, the technical name for this for banjo players is a bump-a-ditty is the name of the rhythm, because when you play it, it sounds like bump-a-ditty, bump-a-ditty, bump-a-ditty, bump-a-ditty. But if you take out one of those drone notes and you make it a pause, or you say, OK, hold that first note in the pattern for two beats, then bump-a-ditty becomes bum-ditty, because it sounds like bum-ditty, bum-ditty, bum-ditty.
And then if we have one more variation on what we do with four beats, this is called boom chick, boom chick. Because without getting into too much details, these square brackets in there in ABC music notation, it means play those notes at the same time. So we're saying we're going to have two beats of melody and then two beats of where we actually are pinching string one and string five at the same time. And it sounds like boom chick, boom chick.
So now if we were to create a grammar object from this string of rules and to generate a phrase-- right now we're saying, OK, give me the next item. But we don't want to do that anymore. Instead, we want to be able to choose from a variety of options. Because now our grammar can generate multiple possibilities. So instead of saying, give me the next item from this generator, what if we said, OK, give me a list of all the items that this generator can generate. And we'll call that our options.
And now if we take a look at all the options here, we see we have three different possibilities for what you could do with 16 beats. And so now how we're going to choose our phrase is we're simply going to say make a random choice among the valid options that we have right now. So if I run that and grab a phrase, then now I'm getting one of the new options for 16 beats that does have that pause in there, that pinch at the end for a boom-chick pattern.
If I run it again, I may get another option. So let's just take that code there and copy and paste it into our Banjoify function. And that's all we have to do to make use of this variety of rules. Now if we again play "Happy Birthday" but with our updated set of rules, and if we run it, we get-- here's one variation. We run it again, we get another variation. Run it again, get another variation. Let's listen to this one.
[COMPUTERIZED BANJO PLAYING]
So slightly different. Some little pauses in there. So that's interesting to me that now there are more possibilities. And I was kind of curious how many more possibilities are there here? So I decided, let's keep track of that. At the beginning, by definition, there's one possibility. And we can update that number of possibilities by each time through the loop. Let's multiply it by the length of that options list, meaning the number of options.
Then at the end here, I want to print out, as part of my output, the number of possibilities. And I'll go ahead and decide to format that with a comma, just in case it might be useful to interpret that number. And we run it. And we see that there are over 9 billion possibilities. Just by adding three rules to the grammar, we went from one possibility to over 9 billion. How is that possible?
I was thinking about it and looking at the rules and saying, well, each note really only has two or three options. But if each node has two or three options, approximated by 2.5, and then we raise that to the power of 25 melody notes in "Happy Birthday," then we're in the ballpark of 9 billion.
We can update the rules as well to get a different "Man of Constant Sorrow" version. So before we had some low-level rules. Now we're going to have some variation in the high-level rules. We can say 16 could also be four groups of four. 12 could be three groups of four. So now just adding those two rules. And we'll banjoify "Man of Constant Sorrow," and there are over 13 billion possibilities. Here's one of them.
[COMPUTERIZED BANJO PLAYING]
So admittedly, these are not big variations. But that's not a bug. That's a feature. These are all following the song and following the style, every one of these 13 billion possibilities. This is not like giving a million monkeys a million typewriters. There's no gibberish. The rules are human-readable. They fit on the screen.
So that's how computers learn how to banjoify a song. But how do beginning banjo players learn this process? They don't. Not usually. We don't learn how to transform plain melodies into Bluegrass banjo style. Instead, we memorize banjo arrangements that other people have come up with. This was me for years. And not just me-- if you go to banjohangout.org/forum, which is a real site, there are over 200,000 posts, including ones like this.
Someone writes, I'm stuck on learning tunes by memorizing tabs, which is another notation. I really envy those of you who can play by ear. I'm imploring you at the Banjo Hangout to tell me your secrets. How do I develop this skill so I can stop coveting your talent? Signed, sincerely, Grizzly the Banjo-Picking Bear. This is real.
[LAUGHTER]
So that's how Grizzly felt. And that's how I felt. That's how I felt, too. And so I took action. I actually went on a journey 15 years ago. I journeyed from my home in Normal, Illinois to study with a banjo guru in New Jersey, as you'd expect.
[LAUGHTER]
My guru was this guy. Not Steve Martin, the other guy, Tony Trischka. He was my banjo guru. I wanted to learn two things from him. I wanted to learn this process. How do you transform a plain melody into a banjo version so that I feel like I can speak this language fluently in my own words instead of just copying what other people have come up with?
And also I said, well, my comfort zone is really memorizing things. That's what I have gotten good at. And so I'd like to learn all the advanced techniques with that. And so in my time-- I was out there for one semester in college. And I really tried to steer as much as possible. I forgot my initial motivation. I said, I just want to learn all the fancy stuff. I don't want to spend time on the basic fundamentals. I mean, I'm here for a short time. I want to soak up all the advanced stuff from this banjo guru. But I did not have the fundamentals.
Then the very last lesson, Tony brought me back. He said, we've worked on all this advanced stuff, all these different techniques. How about we just jam on a simple song? Do you know "John Hardy"? It's an old folk song. And I said, well, I know how the tune goes, but I've never played it on banjo before.
And Tony said, well, that's OK. Just make it up as you go. But I did not know how to do that. So I didn't want to admit that. So we started playing. But all of a sudden my fingers were not cooperating with what I was telling them to do. And I didn't really know what to tell them how to do. And I'm just thinking about there are all these billions of possibilities, and they're coming by so fast. And how do I choose what the next note is going to be? And I couldn't keep up. And I felt dizzy.
And all of a sudden, my eyes started getting watery. And I said, Tony, I'm sorry. I got to stop. My seasonal allergies are really acting up here, the high pollen count. It was December. There was no pollen. But I excused myself. I was at his house. That's where I did the lessons.
And I went into his bathroom and just shut the door and looked in his mirror and just looked in the mirror and said, why do you not get this? Are you just not smart enough, Ryan, to figure this out? Maybe figuring out, understanding how this really works, maybe that's just not for you. Maybe you should just copy what other people have done. Maybe you can't do it.
We've all been there, right? Crying in your banjo guru's bathroom. Pretty universal human experience, rite of passage. I'm probably the only one in this hall with that particular experience. But we may have all had the experience of some skill that you just thought I want to do this so bad, like Ryan and Grizzly the Banjo-Picking Bear wanted to learn banjo. Maybe for you, it's speaking a certain natural language or programming language with greater fluency. But you say, is this just not for me?
Well, when I looked into the mirror, I said no, I can learn this. I've just got to put in the effort and go back to basics. What I really need to understand is how the pieces of this language fit together. And so I worked on a banjo grammar, not with code at first, but just writing a Banjo Hangout forum post as I started to gain some understanding of this and wanted to help others.
So here over 10 years ago I have this lesson on making your own arrangement of a melody. But honestly, I swear to you, I forgot about this, how much this resembled a formal grammar. I mean, look at this. I have-- now, this is a different notation. But it's before-and-after examples of you take this plain melody and then you transform it in certain ways. If I took screen shots from my forum posts, it's some of the same rules that we went through before, where if you have two beats of a melody, then you can put the first string in between there as a drone.
I have the four-beat pattern in there, the bump-a-ditty that we learned about, an eight-beat pattern for this forward roll. So I was so excited. I mean, this took me a couple of years. I went out to the banjo teacher 15 years ago. This was 10 years ago. So it took some time. And I started with simple songs. This was "Hot Cross Buns." I mean, that's kind of humiliating to work on. But it's like, I want to understand these fundamentals so bad that I'm willing to humble myself and play "Hot Cross Buns" until I figure this out and then work up from there.
And if you remember Grizzly before, he said, how do I learn this skill? Well, actually, somebody replied to him and said, Ryan Herr put together an excellent tutorial on how to develop your own arrangements. If you follow it, you will understand. And this was so exciting to me because it's like I was there, Grizzly. And it took a couple of years, but I went from asking that question and wondering if it was hopeless to being able to answer that question.
And I did it with a grammar, a set of rules for the structure of the language so that I could speak this language fluently in my own words. A grammar can be text, like in my forum post. Or it can be Python code. But you can use it to play banjo songs. You can learn it to make these transformations, including the song that I could not jam with Tony Trischka, "John Hardy." And here's what it sounds like.
[COMPUTERIZED BANJO PLAYING]
Thank you.
[APPLAUSE]