Seth Rait

Student, Coder, Sailor, Musician

This is part 2 of a series of music theory posts. You can find part 1 here.

The Science

Have you ever been to a really loud concert or stood very near to a speaker and felt the ground shake a bit? That’s because sound is made up of vibrations. These vibrations travel through the air, displacing particles in their path and eventually finding their way to the little hairs inside your ears. Then they start vibrating those too. If this vibration happens in a regular pattern, then it sounds like a single pitch or tone. That is, there are no parts of the sound which are higher or lower than any other. When this happens, the vibrations take the form of a regular, uniform wave.

In this wave, the difference between crests and the midpoint of the wave is called the amplitude, and this determines how loud the sound is (measured in decibels). A good way to remember this is that the word “amplitude” shares its root with “amplify”, which means “to make louder”. Musicians don’t like decibels, so we describe loudness with dynamics. The sound can be piano (soft) or forte (loud). They can be somewhere in between, like mezzo-piano or mezzo-forte or very a bombastic fortissimo.

Back to the wave. The distance between two similar points (like two crests or two troughs) is called the wavelength. Since all sound waves travel at the same speed (the speed of sound), waves with shorter wavelengths pass by your ear more often than waves with longer wavelengths. The rate at which waves pass by a point is called the wave’s frequency, and it is measured in hertz. Waves with a higher frequency (and thus a shorter wavelength) sound higher than waves with a lower frequency. Musicians call this frequency pitch. Here’s a pitch for example, at 440Hz:

(Coincidentally, this pitch is the same as the one heard right before most orchestras begin their first piece, it is the pitch to which the instruments tune themselves.) Just as musicians didn’t like using decibels for volume, we don’t like to use Hertz for pitch. Too many numbers. Instead, we give letter names to certain important pitches.

The Notation

Now that we understand how sound is produced, we can start to figure out how to turn sounds into music. That requires a little bit of knowledge about how musicians organize and classify series of pitches.

The musical alphabet

The pitch you heard above (440Hz) is called A and if you move from 440Hz to 493.88Hz, that new pitch is called B. I know there are a lot of numbers between 440 and 493.88, and zero letters between A and B, but we’ll get to that later. In fact, these two pitches are not even the only A and B. The musical alphabet, for historical reasons I’m sure I will mention sometime (I do love minutia), runs from A through G, then circles back around to A. The distance between pitches of the same name is called an octave. Pitches an octave apart sound very similar. For example, here is the first A we played, followed by the A one octave deeper (A 220Hz), then them both played together:

When we are talking about these notes in the abstract (that is, devoid of accompanying notation), we sometimes append numbers to the pitch names to disambiguate octaves. The first A (A 440Hz) is called A4. The note above it is called B4 and the note above that is called C4. One octave below this is C3 and one octave above is C5.

The distance between consecutive notes is a little trickier. The distance between the notes A and B is called a whole step, which is twice as big as the distance between B and C, which was creatively dubbed a half step. A half step is the smallest distance between two notes in Western music. Between every whole step, like A-B, there is a half-step. Unfortunately, there is no letter halfway between A and B, so we have to make one up. To do this, we employ the concepts of sharps and flats. To sharpen a note is to raise it by one half step and to flatten a note is to lower it by one half step. So the half-step between A and B can be called either A sharp (A#), the half-step above A, or B flat (B♭), the half-step below B. When one note can be spelled two different ways (like A# and B♭), we call them enharmonic.

The musical alphabet can be spelled many ways, but one common way is thusly: A, B♭, B, C, C#, D, E♭, E, F, F#, G, G#, then back to A (one octave higher than where we started). It sounds like this:

When we line up notes in a row as above, it’s called a scale. There are many ways to create fun and interesting scales. This one in particular is called the chromatic scale or half-step scale. It includes all 12 notes (within one octave) available to us. We’ll talk more about scales later.

Measuring Time

The last major pieces of knowledge we need before actually learning how to visualize music are the concepts of duration, rhythm, and tempo. These ideas are linked very closely, but differ in important ways. Duration is the amount of time one pitch is played. Rhythm is the relative relationship of pitch durations to the pitches surrounding them. While durations can be measured in absolute quantities (a pitch can be played for a duration of 5 seconds, for example), rhythm can only be expressed in terms of the duration of other pitches. This means that while individual pitches can have duration, they cannot have rhythm; only groupings of pitches can have rhythm, such as this simple, yet very common rhythm:

This rhythm is one long pitch, followed by two pitches which are exactly half the duration of the first. In Western music, most rhythms are made up of relationships of ½ and ⅓. Note that if the previous rhythm were played faster or slower, the relationship of the duration of the pitches would stay the same. The speed at which a given section of music is played is called tempo. You can use the slider below to adjust the tempo of the audio clip from before.

Tempo and rhythm work together to drive the overall speed of a piece of music, but they are distinct from each other and are noted differently.

Music is usually made up of many different pitches, which, as above, we can describe by frequency, dynamic and duration. But no one wants to read music as a bunch of archaic formulae so we use a different language altogether (which itself is rather archaic), called musical notation. As I promised in the last post, you won’t need to know too much about this notation in order to understand most concepts of music theory, but it will sure make them easier, so here’s a short introduction.

Putting it on the Staff

As in most western languages, we read music from left to right. The two foundational notations in this language are the staff and the note. A staff is denoted as 5 equidistant horizontal lines running from the left side of the page to the right side. Each line, and each space in-between the lines (as well as the spaces above and below) represent different pitches. A note is the durational representation of a sound. That is, without a staff, a note cannot tell us pitch, but only duration relative to other notes. When we place a note on a staff, we have both pitch and duration – kind of. As it turns out, just these two symbols aren’t quite enough to express a very large set of music.

There are two main problems we encounter with our notational system thus far. The first is: with 5 lines and 6 spaces (4 between the lines, one below the bottom, and one above the top), we can only represent 11 pitches! Surely, out of the infinite number of pitches producible, we should be able to write down more than 11 of them. Musicians solve this problem in a few important ways, one of which is the concept of the clef. A musical clef is an insignia placed at the start of a piece of music to denote a mapping between specific pitches and locations on the staff. That is, the clef tells readers of the music which lines and spaces correspond to which specific pitches. For example, in bass clef (named for the low range of pitches, not the fish), the third line from the bottom represents the pitch named F, whereas in treble clef, that line represents D. Putting this all together, we can actually start reading real music! Here’s an example from a very famous piece of music. The excerpt is the “Ode to Joy” theme from the final movement of Beethoven’s 9th Symphony (This theme is also tattooed on my arm).

Reading Music

As when reading English, music is read left to right. The first thing we can see on the staff is the bass clef. That clef tells us a few things about the music on the staff. It tells us that the third line from the bottom (the line where the ball on the clef is) is the note F. Secondly, it tells us the octave. Traditionally, that specific F is F3. Moving slightly to the right, we see two “sharp signs (#) centered on the line for F and the space for C. This means, that unless otherwise shown in the music, every time we encounter an F or a C, those are to be played as F# and C#. This is called a “key signature” and we will discuss the importance and use of keys more in the next post.

We’ll skip over the little “c” for now and start with the notes. The first note you see is called a half note. This signifies that the duration of the note (which is an F#) is to be exactly twice the duration of the note which follows it, called a quarter note (which is a G). That is, half notes are exactly twice as long as quarter notes. Visually, the only difference is that the heads of quarter notes are filled in, while those of half notes are not. After the first three notes (F#, G, A) there is a vertical line dividing the staff into measures. A measure is a specific segment of time corresponding to a set number of beats. If you count the number of quarter note equivalents in each measure, you will see that there are always four. Either four quarter notes or two quarter notes and a half note (which counts for two quarter notes, since it is twice as long).

Some pieces have two, three, or six beats per measure, but each measure will always have the same number of beats (for now). The way to determine how many beats are in a measure, and which note counts as the beat (in this case it’s the quarter note, but it could just as easily be the half note), we look at the time signature. A time signature tells us two things. First, it tells us how many beats are in a measure. Second, it tells us which note “gets the beat”. The most common time signature is 4/4 (pronounced “four four”). The first “four” tells us that there will be four beats in a measure. The second “four” tells us that quarter notes get the beat, so there will be four quarter notes in the measure (or their durational equivalent). The vast majority of music is written in 4/4, so it is often referred to as “common time”, and that is expressed as a “c” right after the key signature on a staff.

If the key signature were instead 3/2, then that would imply three beats per measure, with half notes defining each beat (so six quarter notes). Below is a diagram showing sthe relative duration of different notes. At each level of the diagram, the same amount of time is given. So sixteen sixteenth notes take the same amount of time as one whole note, or two half notes, or four eigth notes and two quarter notes etc.

By now you should have all the tools necessary to start reading music. In the next post I’ll explain how to understand what’s written a little better. We’ll also talk about why certain music sounds good, and how to make music that sounds good.

Music Theory II: Sound - Seth Rait