Type here to search
Plan & launch
Produce & edit
Grow & monetise

Adding Structure, Clarity & Timing Through Music | Music In Podcasting #2

article featured image


So you’ve decided that it’s appropriate to add music to your podcast, and you’ve thought about the genre you want, and the mood and tone you want to set. It’s time then to look at the specifics of how to add music to your mix, and the ways music can be used to add structure to your show.


Ducking is the technique of reducing the volume on a music track so it doesn’t overpower the voice. It’s the first skill you need to learn if you want to add music to your show. Most podcasters simply play it by ear and lower the volume of the music track until the voice is clear but the music is still audible. Depending on the music, this could mean a reduction of anywhere from -5 to -30dB.

In some programs you might use a sound envelope to pull the volume up and down, however, in Audacity there is a dedicated Auto Ducking tool which does the same trick. The general technique is to just pull down the volume where the music overlaps with the voice, allowing the full sound to play before, and possibly after, for a short period of time. (Note that fades are also often used, and below I’ll talk about reducing the volume in steps).

The potential problem with this technique is that with a lower volume you risk losing all the energy and emotions that the music generates (which is probably what made you want to use it in the first place).

Frequency Ducking

So another way to duck music is to lower the gain on the frequencies that overlap with the voice. This is actually not at all complicated.

In a multitrack session of Audacity – one with both voice and music tracks – highlight the music and then click on Equalizer and choose the Graphic option. Now start to pull down the frequencies between, let’s say, 250Hz to 2500Hz. Reduce by -3 to -6dB.

Now, go back and listen to your session with both the music and voice playing. You’ll probably be amazed how much clearer both the voice and the music sound. This is because most of the information and clarity in a human voice is expressed in the 200-4000Hz range. So if the music is not competing with this in your mix, the voice will ‘pop out’, so to speak.

As with all ducking, play this by ear. You’ll probably need to lower the overall volume as well, though usually less than if you didn’t duck specific frequencies.

Tip: Always get feedback from others after ducking. We usually tend to play music tracks too loud (duck too little) as we are trying to ensure they have the emotional effects we want.

Tip: You can duck certain frequencies more than others, such as to make a heavy drum sound less obtrusive. That said, some music can start to sound a little ‘off’ if you play around with the frequencies too much.

Cuing music

So when do you actually start playing music within your podcast? Before someone speaks, at the same moment they begin to speak, or at some point after?

Cuing before someone speaks

When you cue music to play before someone begins speaking, you need to consider how long (how many seconds) the music should play before the voice comes in. There are two things to consider: what emotional or tonal effects are you trying to achieve, and how long is the music going to play in its entirety?

According to a musician I know who writes film scores, a good rule of thumb is that if the music is going to play for several minutes (perhaps accompanying a long, introspective section of the podcast) then it can play longer before the voices come in. However, if the section is going to be short, then a few notes may do. Again, play it by ear. A shorter playing time can also be effective if the music is energetic and is meant to enhance the energy of the speaker’s delivery.

There are endless examples out there, but here’s one that demonstrates the importance of letting the music play long enough to spark some emotions before someone speaks. The music has a warbly, futuristic mood and if it wasn’t played long enough, you simply wouldn’t catch that.

Cuing after someone begins speaking

Take this example from one of my own shows.

A guest mentions that he first went to China in 1989 and then the music – a pulsating techno piece with an Asian feel – comes in. By having the music play later, listeners experience a jolt as they were likely expecting some stereotypical Chinese music (such as a lute, or zither). Instead, the music propels you into modern times, which is appropriate as the guest is going to talk about how China was modernizing. If the music played before the speaker began to talk, the effect would clearly be muted.

Cuing the music to a point after someone begins talking often gives the music more of a presence. For another example listen to the beginning of this episode of Criminal. Notice how nicely the music sets a clear tone and mood about the subject matter by coming in after you have already been introduced to that subject matter.

The Shell Game

Cuing and the fade in/out

Below I’ll talk about fade ins/outs when they are used for scene transitions, but here it’s important to also be aware that music, and sound effects, often fade in slowly, sometimes playing a second or two in the background before being played at a normal volume.

This podcast on sound trademarks has so many short clips of news, sound, music, and effects that have to be woven into the fabric of the story that it’s a great example to study to see how to bring music and other sounds in smoothly. It should be clear that you can’t always just end a piece of narration suddenly, play a music or sound clip, end that clip, and move on to more narration without the whole piece sounding choppy and unpleasant. Usually the new clip (depending on its length) is introduced just a second or two before the end of the narration, but at a very low volume. The clip also often lingers a few more seconds as a background sound once the narration resumes.

NBC Chimes on 99% Invisible

What note to come in on

The last thing to talk about here is when exactly to play someone’s first words when the music is already running.

As a general rule, don’t do this on a clear high note, or a strong beat, as the words may be muffled (even if you have frequency ducked already). Try to let at least a full ‘sentence’ or phrase play first, and then drop the sound at a point that doesn’t overwhelm the voice, but in fact sounds like the sort of place the words should come in. Think of when a singer begins to sing during a song. It’s not random.

For a podcast, you often need to duck the music just a second before the voice comes in, and continue to reduce the volume in steps to make the change sound smooth.

Transitions/Using Music to Add Structure

When planning a podcast episode, you should break it down into distinct sections (at the very least into a beginning, middle and end). As with a visual fade in/out from a TV show or movie, you can use music as a clue that you are moving onto something new. Let’s look at various ways to signal these transition.

Fade out/fade in

The simplest method is to literally fade out the music from one section (usually letting it play for at least a few seconds after the speaker stops talking), and then fade in a new song.

Speak, then fade in

Another method is to fade out the first song, introduce the next section (by speaking), and then when you or another person begins to talk, cue the music. To understand why this works, think of an essay structure. If you bring the music in on the second paragraph, when you flesh out the introductory paragraph, it enhances the sense of forward movement more than if you brought the music in at the beginning, that is, before the listener was clear where you were going. This is similar to what I talked about above with cuing after someone begins speaking.

Repeating music within a single episode

Sometimes you can repeat music at the beginning and end to give a sense of connectedness and closure. Music can also be reintroduced to link segments thematically, even if they are separated by time. Think of how effective the theme music to Serial is: it’s used to introduce the show, end the show, but also used at times as the soundtrack during a summary, or when the host is talking about the next episode. This simple tactic helps to unify not just each episode, but all the episodes that you have listened to by reminding you where you’ve been and where you might be going.

No musical transition

If you don’t plan on using a full song in a new segment, then a clip that has a transition feel to it (such as a trumpet blast, or electric guitar riffs) can be used. Some music websites have a category called ‘Transition’ so these are easy to find. Ambient sounds can also work here, such as rain, wind, office noises, traffic, and so on, depending on what you are trying to set up for the next scene.

Timing Content to Match Changes in Musical Structure

In part one we looked at how, in the movie The King’s Speech, the music was timed, rising and falling and pausing with the cadences of the speech itself. Most of us can’t afford an original score for every podcast, but you can take advantage of the natural changes in a song’s rhythm or melody to make it seem like it was written for your words.

One of the easiest ways to do this is simply to introduce pauses (or gaps) in your own, or a guest’s, talk so that the words you want to match with a particular section of the music are said at the right moment. Note that you may need to introduce a white or grey noise track to your mix to ensure the new gaps don’t introduce ‘dead air’.

If possible, you can also cue the music a little later to make the music and words match. Here’s an example where I timed things so that the gong sounded after the words “imperial family”

A slightly more complicated technique involves cutting sections of the music out entirely, or looping certain parts. This requires a bit of practice overlapping sections of music to ensure the pieces still flow naturally. Also be aware that you may not legally be permitted to alter music in this way, even with royalty free tracks.

Disguising Recording Flaws with Music

If you are choosing music to disguise flaws in your recording then that music needs instruments that provide a natural layer of (usually) low frequency sounds to muffle the flaws. A single piano tune or a guitar plunking away slowly can often make the flaws sound even more prominent, so first identify what the flaws are before looking for music. If they are minimal (for example, light sounding crackles or hissing) then almost any music that doesn’t have obvious spaces between notes is going to work.

Music with wind instruments like flutes, or stringed instruments like harps and lutes, can be good for disguising heavy static. Contemporary electronica is often good for disguising tinny-sounding artefacts.

If you have a lot of background static or hiss, see if it makes sense to add a track with rain, city traffic, news, or an old recording with a lot of obvious clicks and pops and hiss of its own.

Here’s an example of a fairly bad vocal recording that I couldn’t redo and so used a background music track with a tea whistle accompanying a piano. As the speaker is talking about travelling to China, most people think it sounded like a train entering or leaving a station (and hence completely appropriate).

Final Thoughts

The last thing to take away from these articles is that podcasting is a relatively new medium and creativity is being rewarded. There are guidelines for using music, but don’t hesitate to use your own judgment and experiment. If you make a mistake, however that is defined, it’s easy to correct.

Guide Index

From idea to legendary podcast...

Plan & launch

From idea to recording


Produce & edit

Gear, software & tips



Be the best show host


Grow & monetise

Promote and earn


We’ve got every step covered.