Updated October 26, 2021. This article is being continuously expaneded, amended and improved. For the log of major changes, see the changelog.txt.

Understanding Computer Sound

Introduction

The subject of sound in home computers is fascinating, and the decade between mid 1980s and mid 90s saw a fierce competition of digital sound representation methods, sound processing chips, and manufacturers. This article is the result of my attempt to re-discover and understand the technologies behind sound reproduction in computers. Where available, I will be using actual products from that era. The article was greatly inspired by previous works by Piotr Gontarczyk (here), Stefan Goehler (here) and countless other printed and online resources. For details see the Reference section.

There is a series of videos accompanying this article. If you prefer watching to reading, here is the intro episode:

Sound Waves

Sound is a mechanical agitation of air which takes the form of waves. It is produced by living organisms and things, transported through the air for a limited distance, and potentially received by other organisms or things. To increase sound transport distance, to store it for reproduction at a later time, or to create new artificial sounds, methods have been devised to transform acoustic waves into electric impulses. The transportation and storage can be analog (acoustic waves are represented as electric waves), or digital (acoustic waves are represented as series of bits). Analog representation is continuous (sound does not need to be broken up into pieces) while digital representation is discreet (waves are "probed" — sampled — at certain points and these snapshots are stored or transported). The advantage of the former is (theoretically) higher fidelity to the original sound while the advantage of the latter is higher reliability and easier storage and transformation of sound. In practice, with today's technology advancements digital representation also meets high fidelity requirements.

Now, let's try and experiment before we move on. Oscilloscope helps us observe the sound waves in their raw form. The oscilloscope I'm using is digital, so there is a digital transformation before the wave is displayed, but we can ignore that — we could just as well be using an old analog oscilloscope. What I'm saying is that we can "see" the sound wave before it is digitized in any way. Like below, when we play the C major scale on the acoustic guitar. Let's listen first:

And then look at the waves:

C D E F G A H
Figure 1. C major scale played on an acoustic guitar as seen in an oscilloscope.

Electric guitar is just that: electric. It sounds different, and waves are transmitted over a cable instead of air, but it is still analog, not digital.

C D E F G A H C
Figure 2. C major scale played on an electric guitar as seen in an oscilloscope.

Look at the periods of the waves above. One square on the horizontal scale corresponds to 2 milliseconds, i.e. 2 thousands of a second. So the period (horizontal span at which the wave's shape starts to repeat) for the note C above seems close to 0.008s or 0.009s. Then the period decreases with every note of the scale played, and for A above it is down to somewhere between 0.004s and 0.005s. We know that frequency of a wave is the inverse of its period, so we calculate the frequency of A at somewhere between ≈1/0.004 = 250Hz and ≈1/0.005 = 200Hz (hertz means "agitations per second"). In fact, the note A4 is officially defined as 440Hz which is twice the frequency of what we measured. Twice, because our sound was A3, i.e. an octave lower.

Waves illustrated above are not too similar to the well-known sine wave ∿ we all know from school. The reason is that traditional instruments produce much more sound than just the base note. There are additional harmonic components in the waves produced, resonance from other parts of the instrument (e.g. other strings), as well as noise components (e.g. caused by imprecise string plucking, scratching the string with fingernail, etc.). Also, the intensity (amplitude) of sound, and the presence of these additional components, is not the same throughout the lifetime of the sound.

a_third_harmonic_electric_guitar_parameters a_third_harmonic_electric_guitar
Figure 3. 2nd harmonic on 5th string (A) of an electric guitar

By the way, you don't need to buy a real oscilloscope if you don't have one. If you don't mind some digital processing, delay and inaccuracies, you can make similar experiments using the online Virtual Oscilloscope.

Click below for the video record of my exercises with sound waves...

Beep

At a low level, computers process data in binary form: either true, or false; either 1 or 0. Only two extreme values. This is the exact opposite of any analog signal where in addition to the two extremes there can also be an infinite number of intermediate values. And that is why the whole subject of converting between these two representations is so interesting, and why we have this article in the first place.

By early 70s it had become obvious that home computers had to have some means of communicating messages — for example errors or acknowledgements – in audio as well as visual form. But they were not equipped with digital-to-audio (DAC) converters, so methods were devised to simulate the analog signal with digital ("1" or "0") impulses. Let's try and understand these techniques by re-creating them using Arduino. To run the following experiments on your own, all you need is any Arduino board, a couple of resistors, any small speaker (e.g. 0.25W which was typical in the PCs of the 80s and 90s), and optionally a LED (to understand when the speaker is on, and when off) and potentiometer. You also need Arduino software for your computer in order to upload the sketches we discuss below to your Arduino.

The Arduino examples described below are also covered in the following video:

Let's connect the speaker to Arduino in the following way:

speakerSimple_bb_small
Figure 4. Speaker with LED to help understand "on" "off" states simulated by Arduino.

Now if we upload and run the following code, the speaker will star ticking twice a second. How is that happening? We're applying voltage to the speaker which energises the speaker's driver and pushes the membrane out, making a ticking sound. After waiting 500 milliseconds we're we're disconnecting the voltage; the driver's electromagnet is not energized anymore and the membrane gets pulled back in, making another ticking sound. The "out" and "in" sounds are almost indistinguishable, and both of them together constitute one "on"/"off" cycle. Here is what it sounds like:

And below is the code responsible for that effect:

Continuous sound can be generated by increasing the frequency of the ticks, i.e. decreasing the delays between them. Here is what A4 note sounds like if generated that way.

And here is the Arduino code which generated it. Check out the interdependencies between period (the two delays in one cycle combined) and frequency.

That said, the generated wave is not at all like the one coming from a real instrument. Compare the close-to-sine sound wave (blue) generated by a guitar and and the on/off square wave generated by the piece of code above.

a_guitar_with_a_square

The frequency is the same, but the shape is different, and the difference is clearly distinguishable by the human ear. The shape of a sound wave dictates its timbre which is a quality distinct from pitch and intensity. Later we will see how other sound synthesis methods tried to overcome the limitation of the binary, or "on/off only", quality of 1-bit speaker.

Square wave produces a characteristic "beep" sound which has been used in many devices to communicate alerts, acknowledgements, and errors to users. In computers, the beep sound history reaches back all the way to mainframe terminals where a special character, bell, was used to denote alert condition (which, in turn, had its origins in the ring sound of a typewriter denoting the end of line). The bell code, even though it represents a sound, remains a part of ASCII character set, and one can actually embed it in a file, or even type it in by pressing Ctrl+G in any modern terminal emulator, e.g. in Linux or macOS. In the 80s the square-wave beep — by far the cheapest method to produce sound in electronic devices — became so ubiquitous that some programming languages made the BEEP command a part of their standard libraries. Examples are ZX Spectrum 48K built-in BASIC interpreter and even some programmable calculators! Here are sound samples generated on a HP 28S Advanced Scientific Calculator. Since the calculator uses Reverse Polish Notation (RPN), instead of typing in BEEP 3, 9 (3-second A note) as we would in ZX Spectrum, we need to type in 220 3 BEEP. Here is the A note played on an HP 28S across 4 octaves: 220Hz, 440Hz, 880Hz, 1760Hz (each subsequent octave doubles the frequency of a note):

So at this point nothing stops us from playing a tune, i.e. sequence of sounds of specific length and pitch. In the code below we took a Polish tune from 19th century titled Prząśniczka and transcribed it as a series of frequencies and lengths. 0.2 corresponds to an eigth note, 0.4 to a quarter note, 0.3 to a dotted eight, and so on. Frequencies corresponding to note pitches can be found here.

Figure 5. Score of "Prząśniczka" tune by Stanisław Moniuszko.

Here is what it sounds like. Note that some of the pitches are difficult to get right because of other factors to Arduino processing, in particular the calculations it has to do with every loop iteration. If you want pitches closer to perfect, use Arduino's built-in tone() function. It uses the actual Arduino's timers to get the right frequencies, and hides many complexities, but for the sake of our discourse I deliberately refrained from using it.

Since a PC speaker provides just one sound channel, there is no way to play chords (multiple notes played at the same time). But there are ways to simulate chords by cheating the human ear. Arpeggio is one of them; in music world it just means playing several notes in quick succession as a form of artistic expression; but in early computer music arpeggios were played fast to create the impression of chords. Listen to chords C major, A minor, D minor, G major simulated by playing arpeggios of respectively (C, E, G); (A, C, E); (D, F, A); (G, H, D) notes, with each note played for 6 hundredths of a second.

See the code responsible for this pleasant noise:

In all the examples above the ratio between the "on" and "off" states of the speaker was 1:1, i.e. 50% of the time the speaker was "on" and 50% of time the speaker was "off". This is called 50% duty cycle of a square wave. It turns out that certain properties of sound generated by square wave will change — even if we don't change the frequency itself — when duty cycle is modified. So if we apply voltage to the speaker for 20% of the time, and leave it off for 80% of the time, the sound will retain the same frequency as in 50/50 duty cycle, but the timbre will change. In this sample, we first increase the sound frequency, and then (around 0:22s) modulate the duty cycle. We then change frequency again, and modulate duty cycle again, repetitively. Hear how the perception of sound changes from more "ripe" or "full" to "shallow" or tinny" and vice versa when modulating the duty cycle.

Below are the Arduino setup and sketch I used to simulate the above frequency and duty cycle modulations with 3 potentiometers (one for coarse frequency tuning, another one for fine frequency tuning, and a third one for duty cycle modulation).

speakerModulated_small
Figure 5. Pulse Width Modulation simulated with Arduino and 3 potentiometers (one for coarse frequency tuning, another one for fine frequency tuning, and a third one for duty cycle modulation).

The techniques of controlling the frequency and duty cycle of a square wave are called Pulse Width Modulation (PWM), and are applied to many fields of computing and communications as well as computer sound. For the use in sound, 1-bit speaker and PWM techniques were — and still are — used extensively in many early home computers, including PCs, Apple II, ZX Spectrum, and other computing and non-computing devices like calculators, dumb terminals, and household appliances. Let's look at how the idea was implemented in ZX Spectrum, Apple II, and PCs.

Apple II

Even though we just put ZX Spectrum, Apple II and the early PCs in one bucket — let's call it 1-bit music generators — in fact there are differences between these platforms in how they drive that 1-channel speaker. In particular, while in the PCs the speaker is connected to a PIT (programmable interval timer) which only then is driven by the CPU, in both ZX Spectrum and Apple II the speaker is driven directly by the CPU. As a consequence, the CPUs of the two latter machines are much busier when generating sound than the PC CPU is. This has severe implications on how software was written for these platforms: code which did something else had to be executed in a disciplined manner to fit the precisely timed speaker clicks. This required a lot of ingenuity. If you prefer watching, click below to see the video which covers some of the subjects covered in this Apple II section:

Let's listen to a few examples of Apple II software using sound routines. The first one, Family Feud game published by Sharedata in 1984, simply stops all processing to play the initial tune:

With the exception of evenly spaced noise outbursts which outline the rhythm, there is nothing unusual here: a pulse train of various frequencies, all employing 50-percent duty cycle. Similarly in the next track from the game Frogger by Konami/SEGA, whose Apple II port was released in 1982, only 50-percent duty cycle square wave was used, but with a very well composed alternation between two melodies, one in the lower register and one in the higher one, which creates a pleasing counterpoint effect.

Interestingly enough, the Frogger game is about a frog, but intro tune is based on an old Japanese nursery rhyme which tells a story of a cat and dog...

As mentioned, in non-expanded Apple II there was no way to leave sound playing while engaging the CPU with other tasks. Each single speaker click required a CPU cycle, and to play the simplest A3 sound using square waves for only 1/5th of a second, 880/5 = 176 such CPU cycles are needed! (And that's just CPU work on clicking the speaker, not to count looping, decrementing counters, etc.) It should not be surprising that many devs settled on the simplest solution like the one used in Family Feud above: don't do anything else while sound is playing. Yet there were notable exceptions, one of them being Ms. Pac-Man, a 1982 game by General Computer Corporation and relased for Apple II by Atarisoft, where well-designed character sounds and tune snippets were incorporated into the game without stopping the gameplay at all (e.g. when Ms Pac-Man is eating the dots) or with minimal slowdown (e.g. when Ms Pac-Man devours a ghost). The game's soundtrack was well thought-through in all aspects; even the short initial tune is not just a square pulse train but a very clean two-sound harmony.

Namco's 1981 game Dig Dug, ported to Apple II in 1983, is amazing in even playing a melody in game, while the character was moving on the screen and doing things! Here is what it sounds like:

[... Dig Dug sound...]

As we move away from the 50-percent duty cycle, a good idea is to explore the auditory implications of using very narrow rectangular waves, i.e. ones of very low or very high duty cycle. In short, as the duty cycle approaches 0% (or 100%), the harmonic components are more evenly distributed, without the fundamental one strongly dominating which, in turn, means that a long tail of harmonic components are lost beyond human hearing ability, and beyond hardware frequency response, so relatively less of the specific sound can be heard. To explore the behavior of very low-duty cycle rectangular waves at different frequencies, I wrote a simple assembly program for the Apple II where the user is able to modulate frequency of the continuous sound with a pair of keys (arrows left and right), and modulate the duty cycle with another pair of keys (A and Z). Here is what modulating the duty cycle sounds like, and how frequency distrubution changes. Pay attention to the tall red line to the very left of the image. It represents the fundamental harmonic, and it is less and less dominating as the pulse width gets thinner:

Frequency distribution of a rectangular wave with gradually decreasing duty cycle.
Figure 9. The red lines show the distribution of harmonic frequencies comprising the blue square wave. Observe the main (fundamental) harmonic on the far right side. Its dominance over other harmonics is significantly reduced as the duty cycle gets very thin.

Here is the assembly code of my program:

Another way of looking at the effect of thin pulses is to examine the spectrogram of the progression from very low to 50-percent duty cycle. Again using my simple program I was able to generate approx. 780Hz frequency and gradually modulated the duty cycle from a very thin pulse to 50%. Listen to, and look at, the effect of this:


Spectrum analyzer view of increasing duty cycle from very low to 50%
Figure 9. Spectrum analyzer view of the progression from very thin to 50-percent duty cycle. Note how the fundamental 780Hz harmonic (the horizontal line at the bottom, right above the "440 Hz" mark) is indistinguishable from other harmonics on the very left hand side of the graph (thin duty cycle) but gets more and more pronounced towards the right of the view (closer to 50-percent duty cycle).

The 1981 game Defender uses that gradual modulation of duty cycle a lot, and the resulting spectrogram looks like a work of abstract art:

Spectrum analyzer view of Defender game intro sound
Figure 9. Defender game intro sound as viewed in spectrum analyzer.

The volume modulating effect of harmonic composition of low-duty cycle square waves on sound power was applied for example in Oscillation Overthruster, an interesting Apple II program published by David Schmenk in 2017. In Oscillation Overthruster, the sound power modulation effect was used to shape (albeit in a crude way) the ADSR (attack, delay, sustain, release) envelope of sound which is really crossing the boundaries of the platform as being able to control such envelopes pertain to much more advanced chips which we will discuss later. To achieve this control, Oscillation Overthruster starts with a very thin rectangular pulse; here is what C Major scale sounds like, and looks like, when played in this program:


C Major scale in Oscillation Overthruster
Figure 2. C Major scale in Oscillation Overthruster.

The sound sample and the animated gif show us walking over the C Major scale from C3 (approx. 7.6ms ≈ 131Hz) to C4 (3.8ms ≈ 263Hz). One can immediately notice that the rectangular pulses generating these frequencies are extremely thin; on this 2ms scale looking very much like lines. And yet as the sound fades, the pulse gets even thinner. Let's zoom in on a single pulse in the pulse train for the note A3:



narrow rectangular pulse
Figure 3. "Rectangular" pulse gets narrower and narrower and does not resemble a rectangle anymore.

We can observe that the pulse width starts at around 70µs (that's microseconds) which for the note A3 (220Hz ≈ 4.54 miliseconds) is barely 1.5% of the wavelength! And then it only gets thinner as the sound fades, all the way to less than 20µs (0.4%), before desappearing altogether. In reality the pulses (signal high) are actually very wide and the spacing between them very thin (signal low), but for the aural expression using rectangular waves the spacing can be treated as pulses and vice versa. It is of little meaning whether we are dealing with a 1.5%-wide crest and 98.5%-wide trough or the other way round. A different thing should immediately shout at us from the above gif: it is the shape of the pulse. While initially it still bears some resemblance to rectangle (even though already rounded), as it gets thinner it completely loses its rectangular features and starts resembling a triangle. To understand this, we first need to look at the scale of the oscilloscope view. One square is 50µs wide, so the leaning slope of the wave is somewhere around 10µs wide. That's 1/100,000 of a second. If we zoom out to a scale more appropriate for representing sonic frequencies (e.g. 2ms per square), that leaning line will look perfectly straight, and for all purposes it will behave as if the time to switch between high and low state was infinitesimal. Limitations of the equipment, including slow time to rise a signal itself and the capacitance of components along the way to the speaker, are starting to show at very close view as square wave distortions. These, combined with the physical properties of the speaker whose response to state change is also non-immediate, is used by 1-bit scene composers to further refine sound characteristics.

All the experiments and samples above are meant to reproduce single note at a time. The real challange faced by 1-bit composers was to achieve polyphony, i.e. be able to play multiple notes at the same time. The task seems impossible, because we are dealing with a 1-bit medium: in real world mixing two different notes results in a complex wave which is a result of addition of wave A and wave B whereas in digital world the signal can be either in high or in low state and a logical addition of signals in a given time slot results in losing that signal altogether (high state + high state = also high state). One method to overcome this was to switch between low-register melody and high-register melody frequently enough to create the impression of two separate melodies, but not frequently enough for our brain to be cheated into thinking each of these melodies is contiguous. This counterpoint effect was used in the Frogger game intro discussed earlier. Listen to it again and notice how we get the feeling of polyphony (bass and treble playing each own's melody), and yet we are fast to understand that bass and treble notes are simply played interchangeably. Another metod are faster arpeggios where the composer actually does try to cheat our brain to accept that notes are played at the same time. Examples are our Arduino arpeggio code shown earlier, or the following interpretation of Maple Leaf Rag by subLogic, the developers of Music Maker and Kaleidoscopic Maestro:



Maple Leaf Rag as played in Kaleidoscopic Maestro
Figure 9. Zoomed-in soundwaves of Maple Leaf Rag played in Kaleidoscopic Mastero. Very fast arpeggios create the impression of polyphony in our mind. Here the switch between different sound waves happens every 28ms, i.e. 35 times a second.

There are several methods of representing complex soundwaves, including polyphonic sounds, using rectangular wave trains. Pulse frequency modulation (PFM) changes the instantenous frequency of individual pulses according to the current amplitude of the analog signal while keeping pulse width constant; pulse width modulation (PWM), instead, keeps frequency of pulses constant while modulating their width; there is also pulse position modulation (PPM — moving the position of the pulse within a constant time slot). In his article The 1-Bit Instrument. The Fundamentals of 1-Bit Synthesis, Their Implementational Implications, and Instrumental Possibilities, Blake Troise explores additional two advanced ways of achieving polyphony: PPM and PIM. The first of these, Pin Pulse Method (PPM), involves reducing the duty cycle of one frequency to a very low level (below 10%), and then introducing another frequency also at a very low duty cycle. The resulting pulses will likely not colide on the signal timeline (and if they do, then not frequently), but the two different pulse trains will successfully produce two frequencies which can be discerned by our brain as two notes playing at the same time.

The second polyphony method described by Blake Troise is Pulse Interleaving Method (PIM). The method is ingenious and shows just how far the limited medium can be pushed by careful programming. Those who mastered this method in their productions have been kept in high regard in 1-bit demoscene, be it Apple II or not. In this method, a special high frequency is used to between high and low states fast enough to not allow the speaker cone to fully extend or collapse and instead keep it in one or more in-between states. Given enough in-between states one can represent a complex sine wave summation as a "stepped" rectangular wave and effectively achieve polyphony. A good way to understand how this works is to listen to, and look at, the sounds produced by RT.SYNTH program written in 1993 by Michael J. Mahon. First let's listen to the "silence" when RT.SYNTH is on but not playing anything:



Were you able to hear anything? The volume in your computer has to be way up to hear the hiss, and your hearing must be good. But the sound, a high-pitched hiss, is there. What you hear is the switching frequency which is at the boundary of human hearing (look at distance between wave crests: approx. 46µs ≈ 22kHz), is constant, and is independent of the actual sound frequency to be generated:

RT.Synth 'silence' hiss
Figure 9. This is what "silence" looks like in RT.Synth. The constant pulse is (almost) beyond hearing but enables putting the signal (and speaker) in several non-extreme states.

We already explained earlier why the "square" wave doesn't really look square; in short, we are looking at a very close zoom where the "instant" switches between high and low show actual time delay, and also the switching itself is so rapid that the signal isn't reaching the fully high or the fully low state. And this is actually one of the premises of the pulse interleaving method! Without modulating that base (or "carrier" or "phantom") frequency, modulate the pulse width with such precision that the signal (and speaker) are maintained at the maximum high, minimum low, or at several levels in between. Michael Mahon, the author of RT.Synth, aptly called his implementation of this method DAC522 to denote that his solution transforms digital signal (a stream of bytes in memory) to analog signal (frequency- and amplitude-modulated wave shape which, even though it isn't, can be approximated to a complex sine wave), and also that it is capable of 5-bit sampling resolution, and that the underlying switching frequency is 22kHz. Now let us play a sound over that hiss...



...and in wave view zoom out of the carrier frequency scale (20µs per square on the oscilloscope) to actual sound frequency (1ms per square):

Zooming out of carrier frequency to sound frequency
Figure 10. Zooming out of carrier frequency to sound frequency

Even at the highest zoom one can see the carrier frequency wave shaking up and down. By switching high and low states very fast, the programmer is controlling how far up or how far down the signal goes. When zoomed out, the purpose of that shaking becomes obvious: we are building a secondary wave made of small rectangular waves, and that secondary wave is already within human hearing frequency (the wave crests are 4.5ms apart which approximates to 220Hz, that is the note A3). The technique is not unlike controling individual pixels of a digital display (which we can't see from a distance but have precise control over) to build a coherent image (which we can see); it is just that instead of pixels the composer controls individual impulses constituting a larger wave. Mahon, in his implementation, was inspired by earlier works of Scott Alfter and Greg Templeman, but was able to refine their designs to achieve 5-bit resolution (32 discreet levels) at 22kHz carrier frequency. His detailed description about how he managed to get the timings right within a very limited number of CPU cycles is fascinating. Now, having such control over the shape of the resulting sound wave, one can adjust more than just frequency. RT.Synth uses a wavetable to provide access to 8 different voices (shapes of waves), each of which also has their ADSR envelope defined. Interestingly enough, in addition to voices mimicking instruments like accoustic piano and banjo, Michael Mahon also included voices for triangle and... square wave. Here is how RT.Synth uses square waves to generate various instruments, includng square waves!



Square waves shaped using smaller square waves
Figure 8. Square wave shaped using smaller square waves. Well, the smaller ones are only theoretically square; at this frequency the pulses are really triangles. Note how with increasing fequency fewer and fewer component waves are used to build the "high" or "low" state; at the highest RT.Synth frequency it is only four carrier waves per crest / trough!

By the way, I noticed a slight detune when listening to the same notes, played on the same RT.Synth software, but on two different machines: Apple IIe and Apple IIc. My guess is that the tiny frequency difference is not really a result of using two different Apple II models, but rather that my IIe is a European one, with 220V power supply, while the IIc is imported from the US and I'm using a 220->110V voltage converter to power it. That detune, however, creates a unison effect and is pleasant to the ear!



Detune effect visulaized in spectrum analyzer
Figure 8. The detune effect shows clearly in spectrum analyzer. The two standing wave periods that can be heard in the above sample when the second note is played show as two wave interference patterns (the two larger and two smaller pyramids at the bottom). The red line at the top is the constantly active 22kHz carrier frequency of RT.Synth. Spectrum analysis was performed in this online spectrum analyzer.

Michael Mahon made his DAC522 component available for embedding, and many developers used it in their demos. A great example is Micro Music, a compilation released in 2007 by Simon Williams (quoting the author: Approximately 30 minutes of music is contained on a single 140 KB floppy disk. Each song is comprised of 24 KB of digital audio (16 samples) and uses 1 KB of the audio stream to generate the composition.). Here is a fragment of the piece "Summer at the Berghof" played on my Apple IIe:



Note how the composer used silence to outline the beat (approx. 6 beats per second), and how noise is interwoven with short samples of instruments to arrive at this unique auditory effect:

Summer at the Berghof by Simon Williams

RT.SYNTH, even though used the brilliant method of "carrier" frequency to switch between intermediate speaker states, was not actually playing multiple notes at the same time. We filed it under polyphony discussion, because Michael Mahon found a way to represent a complex sine wave summation using square waves. A program that actually allowed playing 2 notes at the same time was Electric Duet published in 1981 by Paul Lutus. Listen to the note A1 played on its own, then A1 played with F4, and look at the corresponding waves captured on an oscilloscope:



Electric Duet sound A1 Electric Duet sound 4F Electric Duet sounds A1 and F4
Figure 8. Notes A1, F4 played individually, then A1 and F4 played together in Electric Duet.

Paul Lutus also used an underlying "canvas" frequency at the verge of human hearing (around 14kHz), and it had a similar effect of being able to put the signal in multiple intermediate states, but in addition to shaping the specific (very characteristic) sound for its Electric Duet, Paul Lutus also employed a mechanism to switch audible frequencies fast enough for the switching itself to escape human attention. The result is a very capable duophonic engine which is small, responsive to computer events without distorting the music, and which can be used in other programs. The notes on the development of Electric Duet provide fascinating insight into the programmer's challenges and writing adventure. And the results are mindblowing; listen to the following piece, Bach's Bourrée in E Minor, played on Electric Duet. It sounds really clean and is such a pleasure to listen to.



When one looks at the musical notation for Bach's Bourrée in E Minor played above, it is immediately obvious that two-voice counterpoints like these, a popular form of baroque music, are perfectly suited for two-voice Electric Duet:

Score for the Bach's piece above. Source: Wikipedia.

Lastly, here are interesting phantom waves I observed when running my frequency and pulse width modulation program on the Apple IIe hooked up to an oscilloscope. I'm not fully sure what they result from and I definitely need to dig deeper into this; for now let me just put them here for the record. The first gif shows what I observed at approx. 10ms-per-square view when modulating frequency relatively high in the audio spectrum while the second what I observed at a set (again, relatively high in human-hearable spectrum) frequency when modulating the pulse width.

narrow rectangular pulse narrow rectangular pulse
Figure 11. I don't know what these are! Anybody?

Timeline

June 1977Apple Computer, Inc. releases Apple II with a simple 1-bit speaker
November 1979Atari, Inc. releases Atari 400/800 with POKEY chip
12 August 1981IBM introduces IBM Personal Computer with a simple 1-bit speaker later referred to as "PC Speaker"
23 April 1982Sinclair Research releases ZX Spectrum 48K also with a simple 1-bit speaker
1982-1983MIDI is becoming de facto standard for exchanging information between instruments and audio devices
August 1982Commodore Business Machines releases Commodore 64 with SID 6581
15 September, 1986Apple Computer, Inc. releases Apple IIGS which has enhanced sound capabilities thanks to ES5503 chip known from Ensoniq Mirage (itself introduced 1984)
1987Roland releases MT-32 Sound Module
1991MIDI Manufacturers Association publishes the General MIDI (known as GM or GM 1) standard
March 2021Adam starts writing this article

Reference and Resources

1-bit sound

Still unsorted, but definitely worth reading; mostly Wikipedia articles