How do we remember sounds and pitches?

The following explanation is not based on scientific research, since science is still years away from understanding how the brain stores information. The explanations assume a simple model that might be compared to a computer system, one with a short- and long-term memory.

Sound Memory Introduction

While it is possible to remember complex sounds, such as a specific noise we are unable to reproduce them vocally, it is much easier to remember sounds we associate with meaning. For example, the pronunciation of foreign languages can be more easily remembered if we know the meaning of those words. If we are able to reproduce a sound vocally, then it is even easier to remember. In general, the more associations we can assign to a sound the easier it is to recall.

As a specific example, many people are embarrassed when, if they call a company to ask for some information, have a nice talk, not remembering their conversant’s name at the end in order to say thanks. I have noticed that it helps to repeat the greeting, e.g. “Good afternoon Mr. So-and- so, I’d like to ask about ...” Chances are much greater that you will remember the name at the end of the phone call. Further, if you write down the name, you can expect to remember it for a longer time, even without reviewing what you wrote. (And if the voice or name is difficult to understand, at the end of the phone call one might ask him or her to kindly repeat the name, gaining another opportunity to hear and memorize the sound or spelling of the name.)

The above example demonstrates that sound memory need not be an isolated or specialized memory, but is normally interconnected with other associations.

Sound processing

Computers process sound differently than we do. First the sound is converted from its analog waveform to sample points, similar to scanning a page digitally. The data in this sample form can be stored as actual sound or as actual image. Except for the reproduction of that sound or its picture, nothing more can be done with this kind of data. For meaning to work on a higher level the computer must abstract and convert the data into another form, via analysis of the data.

For another example, a page of text is scanned by optical character recognition. Thus scanned pixels are analyzed against character patterns. During this process, characters and words can be recognized, but in compressing the pixeled information into words, we lose information. For example, the size, font, and position on the paper are no longer applied to the text. For that information, the character recognition process must work harder, figuring out these additional attributes. However, since thousands of fonts available, and new ones are coming daily, the character recognition process must search more general descriptions, such as serif or non-serif fonts, for example. Even without this additional information, the condensed form still has the most important information: the meaning of the word, as we would find in a dictionary.

Speech recognition software is used to transform raw sound data into words and sentences. But again, we will lose information during this process, for example, volume level, pitch, speed, etc. However, after this transformation the most important information––the words themselves––are now available for further processing.

Storing these higher-level abstractions uses far less memory than storing raw data. This higher- level storage also allows us to more easily process the data; the smaller storage size can be accessed faster. For raw data, there is never an exact match; even if the same person repeats the same sentence again and again. There are almost always differences in the raw data; therefore, it is important that the information is stored in a useful way, with not too much data and not too little. These higher levels of abstractions are important for communications. We can speak to and understand people we have never met before.

Another aspect of computer processing of sounds is seen in the synthesis of sounds. Midi files are files that store musical information as events. These events are defined by the start and end time of, for example, a specific note within a piece, its volume, the instrument used, or effects applied. The actual sounds are not stored in the Midi file. It is the task of the playback device to generate the real sound. Because of this, MIDI files are very much smaller than digital audio files, and the events are also editable, allowing the music to be rearranged, edited, even composed interactively if desired (see

To generate the real sound of the piece, the playback device (on the computer) must synthesize the sound. The result of the synthesized sound depends on the sound samples stored on the playback device. When coupled with a Download-able Sounds (DLS) synthesizer, MIDI files can be combined with standardized samples of musical instruments, sound effects, or even dialogue, which are used to recreate an exact copy of the sound intended by the composer. For our purpose, the point is that sound is stored in separate sound samples instead of as a digital file containing the whole piece. We will assume that a similar effect to recall sounds exists in our brain: Sound gets broken up into snippets and attributes. These stored fragments (sound snippets) can be combined in a sequence of actions to form step by step instructions to generate a certain sound (e. g. the MIDI file only contains the instructions (actions) for a playback device to produce the sound. The sound itself gets generated in the playback device with preloaded sound snippets).

Sound memory by reproduction

The most effective way to store a sound in our brain is to reproduce the sound ourselves. The reproduction of a sound is the strongest active channel we have to it. The recognition of sounds by far exceeds the ability to produce them. But the sounds we can reproduce make out the biggest part of sounds that were considered by nature as important for survival. Therefore, recognition of this part of sound receives special treatment in our brain.

Hearing and speaking are highly interrelated, and this relationship begins during childhood. When we learn to speak, we listen to what comes out of our mouth, so a child learns to actively control the means to change an uttered sound into a desired direction. The activity of speaking involves our muscles, which can be accessed by the brain as an additional association or link to sound. Thus sounds that we can reproduce, can be processed differently by the brain than sounds we cannot. In the next paragraphs we will focus on this highly-developed area of the brain, concerning sound.

Sound memory and muscles

To be able to reproduce a sound we use our muscles, although in daily life we are not aware of using them, either for standing in place, walking, or talking. The main function of our brain is to control our muscles, and it took thousands of years for the brain to master its control and to react to such muscle requirements as walking on two feet. Our balance system, which resides in the ear, or our perception of visual changes in the environment, are actually not the primary sources of input stimuli that keep us vertical. Arguably, the necessary muscle changes that keep us standing on two feet are based on automated reflexes, which the brain controls only peripherally (i.e. they would not work if we had no brain, but they become automated, and thus the brain is not needed directly). However, the brain’s primary function is still to control muscles and the brain can overwrite reflexes.

To utter words, then, the vocal cords play an important role. If you say “Ah,” then “Oh,” you have no clue what commands your brain has given the vocal cords. Yet, the sounds come forth correctly and certainly muscles were involved. This is because somewhere your brain has stored the muscle patterns for “Ah” and “Oh.” It is capable even of finely controlling the vocal cords to express “Oh” as indicating surprise (Oh, what a surprise to meet you here) and “Oh” as indicating simplification (Oh, that’s easy, everybody can do that). Thus sounds are stored along with muscle movement information in the brain, and this stored muscle-movement data can be modified with several attributes for final production of the sound.

Since what is important in producing sound falls into the process of actively producing it, we can assume that the sound is stored only once, in the muscle memory. We need this muscle-movement information anyway, when we want to speak, so it makes sense to have the information stored primarily in this part of the memory. Storing once for use in many situations reduces redundancy and helps to clearly differentiate meanings and attributes. With the descriptive attributes, we will change and combine the stored sound snippets to produce the final sounds.

When we recognize a sound, we can analyze and categorize it, then assign it appropriate attributes with their different meanings. This breaking down in order to distinguished attributes helps us to recognize danger, or where we need to take immediate action. When in danger, our actions usually involve muscle movement; therefore, to react quickly it makes sense to have analyzed sounds stored together with fight or flight schemas in the muscle memory.

Of course, we store non-reproducible sounds somewhere as snippets. However, sounds stored in muscle memory will get priority, as they reside in the overlapping area of sounds deemed important, therefore evaluated quickly. So, our muscle memory––the brain’s primary control center for survival––allows us to think about (or reflect on) everything we put in it, with accompanying stored attributes, and to combine these in different variations. This cannot be done with raw, unstructured sounds. (e.g. we would not be able to understand people whom we have never met before).

Use muscle memory to store pitch

Since the voice does the speaking, and therefore strongly related to listening, muscle memory is the most appropriate place to store pitch sounds, too. As the brain was built to control muscles, it stores various pitch patterns at a very high precision.

Humans have little reason to think consciously about sound memory except when they have to identify or classify sounds. Keeping a sound in our memory for a few seconds allows us to compare an actual sound to a stored sound. If doing training with the Singing Funnel method, you soon notice that when you start to lose concentration, then you rely on muscle memory to continue scoring. That is, you will stop using your “inner ear” imagining the sound, and only concentrate on the producing part of the sound. That is trying to remember the position of the muscles that produced the correct answer last time. Of course, you will score if you produce the sound correctly, without imagining the sound with your “inner ear”.. This process is fine; it helps to calibrate your ”abstract note to pitch understanding” with the muscles and reinforces the relationship between pitch and muscle memory.

In the same way you can observe how professional musicians might take over a pitched sound on their instrument, humming it to keep the sound in focus longer. (you don’t have to use your “inner ear”, if you hum the sound: the muscles are in place).

Or, in other words, it uses more brain power to imagine a sound (and hear it with your “inner ear”), than to recall the states of muscle positions. Of course they are related, since if you want to hear a sound with your “inner ear” the muscle memory gets activated. But you can also start building a link from an abstract note name to muscle positions without building up the sound in your “inner ear”. Of course the goal is to strengthen your “inner ear”, however, on the way to your goal you will also include separation of different actions and attributes.

Have you ever observed a person mimicking another? She or he not only uses the voice but the muscles all over the face and body. Thousands of years after the Stone Age, muscles still play an important role––not merely for walking.

Dyslexic people can learn to differentiate right and left by remembering in which hand they hold a spoon, training in muscle memory until that association becomes a reflex.

These observations all tend to support the idea that we have muscle memory for sounds. If we did not, you could still substitute muscle memory for the sort of memory you know exists, or the researchers find which will have at least a positive effect on the recall of sounds and pitches.

Why use solfege syllables?

Fixed “Do” solfège syllables support the process of assigning pitch to singable syllables. Over time one gets better at producing correct pitches, until it is as easy as saying “Ah” or “Oh.” The Singing Funnel method lets you do exercises down to a precision of seven cents, so you can be very confident of hitting a particular pitch solely with the use of muscle memory. If you relax, the mind can more easily remain open for new information, and will allow the true sounds of solfège syllables––without processing––to enter short-term memory. Working with the Singing Funnel method allows you to receive the necessary feedback, showing when you are on the right track. Therefore you can be relaxed knowing that you sang a pitch correctly. Later, when you recall the sound, you can rely on your muscle memory. This confident and relaxed state helps the sound move from the short-term to the long-term memory. At the same time it will be processed and associated with the solfège syllable, thus the corresponding pitch. (Of course, this works only for Fixed Do.).

Fine mechanics are involved in producing a sound using the muscles in your vocal cords to control the voice. How the solfège system helps to calibrate this muscle memory can be explained with an analogous experiment, which reveals the events in this process. Follow the instructions below to learn more:

While standing, stretch out your arms horizontally. (In order to know whether they are exactly horizontal you would have to have measure with a level of some kind.).

Now do the same motion with closed eyes.

You will not be able to discern if your arms are horizontal or not, but with an external guide (person), you could learn to position them horizontally with closed eyes, or at any desired angle. The external guide would tell you to raise or lower them. You repeat this experiment until you have reached very good precision with closed eyes. Maybe not perfect to the degree, but you would achieve pretty close results. Of course, your brain uses every bit of information it can, for instance, the position of your feet. Thus the environment influences your awareness of how your arms are positioned in relation to earth. Standing on a steep hill will likely make you adjust to imperfect measurement of horizontal. But if you train in several different environments you will master the additional conditions.

So, to apply this analogously to controlling the vocal cords, you can train their muscles to produce a desired pitch. With such training, you can fine tune your vocal cord muscles precisely and produce a correct pitch in different situations. If you associate the solfège syllables to corresponding pitches, the muscle setup for each syllable will itself guide you to almost produce the correct pitch without fine-tuning. The more you train in different situations (e.g. in the morning or evening, standing, sitting, etc.), the better you will master the relationship between the vocal cord muscles and the solfège pitch.

Recognizing words

To recognize words, your brain has to compare the physical sound you hear with patterns stored in your memory. If the brain can match a sound, it will present you with its stored meaning, for example, when you are in danger, or finding food, or discerning if other humans are in the direction of a sound. But when the brain matches a sound to a word, we can write it down or repeat it. We can also repeat a sound without any meaning; actual sound pattern memory, or the memory used to recognize special noise. Still, to repeat a sound you use the vocal muscles do the work of the repetition, so the motions of the vocal cords must be stored somehow. The brain controls muscles and stores their actions in muscle memory.

Words can be repeated using the short-term, abstract sound memory (i.e. the memory that holds the actual sound patterns without their meaning) or with individual interpretations of their meaning. In most cases the brain prefers to store the abstract meaning of the word instead of using the sound pattern memory. In the computer, a stored word uses a lot less memory than it would take to store the sound of that word. When the same word is spoken by different people a different sound pattern is produced by each. The same is true for different musical instruments, which produce different sound patterns even if they play the same pitch. Loud or soft dynamics do not change either the meaning of word or the pitch. Thus the abstraction of a pitch or word makes sense to the brain and only the pattern of the muscle memory for the interpreted word gets stored.

However, the actual sound memory is separate, where sound patterns from short-term move to long-term memory. This is especially true for sounds heard repeatedly, and for which additional attributes are also stored. For example, you can hear the voice of your mother with your “inner ear,” but the majority of the sounds will not be stored as patterns, but as attributes. Dialect (region), Accent (foreign language speakers), casual vs. formal speech, omissions (swallowing endings), or attributes like croaky, harsh, hoarse, rough, cracking, sharp, etc., will be stored and assigned to a speaker. The storage of sound pattern seldom makes sense, since it makes more sense for the brain to store only the meaning. When we hear our mother with our “inner ear” we supplement the meanings with the attributes associated with her and mix them with stored raw-sounds from the long-term memory.

Of course, our parents play a major role in this process. In our early childhood, much data will be stored as raw data. Later this raw data will be used as building blocks, to understand and produce sounds. Throughout life, meanings will be associated to sounds, and we also develop the ability to differentiate other attributes. From our parents we collect and store many attributes, and at first we precisely hear our mother with our “inner ear.” However, not all data from our mother will be stored as raw data. Especially because, as we get older, the brain accumulates enough attributes to start managing and thinking using meanings instead. Thus as we grow, the brain begins to store meanings and only links them to our mother’s attributes. We might even hear our mother’s voice with our “inner ear” speaking sentences she never had actually said, for example, in our dreams. Thus we construct our sound imaginations, which can be so precise that we cannot distinguish them from real sounds.

Recognizing Pitch

From the above reasoning, we can conclude that pitches, too, get stored in muscle memory (at least as a basic attribute). People who have absolute pitch but do not actively sing, I assume, perhaps have built a relationship during childhood from the brain’s memory to at least one pitch. This may have happened during the bubbling phase, where an active, uttered pitched sound pattern got stored along with a pitch attribute. Other pitches may be determined later by relative pitch, without the necessity of actively producing the sound. Recognition of those other pitches will take place as fast as recognizing vowels, unconscious to us as a separate step. The recognition process is the same as for words: if a match can be found, we can name the corresponding note name.

The recognition process for note names can be as fast as the recognition process for words. Since a meaningful sound goes along with the stored muscle pattern, your “inner ear” will make use of this storage. Just as you can mentally hear words without speaking them, you can hear pitches in your “inner ear” without singing them. The production of a correct pitch through singing can be as easy as saying “Ah.” Now, that is slightly exaggerated; if you do not sing everyday, it is of course much more difficult to sing than to speak. For singing accurately, you must bring your vocal cords into specific positions and tension. To sing a pitch in tune you must also control your breathing by controlling the airflow through your vocal cords. Our product “Listening Singing Teacher” gives you real-time feedback on pitch accuracy and helps you to control your vocal cords. Therefore, by training your muscle memory with the Singing Funnel method, you can supply the missing musical education by remembering sounds with muscle memory, and improve your listening skills to have absolute pitch.

Since the only way we can produce a sound is through our vocal cords, it follows that we hear pitches with our “inner ear” from their muscle memory. The process has to include the activation of our muscles, so it also makes sense that pitch gets stored in the muscle memory as well. Since the brain likes to store information as attributes, the timbre of an instrument, for example, will be stored in a separate attribute. Using our “inner ear,” the brain tries to match our associated solfège syllables (or the sound of another instrument for that purpose, if we have trained during childhood) to the real sound. If a match is found, the recognition process is complete.

Since singing is an active experience, we can improve it by training. If we sing correctly, and tap into our muscle memory, we will recognize pitches correctly. Since the brain likes to store information in meaningful categories, it is capable of isolating the pitch attribute of any instrument by comparing it to the pitch attribute of the voice, or the pitched solfège syllables. Pitches outside the singing range need additional brainpower to be recognized. To recognize pitch without singing is very difficult to accomplish after the brain has switched from raw data collection to the more powerful meaning and attribute system; the latter system tends to get in the way. Of course, your brain can still collect raw data, but it will do so to a much lesser extent. You must use special techniques to overcome this situation (e.g. hypnosis, or other psychological relaxation methods that reopen the mind for raw data collection).

But the brain is always open for raw data, which can be shown with learning foreign languages. You can repeat sounds of a new language with little or no difficulty; however, the transfer to the long-term memory takes longer. Since the brain cannot assign known attributes to a new sound, it needs repetition and corresponding stimuli (visual, textual, and other sensory stimuli) in order to assign meaningful attributes to the sound. Otherwise the brain will refuse to store the “meaningless” sound in the long-term memory. Again, for learning a foreign language, the active speaking of the words helps the learner tremendously.

The Singing Funnel method used with the voice is much easier and effective than, say, training with a piano, as the learner’s progress is better shown immediately and tracked in a way that stimulates further practice. This learning process can be accelerated, as any other learning process, by multiple stimuli, especially through activities. Activities highly related to the subject will bring the fastest results and, in the case of pitch recognition, singing is the best.

Recognizing pitches is similar to recognizing language. We learn our mother tongue automatically, just as, if exposed to pitch exercises, or we have a special relationship to a musical instrument during childhood, we are able to recognize pitches later. But if you missed that opportunity, you would learn pitch recognition much like learning a foreign language. What actively speaking is to language, singing is to pitch recognition. Similarly as for learning languages, it gets more difficult to learn absolute pitch as we get older, but it is never too late.

Of course, past a certain age, the production of a correct pitch through singing is not as easy as saying “Ah.” The recognition process takes longer, but if you want a challenge for the second half of your life, for example, learning to recognize pitches is good for the brain in warding off memory problems, dementia, and even Alzheimer’s disease. The Singing Funnel method can help you by structuring your learning process with step-by-step feedback and tracking your progress.

Recognizing pitch is one of the first steps in one’s journey to becoming a musician. Almost all of us will be able to recognize pitch even though we may not know the respective technical jargons. The variation in the musical notes is nothing but the variation in the pitch. If you are able to follow a tune and if you are able to reproduce it then it simply means that your brain can notice the variation in the pitch. This is a very basic understanding of pitch and how our ears are already attuned to the variation in pitch.

However, for someone that is aspiring to be a musician, this basic understanding will not suffice. You should know how to recognize pitch and how to achieve absolute pitch. For some this comes naturally but for others ear training is required. Nonetheless, it is possible for everyone to recognize pitch with appropriate training. If you are keen on recognizing pitch and if you want to achieve absolute pitch, we have the right solution for your needs. You will be able to recognize pitch easily and even the minute variations in pitch by training yourself through the use of ear training software program or absolute pitch training software. This is very user-friendly software that can be easily used by anyone. As the software makes use of advanced techniques on ear training, you will not only be able to train your ears to recognize pitch easily but you will also be able to close the gap considerably between the absolute pitch and your perceived pitch. Here is a very cost effective solution that you could consider for all your ear training needs.

Even if you should think that your ability to recognize pitch is very poor, you do not have to worry, you can improve your listening skills through the use of our ear training software. You will be able to improve progressively by making use of the ear training software that we have specially designed for you. Download your free trial of our software and start recognizing pitch accurately. You will be surprised with your own improvement. The best part is that you will own your copy of ear training software for life at a very nominal price. You will be able to train your ears at your own convenient pace. You are just a few clicks away from recognizing pitch.



For another ear trainings method see our product Listening Music Teacher:

To do ear training to the cent without singing we have made exercises in our product: Listening Music Teacher. For example the exercise “Ear training 12 cents” will present you with a sound, which is perfectly tuned, and a second sound, which is 12 cents higher or lower or is the same. Your task is to find out if the second sound is higher or lower or the same. For small deviations this task gets very challenging. Especially if a longer pause between the sounds is made, the task gets really ambitious. With Listening Music Teacher you can also learn to hear triads and seventh chords, something not taught by Listening Ear Trainer. Visit

Want to learn to sing with feedback on pitch and rhythm? Visit


Macintosh and OS X are trademarks of Apple Computer Inc. IBM PC is trademark of International Business Machines Inc.Windows XP/Vista/7 is trademark of Microsoft Inc. Listening Singing Teacher, Listening Music Teacher, The Listening Ear Trainer, The Red Pitch Dot ,The Colored Pitch Line, The Counting Hints Line, The Half Step Change Hints Line, The Notation Hints Line, The Half-Step Brackets, The Precision Listening Method, The Singing Funnel Method, The Octave Anchor Pitches Method,The Interval Overtone Method, The Pitch Keeper Method, Absolute Pitch Point, Same Pitch Please, Pitch Ability Method, Pitch Grid Test and PitchBlitz are trademarks of AlgorithmsAndDataStructures, F. Rudin. All other company and product names are trademarks or registered trademarks of their respective owners.