NEWS         COMPOSER         PIANIST         CONTACT        BIO


VOGELSPRACHEN: Birdsong, language and Music

English version

Project funded by/Projekt gefördert von:


For almost four years I have been working on the connection between music and language. In doing so, I embarked on a search for characteristics that belong to the common origin of these two types of communication. The interaction of animals that have developed a natural gift for communication allows direct observation of both systems and shows how similar they are at their elementary level. In 2011,  for the first time in the history of behavioral research, ethologist Toshitaka Suzuki discovered that there are also non-human animals that possess a rudimentary syntax. It was already known that songbirds can send messages through sequences of sounds - but - the components of these sequences usually have no meaning as standalone elements. Dr. Suzuki proved that an Asian bird species (the Parus minor, or japanese tit) possesses a sort of vocabulary and that they can combine words into their calls to convey more complex information. According to Dr. Suzuki and his associates, Japanese tits can send (and understand) complex messages by composing various existing tone sequences. While each individual tone sequence (or, in this case, word) has a special meaning, the combination of several words carries a new, more complex message.Moreover, such messages can evoke quite precise impressions: For example, in the case of snake alarm calls, the mental image of a snake arises in the minds of all listening Japanese tits.The Japanese tit is the first animal species in which such a strong affinity with our communication has been demonstrated.

In 2019, thanks to a grant from the Goethe Institute, I was able to meet T. Suzuki and observe his fieldwork in person. During my research in the field, I observed Japanese tit with Mr. Suzuki and was able to conduct small experiments with acoustic input. In the process, I learned of new aspects of this interaction that are in the demonstration phase and have not yet been disclosed.
I attended some conferences and delved into this subject, covering such topics as theories of language (theory of formal languages, context-free grammar, etc.), topics from cognitive science (theory of mind) to the anatomy of the connection between the brain and the peripheral system. By doing so, I noticed that there is a lot of discovery going on in this direction these days, but also that scientists have rarely considered musical aspects in their research and likewise involved music experts. On the other hand, few musicians have fully exploited the potential of this subject for their art. Birdsong has indeed fascinated many composers, including famous figures such as Olivier Messiaen, John Cage and Heitor Villa-Lobos, but it remained in their music mainly as a citation or source of inspiration for their musical ideas (motivic, rhythmic, tonal color, etc.).

The main focus of my research is only on the form and structures underlying such interactions; the acoustic-melodic component of birdsong does not play an important role.The idea behind the project is to develop new systems for composing new music, in which the structural peculiarities of birdsong are the fundamental element of all creative processes.
By creative processes, I refer not only to how the sound material (pitch, rhythm, etc.) is organized, but also to how the music is fundamentally shaped and structured.


1.1 Music and language have the same roots

First and foremost, it is necessary to explain how i connect my musical project to Dr. Toshitaka Suzuki extensive research on the parus minor and the periparus ater species, and why I consider birdsongs a source of inspiration for music composition and for the understanding music on a different level. This project stems from the assumption that there is a strong link between communication - in our case vocal communication - language - and music. Both language (intended as the faculty or ability of speech) and music are universal and characteristic for humans.

I assume that music is a byproduct of our communication skills and that it developed alongside language; a diverging path between the two started when our utterances started to become more complex (both semantically and phonologically) and, as a consequence, our vocal communication started developing the recursive traits that make it unique among all animal species.  At some point, music detached from verbal communication, taking on the following features: a) it became a participative moment, b) an activity involving other parts of the body rather than only the vocal apparatus (singing, playing percussive instruments, dancing) c) started developing around different areas than those of a language (a recurring grammar with recurring sounds based on formants and explosive/fricative sounds achieved by our oral cavity muscles) such as pitch, rhythm and timbre d) a vehicle to express more complex, rather abstract feelings that words couldn’t yet convey / or to express them in a more intense way.

In fact, still up to our times, music remained strictly linked to its performance. To support this thesis, it is necessary to add that at different moments, graphic signs were invented to ‘store’ and preserve overtime both language and music; however,  texts doe not need to be converted back into sounds to convey their message and, in some cases, they are even more powerful or precise as graphic elements,  whereas music notation isn’t enough to transmit their musical ideas (not even for someone who can innerly read a score) until it is played or sung.

All these mentioned features are indeed connected between each other and can be seen as the consequence of one another. Charles Darwin’s hypothesis that music developed as an agent of sexual selection, as in other species, could be one of the reasons that triggered this divergence.

1.2 Why this research project?

Although there are no scientific studies on that, there are still certain features of our languages that share common traits with certain musical gestures.  Languages became highly complex and refined communication systems over thousands of years of evolution, reason why it is difficult to find those common roots even in the most accurate and recent linguistic analysis.

Communicative traits can be partly found in music but are mainly lost too, due to its extreme development though ingenious compositional systems, the invention and the refinement of instruments and to a constantly (still ongoing) increase of its performance techniques. If there is one possibility to find those common traits, it is to look for them in the most basic layers of our expression, the ones that involve less rational, rather basic needs and feelings.

1.3 Leoš Janáček

Sporadically, composers tried to study this connection: just to mention a renowned   example, the Czech composer Leos Janacek did an extensive research work on this over decades.

Janacek says: The tones, the tonal cadence of human speech, of all living creations, contain the most profound truths for me. i.e. the fundamental feeling (additional identity information about the signaller) that the speaker adds to the meaning of the words.

He collected and recorded excerpts of human speech in various situations, including in musical notation, grouped them in speech intonation patterns and analysed the material in relation to the psychological condition of the speaker.

Those melodic motives, or speech melodies as he used to call them,  became then  part of his compositional technique and appeared in both vocal and orchestral parts of his pieces to characterise a specific person or a specific situation.

1.4 Avian communication: why?

In this context, non-human communication can be a rich source of inspiration to understand these traits. A still debated question among scientists is whether the abilities underlying language and music find their origins in a modification or extension of general cognitive abilities for processing auditory input also present in other species. If so, comparative studies of nonhuman animals should reveal similarities in processing abilities.

There are striking similarities between the development of birdsong and human speech: similarities have been found in:

  • vocal learning processes, and the neurobiological aspects behind it distinguishing and producing sounds of the new language is difficult without an early exposure to the foreign language

  • in some neurobiological aspects, 1) where discrete neural circuitry and auditory feedback is essential in normal learning (for instance, auditory feedback is necessary for the maintenance of stereotyped song in adult zebra finches) / 2) the neuroplasticity behind those processes
  • The processes that lead from the acquisition of auditory memory (i.e.  an acoustic input) to goal-directed sensorimotor learning (activate the specific muscles in a complex combination of movements to reproduce the acoustic input previously heard)

  • Birds (and also some mammals like whales, seals, and chimps) can refer to external objects. Animal signals have long been considered expressions of the emotional state of signallers, leading to a simple dyadic relationship between signallers and receivers, but recent research demonstrated the opposite.

  • The supposition that, as groups of animals increase in social complexity, there is a need for increased vocal complexity.

    Songbirds also have diverse life-history strategies, having filled a wide array of ecological and acoustic niches during evolution, which supports the idea that social complexity in avian groups is high enough to suppose a complexity in their communication.

Thanks to the extraordinary research done in the last years (among which the outstanding revelations made by Dr. Toshitaka Suzuki), we can now do a greater comparison between these mentioned ‘basic’ layers of our interactions and non-human species (apes, birds etc.) that developed complex communication systems.


2.1 Similarities and differences:

In this section I will explain what the main aims of my research project are. The research part of my project consists in identifying those features (and, the possible evolutionary steps that led to their development) of avian communication, that can be compared to human communication.  This following section contains some of the most relevant observations about how birds songs and call repertoire evolved, from a structural and acoustical point of view.

2.2 Semantics:

The first area of interest is the research on the semantics of bird calls. I’m interested in considering the acoustic features of birdcalls in relationship with their alleged semantics and compare them to those found in human utterances that belong to the same or to a comparable semantic area. The features I take into account are volume, pitch, timbre, presence of noise, and the gesture of a given sentence or syllable, i.e. increase or decrease of certain parameters within it.

Firstly, I considered a number of semantic areas that can be found human and non-human species:

  • danger/scare
  • empathy/socialization
  • identification/self-conscience  
  • anger/aggression

Commonly identified types of bird calls are aggressive calls, alarm calls, distress calls, contact (affiliative) calls, nest calls and begging calls.

Secondly, I then intend to proceed finding similarities and differences between semantically related utterances. There are some key factors that might have shaped bird utterances in a different way in comparison to ours:

  • Environmental sounds: Environmental sounds affect vocalisations: some birds promptly react to changes in environmental noise, although only singing loudly and not changing the minimum frequency of their vocalisations.
    Example: sometimes a noisy syllable is being held for less than pitched sounds, or repeated faster, so that it won’t be confused with other surrounding sounds.

  • Habitat constraints: In some habitat environments, communicating individuals can only rely on acoustic inputs.
    Example: passerine birds living in thick woods cannot see other conspecifics (either their position, nor their beaks opening).
    In human communication, the visual feedback  (such as lip movements) plays an important role in understanding a message.

  • Efficience of transmission: The link between sound and meaning could need to be either very clear or otherwise disguised for ecological reasons, such as being inconspicuous to avoid predation. 
    Examples: a) the fact that in some cases birds try not to be localised by preys or b) that they try to get around some physical properties of the predators like in the case of snakes, which lack ears but senses certain frequencies through their quadrate and columella bones. We actually don’t know whether snakes perceive birds mobbing calls, Some alarm calls are also relatively loud but very short, making them harder to localize, which is critical in the presence of potential predators.

2.3 Alarm calls

I find Alarm Calls of particular interest, because in humans they are detached from any syntactical logic (by being immediate reactions to unexpected danger elements) and they can be compared to 0-merge systems* in linguistic words.I identify utterances of the danger/scare semantic areas with dramaturgically intense moments within pieces of music. These can be climaxes, dramatic virtuosisms etc.

The factors mentioned before are among the reasons why birds’ alarm calls distinguish to human alarm utterances. Many of those calls differ in quality and gesture from their respective human utterance. This is the case of short alarm calls or alarm calls that focus on particular frequencies (to avoid being heard by predators) which differ from our alarm calls.

To mention some other Examples, related to Mr. Suzuki’s research, Japanese tits (Parus minor or Shijukara in Japanese) use different calls for different predators that pose different threats. Also, siberian jays (Perisoreus infaustus) use specific anti-predator calls for hawks depending on predator behavior.  A number of factors played a key role in the evolution of these utterances, highlighting noticeable differences with our utterances in similar semantic areas.

Other similar Examples: Brief, soft repetitive notes of low frequency are attraction calls. Harsh sounds emphasising low frequencies are threat sounds. It is interesting to see that these ‘rules’ hold for many bird species.

2.4 Compositionality of birdcalls as a set of rules per se:
There have been extensive research in the last years about the structure of birdcalls, with particularly astonishing discoveries like the presence of a syntax in the utterances of some birdspecies. Also the organization of a syllable, such as the order of its constituent notes, has also been shown to influence receiver responses.There are particular rules found in recent studies on some particular birdspecies, that differ from our (human) approach to compositionality/sequentiality in language and music:

  • There seem to be a particular importance on the first and last syllables of the zebra finch (Taeniopygia guttata) songs and calls, but never on those in between. In some cases, the initial and final syllables are the only feature necessary to the understanding of the meaning, but not what’s in the middle of the sequence. In zebra finches and other species’ songs, 1st and last syllables are sometimes unmodified ‘flight’or ‘take-off’ calls. This highlights a crucial difference in how composers structured their music in the past centuries: a melody or a 12-tone row are strict order-based sequences.

  • As a second example, in black capped chickadees (Poecile, whose calls are usually composed by 4 different syllables/notes), as the number of A, B, and C notes increase in a call, the number of D notes that might occur decreases. Therefore, there seems to be a constraint on the overall number of notes that can occur in an average call.
    However, calls with extremely large numbers of D notes are more common than expected by chance, suggesting that the constraints on introductory notes are relaxed when calls contain many D notes.

  • In other species such as black capped chickadees (Poecile atricapillus), the presence of key syllables, but not their order, is sufficient to drive responses.

  • Another example is, receivers respond differently to song phrases containingdifferent length and repetition rate of individual syllables or notes. Those are all fascinating and surprisingly mathematical rules that can be applied to music, at different levels.

2.5 Shared semantics

It has recently been discovered that in some cases, birds from different species that inhabit the same environment, can discern and understand messages from calls sent by the other species, as in the case of the willow and japanese tits (Poecile montanus and Parus minor).

Those two species form mixed-species flocks. Both species can produce particular calls (recruitment calls) that attract both conspecific and heterospecifics. Experiments led by Dr. Suzuki and his team indicated that the responses of Japanese tits are not due to the acoustic similarity of the recruitment calls of willow tits and conspecifics, but rather because they perceive them as two distinct vocalizations with a shared meaning, AND that they respond to those vocalizations only if the sequence of the syllables is in the right order.

This means that they recognize it as heterospecific vocalization. Further experiments on the parus minor proved that if the call is acoustically tweaked (with an audio processing software), apparently the other species don’t respond. This point is important as we can suppose that there are some specific acoustical features of a syllable or of a sequence, whose meaning different bird species can agree on. This shows instead similarities to how all humans across space and cultures tend to converge on the interpretation of certain utterances (also non-acoustic ones such as particular hand gestures or facial expressions).

2.6 A bird’s perspective and the acoustics of bird calls:

How birds perceive-hear songs: there is ample anecdotal evidence that birds are consistently more sensitive than humans to at least some aspects of their song.  Zebra finches, (Taeniopygia guttata), particularly, seem to be extremely sensitive to temporal fine structure, what we would call timbre or color of a sound in both synthetic stimuli and natural vocalizations. These finches have very harmonic vocalizations which are rich in fine structure (as opposed to whistles and pure tones).  Interestingly, from casual listening to these same complex stimuli, humans generally hear changes in syllable order but are quite insensitive to syllable reversals—just the opposite of the songbirds.

This finding, together with recent research highlighting the complexity of zebra finch vocalizations across contexts, raises interesting questions about what information zebra finches may be communicating in temporal fine structure. Together these findings show there is an acoustic richness in bird vocalizations that is available to birds but likely out of reach for human listeners.

There could be a work to be done, where the most recent spectrum analyses of birds’ vocal repertoire (for example on the zebra finches in 2018) can be used  as reference parameters to create music.Variations in the utterance and therefore probably in the meaning conveyed by certain birds is to be found the spectral envelope that can be explained in terms of formants produced both by the syrinx and the vocal tract of the bird.

2.7 Bird *songs*

Another very interesting aspect is the following: whereas there is nowadays a more detailed knowledge about birdcalls such as alarm, begging, contact, etc., we surely know much less about the songs that bird sing to mark territory or during the mating season, as the components of those calls have less direct references, such as a predator, a threaten, food or other similar concrete tangible factors that catch their attention.

Songs have rarely been studied in the context of compositionality. This is largely because individual notes and/or syllables have generally been considered meaningless on their own. Detailed observations from certain species (Parus minor) and, even zebra finches, suggest that some song components do indeed have independent meanings.

Mating calls and songs are driven and therefore shaped by less clearly identifiable factors and convey less evident-tangible messages, which brings us back to the point where I talked about music and its detachment from verbal communication.

Male fairy-wrens (Malurus cyaneus) often produce songs immediately after hearing the calls of grey butcherbirds, (Cracticus torquatus), a species that preys upon small birds, such that the combined vocalizations resemble a duet. One possibility is that including an alarm call influences receiver responses by increasing their attention before hearing the song phrase. These are elements that can be easily compared or adapted to music perception (at a subconscious level) as well.

2.8 Dance and movement

In thinking about how birdsong might sound to a bird’s ear, a better analogy than human language or music might be dance. When we learn a dance routine, getting the sequence right is necessary for getting the moves right. Missing a transition can cause the structure of an individual move to fall apart. But the watcher does not extract much information from the order of the moves. The audience is rather focused on other elements such as the acrobatics, rhythm and variety of the movements rather than the sequences in which they occur.
This may be the same for birdsong. From the perspective of a bird producing the song, getting the sequence right can be essential for getting the “moves” right. But for the bird listening, what is most important may be the individual moves themselves and their peculiar constitutive qualities.


3.1 Applying the observations:

In the third and last part, I make some examples concerning how I am currently converting, translating and integrating these observations into my music and trying to develop systems to compose new music pieces. I will show some examples of birdsong (spectrograms) and then small excerpts from the score of an ensemble piece I’m currently writing.

3.2 Examples applied to my music:

  • As previously mentioned, meaning-related differences between our and birds’ vocalisations:

1) alarm calls are relatively loud but very short, making them harder to localize (for prayers) / in contrast to our calls that are loud but long in order to be perceived (since we have a different, less performing auditory system).Alarm calls are easily comparable to particularly intense moments in music (for instance climaxes) - it might be its direct transposition in musical gestures.

2) brief, soft repetitive notes of low frequency are attraction calls.

  • the sequentiality rules: In the Following example(s):here the only starting and ending notes are fixed, whereas the sequence order is different in every variation:

  • The repetition of certain elements of a phrase to change the information conveyed / in contrast to our musical gestures and forms that tend to have a peak-alike structure to convey meaning (often using loudness to express intensity). In the following examples, three different musical elements (A, B, and C) are presented twice in the same sequence but in different repetitions. This process can be applied to a number of different musical parameters.

In some birdspecies, the presence of more notes (or acoustic elements) within the same sequence of notes denotes one or another meaning and the repetitions of introductory notes and final notes.

  • The focus on fine-time structure (fine details in the quality of the sounds) instead of strict sequences/sequence rules. This point connects to the following section (5)

  • I also used the software Orch-idea (a software developed by IRCAM as a computer-aided orchestration tool, controlled by MAX/MSP) which is able to analyse a sound source, compute it and provide to the composer different orchestration solutions. I tried to input some high quality audio recordings of some of the aforementioned bird calls (among which also those of the zebra finch) to see what parameters the software would translate into the output file and eventually incorporated some sections/ideas into my own compositions. In the following videos you will able to listen excerpts from this process (the input source, the outcome and the two audios together):


4.1 Birds

A list of the birdspecies I mentioned in this description and their scientific names:

  • Willow Tit (Poecile montanus) コガラ

  • White-crowned Sparrow (Zonotrichia leucophrys) ミヤマシトド

  • Zebra Finch (Taeniopygia guttata) キンカチョウ

  • New Zealand bellbird (Anthornis melanura) ニュージーランドミツスイ

  • Siberian jays (Perisoreus infaustus) アカオカケス

  • Chickadees (Poecile atricapillus) アメリカコガラ

  • Southern pied babbler (Turdoides bicolor) シロクロヤブチメドリ

  • Budgerigars (Melopsittacus undulatus) セキセイインコ

  • Carolina chickadees (Poecile carolinensis) カロライナコガラ

  • The varied tit (Sittiparus varius) ヤマガラ

  • Pied Flycatcher (Ficedula hypoleuca)

4.2 Bibliography:

Nordeen KW, Nordeen EJ. Auditory feedback is necessary for the maintenance of stereotyped song in adult zebra finches. Behav Neural Biol. 1992 Jan;57(1):58-66. doi: 10.1016/0163-1047(92)90757-u. PMID: 1567334.

Suzuki, TN. Animal linguistics: Exploring referentiality and compositionality in bird calls. Ecological Research. 2021;36:221–231.

Suzuki, Toshitaka & Griesser, Michael & Wheatcroft, David. (2019). Syntactic rules in avian vocal sequences as a window into the evolution of compositionality. Animal Behaviour. 151. 10.1016/j.anbehav.2019.01.009

Elie JE, Theunissen FE. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals. Anim Cogn. 2016 Mar;19(2):285-315. doi: 10.1007/s10071-015-0933-6. Epub 2015 Nov 18. PMID: 26581377; PMCID: PMC5973879.

Sound sequences in birdsong: how much do birds really care? Adam R. Fishbein, William J. Idsardi, Gregory F. Ball and Robert J. Dooling, Philosophical Transactions of the Royal Society B: Biological Sciences

Elie JE, Theunissen FE. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals. Anim Cogn. 2016 Mar;19(2):285-315. doi: 10.1007/s10071-015-0933-6. Epub 2015 Nov 18. PMID: 26581377; PMCID: PMC5973879

Jack P. Hailman, Millicent S. Ficken, Combinatorial animal communication with computable syntax: Chick-a-dee calling qualifies as ‘Language’ by structural linguistics, Animal Behaviour,Volume 34, Issue 6

Suzuki, Toshitaka & Griesser, Michael & Wheatcroft, David. (2019). Syntactic rules in avian vocal sequences as a window into the evolution of compositionality. Animal Behaviour. 151. 10.1016/j.anbehav.2019.01.009

Sabrina Engesser and Amanda R. Ridley and Simon W. Townsend, Meaningful call combinations and compositional processing in the southern pied babbler, Proceedings of the National Academy of Sciences, 2020

Suzuki TN, Wheatcroft D, Griesser M. Experimental evidence for compositional syntax in bird calls. Nat Commun. 2016 Mar 8;7:10986. doi: 10.1038/ncomms10986. PMID: 26954097; PMCID: PMC4786783

Suzuki, Toshitaka. (2014). Communication about predator type by a bird using discrete, graded and combinatorial variation in alarm calls. Animal Behaviour. 87. 10.1016/j.anbehav.2013.10.009.

Ondracek, Janie & Hahnloser, Richard. (2014). Advances in Understanding the Auditory Brain of Songbirds. 10.1007/2506_2013_31.

Morisaka, Tadamichi & Okanoya, Kazuo. (2009). Cognitive tactics of Bengalese finch ( Lonchura striata var. domestica ) for song discrimination in a go/no-go operant task. Journal of Ethology. 27. 11-18. 10.1007/s10164-007-0074-8.

Animal Communication Is Not the Same As Human Language, FROM THE LECTURE SERIES: THE STORY OF HUMAN LANGUAGE, July 2, 2020 Communication, Language, Linguistics. By John McWhorter, Ph.D., Columbia University
Michael Griesser, Referential Calls Signal Predator Behavior in a Group-Living Bird Species, Current Biology, Volume 18, Issue 1, 2008, Pages 69-73.

Hailman, J. P., & Ficken, M. S. (1986). Combinatorial animal communication with computable syntax: Chick-a-dee calling qualifies as "language" by structural linguistics. Animal Behaviour, 34(6), 1899–1901.

Fishbein, A.R., Prior, N.H., Brown, J.A. et al. Discrimination of natural acoustic variation in vocal signals. Sci Rep 11, 916 (2021).

Krams I, Krama T, Freeberg TM, Kullberg C, Lucas JR. Linking social complexity and vocal complexity: a parid perspective. Philos Trans R Soc Lond B Biol Sci. 2012 Jul 5;367(1597):1879-91. doi: 10.1098/rstb.2011.0222. PMID: 22641826; PMCID: PMC3367703.

Okanoya, K., & Dooling, R. J. (1990). Song-syllable perception in song sparrows ( Melospiza melodia ) and swamp sparrows ( Melospiza georgiana ): An approach from animal psychophysics. Bulletin of the Psychonomic Society, 28(3), 221–224.
Toshitaka N. Suzuki, Calling at a Food Source: Context-Dependent Variation in Note Composition of Combinatorial Calls in Willow Tits, vol. 11, Ornithological Science, N.2, The Ornithological Society of Japan, 10.2326/osj.11.103.

Animal syntax Toshitaka N. Suzuki and Klaus Zuberbühler Current Biology 29, R663–R682, July 22, 2019 © 2019 Elsevier Ltd. R671

Lucas, Jeffrey & Freeberg, Todd. (2007). “Information” and the chick-a-dee call: Communicating with a complex vocal system. 10.1093/acprof:oso/9780198569992.003.0015.