This paper explores how three cognitive and perceptual cues, vocal iconicity, resemblance-based mappings between form and meaning, and segment position and lexical stress, interact to affect word formation and language processing. The study combines an analysis of the word-internal positions that iconic segments occur in based on data from 245 language families with an experimental study in which participants representing more than 30 languages rated iconic and non-iconic pseudowords. The pseudowords were designed to systematically vary segment and stress placement across syllables. The results for study 1 indicate that segments used iconically appear approximately 0.26 segment positions closer toward the beginning of words compared to non-iconic segments. In study 2, it was found that iconic segments occurring in stressed syllables and non-iconic segments occurring in the second syllable were rated as significantly more fitting. These findings suggest that the interplay between vocal iconicity and prominence effects increases the predictive function of iconic segments by foregrounding sounds, which intrinsically carry semantic information. Consequently, these results contribute to the understanding of the widespread occurrence of vocal iconicity in human languages.

When people listen to speakers of a completely unfamiliar language, they tend to be astonished by how the speakers are able to parse the rapid bombardment of speech sounds into meaningful components. Although some languages are spoken at a somewhat faster pace than others, the experience of listening to unfamiliar languages is shared by most people. This suggests that spoken languages are equally adept at conveying information at the same time as having a great deal of flexibility in how they convey information (Shcherbakova , 2023). This variation of features and structure occurs on multiple levels and includes, for example, which speech sounds are applied, which combinations of speech sounds are used as words and affixes to represent meaning and morphological information, and which order speech sound combinations can be placed in. Even when listening to a language they are familiar with, it is difficult to comprehend how language users perceive acoustic stimuli, divide them into meaningful chunks, analyze the collective meanings conveyed, and cognitively and motorically prepare their response, often before the speaker has completed their utterance. This study investigates the interaction between different cognitive and perceptual cues that language users employ to exchange information through spoken language, including vocal iconicity (resemblance-based mappings between aspects of form and aspects of meaning; Winter , 2023), segment prominence (in the word-initial position and stressed syllables), and how this interaction might affect word formation, language change, and language processing.

An essential factor contributing to efficient and swift information transfer is that language users rely on various cognitive and perceptual shortcuts to anticipate meaning in advance. At the lexical level, context can help resolve ambiguity between homophonous words such as distinguishing between I need the dough and I knead the dough. This process is further influenced by phonotactic constraints—permissible phoneme sequences in a language—and the likelihood of words occurring within specific contexts. Therefore, I need a book is a more probable interpretation of [aɪ ni:d ə bʊk] than I knead a book. Context is also pivotal in distinguishing points of reference. For instance, when someone says I love you, both parties understand that I refers to the current speaker and you refers to the listener. However, if the former listener becomes the speaker and uses I in a sentence, the word now refers to another person. This principle applies to all deictic words such as here and there, this and that, and now and then.

The order in which information is presented also impacts context processing. For example, the sentence the old man the boat (man “a human male” vs man “to supply (something) with staff or crew”) begins in a way that usually leads the recipient to an incorrect interpretation, which forces new parsing attempts. This underscores how language users start processing the information and making assumptions about their intended meaning well before all information has been transferred. However, this process also hinges on how information is conveyed. In sign language, it is possible to use visual, tactile, and auditory modalities for producing signs, which allows for different types of information to be processed simultaneously. Spoken languages, on the other hand, are mostly linear because only a single speech sound can be produced or received at the same time with the exceptions of co-speech gestures and prosody.

Spoken word recognition models generally agree that when the onset of a word is heard, a set of words in the mental lexicon with the same initial segments strive for activation (Norris , 2016). The activated words then compete as more acoustic information about the word becomes available until a final word is recognized, contingent on the lexical and auditory context (Luce and Pisoni, 1998). Hence, word onsets are pivotal bcause they are used as initial cues for narrowing the pool of possible word candidates based on the acoustic signal. For instance, English jealous [dʒɛləs] shares initial sound sequences with many other words like jet [dʒɛt] and generator [dʒɛnəɹeɪtə(ɹ)]. Consequently, jealous is more difficult to recognize than the similar-sounding word zealous [zɛləs], which shares initial sound sequences with fewer English words, leading to fewer possible competitors. However, occurring at the onset of words is not the only way to increase the prominence of a single segment or word section. Stressed syllables can result in more precise phonetic realization, rendering them more memorable and easily perceptible (Cutler and Norris, 1988), although stress patterns vary by language.

There are even more direct ways of linking sound to meaning as certain speech sounds or acoustic and articulatory features involved in the production of speech sounds inherently evoke specific meanings. For instance, the English word cuckoo sounds similar to the distinctive call of the bird, at least to human ears. Similarly, words that contain rounded speech sounds, such as bamboo, are perceived as rounder than those without such sounds, such as stem. These (vocal) iconic associations, which involve resemblance-based mappings between aspects of form and aspects of meaning (Winter , 2023), exist across various levels of the linguistic system. Whereas the strength of these associations may vary among languages, a significant proportion of them are present in most languages.

Experiments have demonstrated that language users can elicit iconic associations across numerous semantic domains that belong to the fundamental subsection of the lexicon such as shape (Ćwiek , 2022), size (Vainio and Vainio, 2022), and color (Johansson , 2019). Similar associations have been observed throughout the broader lexicon within languages (Winter and Perlman, 2021) and on grander scales when comparing words with the same meaning across unrelated languages (Blasi , 2016; Erben Johansson , 2020; Joo, 2020). These typological patterns are further corroborated by the fact that sounds used iconically tend to remain stable over time and are reintroduced in words to express iconic meanings after being lost as a result of sound changes (Dellert , 2021; Flaksman, 2017; Johansson and Carling, 2015). Collectively, this has resulted in a substantial catalog of associations between speech sounds and meanings, which has demonstrated that iconicity plays a prominent role in human language, particularly in the evolution of words over time.

Furthermore, just as other cognitive and perceptual shortcuts operate in tandem to predict meaning, it is possible that iconicity also interacts with these factors. There are few previous studies that have explicitly studied the placement of iconic segments within iconic words, but it is possible that such an interaction could improve lexical access (Eddington and Nuckolls, 2019; Erben Johansson and Cronhamn, 2022; Kawahara , 2008; Shinohara and Uno, 2022; Sidhu , 2020). The associations between physical/visual size and vowel quality exhibit greater strength in the first vowel compared to the second vowel of a word (Huang , 1969). Cross-linguistically, specific word positions can be reserved for certain types of sound-meaning associations (Wichmann , 2010), a phenomenon observed in Japanese ideophones (Hamano, 1998)— a marked lexical class representing sensory imagery such as kira kira approximately translating to “glittery” (Dingemanse, 2019).

By addressing the following questions, this paper explores whether iconic segments, when given heightened prominence, lead to more robust iconic associations:

  • How does the placement of iconic segments within words impact the strength of their iconic associations?;

  • does the presence of stress on iconic segments influence the strength of their iconic associations?

This is accomplished by combining an analysis of iconic segment placement in previously established cross-linguistic sound-meaning associations with an experimental study which analyzes collected judgments of iconicity strength for pseudowords, in which the placement of segments and lexical stress are systematically varied.

The objective of this study is to gain a general understanding of whether a prominence effect exists in cross-linguistically confirmed iconic concepts despite language-specific phonological differences.

Large cross-linguistic studies on vocal iconicity are limited, and information about stress placement in the analyzed lexical material is scarce. Therefore, only the position of iconic segments within words was investigated. The data provided by Erben Johansson (2020) were used as material for the analysis and words were compiled for 344 meanings or concepts in 245 languages and language families. The data were transcribed according to the International Phonetic Alphabet with minor deviations, followed by grouping speech sounds into 42 sound groups of varying granularity based on acoustic and articulatory features, e.g., rounded vowels, high front vowels, low back unrounded vowels, voiced consonants, labial consonants, and voiceless alveolar stops. A series of analyses was then conducted to determine if any sound group was statistically overrepresented in any of the 344 concepts.

It was found that 125 noteworthy sound-meaning associations could be established, but a few concepts included many fine-grained distinctions not relevant for the present study, i.e., gender-specific distinctions for words referring to a speaker's kinship relation based on the speaker's gender and grammatical clusivity in first person plural pronouns. Consequently, associations involving M_ms (“mother, male speaking”), MF_ms (“mother's father, male speaking”), F_ms (“father, male speaking”), and 1ple (“first person plural exclusive”) were chosen for the present study, whereas associations involving M_fs (“mother, female speaking”), MF_fs (“mother's father, female speaking”), F_fs (“father, female speaking”), and 1pli (“first person plural inclusive”) were excluded. The final selection for the present study, therefore, resulted in 106 sound-meaning associations, which involved 51 concepts and 16 sound groups.

Control concepts were selected by extracting data for the 16 sound groups across the 344 concepts, calculating the variance among the medians of posterior distributions, and selecting the 10 concepts with the least variance, i.e., the concepts that showed the smallest over- and underrepresentations across the 16 sound groups. Several of the concepts with the least variance were kinship terms and replaced by non-kinship term concepts to ensure that the control group featured a wide variety of lexemes. The final control group consisted of after, before, boy, grow, in front of, old man, old woman, sit, sweet, and three.

The data were analyzed by calculating the first occurrence of the associated sound group, i.e., the first position of the word that the sound group was found in, for all words in the dataset. For instances where a concept was represented by multiple words, such as English he, she, and it, which all refer to the concept 3sg (third person singular pronoun), an average based on all words representing the concept was calculated.

All statistical analyses were conducted using the R statistical software version 4.3.1 (R Core Team, 2023) with the tidyverse, caret, and lmerTest R packages (Kuhn, 2008; Kuznetsova , 2017; Wickham , 2019). A linear mixed model was fitted using restricted maximum likelihood (REML) to predict the first occurrence of the sound group (dependent variable) for each sound-meaning combination. As for predictors, the analysis considered whether the sound-meaning combination was iconic or a control combination as a fixed effect while controlling for language, meaning, sound group, and word length (random effects): first occurrence ∼ congruence + (1/language) + (1/meaning) + (1/sound group) + (1/word length). The results showed that the sound groups in iconic sound-meaning combinations occurred significantly closer to the beginning of words than the control combinations did [β = –0.2566, SE (standard error) = 0.0612, df (degrees of freedom) = 35.3288, t = –4.139, p < 0.001***]. The model's explanatory power was moderate (R2 = 0.2).

For each sound-meaning association, this pattern becomes evident by comparing the average first occurrence of the associated sound group with the average first occurrence of the same sound group across the ten control concepts. For example, the concept to blow was iconically associated with the labial consonant sound group. The average first occurrence of labials across the words for to blow, which was1.7, was compared to the average first occurrence of labials across the words for the selected control concepts, which was 2.73. This comparison showed that labials in words for to blow occurred 1.03 positions closer to the beginning of the words than in words for the non-iconic control concepts. As displayed in Fig. 1, in all but six sound-meaning associations, the same sound group occurred closer to the beginning of words in the iconic concepts compared to the control concepts.

FIG. 1.

Average word lengths for lexemes which correspond to sound-meaning associations (gray) and average first occurrence of the associated sound group compared to the average first occurrence of the same sound group across the control concepts (black).

FIG. 1.

Average word lengths for lexemes which correspond to sound-meaning associations (gray) and average first occurrence of the associated sound group compared to the average first occurrence of the same sound group across the control concepts (black).

Close modal

However, it is also evident that the average word lengths of the words correlate with this trend, at least to some extent—i.e., iconic concepts displaying a larger position difference compared to control concepts also tended to have shorter word lengths. This trend likely stems from the fact that words for iconic concepts are often shorter due to Zipfian effects (Baumann and Winter, 2018). For instance, pronouns tend to have short word lengths due to their frequent usage in everyday speech. Young children also learn iconic words earlier than less iconic words, and children and adults speaking with children use iconic words more frequently in conversation (Perry , 2017). Therefore, the exact role of word length in potential prominence effects remains uncertain and will be experimentally investigated along with another prominence effect, stress, in a controlled environment in study 2.

For the experimental study, iconic concepts identified by at least one major cross-linguistic study on lexical and grammatical vocal iconicity were selected (Blasi , 2016; Erben Johansson , 2020; Erben Johansson and Cronhamn, 2022; Joo, 2020; Wichmann , 2010). The concepts were also selected to represent different major parts of speech, including adjectives, nouns, verbs, and relational concepts, such as deixis, kinship, and so on.

For each of the four parts of speech, the three iconic concepts most frequently represented across the previous studies were selected. The chosen adjective and verb concepts were small, round, flat, to bite, to blow, and to suck. As for noun concepts, breast, ashes, knee, tongue, and nose appeared in at least three major cross-linguistic studies. Ashes was excluded because body parts were seen as more fundamental because of their inherent inalienability. Tongue, associated with laterals, was chosen over nose, associated with nasals, to introduce greater variety in iconic segments. The seven relational concepts appeared in at least two of the studies, but 1sg was selected instead of the other pronouns (1pl, 2sg, this, and who) as it featured in all major cross-linguistic studies, along with the kindship concepts mother and father.

Furthermore, because to blow and to suck have multiple interpretations in English, e.g., to blow could mean “to blow out air with the mouth” and “to hit,” and as I is orthographically similar to the number 1 and the letter l, these meanings had to be specified as “to blow (with mouth),” “to suck (with mouth),” and “i or me.” In addition, three basic vocabulary concepts, i.e., meanings that are presumed to be rather culturally neutral with no known iconic associations were added as control concepts: new (adjective), foot (noun), and to swim (verb).

Certain selected concepts were associated with more than one iconic segment or with broader phonetic characteristics, such as nasal consonants, in general; see Table I. To ensure balanced associations between sounds and meanings, a single iconic segment was chosen for each concept. Round was paired with /u/ instead of /r/, and knee was paired with /u/ instead of /o/ or /k/ as these associations were found across more studies. Mother was paired with /m/ rather than /a/ because father was also associated with /a/, and father was paired with /b/ over /t/ as the association with /b/ was based on stronger statistical evidence (Erben Johansson , 2020). Although breast's associations with /m/ and /u/ were found in three studies, /m/ was selected to diversify from /u/, which was already used for three other iconic concepts. Last, each of the three control concepts was paired with typologically common segments, which are also seldom involved in iconic associations: new with /g/, foot with /e/, and to swim with /j/.

TABLE I.

Selected sound-meaning combination stimuli, along with corresponding part of speech category and which cross-linguistic studies the iconic associations have been found in. In the task, to bite, to blow, and i were specified as to bite (with mouth), to blow (with mouth), and i or me.

Adjective Noun Verb Relational Control
/tʃ/  /m/  /k/  /n/  /g/ 
Smalla,b,c  Breasta,b,c,d  To bitea,c  i or mea,b,c,d  new 
/u/  /u/  /u/  /m/  /e/ 
Rounda,b  kneea,b,c,d  To blowb,c  Motherb  foot 
/a/  /l/  /u/  /b/  /j/ 
Flata,b,e  Tonguea,c,e  To suckb,c  Fatherb  To swim 
Adjective Noun Verb Relational Control
/tʃ/  /m/  /k/  /n/  /g/ 
Smalla,b,c  Breasta,b,c,d  To bitea,c  i or mea,b,c,d  new 
/u/  /u/  /u/  /m/  /e/ 
Rounda,b  kneea,b,c,d  To blowb,c  Motherb  foot 
/a/  /l/  /u/  /b/  /j/ 
Flata,b,e  Tonguea,c,e  To suckb,c  Fatherb  To swim 

The selected segments were then used to construct a series of stimuli words. These pseudowords were designed to be as phonologically unmarked as possible given the constraints imposed by the selected sound-meaning associations. To enable contrastive stress placement, the pseudowords were designed with the simplest possible syllable structure—resulting in a composition of two syllables consisting of a consonant followed by a vowel (CVCV). Each target segment could only occur once in each pseudoword while the remaining three segments consisted of the mid-central vowel [ə] and the voiceless glottal fricative [h], both of which are seldom featured in vocal iconic associations. [ə] was chosen specifically because it occupies the center of the vowel space and frequently appears as an unstressed version of various vowels. Whereas no real consonant equivalent of [ə] exists, [h] was selected because it often arises from a transitional state of the glottis and is significantly influenced by adjacent sounds. Four different versions of pseudowords were created for each target segment, introducing variations in segment and stress placement; see Table II. This resulted in 44 recordings in total as some of the 11 associated segments were shared between the concepts, as shown in Table I. All stimuli words were then audially recorded by a trained male linguist with extensive background in phonetics and phonology to mitigate potential orthographic bias (Cuskley , 2017). All recordings were between 0.85 and 1.5 s long.

TABLE II.

Stimulus word template showing the variation of segment and stress placement. Iconic consonant segments are represented by /n/ and iconic vowel segments are represented by /u/.

Segment placement Stress placement Consonant Vowel
Initial  Initial  ˈnəhə  ˈhuhə 
Final  Initial  ˈhənə  ˈhəhu 
Initial  Final  nəˈhə  huˈhə 
Final  Final  həˈnə  həˈhu 
Segment placement Stress placement Consonant Vowel
Initial  Initial  ˈnəhə  ˈhuhə 
Final  Initial  ˈhənə  ˈhəhu 
Initial  Final  nəˈhə  huˈhə 
Final  Final  həˈnə  həˈhu 

The participants were recruited based on self-reported first language to enhance the linguistic diversity of the sample (Blasi , 2022). A total of 20 participant groups, each comprised of 15 participants, were recruited online,1 collectively representing 11 language families and 30 languages. As the number of available participants for each group varied, some groups encompassed an entire language family while others encompassed major branches or subbranches; see Table III). Including these branch-based groups amplified the linguistic diversity of the sample and, to mitigate potential genetic bias, groups representing languages from the same family were chosen to be geographically dispersed when possible. For instance, Slovene was preferred over Polish to minimize proximity to Baltic languages. From the initial pool of 300 participants, 37 individuals had to be replaced due to noncompliance with instructions.

TABLE III.

Included participants by self-reported first language and corresponding language family and (sub)branch.

Language family Branch, language (number of participants)
Indo-European  Germanic, English (15); Italic, Portuguese (15); Slavic, Slovene (15); Baltic, Latvian (9), Lithuanian (6); Graeco-Phrygian, Greek (15); Celtic, Scottish Gaelic (2), Welsh (13); Iranian, Western Farsi (11), Northern Kurdish (1), Pashto (3); Aryan, Hindi/Urdu (15) 
Sino-Tibetan  Cantonese (5); Mandarin (10) 
Uralic  Hungarian, Hungarian (15); Finnic, Finnish (7), Estonian (8) 
Afro-Asiatic  Arabic (15) 
Austroasiatic  Vietnamese (15) 
Turkic  Turkish (15) 
Austronesian  Greater Central Philippine, Filipino/Tagalog (15); Malayo-Sumbawan, Indonesian/Malay (15) 
Koreanic  Korean (15) 
Dravidian  Malayalam (4); Tamil (8); Telugu (3) 
Atlantic-Congo  Swahili (10); Twi (4); Wolof (1) 
Japonic  Japanese (15) 
Language family Branch, language (number of participants)
Indo-European  Germanic, English (15); Italic, Portuguese (15); Slavic, Slovene (15); Baltic, Latvian (9), Lithuanian (6); Graeco-Phrygian, Greek (15); Celtic, Scottish Gaelic (2), Welsh (13); Iranian, Western Farsi (11), Northern Kurdish (1), Pashto (3); Aryan, Hindi/Urdu (15) 
Sino-Tibetan  Cantonese (5); Mandarin (10) 
Uralic  Hungarian, Hungarian (15); Finnic, Finnish (7), Estonian (8) 
Afro-Asiatic  Arabic (15) 
Austroasiatic  Vietnamese (15) 
Turkic  Turkish (15) 
Austronesian  Greater Central Philippine, Filipino/Tagalog (15); Malayo-Sumbawan, Indonesian/Malay (15) 
Koreanic  Korean (15) 
Dravidian  Malayalam (4); Tamil (8); Telugu (3) 
Atlantic-Congo  Swahili (10); Twi (4); Wolof (1) 
Japonic  Japanese (15) 

Study 2 was conducted online during July of 2022.2 Participants could access the study using a personal computer, mobile device, or tablet. The task, which took approximately 8 min to complete, began with participants being informed about the purpose of the study: to explore how individuals perceive unfamiliar words. They were informed that they would listen to brief recordings of “exotic” words, each lasting around 1 s, and would then rate them. On completion of the task, participants were reimbursed £1.2. Participants were first told about the nature and goals of the study, provided informed consent by agreeing to terms and conditions, informed that they could withdraw from the task at any time, as well as told that their submissions would be anonymous. The participants then had to answer general background questions about sex, age, which country or countries they grew up in, which language(s) they spoke growing up, and which language(s) they speak today. They were next presented with an audio recording of each stimulus word one at a time. The stimuli words were presented in blocks of four, grouped by associated meanings. The presentation orders of the entire blocks and stimuli words within those blocks were randomized. After having played a stimulus word, the participants were asked to rate the word according to how well they thought it fit the corresponding meaning. They used a scale ranging from “not at all” (0) to “very much” (100). The participants had the option to replay the audio recording multiple times before submitting their rating. Only after submitting a rating could they proceed to the next stimulus word. For instance, a block could start with the instruction “Please rank the following words according to how well they would fit the meaning small.” The participants were then presented with one of the four associated stimulus words—namely, [ˈtʃə.hə], [ˈhə.tʃə], [tʃə.ˈhə], and [hə.ˈtʃə]—which were represented solely by a play button and accompanying audio recording. Next, the participant listened to the recording and rated it based on the statement “This word sounds like it could mean: small.” This process was repeated for all four stimuli words within the block. The participants then proceeded to the next block and repeated the same process until all stimuli words had been rated.

A linear mixed model was fitted using REML to predict the participants' ratings (dependent variable) of the pseudowords based on six predictors. Fixed effects included congruence (iconic sound-meaning combinations vs control sound-meaning combinations), position (target segments placed in the first vs second syllable), and stress (target segments occurring in stressed vs unstressed syllables). Random effects included participant, reported first language, and the meaning for each sound-meaning combination with random intercepts specified for each: rating ∼ congruence + position + stress + (1/participant) + (1/language) + (1/meaning). No significant correlations were observed between the fixed predictors and the model's explanatory power was moderate (R2 = 0.33); see Table IV. Whereas the average rating for the control stimuli was 36.93 and the average rating for the iconic stimuli was 43.39, there were no significant effects found for congruence (β = 6.4591, p = 0.202). There were no significant effects for position either (β = 0.5718, p = 0.086), and position 1 and position 2 stimuli had an average rating of 41.81 and 42.38, respectively. However, target segments occurring in unstressed syllables, which had an average rating of 41.09, were rated significantly lower than those occurring in stressed syllables, which had an average rating of 43.1 (β = –2.0124, p < 0.001), indicating that stress has a positive effect on ratings; see Fig. 2.

TABLE IV.

Regression analysis for the fixed effects in the full dataset, iconic subset, and control subset models.

β SE df t p
Full dataset           
Intercept  37.44  4.43  14.74  8.45  <0.001*** 
Congruenceiconic  6.46  4.80  13.00  1.35  0.202 
Positionposition 2  0.57  0.33  17 684.01  1.72  0.086 
Stressunstressed  −2.01  0.33  17 684.01  −6.05  <0.001*** 
Iconic subset           
Intercept  44.36  2.56  15.87  17.36  <0.001*** 
Positionposition 2  0.28  0.38  14 087.00  0.74  0.458 
Stressunstressed  −2.55  0.38  14 087.00  −6.78  <0.001*** 
Control subset           
Intercept  35.58  1.93  11.38  18.40  <0.001*** 
Positionposition 2  1.74  0.66  3296.00  2.64  <0.01** 
Stressunstressed  0.16  0.66  3296. 00  0.24  0.815 
β SE df t p
Full dataset           
Intercept  37.44  4.43  14.74  8.45  <0.001*** 
Congruenceiconic  6.46  4.80  13.00  1.35  0.202 
Positionposition 2  0.57  0.33  17 684.01  1.72  0.086 
Stressunstressed  −2.01  0.33  17 684.01  −6.05  <0.001*** 
Iconic subset           
Intercept  44.36  2.56  15.87  17.36  <0.001*** 
Positionposition 2  0.28  0.38  14 087.00  0.74  0.458 
Stressunstressed  −2.55  0.38  14 087.00  −6.78  <0.001*** 
Control subset           
Intercept  35.58  1.93  11.38  18.40  <0.001*** 
Positionposition 2  1.74  0.66  3296.00  2.64  <0.01** 
Stressunstressed  0.16  0.66  3296. 00  0.24  0.815 
FIG. 2.

Distributions of ratings based on stress (stressed, dark gray; unstressed, light gray), congruence (iconic and control), and position (position 1 and position 2). Mean ratings for each group are indicated by points.

FIG. 2.

Distributions of ratings based on stress (stressed, dark gray; unstressed, light gray), congruence (iconic and control), and position (position 1 and position 2). Mean ratings for each group are indicated by points.

Close modal

To ascertain whether position and stress yielded distinct effects on the iconic and control stimuli, two additional models were fitted, one using only the data for the 12 iconic sound-meaning combinations and one using only the data for the 3 control sound-meaning combinations. In these models, the dependent variable was the participants' ratings; position and stress were used as fixed effects, whereas participant, reported first language, and meaning were used as random effects. Again, there were no significant correlations between the fixed predictors and both models had similar explanatory power, R2 = 0.34 and R2 = 0.43, respectively. The iconic subset model yielded results analogous to the full dataset. The average rating for position 1 was 43.25 compared to 43.53 for position 2, but there was no significant effect (β = 0.2796, p = 0.458). A significant effect was found for stress (β = –2.5543, p < 0.001), where the average rating is 44.67 for stressed and 42.11 for unstressed. In contrast, the control subset model showed no significant effect for stress (β = 0.1550, p = 0.814), where the average rating is 36.85 for stressed and 37 for unstressed, yet, it surprisingly unveiled a significant effect for position (β = 1.7406, p < 0.008), where the average rating for position 1 is 36.06 and position 2 is 37.8. In other words, target segments occurring in the second syllable were ranked higher than those occurring in the first syllable. This suggests that the impact of stress is exclusively attributable to the iconic stimuli, whereas the effect of position is exclusive to the control stimuli. The datasets, scripts, and audio stimuli files are openly available online.3

This paper investigates the potential impact of two types of prominence—segment and stress placement—on lexical iconicity. It employs an analysis of established iconic sound-meaning associations from cross-linguistic natural language data combined with an experiment involving the rating of iconic and non-iconic pseudowords by a diverse group of participants.

The analysis based on the natural language data showed a tendency for sounds to occur more toward the beginning of words when they occurred in words for iconically congruent concepts compared to occurrences in words representing non-iconic control concepts. However, this effect was particularly pronounced in words with short average word lengths such as personal pronouns. When this was tested experimentally using pseudoword stimuli that kept word length fixed at four segments in all conditions, no position effect could be found for the iconic sound-meaning combinations. However, there was a strong stress effect for the iconic stimuli, whereas no such effect could be found for the control stimuli. In addition, the average ratings for the iconic stimuli were higher than those for control stimuli, albeit not significantly so.

This means that the participants judged words with iconically paired sounds and meanings as more fitting than the control stimuli but also when the syllables containing the iconic sounds exhibited greater prominence through stress, they were perceived as more iconic. Because no comparable effect was found for the control stimuli, it can be assumed that sounds occurring in stressed syllables do not constitute more fitting labels for any paired concept, at least not to the same degrees as congruently paired iconic concepts.

Interestingly, a reversed position effect was found for the control sound-meaning combinations, i.e., when non-iconic sounds occurred in the less prominent second syllable, it was judged as a better fitting sound for the paired concept than in the cases where the same sound occurred in the more prominent first syllable. One possible interpretation of this is that the presence of the target sounds in the less prominent position is perceived as less objectionable. In the second syllable, the target sound is obscured by the adjacent filler sounds ([h] and [ə]) in the first syllable to a higher degree, and the incongruency between the target sound and paired meaning is less accentuated than it would be in the first position. A counterargument to this would be that no analogous negative association was found for stress. However, as no stress effect was found for the control stimuli, although a positive effect was found for the iconic stimuli, a possibility of a hidden position effect that could potentially influence the iconic stimuli as well is left open. Confirming the existence of such an effect is challenging, but designing nonwords to incorporate a variety of filler segments—comprising randomized typologically common speech sounds rather than solely [h] and [ə]—could provide a substantial number of unique nonwords. This would diminish the relative markedness of the target segments and could be used to test this effect.

The correlation between iconicity and prominence effects does not necessarily provide insights into the underlying causality. This parallels the situation where speech sounds commonly found in cross-linguistic iconic associations are the same sounds that tend to exhibit high stability (Dellert , 2021), i.e., they have higher diachronic survivability rates. However, it remains unclear whether these sounds are frequent in iconic associations due to their stability or if they become more stable as a result of their iconicity.

From a synchronic perspective, prominence effects can make segments more memorable and easier to perceive (Akita, 2020; Cutler and Norris, 1988) and, diachronically, this can prevent phonetic erosion. Segments and syllables that stand out act as important cues for lexical access, implying that iconic segments are better positioned to preserve congruence with their associated iconic meanings when located in more prominent syllables. This is supported by the fact that function words, which are generally prosodically inconspicuous, also tend to be less iconic (Perry , 2015). In combination with the increased linguistic learnability that iconicity provides (Imai and Kita, 2014; Massaro and Perlman, 2017; Nielsen and Dingemanse, 2021), a positive feedback loop could ensue, whereby iconic segments become more memorable and diachronically stable. In turn, this is further amplified by the fact that iconic segments are particularly suitable for conveying certain meanings because they already carry semantic information. Although these advantages might be minor, they could help explain the prevalence of iconicity in human language.

One possibility is that iconic segments could attract prominence effects to the syllable they occur in, thereby bolstering survivability of the iconic segments. Alternatively, when an iconically congruent segment ends up in a syllable with high prominence, be it through sound change, meaning change, or coinage, the syllable's diachronic stability would also increase its survivability. This does not mean that iconic segments would survive forever when these conditions have been met, but it would be more likely for a new iconically congruent segment to survive the next time it is introduced. Regardless, the correlation between iconicity and prominence effect on the one hand and correlation between iconicity and sound stability on the other hand ought to affect the organization of segments within words and how they change over time. For instance, the current findings suggest that iconic segments tend to occur toward the onset of words. This could be explained by an interplay between stress, which prevents iconic segments from eroding, the fact that words generally erode more at the end as initial segments are important for lexical access (Wedel , 2019), and the fact that stress prevents iconic segments from eroding. Consequently, this dynamic may lead to a gradual repositioning of iconic segments and syllables toward word onsets through regular language change, enhancing their predictive capacity (Ussishkin and Wedel, 2009). For example, the potentially iconic segment /s/ in English lisp has been moved closer toward the beginning of the word compared to Old English wlisp, and the rounded feature of the segment /w/ in Proto-Balto-Slavic *apwilas “round” has moved to the initial segment of contemporary Slavic languages, producing an /o/, e.g., Czech oblý “round.” Although this general pattern may hold true for individual iconic segments, it is important to recognize that various types of sound-meaning mappings involve combinations of sounds, thereby underscoring some of the limitations inherent in the present study. The apparent next step in delving deeper into this matter would involve analyzing the position of segments in words for iconic meanings diachronically. By tracking the segment position of iconic elements over time and considering factors such as meaning change, word coinage, and loans, a more comprehensive analysis of the impact of iconicity on language change could be achieved. Regrettably, this approach also limits the analysis to the few language families with extensive historical attestations, like Indo-European, thereby reducing the scope of linguistic diversity.

For study 1, there are alternative methods for analyzing the placement of iconic segments within words. For instance, instead of calculating the first position of the word that the iconically associated segment was found in, a binary distinction based on whether the segments are word-initial or not could be used. However, given that consonants are more frequently situated in word-initial positions than vowels across languages, an approach capable of addressing the methodological challenges associated with this should be employed.

For comparability reasons, only a single sound was paired with each meaning in study 2. However, as evident from the cross-linguistic analysis in study 1, many of the iconic concepts were iconically associated with multiple sound groups. Although the iconic sounds used for the stimuli words in the experiment were selected based on various cross-linguistic studies, the strength of iconicity between the meaning and associated sound groups can vary. It is possible that some of the iconic sounds were not optimal choices for the present minimalistic design. For instance, although /u/ was paired with knee, /k/ might have been a more effective choice, especially when using a single iconic sound. Some concepts which were iconically associated with more than one sound group are presumably also sensitive to the word-internal order of iconic sounds. For example, whereas to blow is associated with both rounded vowels, e.g., /u/, and labial consonants, e.g., /p/, the combination /pu/ is probably a better label for evoking the concept to blow than /up/ is. Thus, the selection of segments for investigating sound-meaning associations ought to be improved if it were based on the iconic associations' conditions rather than symmetrical study design.

The stimuli words used were designed to isolate iconicity and prominence effects by using a simple CVCV structure, wherein the target sounds were surrounded by phonetically opaque filler segments. As a result, some participants might perceive the stimuli words as not sufficiently word-like (Styles and Gawne, 2017). It should also be noted that the selected concepts were intended to encompass various parts of speech categories; however, these categories exhibit formal and functional variation across languages, and certain categories might be better equipped to convey iconicity based on their semantic transparency (Erben Johansson and Cronhamn, 2022; Winter , 2023). The stress patterns of the participants' languages could also affect how the participants perceived the stimuli words. Those speaking languages with fixed stress could be less sensitive to stress shifts, but this also suggests that the effect found for stress might actually be a conservative estimate.

Furthermore, it is always challenging to completely mitigate the influence of culturally dominant languages. Despite the inclusion of participants from over 30 languages and representing 11 language families, the experiment's scope only captures a fraction of the world's linguistic diversity. Conducting the experiment in English unavoidably introduced some level of linguistic and cultural transfer. Therefore, conducting similar experiments in the participants' first languages should be encouraged. The increasing degree of linguistic diversity observed in recent years on platforms used for participant recruitment and other online resources, such as the growing number of multilingual corpora, will hopefully also help mitigate these types of issues. Nevertheless, this would undoubtedly require more resources and preparation to conduct such experiments.

This paper integrates an analysis of cross-linguistic lexical data from 245 language families with an experimental study which included participants representing more than 30 languages to assess how prominence effects, such as lexical stress and word-internal segment position, influence iconic associations between sound and meaning. The results showed that iconic segments occurring in stressed pseudoword syllables were judged as significantly more fitting for their paired iconic meanings than those occurring in unstressed syllables, and there is a cross-linguistic tendency for iconic segments to occur toward the beginning of lexemes.

Given that prominence effects are linked to resistance against phonetic erosion and contribute to enhanced memorability and perceptibility of speech sounds, it can be hypothesized that vocal iconicity leverages these effects. This phenomenon may contribute to the enduring presence of iconic segments over time and increase the likelihood of their incorporation when new words are coined. These factors, conjointly, ought to increase the predictive function of iconicity by foregrounding sounds which intrinsically carry semantic information as well as affect how words change and are introduced diachronically. This makes iconicity an expedient strategy for accessing the meaning of words and helps explain its widespread occurrence in human languages.

Future studies should explore more variations and combinations of associations between sounds and meanings, a more comprehensive array of iconic concepts, the diachronic development of iconic segments, as well as more nuanced association types, including the distinction between iconic vs indexical associations. An even more linguistically diverse participant pool would also be beneficial, and linguistic transfer from English could be limited by translating the instructions in the experiments to each participant's first language. The participants' first languages could also represent more nuanced types of prominence effects, such as various stress patterns and tone, but also whether the languages have predominately prefixing or suffixing morphonology as this might affect lexical recognition (Cutler , 1985; Dryer, 2013).

This study was conducted thanks to funding granted by The Swedish Research Council (Grant No. 2020-06398).

The author has no conflicts to disclose.

The study was conducted following the guidelines formulated by The Swedish Research Council. In accordance with the Swedish law [Svensk författningssamling (Swedish Code of Statutes) 2003: 460 Sec. 16], all participants were informed of the nature and purpose of the study, and by clicking the corresponding box, confirmed that they were willing to take part in it and for their data to be published anonymously. In accordance with the Swedish Act concerning the ethical review of research involving humans (SFS 2003: 460), the present study was exempt from the requirement for ethical approval.

The data that support the findings of this study, including scripts, data, and audio stimuli files, will be openly available in Open Science Framework at http://doi.org/10.17605/OSF.IO/3MS56.

1

Through prolific.ac (Last viewed July 12, 2023).

2

Via gorilla.sc (Last viewed July 12, 2023).

1.
Akita
,
K.
(
2020
). “
A typology of depiction marking: The prosody of Japanese ideophones and beyond
,”
Stud. Lang.
45
,
865
886
.
2.
Baumann
,
S.
, and
Winter
,
B.
(
2018
). “
What makes a word prominent? Predicting untrained German listeners' perceptual judgments
,”
J. Phon.
70
,
20
38
.
3.
Blasi
,
D. E.
,
Henrich
,
J.
,
Adamou
,
E.
,
Kemmerer
,
D.
, and
Majid
,
A.
(
2022
). “
Over-reliance on English hinders cognitive science
,”
Trends Cognit. Sci.
26
,
1153
1170
.
4.
Blasi
,
D. E.
,
Wichmann
,
S.
,
Hammarström
,
H.
,
Stadler
,
P. F.
, and
Christiansen
,
M. H.
(
2016
). “
Sound–meaning association biases evidenced across thousands of languages
,”
Proc. Natl. Acad. Sci. U.S.A.
113
,
10818
10823
.
5.
Cuskley
,
C.
,
Simner
,
J.
, and
Kirby
,
S.
(
2017
). “
Phonological and orthographic influences in the bouba–kiki effect
,”
Psychol. Res.
81
,
119
130
.
6.
Cutler
,
A.
,
Hawkins
,
J. A.
, and
Gilligan
,
G.
(
1985
). “
The suffixing preference: A processing explanation
,”
Linguistics
23
,
723
758
.
7.
Cutler
,
A.
, and
Norris
,
D.
(
1988
). “
The role of strong syllables in segmentation for lexical access
,”
J. Exp. Psychol.: Hum. Percept. Perform.
14
,
113
121
.
8.
Ćwiek
,
A.
,
Fuchs
,
S.
,
Draxler
,
C.
,
Asu
,
E. L.
,
Dediu
,
D.
,
Hiovain
,
K.
,
Kawahara
,
S.
,
Koutalidis
,
S.
,
Krifka
,
M.
,
Lippus
,
P.
,
Lupyan
,
G.
,
Oh
,
G. E.
,
Paul
,
J.
,
Petrone
,
C.
,
Ridouane
,
R.
,
Reiter
,
S.
,
Schümchen
,
N.
,
Szalontai
,
A.
,
Ünal-Logacev
,
Ö.
,
Zeller
,
J.
,
Perlman
,
M.
, and
Winter
,
B.
(
2022
). “
The bouba/kiki effect is robust across cultures and writing systems
,”
Philos. Trans. R. Soc. B
377
,
20200390
.
9.
Dellert
,
J.
,
Erben Johansson
,
N.
,
Frid
,
J.
, and
Carling
,
G.
(
2021
). “
Preferred sound groups of vocal iconicity reflect evolutionary mechanisms of sound stability and first language acquisition: Evidence from Eurasia
,”
Philos. Trans. R. Soc. B
376
,
20200190
.
10.
Dingemanse
,
M.
(
2019
). “ 
‘Ideophone’ as a comparative concept
,” in
Ideophones, Mimetics and Expressives, Iconicity in Language and Literature
, edited by
K.
Akita
and
P.
Pardeshi
(
Benjamins
,
Amsterdam
), Chap. 1, pp.
13
34
.
11.
Dryer
,
M. S.
(
2013
). “
Prefixing vs. suffixing in inflectional morphology
,” in
The World Atlas of Language Structures Online
, edited by
M. S.
Dryer
and
M.
Haspelmath
,
Leipzig
, available at http://wals.info/chapter/26 (Last viewed July 4,
2023
).
12.
Eddington
,
D. E.
, and
Nuckolls
,
J.
(
2019
). “
Examination of manner of motion sound symbolism for English nonce verbs
,”
Languages
4
,
85
.
13.
Erben Johansson
,
N.
,
Anikin
,
A.
,
Carling
,
G.
, and
Holmer
,
A.
(
2020
). “
The typology of sound symbolism: Defining macro-concepts via their semantic and phonetic features
,”
Linguist. Typol.
24
,
253
310
.
14.
Erben Johansson
,
N.
, and
Cronhamn
,
S.
(
2022
). “
Vocal iconicity in nominal classification
,”
Lang. Cogn.
15
(2),
266
.
15.
Flaksman
,
M.
(
2017
). “
Iconic treadmill hypothesis
,” in
Dimensions of Iconicity, Iconicity in Language and Literature
, edited by
A.
Zirker
,
M.
Bauer
,
O.
Fischer
, and
C.
Ljungberg
(
Benjamins
,
Amsterdam
), pp.
15
38
.
16.
Hamano
,
S. S.
(
1998
).
The Sound-Symbolic System of Japanese
(
CSLI and Kuroshio
,
Stanford, CA
).
17.
Huang
,
Y.-H.
,
Pratoomraj
,
S.
, and
Johnson
,
R. C.
(
1969
). “
Universal magnitude symbolism
,”
J. Verbal Learn. Verbal Behav.
8
,
155
156
.
18.
Imai
,
M.
, and
Kita
,
S.
(
2014
). “
The sound symbolism bootstrapping hypothesis for language acquisition and language evolution
,”
Philos. Trans. R. Soc. B
369
,
20130298
.
19.
Johansson
,
N.
,
Anikin
,
A.
, and
Aseyev
,
N.
(
2019
). “
Color sound symbolism in natural languages
,”
Lang. Cogn.
12
,
56
83
.
20.
Johansson
,
N.
, and
Carling
,
G.
(
2015
). “
The de-iconization and rebuilding of iconicity in spatial deixis
,”
Acta Linguist. Hafniensia
47
,
4
32
.
21.
Joo
,
I.
(
2020
). “
Phonosemantic biases found in Leipzig-Jakarta lists of 66 languages
,”
Linguist. Typol.
24
,
1
12
.
22.
Kawahara
,
S.
,
Shinohara
,
K.
, and
Uchimoto
,
Y.
(
2008
). “
A positional effect in sound symbolism: An experimental study
,” in
Proceedings of the Japan Cognitive Linguistics Association
(Japan Cognitive Linguistics Association, Tokyo, Japan), pp.
417
427
.
23.
Kuhn
,
M.
(
2008
). “
Building predictive models in R using the caret package
,”
J. Stat. Software
28
,
1
26
.
24.
Kuznetsova
,
A.
,
Brockhoff
,
P. B.
, and
Christensen
,
R. H. B.
(
2017
). “
lmerTest Package: Tests in linear mixed effects models
,”
J. Stat. Software
82
,
1
26
.
25.
Luce
,
P. A.
, and
Pisoni
,
D. B.
(
1998
). “
Recognizing spoken words: The neighborhood activation model
,”
Ear Hear.
19
,
1
36
.
26.
Massaro
,
D. W.
, and
Perlman
,
M.
(
2017
). “
Quantifying iconicity's contribution during language acquisition: Implications for vocabulary learning
,”
Front. Commun.
2
,
4
.
27.
Nielsen
,
A. K. S.
, and
Dingemanse
,
M.
(
2021
). “
Iconicity in word learning and beyond: A critical review
,”
Lang. Speech
64
,
52
72
.
28.
Norris
,
D.
,
McQueen
,
J. M.
, and
Cutler
,
A.
(
2016
). “
Prediction, Bayesian inference and feedback in speech recognition
,”
Lang., Cognit. Neurosci.
31
,
4
18
.
29.
Perry
,
L. K.
,
Perlman
,
M.
, and
Lupyan
,
G.
(
2015
). “
Iconicity in English and Spanish and its relation to lexical category and age of acquisition
,”
PLoS One
10
,
e0137147
.
30.
Perry
,
L. K.
,
Perlman
,
M.
,
Winter
,
B.
,
Massaro
,
D. W.
, and
Lupyan
,
G.
(
2017
). “
Iconicity in the speech of children and adults
,”
Dev. Sci.
21
,
e12572
.
31.
R Core Team
(
2023
).
R: A language and environment for statistical computing
,
R Foundation for Statistical Computing
, available at https://www.R-project.org/ (Last viewed November 30, 2023).
32.
Shcherbakova
,
O.
,
Gast
,
V.
,
Blasi
,
D. E.
,
Skirgård
,
H.
,
Gray
,
R. D.
, and
Greenhill
,
S. J.
(
2023
). “
A quantitative global test of the complexity trade-off hypothesis: The case of nominal and verbal grammatical marking
,”
Linguist. Vanguard
9
,
155
167
.
33.
Shinohara
,
K.
, and
Uno
,
R.
(
2022
). “
Exploring the positional effects in sound symbolism: The case of hardness judgments by English and Japanese speakers
,”
Languages
7
(3),
179
.
34.
Sidhu
,
D. M.
,
Vigliocco
,
G.
, and
Pexman
,
P. M.
(
2020
). “
Effects of iconicity in lexical decision
,”
Lang. Cogn.
12
,
164
181
.
35.
Styles
,
S. J.
, and
Gawne
,
L.
(
2017
). “
When does Maluma/Takete fail? Two key failures and a meta-analysis suggest that phonology and phonotactics matter
,”
i-Perception
8
,
204166951772480
.
36.
Ussishkin
,
A.
, and
Wedel
,
A.
(
2009
). “
Lexical access, effective contrast, and patterns in the lexicon
,” in
Phonology in Perception
, edited by
P.
Boersma
and
S.
Hamann
(
De Gruyter Mouton
,
Berlin
), pp.
267
292
.
37.
Vainio
,
L.
, and
Vainio
,
M.
(
2022
). “
Interaction between grasping and articulation: How vowel and consonant pronunciation influences precision and power grip responses
,”
PLoS One
17
,
e0265651
.
38.
Wedel
,
A.
,
Ussishkin
,
A.
, and
King
,
A.
(
2019
). “
Crosslinguistic evidence for a strong statistical universal: Phonological neutralization targets word-ends over beginnings
,”
Language
95
,
e428
e446
.
39.
Wichmann
,
S.
,
Holman
,
E. W.
, and
Brown
,
C. H.
(
2010
). “
Sound symbolism in basic vocabulary
,”
Entropy
12
,
844
858
.
40.
Wickham
,
H.
,
Averick
,
M.
,
Bryan
,
J.
,
Chang
,
W.
,
McGowan
,
L. D.
,
François
,
R.
,
Grolemund
,
G.
,
Hayes
,
A.
,
Henry
,
L.
,
Hester
,
J.
,
Kuhn
,
M.
,
Pedersen
,
T. L.
,
Miller
,
E.
,
Bache
,
S. M.
,
Müller
,
K.
,
Ooms
,
J.
,
Robinson
,
D.
,
Seidel
,
D. P.
,
Spinu
,
V.
,
Takahashi
,
K.
,
Vaughan
,
D.
,
Wilke
,
C.
,
Woo
,
K.
, and
Yutani
,
H.
(
2019
). “
Welcome to the Tidyverse
,”
J. Open Source Software
4
,
1686
.
41.
Winter
,
B.
,
Lupyan
,
G.
,
Perry
,
L. K.
,
Dingemanse
,
M.
, and
Perlman
,
M.
(
2023
). “
Iconicity ratings for 14,000+ English words
,”
Behav. Res. Methods
4
(43),
1686
.
42.
Winter
,
B.
, and
Perlman
,
M.
(
2021
). “
Size sound symbolism in the English lexicon
,”
Glossa
6
,
79
.