Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (2024)

In our daily lives, the integration of visual and auditory information is crucial for detecting, discriminating, and identifying multisensory objects. When faced with stimuli like seeing an apple while hearing the sound ‘apple’ or ‘peach’ simultaneously, our brain must synthesize these inputs to form perceptions and make judgments based on the knowledge of whether the visual information matches the auditory input. To date, it remains unclear exactly where and how the brain integrates across-sensory inputs to benefit cognition and behavior. Traditionally, higher association areas in the temporal, frontal, and parietal lobes were thought to be pivotal for merging visual and auditory signals. However, recent research suggests that even primary sensory cortices, such as auditory and visual cortices, contribute significantly to this integration process1-7.

Studies have shown that visual stimuli can modulate auditory responses in the AC via pathways involving the lateral posterior nucleus of the thalamus8, and the deep layers of the AC serve as crucial hubs for integrating cross-modal contextual information9. Even irrelevant visual cues can affect how the AC perceives sound frequency10. Anatomical investigations reveal reciprocal nerve projections between auditory and visual cortices4,11-14, highlighting the interconnected nature of these sensory systems. Sensory experience has been shown to shape cross-modal presentations in sensory cortices15-17. However, despite these findings, the precise mechanisms by which sensory cortices integrate cross-sensory inputs for multisensory object discrimination remain unknown.

Previous research on cross-modal modulation has predominantly focused on anesthetized or passive animal models3,5,6,18,19, exploring the influence of stimulus properties and spatiotemporal arrangements on sensory interactions3,6,19-21. However, sensory representations, including multisensory processing, are known to be context-dependent22-24, and can be shaped by perceptual learning 17,25. Therefore, the way sensory cortices integrate information during active tasks may differ significantly from what has been observed in passive or anesthetized states. Relatively few studies have investigated cross-modal interactions during the performance of multisensory tasks, regardless of the brain region studied 25-27. This limits our understanding of multisensory integration in sensory cortices, particularly regarding: (1) Do neurons in sensory cortices use the same or different strategies to integrate various audiovisual pairings? (2) How does the "match" or "mismatch" between auditory and visual features influence this integration? (3) How does learning to discriminate audiovisual objects affect their representations in sensory cortices?

We investigated this by training rats on a multisensory discrimination task involving both auditory and visual stimuli. We then examined cue selectivity and auditory-visual integration in AC neurons of these well-trained rats. Our findings demonstrate that multisensory discrimination training fosters experience-dependent associations between auditory and visual features within AC neurons. During task performance, AC neurons often exhibited multisensory enhancement for the preferred auditory-visual pairing, with no such enhancement observed for the non-preferred pairing. Importantly, this selective enhancement correlated with the animals’ ability to discriminate the audiovisual pairings. Furthermore, the degree of auditory-visual integration correlated with the congruence of auditory and visual features. Our findings suggest AC plays a more significant role in multisensory integration than previously thought.

Multisensory discrimination task in freely moving rats

To investigate how AC neurons integrate visual information into audiovisual processing, we trained ten adult male Long Evans rats on a multisensory discrimination task (Fig. 1a). During the task, the rat initiated a trial by inserting its nose into the central port, which triggered a randomly selected target stimulus from a pool of six cues: two auditory (3 kHz pure tone, A3k;10 kHz pure tone, A10k), two visual (horizontal light bar, Vhz; vertical light bar, Vvt), and two multisensory cues (A3kVhz, A10kVvt). Based on the cue, the rats had to choose the correct left or right port for a water reward within 3 seconds after the stimulus onset. Incorrect choices or no response resulted in a 5-second timeout. To ensure reliable performance, rats needed to achieve 80% accuracy (including at least 70% correct in each modality) for three consecutive sessions (typically taking 2-4 months to achieve this level of accuracy).

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (1)

Consistent with previous studies 10,26,28,29, rats performed better when responding to combined auditory and visual cues (multisensory trials) compared to trials with only auditory or visual cue (Fig. 1b). This suggests that multisensory cues facilitate decision-making. Reaction time, measured as the time from cue onset to when the rat left the central port, was shorter in multisensory trials (Fig. 1c, d). This indicates that multisensory processing in the brain helps rats discriminate between cues more efficiently (mean reaction time across rats, multisensory, 367±53 ms; auditory, 426±60 ms; visual, 417±53 ms; multisensory vs. each unisensory, p <0.001, paired t-test. Fig. 1d).

Auditory, visual, and multisensory discrimination of AC neurons in multisensory discrimination task

To investigate the discriminative and integrative properties of AC neurons, we implanted tetrodes in the right primary auditory cortex of well-trained rats (n=10) (Fig. 2a). Careful implantation procedures were designed and followed to minimize neuron sampling biases. The characteristic frequencies of recorded AC neurons, measured immediately after tetrode implantation, spanned a broad frequency range (Supplementary Fig. 1). We examined a total of 559 AC neurons (56±29 neurons per rat) that responded to at least one target stimulus during task engagement. Interestingly, a substantial proportion of neurons (35%, 196/559) showed visual responses (Fig. 2b), which was notably higher than the 14% (14%, 39/275, χ2 = 27.5, P < 0.001) recorded in another group of rats (n=8) engaged in a choice-free task where rats were not required to discriminate triggered cues and could receive water rewards with any behavioral response (Fig. 2b). This suggests multisensory discrimination training enhances visual representation in the auditory cortex. Notably, 27% (150/559) of neurons responded to both auditory and visual stimuli (audiovisual neurons), and a small number (n=7) only responded to the auditory-visual combination (Fig. 2b).

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (2)

During task engagement, AC neurons displayed a clear preference for one target sound over the other. As exemplified in Fig. 2c-2d, many neurons exhibited a robust response to one target sound while showing a weak or negligible response to the other. We quantified this preference using receiver operating characteristic (ROC) analysis. Since most neurons exhibited their main cue-evoked response within the initial period of cue presentation (Fig. 2e), our analysis focused on responses within the 0-150ms window after cue onset. We found a significant majority (61%, 307/506) of auditory-responsive neurons exhibited this selectivity during the task (Fig. 2f). Notably, most neurons favored the high-frequency tone (A10k preferred: 261; A3k preferred: 46, Fig. 2f). This aligns with findings that neurons in the AC and medial prefrontal cortex selectively preferred the tone associated with the behavioral choice contralateral to the recorded cortices during sound discrimination tasks 10,30, potentially reflecting the formation of sound-to-action associations31. Such pronounced sound preference and bias were absent in the choice-free group (Fig. 2f-2e), suggesting it is directly linked to active discrimination. Anesthesia decreased auditory preference (supplementary Fig. 2a-2b), further supporting its dependence on active engagement.

Regarding the visual modality, 41% (80/196) of visually-responsive neurons showed a significant visual preference (Fig. 2f). Similar to the auditory selectivity observed, a greater proportion of neurons favored the visual stimulus (Vvt) associated with the contralateral choice, with a ratio of 60:20 for Vvt preferred vs. Vhz preferred neurons. This convergence of auditory and visual selectivity likely results from multisensory perceptual learning. Notably, such patterns were absent in neurons recorded from a separate group of rats performing choice-free tasks (Fig. 2e). Further supporting this conclusion is the difference in visual preference between audiovisual and exclusively visual neurons (Fig. 2g). Among audiovisual neurons with a significant visual selectivity, the majority favored Vvt (Vvt preferred vs. Vhz preferred, 48 vs. 7) (Fig. 2g), aligning with their established auditory selectivity. In contrast, visual neurons did not exhibit this bias (12 preferred Vvt vs. 13 preferred Vhz) (Fig. 2g). We propose that the dominant auditory input acts as a “teaching signal” that shapes visual processing through the selective reinforcement of specific visual pathways during associative learning. This aligns with Hebbian plasticity, where stronger auditory responses boost the corresponding visual input, ultimately leading visual selectivity to mirror auditory preference.

Similar to auditory selectivity, the vast majority of neurons (79%, 270/340) showing significant multisensory selectivity exhibited a preference for the multisensory cue (A10kVvt) guiding the contralateral choice (Fig. 2f). To more clearly highlight the influence of visual input on auditory selectivity, we compared auditory, visual, and multisensory selectivity in 150 audiovisual neurons (Fig. 2h). We found that pairing auditory cues with visual cues significantly improved the neurons’ ability to distinguish between auditory stimuli alone (mean absolute auditory selectivity: 0.23±0.11; mean absolute multisensory selectivity: 0.25±0.12; p<0.0001, Wilcoxon Signed Rank Test; Fig. 2i).

Our multichannel recordings allowed us to decode sensory information from population activity in AC on a single-trial basis. Using cross-validated support vector machine (SVM) classifiers, we discriminated between auditory, visual, and multisensory cues. While decoding accuracy was similar for auditory and multisensory conditions, the presence of visual cues accelerated the decoding process, with AC neurons reaching 90% accuracy approximately 18ms earlier in multisensory trials (Fig. 2j), aligning with behavioral data. Interestingly, AC neurons could discriminate between two visual targets with around 80% accuracy (Fig. 2j), indicating a deeply accurate incorporation of visual processing in the auditory cortex. However, AC neurons in the free-choice group lacked visual discrimination ability and showed lower accuracy for auditory cues (Fig. 2k), suggesting that actively engaging multiple senses is crucial for the benefits of multisensory integration.

Audiovisual integration of AC neurons during the multisensory discrimination task

To understand how AC neurons integrate auditory and visual inputs during task engagement, we compared the multisensory response of each neuron to its strongest corresponding unisensory response using ROC analysis to quantify the difference, termed "modality selectivity". A selectivity greater than 0 indicates a stronger multisensory response. Over a third (34%, 192 of 559) of AC neurons displayed significant visual modulation of their auditory responses in one or both audiovisual pairings (p < 0.05, permutation test), including some neurons with no detectable visual response (104/356). Fig. 3a exemplifies this, where the multisensory response exceeded the auditory response, a phenomenon termed "multisensory enhancement" 32.

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (3)

Interestingly, AC neurons adopted distinct integration strategies depending on the specific auditory-visual pairing presented. Neurons often displayed multisensory enhancement for one pairing but not another (Fig. 3b), or even exhibited multisensory inhibition (Fig. 3c). This enhancement was mainly specific to the A10k-Vvt pairing, as shown by population averages (Fig. 3d). Within this pairing, significantly more neurons exhibited enhancement than inhibition (114 vs. 36) (Fig. 3e). In contrast, the A3k-Vhz pairing showed a balanced distribution of enhancement and inhibition (35 vs. 33) (Fig. 3e). This resulted in a significantly higher mean modality selectivity for the A10k-Vvt pairing compared to the A3k- Vhz pairing (0.047 ± 0.124 vs. 0.003 ± 0.096; paired t-test, p < 0.001). Among audiovisual neurons, this biasing is even more pronounced (enhanced vs. inhibited: 62 vs. 2 in A10k-Vvt pairing, 6 vs. 13 in A3k-Vhz pairing; mean modality selectivity: 0.119±0.105 in A10k-Vvt pairing vs. 0.020±0.083 A3k-Vhz pairing, paired t-test, p<-0.00001) (Fig. 3f). A similar but weaker pattern was observed under anesthesia (Supplementary Fig. 2c-2f), suggesting that multisensory perceptual learning may induce plastic changes in AC. In contrast, AC neurons in the choice-free group did not exhibit differential integration patterns (Fig. 3g-3h), indicating that multisensory discrimination and subsequent behavioral reporting are necessary for biased enhancement.

To understand how these distinct integration strategies influence multisensory discrimination, we compared the difference in modality selectivity between A10k-Vvt and A3k-Vhz pairings, and the change in neuronal multisensory versus auditory selectivity (Fig. 3j). We found a significant correlation between these measures (R=0.65, p<0.001, Pearson correlation test). The greater the difference in modality selectivity, the more pronounced the increase in cue discrimination during multisensory trials compared to auditory trials. This highlights how distinct integration models enhance cue selectivity in multisensory conditions. Additionally, a subset of neurons exhibited multisensory enhancement for each pairing (Fig. 3e), potentially aiding in differentiating between multisensory and unisensory responses. SVM decoding analysis confirmed that population neurons could effectively discriminate between multisensory and unisensory stimuli (Fig. 3i).

We further explored the integrative mechanisms—whether subadditive, additive, or superadditive— used by AC neurons to combine auditory and visual inputs. Using the bootstrap method, we generated a distribution of predicted multisensory responses by summing mean visual and auditory responses from randomly sampled trials where each stimulus was presented alone. Robust auditory and visual responses were primarily observed in correct contralateral choice trials, so our calculations focused on the A10k-Vvt pairing. We found that, for most neurons, the observed multisensory response in contralateral choice trials was below the anticipated sum of the corresponding visual and auditory responses (Supplementary Fig. 3). Specifically, as exemplified in Fig. 3k, the observed multisensory response approximated 83% of the sum of the auditory and visual responses in most cases (Fig. 3i).

Impact of incorrect choices on audiovisual integration

To investigate how incorrect choices affected audiovisual integration, we compared multisensory integration for correct and incorrect choices within each auditory-visual pairing, focusing on neurons with a minimum of 9 trials per choice per cue. Our findings demonstrated a significant reduction in the magnitude of multisensory enhancement during incorrect choice trials in the A10k-Vvt pairing. Fig. 4a illustrates a representative case. The mean modality selectivity for incorrect choices was significantly lower than that for correct choices (correct vs. wrong: 0.059±0.137 vs. 0.006±0.207; p=0.005, paired t-test, Fig. 4 b-c). This suggests that strong multisensory integration is crucial for accurate behavioral performance. In contrast, the A3k-Vhz pairing showed no difference in modality selectivity between correct and incorrect trials (correct vs. wrong: 0.011±0.081 vs. 0.003±0.199; p=0.542, paired t-test, Fig. 4d-e). Interestingly, correct choices here likely correspond to ipsilateral behavioral selection, while incorrect choices correspond to contralateral behavioral selection. This indicates that contralateral behavioral choice alone does not guarantee stronger multisensory enhancement. Overall, these findings suggest that the multisensory perception reflected by behavioral choices (correct vs. incorrect) might be shaped by the underlying integration strength.

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (4)

Impact of informational match on audiovisual integration

A pivotal factor influencing multisensory integration is the precise alignment of informational content conveyed by distinct sensory cues. To explore the impact of the association status between auditory and visual cues on multisensory integration, we introduced two new multisensory cues (A10kVhz and A3kVvt) into the target cue pool during task engagement. These cues were termed ‘unmatched multisensory cues’ as their auditory and visual components indicated different behavioral choices. Rats received water rewards with a 50% chance in either port when an unmatched multisensory cue was triggered.

We recorded from 280 AC neurons in well-trained rats performing a multisensory discrimination task with matched and unmatched audiovisual cues. We analyzed the integrative and discriminative characteristics of these neurons for matched and unmatched auditory-visual pairings. The results revealed distinct integrative patterns in AC neurons when an auditory target cue was paired with the matched visual cue as opposed to the unmatched visual cue. Neurons typically showed a robust response to A10k, which was enhanced by the matched visual cue but was either unaffected (Fig. 5a) or inhibited (Fig. 5b) by the unmatched visual cue. In some neurons, both matched and unmatched visual cues enhanced the auditory response but matched cues provided a greater enhancement (Fig. 5c). We compared the modality selectivity for different auditory-visual pairings (Fig. 5d). Unlike Vvt, the unmatched visual cue, Vhz, generally failed to significantly enhance the response to the preferred sound (A10k) in most cases (mean modality selectivity: 0.052 ± 0.097 for A10k-Vvt pairing vs. 0.016 ± 0.123 for A10k-Vhz pairing, p<0.0001, paired t-test). This suggests that consistent information across modalities strengthens multisensory integration. In contrast, neither associative nor non-associative visual cue could boost neurons’ response to the nonpreferred sound (A3k) overall (mean modality selectivity: 0.003 ± 0.008 for A3k-Vhz pairing vs. 0.003± 0.006 for A3k-Vvt pairing, p=0.19, paired t-test).

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (5)

Although distinct AC neurons exhibited varying integration profiles for different auditory-visual pairings, we explored whether, as a population, they could distinguish these pairings. We trained a linear classifier to identify each pair of stimuli and employed the classifier to decode stimulus information from the population activity of grouped neurons. The resulting decoding accuracy matrix for discriminating pairs of stimuli was visualized (Fig. 5e). We found that the population of neurons could effectively discriminate two target multisensory cues. While matched and unmatched visual cues failed to differentially modulate the response to the nonpreferred sound (A3k) at the single-neuron level, grouped neurons still could discriminate them with a decoding accuracy of 0.79 (Fig. 5e), close to the accuracy of 0.81 for discriminating between A10k-Vvt and A10k-Vhz pairings. This suggests that associative learning experiences enabled AC neurons to develop multisensory discrimination abilities. However, the accuracy for discriminating matched vs. unmatched cues was lower compared to other pairings (Fig. 5e).

Cue preference and multisensory integration in left AC

Our data showed that most neurons in the right AC preferred the cue directing the contralateral choice, regardless of whether it was auditory or visual. However, this could simply be because these neurons were naturally more responsive to those specific cues, not necessarily because they learned an association between the cues and the choice. To address this, we trained another 4 rats to perform the same discrimination task but recorded neuronal activity in left AC (Fig. 6a). We analyzed 193 neurons, of which about a third (31%, 60/193) responded to both auditory and visual cues (Fig. 6b). Similar to the right AC, the average response across neurons in the left AC preferred cues guiding the contralateral choice (Fig. 6c). However, in this case, the preferred cues are A3k, Vhz and the audiovisual pairing A3kVhz. As shown in Fig. 6d, more auditory-responsive neurons favored the sound denoting the contralateral choice. The same was true for visual selectivity in visually responsive neurons (Fig. 6e). This strongly suggests that the preference wasn’t simply due to a general bias towards a specific cue, but rather reflected the specific associations the animals learned during the training.

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (6)

Consistent with the cue biasing, differential multisensory integration was observed, with multisensory enhancement biased toward the A3k-Vhz pairing (Fig. 6f). This mirrors the finding in the right AC, where multisensory enhancement was biased toward the auditory-visual pairing guiding the contralateral choice. In audiovisual neurons, mean modality selectivity in the A3k-Vhz pairing is substantially higher than in the A10k-Vvt pairing (0.135±0.126 vs. -0.039±0.138; p<0.0001, paired t-test) (Fig. 6g). These findings suggest that AC neurons exhibit cue preference and biased multisensory enhancement based on the learned association between cues and behavioral choice. This mechanism could enable the brain to integrate cues of different modalities into a common behavioral response.

Unisensory training does not replicate multisensory training effects

Our data suggest that most AC audiovisual neurons exhibited a visual preference consistent with their auditory preference following consistent multisensory discrimination training. Additionally, these neurons developed selective multisensory enhancement for a specific audiovisual stimulus pairing. To investigate whether these properties stemmed solely from long-term multisensory discrimination training, we trained a new group of animals (n=3) first on auditory and then on visual discriminations (Fig. 7a). These animals did not receive task-related multisensory associations during the training period. We then examined the response properties of neurons recorded in the right AC when well-trained animals performed auditory, visual and audiovisual discrimination tasks.

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (7)

Among recorded AC neurons, 28% (49/174) responded to visual target cues (Fig. 7b). Unlike the multisensory training group, most visually responsive neurons in the unisensory training group lacked a visual preference, regardless of whether they were visually-only responsive or audiovisual (Fig. 7c). Interestingly, similar to the multisensory training group, nearly half of the recorded neurons (47%, 75/159) demonstrated clear auditory discrimination, with most (80%, 60 out of 75) favoring the sound guiding the contralateral choice (Fig. 7d). Furthermore, the unisensory training group did not exhibit population-level multisensory enhancement (n=174, Fig. 7f). The mean modality selectivity for the A3k- VHz and A10k-VHz pairs showed no significant difference (p = 0.327, paired t-test), with values of - 0.02±0.12 and -0.01±0.13, respectively (Fig. 7g). Even among audiovisual neurons (n=37), multisensory integration did not differ significantly between the two pairings (A3k-VHz vs A10k-VHz: 0.002±0.126 vs 0.017±0.147; p=0.63; Fig. 7g). These findings suggest that the development of multisensory enhancement for specific audiovisual cues and the alignment of auditory and visual preferences likely depend on the association of auditory and visual stimuli with the corresponding behavioral choice during multisensory training. Unisensory training alone cannot replicate these effects.

In this study, we investigated how AC neurons integrate auditory and visual inputs to discriminate multisensory objects and whether this integration reflects the congruence between auditory and visual cues. Our findings reveal that most AC neurons exhibited distinct integrative patterns for different auditory-visual pairings, with multisensory enhancement observed primarily for favored pairings. This indicates that AC neurons show significant selectivity in their integrative processing. Furthermore, AC neurons effectively discriminated between matched and mismatched auditory-visual pairings, highlighting the crucial role of the AC in multisensory object recognition and discrimination. Interestingly, a subset of auditory neurons not only developed visual responses but also exhibited congruence between auditory and visual selectivity. This suggests that multisensory perceptual training leads to auditory-visual binding and establishes a memory trace of the trained audiovisual experiences within the AC. Sensory cortices, Like AC, may act as a vital bridge for communicating sensory information across different modalities.

Numerous studies have explored the cross-modal interaction of sensory cortices1,3,4,6,19,33. Recent research has highlighted the critical role of cross-modal modulation in shaping the stimulus characteristics encoded by sensory cortical neurons. For instance, sound has been shown to refine orientation tuning in layer 2/3 neurons of the primary visual cortex7, and a cohesive visual stimulus can enhance sound representation in the AC5. Despite these findings, the functional role and patterns of cross-modal interaction during perceptual discrimination remain unclear. In this study, we made a noteworthy discovery that AC neurons employed a nonuniform mechanism to integrate visual and auditory stimuli while well-trained rats performed a multisensory discrimination task. This differentially integrative pattern improved multisensory discrimination. Notably, this integrative pattern did not manifest during a free-choice task. These findings indicate that multisensory integration in sensory cortices is not static but rather task-dependent. We propose that task-related differentially integrative patterns may not be exclusive to sensory cortices but could represent a general model in the brain.

Consistent with prior research10,30, the majority of AC neurons exhibited a selective preference for the cue linked to the contralateral choice, irrespective of the sensory modality. We propose that such cue-preference biasing helps animals learn associations between cues and actions. This learning may involve the formation of new connections between sensory and motor areas of the brain (cortico-cortical pathways) driven by sensorimotor association learning34. This biasing was not observed in the choice-free group. Similar biasing was also reported in a previous study, where auditory discrimination learning preferentially potentiated corticostriatal synapses from neurons representing either high or low frequencies associated with contralateral choices 31. These findings underscore the significant impact of associative learning on cue encoding mechanisms, specifically highlighting the integration of stimulus discrimination and behavioral choice.

Prior work has investigated how neurons in sensory cortices discriminate unisensory cues in perceptual tasks35-37. It is well-established that when animals learn the association of sensory features with corresponding behavioral choices, cue representations in early sensory cortices undergo significant changes24,38-40. For instance, visual learning shapes cue-evoked neural responses and increases visual selectivity through changes in the interactions and correlations within visual cortical neurons 41,42. Consistent with these previous studies, we found that auditory discrimination in the AC of well-trained rats substantially improved during the task.

In this study, we extended our investigation to include multisensory discrimination. We discovered that when rats performed the multisensory discrimination task, AC neurons exhibited robust multisensory selectivity, responding strongly to one auditory-visual pairing and showing weak or negligible responses to the other, similar to their behavior in auditory discrimination. Audiovisual neurons demonstrated higher selectivity in multisensory trials compared to auditory trials, driven by the differential integration pattern. Additionally, more AC neurons were involved in auditory-visual discrimination compared to auditory discrimination alone, suggesting that the recruitment of additional neurons may partially explain the higher behavioral performance observed in multisensory trials. A previous study indicates that cross-modal recruitment of more cortical neurons also enhances perceptual discrimination43.

Our study explored how multisensory discrimination training influences visual processing in the AC. We observed well-trained rats exhibited a higher number of AC neurons responding to visual cues compared to untrained rats. This suggests that multisensory training enhances AC’s ability to process visual information. Our result is in line with an increasing number of studies showing that multisensory perceptual learning can effectively drive plastic change in both sensory and association cortices44,45. For instance, a calcium imaging study in mice showed that a subset of multimodal neurons in mouse visual cortex develops enhanced auditory responses to the paired auditory stimulus following coincident auditory–visual experience17. A study conducted on the gustatory cortex of alert rats has shown that cross-modal associative learning was linked to a dramatic increase in the prevalence of neurons responding to nongustatory stimuli 16.

Among neurons responding to both auditory and visual stimuli, a congruent visual and auditory preference emerged during multisensory discrimination training, as opposed to unisensory discrimination training. These neurons primarily favored visual cues that matched their preferred sound. Interestingly, this preference was not observed in neurons solely responsive to visual targets. The strength of a visual response seems to be contingent upon the paired auditory input received by the same neuron. This aligns with known mechanisms in other brain regions, where learning strengthens or weakens connections based on experience17,46. This synchronized auditory-visual selectivity may be a way for AC to bind corresponding auditory and visual features, potentially forming memory traces for learned multisensory objects. These findings suggest that multisensory training may foster the formation of specialized neural circuits within AC. These circuits enable neurons to process related auditory and visual information together. However, further research is needed to determine if AC is the initial site for these circuit modifications.

There is ongoing debate about whether cross-sensory responses in sensory cortices predominantly reflect sensory inputs or are influenced by behavioral factors, such as cue-induced body movements. A recent study shows that sound-clip evoked activity in visual cortex have a behavioral rather than sensory origin and is related to stereotyped movements47. Several studies have demonstrated sensory neurons can encode signals associated with whisking48, running49, pupil dilation 50 and other movements51. In our study, however, we believe that the activity evoked by the flash of light bar in the AC reflects sensory inputs rather than behavioral modulation through visual-evoked body or orofacial movement. The observed responses to visual targets occurred mainly within a 100 ms temporal window following cue onset, which precedes the onset of rat body movement. This temporal dissociation strengthens the case for a sensory origin, aligning with other research showing motor-related activity triggered by sound begins much later in the visual cortex52. Additionally, movement evoked response are generally stereotyped, while the visual responses we observed are discriminative.

Additionally, our study sheds light on the role of semantic-like information in multisensory integration. During training, we created an association between specific auditory and visual stimuli, as both signaled the same behavioral choice. This setup mimics real-world scenarios where visual and auditory cues possess semantic coherence, such as an image of a cat paired with the sound of a "meow." Previous research has shown that semantically congruent multisensory stimuli enhance behavioral performance, while semantically incongruent stimuli either show no enhancement or result in decreased performance53. Intriguingly, our findings revealed a more nuanced role for semantic information. While AC neurons displayed multisensory enhancement for the preferred congruent audiovisual pairing, this wasn’t observed for the other congruent pairing. This suggests that simple semantic similarity may not be enough for enhanced multisensory integration. However, the strength of multisensory enhancement itself served as a key indicator in differentiating between matched and mismatched cues. These findings provide compelling evidence that the nature of the semantic-like information plays a vital role in modifying multisensory integration at the neuronal level.

Animals

The animal procedures conducted in this study were ethically approved by the Local Ethical Review Committee of East China Normal University and followed the guidelines outlined in the Guide for the Care and Use of Laboratory Animals of East China Normal University. Twenty-five adult male Long-Evans rats, weighing approximately 250g, were obtained from the Shanghai Laboratory Animal Center (Shanghai, China) and used as subjects for the experiments. Of these rats, 14 were assigned to the multisensory discrimination task experiments, 3 to the unisensory discrimination task, and 8 to the control free-choice task experiments. The rats were group-housed with no more than four rats per cage and maintained on a regular light-dark cycle. They underwent water deprivation for two days before the start of behavioral training, with water provided exclusively inside the training box on training days. Training sessions were conducted six days per week, each lasting between 50 to 80 minutes, and were held at approximately the same time each day to minimize potential circadian variations in performance. The body weight of the rats was carefully monitored throughout the study, and supplementary water was provided to those unable to maintain a stable body weight from task-related water rewards.

Behavioral apparatus

All experiments were conducted in a custom-designed operant chamber, measuring 50×30×40 cm (length × width × height), with an open-top design. The chamber was placed in a sound-insulated double-walled room, and the inside walls and ceiling were covered with 3 inches of sound-absorbing foam to minimize external noise interference. One sidewall of the operant chamber was equipped with three snout ports, each monitored by a photoelectric switch (see Fig. 1A).

Automated training procedures were controlled using a real-time control program developed in MATLAB (Mathworks, Natick, MA, USA). The auditory signals generated by the program were sent to an analog-digital multifunction card (NI USB 6363, National Instruments, Austin, TX, USA), amplified by a power amplifier (ST-601, SAST, Zhejiang, China), and delivered through a speaker (FS Audio, Zhejiang, China). The auditory stimuli consisted of 300ms-long (15ms ramp-decay) pure tones with a frequency of 3kHz or 10kHz. The sound intensity was set at 60 dB sound pressure level (SPL) against an ambient background of 35-40 dB SPL. SPL measurements were taken at the position of the central port, which served as the starting position for the rats.

The visual cue was generated using two custom-made devices located on each side in front of the central port. Each device consisted of two arrays of closely aligned light-emitting diodes arranged in a cross pattern. The light emitted by the vertical or horizontal LED array passed through a ground glass, resulting in the formation of the corresponding vertical or horizontal light bar, each measuring 6×0.8 cm (Fig. 1). As stimuli, the light bars were illuminated for 300ms at an intensity of 2∼3cd/m2. The audiovisual cue (multisensory cue) was the simultaneous presentation of both auditory and visual stimuli.

Multisensory discrimination task

The rats were trained to perform a cue-guided two-alternative forced-choice task, modified from previously published protocols 10,30. Each trial began with the rat placing its nose into the center port. Following a short variable delay period (500-700ms) after nose entry, a randomly selected stimulus signal was presented. Upon cue presentation, the rats were allowed to initiate their behavioral choice by moving to either the left or right port (Fig. 1A). The training consisted of two stages. In the first stage, which typically lasted 3-5 weeks, the rats were trained to discriminate between two audiovisual cues. In the second stage, an additional four unisensory cues were introduced, training the rats to discriminate a total of six cues (two auditory, two visual, and two audiovisual). This stage also lasted approximately 3-5 weeks.

During the task, the rats were rewarded with a drop (15-20μl) of water when they moved to the left reward port following the presentation of a 10 kHz pure tone sound, a vertical light bar, or their combination. For trials involving a 3 kHz tone sound, a horizontal light bar, or their combination, the rats were rewarded when they moved to the right reward port. Incorrect choices or failures to choose within 3 seconds after cue onset resulted in a timeout punishment of 5-6 seconds. Typically, the rats completed between 300 and 500 trials per day. They were trained to achieve a competency level of more than 80% correct overall and >70% correct in each cue condition in three consecutive sessions before the surgical implantation of recording electrodes.

The correct rate was calculated as follows:

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (8)

The reaction time was defined as the temporal gap between the cue onset and the time when the rat withdrew its nose from the infrared beam in the cue port.

Unisensory discrimination task

Similar to the multisensory discrimination task, the rats first learned to discriminate between two auditory cues. Once their performance in the auditory discrimination task exceeded 75% correct, the rats then learned to discriminate between two visual cues. The auditory and visual cues used were the same as those in the multisensory discrimination task.

No cue discrimination choice-free task

In this task, the rats were not required to discriminate the cues presented. They received a water reward at either port after the onset of the cue. The six cues used were the same as those in the multisensory discrimination task. One week of training was sufficient for the rats to learn this no cue discrimination choice-free task.

Assembly of tetrodes

The tetrodes were constructed using Formvar-Insulated Nichrome Wire (bare diameter: 17.78 μm, A-M systems, WA, USA) twisted in groups of four. Two 20 cm-long wires were folded in half over a horizontal bar to facilitate twisting. The ends of the wires were clamped together and manually twisted in a clockwise direction. The insulation coating of the twisted wires was then fused using a heat gun at the desired twist level. Subsequently, the twisted wire was cut in the middle to produce two tetrodes. To enhance the longitudinal stability of each tetrode, it was inserted into polymide tubing (inner diameter: 0.045 inches; wall: 0.005 inches; A-M systems, WA, USA) and secured in place using cyanoacrylate glue. An array of 2×4 tetrodes was assembled, with an inter-tetrode gap of 0.4-0.5 mm. After assembly, the insulation coating at the tip of each wire was gently removed, and the exposed wire was soldered to a connector pin. For the reference electrode, a Ni-Chrome wire with a diameter of 50.8 μm (A-M systems, WA, USA) was used, with its tip exposed. A piece of copper wire with a diameter of 0.1 mm served as the ground electrode. Each of these electrodes was also soldered to a corresponding pin on the connector. The tetrodes, reference electrode, and ground electrode, along with their respective pins, were carefully arranged and secured using silicone gel. Immediately before implantation, the tetrodes were trimmed to an appropriate length.

Electrode Implantation

Prior to surgery, the animal received subcutaneous injections of atropine sulfate (0.01 mg/kg) to reduce bronchial secretions. The animal was then anesthetized with an intraperitoneal (i.p.) injection of sodium pentobarbital (40–50 mg/kg) and securely positioned on a stereotaxic apparatus (RWD, Shenzhen, China). An incision was made in the scalp, and the temporal muscle was carefully recessed. Subsequently, a craniotomy and durotomy were performed to expose the target brain region. Using a micromanipulator (RWD, Shenzhen, China), the tetrode array was slowly advanced and implanted in the right primary auditory cortex at coordinates 3.5 - 5.5 mm posterior to bregma and 6.4 mm lateral to the midline. The craniotomy was then sealed with tissue gel (3 M, Maplewood, MN, USA). The tetrode array was secured to the skull using stainless steel screws and dental acrylic. Following the surgery, animals received a 4-day prophylactic course of antibiotics (Baytril, 5 mg/kg, body weight, Bayer, Whippany, NJ, USA). They were allowed a recovery period of at least 7 days (typically 9-12 days) with free access to food and water.

Neural recordings

After rats had sufficiently recovered from the surgery, they resumed performing the same behavioral task they were trained to accomplish before surgery. Recording sessions were initiated once the animals’ behavioral performance had returned to the level achieved before the surgery (typically within 2-3 days). Wideband neural signals in the range of 300-6000 Hz were recorded using the AlphaOmega system (AlphaOmega Instruments, Nazareth Illit, Israel). The amplified signals (×20) were digitized at a sampling rate of 25 kHz. These neural signals, along with trace signals representing the stimuli and session performance information, were transmitted to a PC for online observation and data storage.

Additionally, we recorded neural responses from well-trained rats under anesthesia. Anesthesia was induced with an intraperitoneal injection of sodium pentobarbital (40 mg/kg body weight) and maintained throughout the experiment by continuous intraperitoneal infusion of sodium pentobarbital (0.004 ∼ 0.008 g/kg/h) using an automatic microinfusion pump (WZ-50C6, Smiths Medical, Norwell, MA, USA). The anesthetized rats were placed in the behavioral training chamber, and their heads were positioned in the cue port to mimic the cue-triggering as observed during task engagement. To maintain body temperature, a heating blanket was used to maintain a temperature of 37.5 °C. The same auditory, visual, and audiovisual stimuli used during task engagement were randomly presented to the anesthetized rats. For each cue condition, we recorded 40-60 trials of neural responses.

Analysis of electrophysiological data

The raw neural signals were recorded and saved for subsequent offline analysis. Spike sorting was performed using Spike 2 software (CED version 8, Cambridge, UK). Initially, the recorded raw neural signals were band-pass filtered in the range of 300-6000 Hz to eliminate field potentials. A threshold criterion, set at no less than three times the standard deviation (SD) above the background noise, was applied to identify spike peaks. The detected spike waveforms were then subjected to clustering using template-matching and principal component analysis. Waveforms with inter-spike intervals of less than 2.0 ms were excluded from further analysis. Spike trains corresponding to an individual unit were aligned to the onset of the stimulus and grouped based on different cue and choice conditions. Units were included for further analysis only if their overall mean firing rate within the session was at least 2 Hz. To generate peristimulus time histograms (PSTHs), the spike trains were binned at a resolution of 10 ms, and the average firing rate in each bin was calculated. The resulting firing rate profile was then smoothed using a Gaussian kernel with a standard deviation (σ) of 100 ms. In order to normalize the firing rate, the mean firing rate and SD during a baseline period (a 400-ms window preceding cue onset) were used to convert the averaged firing rate of each time bin into a Z-score.

Population decoding

To evaluate population discrimination for paired stimuli (cue_A vs. cue_B), we trained a Support Vector Machine (SVM) classifier with a linear kernel to predict cue selectivity. The SVM classifier was implemented using the "fitcsvm" function in Matlab. In this analysis, spike counts for each neuron in correct trials were grouped based on the triggered cues and binned into a 100 ms window with a 10 ms resolution. To minimize overfitting, only neurons with more than 30 trials for each cue were included. All these neurons were combined to form a pseudo population. The responses of the population neurons were organized into an M × N × T matrix, where M is the number of trials, N is the number of neurons, and T is the number of bins. For each iteration, 30 trials were randomly selected for each cue from each neuron. During cross-validation, 90% of the trials were randomly sampled as the training set, while the remaining 10% were used as the test set. The training set was used to compute the linear hyperplane that optimally separated the population response vectors corresponding to cue_A vs cue_B trials. The performance of the classifier was calculated as the fraction of correctly classified test trials, using 10-fold cross-validation procedures. To ensure robustness, we repeated the resampling process 100 times and computed the mean and standard deviation of the decoding accuracy across the 100 resampling iterations. Decoders were trained and tested independently for each bin. To assess the significance of decoding accuracy exceeding the chance level, shuffled decoding procedures were conducted by randomly shuffling the trial labels for 1000 iterations.

Cue selectivity

To quantify cue (auditory, visual and multisensory) selectivity between two different cue conditions (e.g., low tone trials vs. high tone trials), we employed a receiver operating characteristic (ROC) based analysis, following the method described in a previous study54. This approach allowed us to assess the difference between responses in cue_A and cue_B trials. Firstly, we established 12 threshold levels of neural activity that covered the range of firing rates obtained in both cue_A and cue_B trials. For each threshold criterion, we plotted the proportion of cue_A trials where the neural response exceeded the criterion against the proportion of cue_B trials exceeding the same criterion. This process generated an ROC curve, from which we calculated the area under the ROC curve (auROC). The cue selectivity value was then defined as 2 * (auROC - 0.5). A cue selectivity value of 0 indicated no difference in the distribution of neural responses between cue_A and cue_B, signifying similar responsiveness to both cues. Conversely, a value of 1 or -1 represented the highest selectivity, indicating that responses triggered by cue_A were consistently higher or lower than those evoked by cue_B, respectively. To determine the statistical significance of the observed cue selectivity, we conducted a two-tailed permutation test with 2000 permutations. We randomly reassigned the neural responses to cue_A and cue_B trials and recalculated a cue selectivity value for each permutation. This generated a distribution of values from which we calculated the probability of our observed result. If our actual value fell within the top 5% of this distribution, it was deemed significant (i.e., p < 0.05).

Comparison of actual and predicted multisensory responses

We conducted a comprehensive analysis to compare the observed multisensory responses with predicted values. The predicted multisensory response is calculated as either the sum of visual and auditory responses or as a coefficient multiplied by this sum. To achieve this, we first computed the mean observed multisensory response by averaging across audiovisual trials. We then created a benchmark distribution of predicted multisensory responses by iteratively calculating all possible predictions. In each iteration, we randomly selected (without replacement) the same number of trials as used in the actual experiment for both auditory and visual conditions. The responses from these selected auditory and visual trials were then averaged to obtain mean predicted auditory and visual responses, which were used to create a predicted multisensory response. This process was repeated 5,000 times to generate a comprehensive reference distribution of predicted multisensory responses. By comparing the actual mean multisensory response to this distribution, we expressed their relationship as a Z-score. This method allowed us to quantitatively assess multisensory interactions, providing valuable insights into neural processing and enhancing our understanding of the mechanisms underlying multisensory integration.

Histology

Following the final data recording session, the precise tip position of the recording electrode was marked by creating a small DC lesion (-30 μA for 15 s). Subsequently, the rats were deeply anesthetized with sodium pentobarbital (100 mg/kg) and underwent transcardial perfusion with saline for several minutes, followed immediately by phosphate-buffered saline (PBS) containing 4% paraformaldehyde (PFA). The brains were carefully extracted and immersed in the 4% PFA solution overnight. To ensure optimal tissue preservation, the fixed brain tissue underwent cryoprotection in PBS with a 20% sucrose solution for at least three days. Afterward, the brain tissue was coronally sectioned using a freezing microtome (Leica, Wetzlar, Germany) with a slice thickness of 50 μm. The resulting sections, which contained the auditory cortex, were stained with methyl violet to verify the lesion sites and/or the trace of electrode insertion within the primary auditory cortex.

Statistical analysis

We also use ROC analysis to calculate the modality selectivity to denote the difference between multisensory and the corresponding stronger unisensory responses. All statistical analyses were conducted in MATLAB, and statistical significance was defined as a P value of < 0.05. To determine the responsiveness of AC neurons to sensory stimuli, neurons exhibiting evoked responses greater than 2 spikes/s within a 0.3 s window after stimulus onset, and significantly higher than the baseline response (P < 0.05, Wilcoxon signed-rank test), were included in the subsequent analysis. For behavioral data, such as mean reaction time differences between unisensory and multisensory trials, cue selectivity and mean modality selectivity across different auditory-visual conditions, comparisons were performed using either the Wilcoxon signed-rank test or paired t-test, as appropriate. We performed a Chi-square test to analyze the difference in the proportions of neurons responding to visual stimuli between the multisensory discrimination and free-choice groups. Correlation values were computed using Pearson’s correlation. All data are presented as mean ± SD for the respective groups.

This work was supported by grants from the “STI2030-major projects” (2021ZD0202600), Natural Science Foundation of China (32371046, 31970925, 32271057).

Author contributions

L.Y. and J.X. supervised and directed this project. L.Y. wrote the manuscript and revision. S.C. performed the experimental studies. J.X., L.K. and P.Z. helped with discussion of the study and contributed to the writing of the manuscript.

Competing interests

The authors declare no competing interests.

Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (9)
Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (10)
Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (11)
Auditory Cortex Learns to Discriminate Audiovisual Cues through Selective Multisensory Enhancement (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Mr. See Jast

Last Updated:

Views: 5520

Rating: 4.4 / 5 (75 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Mr. See Jast

Birthday: 1999-07-30

Address: 8409 Megan Mountain, New Mathew, MT 44997-8193

Phone: +5023589614038

Job: Chief Executive

Hobby: Leather crafting, Flag Football, Candle making, Flying, Poi, Gunsmithing, Swimming

Introduction: My name is Mr. See Jast, I am a open, jolly, gorgeous, courageous, inexpensive, friendly, homely person who loves writing and wants to share my knowledge and understanding with you.