|
# Generating EEG features from Acoustic features |
|
|
|
Gautam Krishna |
|
|
|
_Brain Machine Interface Lab_ |
|
|
|
_The University of Texas at Austin_ |
|
|
|
Austin, Texas |
|
|
|
Co Tran |
|
|
|
_Brain Machine Interface Lab_ |
|
|
|
_The University of Texas at Austin_ |
|
|
|
Austin, Texas |
|
|
|
Mason Carnahan* |
|
|
|
_Brain Machine Interface Lab_ |
|
|
|
_The University of Texas at Austin_ |
|
|
|
Austin, Texas |
|
|
|
###### Abstract |
|
|
|
In this paper we demonstrate predicting electroencephalography (EEG) features from acoustic features using recurrent neural network (RNN) based regression model and generative adversarial network (GAN). We predict various types of EEG features from acoustic features. We compare our results with the previously studied problem on speech synthesis using EEG and our results demonstrate that EEG features can be generated from acoustic features with lower root mean square error (RMSE), normalized RMSE values compared to generating acoustic features from EEG features (ie: speech synthesis using EEG) when tested using the same data sets. |
|
|
|
electroencephalography (EEG), deep learning |
|
|
|
## I Introduction |
|
|
|
Electroencephalography (EEG) is a non invasive way of measuring electrical activity of human brain. EEG sensors are placed on the scalp of a subject to obtain the EEG recordings. The references [1, 2, 3] demonstrate that EEG features can be used to perform isolated and continuous speech recognition where EEG signals recorded while subjects were speaking or listening, are translated to text using automatic speech recognition (ASR) models. In [4] authors demonstrated synthesizing speech from invasive electrocorticography (ECoG) signals using deep learning models. Similarly in [2, 5] authors demonstrated synthesizing speech from EEG signals using deep learning models. In [2, 5] authors demonstrated results using different types of EEG feature sets. Speech synthesis and speech recognition using EEG features might help people with speaking disabilities or people who are not able to speak with speech restoration. |
|
|
|
In this paper we are interested in investigating whether it is possible to predict EEG features from acoustic features. This problem can be formulated as an inverse problem of EEG based speech synthesis. In EEG based speech synthesis, acoustic features are predicted from EEG features as demonstrated by the work explained in references [2, 5]. Predicting EEG features or signatures from unique acoustic patters might help in better understanding of how human brain process speech perception and production. Recording EEG signals in a laboratory is a time consuming and expensive process which requires the use of specialized EEG sensors and amplifiers, thus having a computer model which can generate EEG features from acoustic features might also help with speeding up the EEG data collection process as it is much easier to record speech or audio signal, especially for the task of collecting EEG data for performing speech recognition experiments. |
|
|
|
In [6] authors demonstrated medical time series generation using conditional generative adversarial networks [7] for toy data sets. Other related work include the reference [8] where authors demonstrated generating EEG for motor task using wasserstein generative adversarial networks [9]. Similarly in [10] authors generate synthetic EEG using various generative models for the task of steady state visual evoked potential classification. In [11] authors demonstrated EEG data augmentation for the task of emotion recognition. Our work focuses only on generating EEG features from acoustic features. |
|
|
|
We first performed experiments using the model used by authors in [5] and then we tried performing experiments using generative adversarial networks (GAN) [12]. In this work we predict various EEG feature sets introduced by authors in [2] from acoustic features extracted from the speech of the subjects as well as from acoustic features extracted from the utterances that the subjects were listening. |
|
|
|
Our results demonstrate that predicting EEG features from acoustic features seem to be easier compared to predicting acoustic features from EEG features as the root mean square error (RMSE) values during test time were much lower for predicting EEG features from acoustic features compared to it's inverse problem when tested using the same data sets. To the best of our knowledge this is the first time predicting EEG features from acoustic features is demonstrated using deep learning models. |
|
|
|
## II Regression and GAN model |
|
|
|
The regression model we used in this work was very similar to the ones used by the authors in [5]. We used the exact training parameters used by authors in [5] for setting values for batch size, number of training epochs, learning rate etc. In [5] authors used only gated recurrent unit (GRU) [13] |
|
|
|
[MISSING_PAGE_FAIL:2] |
|
|
|
## III Data Sets used for performing experiments |
|
|
|
We used the data set used by authors in [5] for performing experiments. The data set contains the simultaneous speech and EEG recording for four subjects. For each subject we used 80% of the data as the training set, 10% as validation set and remaining 10% as test set. This was the main data set used in this work for comparisons. More details of the data set is covered in [5]. We will refer this data set as data set A in this paper. |
|
|
|
We also performed some experiments using data set B used by authors in [2]. For this data set we didn't perform experiments for each subject instead we used 80% of the total data as training set, 10% as validation set and remaining 10% as test set. More details of the data set is covered in [2]. We will refer this data set as data set B in this paper. The train-test split was done randomly. |
|
|
|
The EEG data used in these data sets were recorded using wet EEG electrodes. In total 32 EEG sensors were used including one electrode as ground as shown in Figure 5. The Brain Product's ActiChamp EEG amplifier was used in the experiments to collect data. |
|
|
|
## IV EEG feature extraction details |
|
|
|
We followed the same preprocessing methods used by authors in [1, 2, 3, 5] for preprocessing EEG and speech data. |
|
|
|
EEG signals were sampled at 1000Hz and a fourth order IIR band pass filter with cut off frequencies 0.1Hz and 70Hz was applied. A notch filter with cut off frequency 60 Hz was used to remove the power line noise. The EEGlab's [14] Independent component analysis (ICA) toolbox was used to remove biological signal artifacts like electrocardiography (ECG), electromyography (EMG), electrooculography (EOG) etc from the EEG signals. We then extracted the three EEG feature sets explained by authors in [2]. The details of each EEG feature set are covered in [2]. Each EEG feature set was extracted at a sampling frequency of 100 Hz for each EEG channel [3]. |
|
|
|
The recorded speech signal was sampled at 16KHz frequency. We extracted mel-frequency cepstral coefficients (MFCC) of dimension 13 as features for speech signal. The MFCC features were also sampled at 100Hz same as the sampling frequency of EEG features. |
|
|
|
Fig. 4: Bi-GRU training loss convergence |
|
|
|
Fig. 5: EEG channel locations for the cap used in our experiments |
|
|
|
Fig. 3: Discriminator in GAN Model |
|
|
|
Fig. 6: Generator training loss |
|
|
|
[MISSING_PAGE_FAIL:4] |
|
|
|
[MISSING_PAGE_FAIL:5] |