Vowel Space Area of Minangkabau Learners of English

Vowel spacearea (VSA) represents kinematic movements of the articulators and measures speech intelligibility. By looking at the vowel space area, the current study intends to examine the role of Minangkabau in the acquisition of English as a second language. We conducted a speech production experiment involving ten English monophthongs in isolated sentences. We measured the formant frequencies (F1/F2) values and computed the vowel quadrilateral. The results showed that the Minangkabau learners of English did not have similar VSA pattern when compared to the native English speakers. They did not open the jaws and move the tongues as similar as the native English speakers in pronouncing English vowels. The results were discussed in the area of second language acquisition.


Introduction
Second language (L2) adult learners were pertinent to produce L2 phonetic segments differently than the native speakers of the L2 target language (Flege & Fletcher, 1992;McAllister, 1997).L2 learners are predicted to use their native language categories in L2 production asfirst language (L1) may interfere L2 acquisition (Lado, 1957;Arabski, 2006).The interference may occur in their pronunciation patterns, called interference phonology (Crystal, 1987).The interference could be recognized in the formant frequencies values.In general, the formants frequencies would generally crucial to identify the intelligibility and the correct pronunciation of vowels (Peterson & Barney, 1952;Hillenbrand & Nearey, 1999).The activation of L2 vowels increases in the inhibition of L1 production (Jacewicz & Fox, 2012;Green & Abutalebi, 2013).
The influence of L1 in the production of L2 vowels would be estimated through Vowel Space Area/ VSA (Flipsen & Lee, 2012).Vowel Space Area (VSA) would show the spectral dimensions of tongue height (first formant/F1) and anterior-posterior position of the tongue (second formant/ F2) of the tongue for each vowel (Kent & Read, 1992;Cruttenden, 2001).First and second formants roughly relate tothe size and shape of oral cavities created by jaw opening and tongue position, and the VSA is an acoustic representation for the kinematic movements of the articulators (Lee & Shaiman, 2012).Generally, a large VSA would create clearer and more intelligible speech than a smaller VSA (Bradlow & Bent, 2002).
Ample of research had been conducted in the area of cross-language comparisons and sound predictions of VSA represented in the formant frequencies.For instance, Iverson & Evans (2007) revealed that Germans, French, Spanish, and Norwegians learners of English successfully employed formant movements and duration to recognize English vowels.Earlier, Flege et al (2003) found that earlylow learners of English produced more formant movements in English vowels while late learners produced less formant movements.Formant frequencies showed different characteristics of first language (L1) and L2 vowels.
The study on the L2 acquisition on Indonesian regional languages is still underrated.Perwitasari et al. (2015) examined the interference of Javanese on English acquisition using a production experiment.The results of the experiment showed that Javanese learners of English produced English vowels duration significantly different from the native English speakers.In view of the prior research, we are intrigued to investigate how the Indonesian learners from other regional languages produce English vowels.The present study intends to examine the role of first language, in this case Minangkabau, in the acquisition of English as a second language.

Second Language Acquisition
Second language acquisition theories, such as the Feature Hypothesis (McAllister, Flege & Piske, 2002) and the Linguistic Desensitization Hypothesis (LDH) (Bohn, O.S, 1995) will be the baseline of the study.
The Feature Hypothesis (FH) predicts the acquisition of phonological features in L2 speech (McAllister, Flege, & Piske, 2002).The model posits that L2 features that are not contrastive in L1 will be difficult to acquire.The difficulty in producing phonetic features will be reflected in low production accuracy of L2 features in the speech production.This hypothesis was a part of Flege's Speech Learning Model (SLM, 1995).To prove the prediction, McAllister et al. (2002) studied native speakers of Estonian, English and Spanish, who had been living in Sweden for 10 years.The study was designed to test the Feature Hypothesis on the acquisition of Swedish among those L1 speakers.According to L1 sound systems, Swedish involves a complex quantity distinction between temporal and spectral dimensions.Estonian makes a more expanded use of segment duration than Swedish, English expands some, and Spanish has none.As predicted, the finding suggested that the Estonian L1 speakers performed equivalently to Swedish L1 speakers, the English L1 speakers performed less successful and the native Spanish speakers performed least successful.
In contrast, the Linguistic Desensitization Hypothesis (LDH) assumes that L2 learners are sensitive to durational cues when perceiving L2 vowels and predicts that vowel duration will be used to differentiate the non-native vowel contrasts (Bohn, 1995).Bohn tried to convince that adult L2 learners were apt to rely more heavily on duration when identifying the synthetic English vowels than on spectral quality.He designed a perception test of American English vowels by Spanish and German L1 learners.As German employs vowel contrast for both temporal and spectral dimensions, the study has successfully proved that German speakers had the use of duration in the perception of vowel distinctions.Surprisingly, the native speakers of Spanish pose no difficulties in perceiving the L2 vowels.It suggested that duration cues in vowel perception are easy to access whether or not listeners have had specific linguistic experience with them (Bohn, 1995, p. 294).Because vowel duration is easy to access and salient, the hypothesis predicts that L2 learners employ durational information, which is contrastive in the L1.

Minangkabau
(also called as Minangkabaunese) is an Austronesian language spoken in West Sumatra.The speakers of Minangkabau are approximately seven million speakers.In 2007 there were 4.220.032speakers of Minangkabau (except Mentawai Islands) resided in West Sumatera.In addition, the language is also spoken in Negeri Sembilan (Malaysia), Muko-muko (Bengkulu), Tapaktuan (Aceh), Pekanbaru and Taluk (Riau) (Jufrizal, 2007).Although the language has a large number of speakers, the study on Minangkabau remains under described when compared to other indigenous languages in Indonesia, such Javanese and Sundanese.
Minangkabau phonology indicates six monophthongs for the Minangkabau vowel system; /i, u, e, ә, o and a/ although there are allophonic variations in realization (Almos, 2012).The language does not have word stress (Gil, 2006).Minangkabau as the other Malay variants does not distinguish vowel based on duration.Moussay (1998) argues that most of Minangkabau people are bilinguals.They first speak Minangkabau as mother tongue then Indonesian as a national language.Minangkabau people are able to shift in two languages easily in any circumstances.He also mentioned that fluency in two languages will gradually interfere production of the two languages.Correspondingly, Minangkabau speakers frequently use phonetic features of Minangkabau language when they speak in Indonesian and vice versa.
The aim of this paper, therefore, is to: (1) improve current understanding about Minangkabau by describing the formant frequencies of Minangkabau, and (2) assess how the Minangkabau speakers produce English vowels.If Minangkabau has a smaller number of vowels and experience the absence of vowel length in its first language, does the native language affect the English production of Minangkabau speakers?If the answer appears to be true, we argue that the Minangkabau-English learners would have difficulties in producing English vowels.As a result, their formant frequencies would not be in a native like manner.

Participant
Ten native English speakers (five of which were female) and ten Minangkabau learners of English (five of which were female) participated in the study.The native English subjects, as a control group, were aged between 21 and 30 years at the time of testing.They are originally from various states in the United States of America.At the time of recording, the native English participants resided in Yogyakarta and remained a short stay in Indonesia.
The Minangkabau-English learners, as an experiment group, were aged around 21 and 25 years old at the time of testing.The second language learners were Minangkabau, mainly from West Sumatera.They had started learning English since the age of 7-11 years old.They had practically received 9-16 years of English classes during their formal education system ranging from 2 to more than 4 hours per week.Some of the subject stopped learning in quite ranging age for about one to three years.In order to figure out the English skills, it was ascertained that all L2 learners in this study had studied English in the university level.All of them had never visited or stayed in English-speaking countries.We eliminated the participants if they (a) were not native speakers of Minangkabau and or English, (b) pronounced target stimuli incorrectly, (c) showed any speech and voice disorder.

Stimuli
We use two different kinds of stimuli.First, we used Minangkabau words in isolation to make an investigation of Minangkabau formant frequencies.The words are labi, bilah, belek, buto, boto (Moussay, 1998, pp.41-43).The words are isolated into a sentence -Awak kecek an (word stimuli) baliak‖.Second, we used English words to investigate the speech production of Minangkabau learners of English.The stimuli consisted of ten English monophthongs such as bead, bid, bed, bad, bird, bud, body, bawd, Buddhist, booed (Ladefoged, 2001).In order to make a natural speech, the stimuli were inserted in a carrier sentence "I say (bVd) again".The carrier sentences were shown on the screen, in a random and sequence order.During the experiment, the sentences appeared twice.

Procedure
Prior to recording session, the participants followed some initial stages.First, the participants received a short introduction monologue which contained words simulated for the recording.Second, they were introduced with the experiment and recording procedures that they were involved.Afterwards, the participants were recorded in sound-attenuated room.The recording utilized digital audio recorder (H4N Zoom) and adjustable microphone headset (Sennheiser PC 141) with 44, 1 kHz/16 bit sampling.The distance of microphone was set approximately 3 cm in order to create constant sound record for the whole session of every subject.The Minangkabau subjects were recorded in Language Laboratory of Universitas Muhammadiyah Sumatera Barat.In the recording session, each participant sat in front of computer display with active mode recording tools (audio, video recorders and headset microphone).Once the stimuli appeared on the screen, subjects started to produce the sentence according to what they saw.The participants' speech production were documented and stored in a computer file.

Acoustic Analysis
The audio recording data were analysed acoustically.Weused Praat 5.3.56 (Boersma & Weenink, 2013) for annotating speech.The measurements for the acoustic analysis focus on the formant frequency (F1 and F2) values.Vowel Space Area derived from formant frequency values were tracked through the estimation and plotted in each vocal tract.The formant frequencies (F1 and F2) values were traced by identifying on the formant peak of the chosen time point.The value of pitch was automatically computed through the spectrogram display.The value of F1 and F2 was especially measured at the midpoint of the steady stated of the selected vowel and were converted to Bark scale using the following formula: Zi = 26.81/(1+1960/Fi)-0.53(Traunmüller, 1988).After calculating means of the formant frequencies (F1/F2), we show them on the vowel quadrilateral.

Vowel Space Area of Minangkabau L1
We conducted an acoustic investigation especially of vowel space area defined by formant frequencies of Minangkabau sound system.We embedded Minangkabau vowels in words and incorporated them in sentences.The results of formant frequencies of Minangkabau vowels are as follows: Vowel Space Area of Minangkabau vowels is illustrated in Figure 1.

Vowel Space Area of Minangkabau-English L2
We measure the formants frequencies of L1 Minangkabau and L1 English pronouncing English vowels.The results of F1 and F2 values are listed in Table 2.For F1 values, the main effect of an L1 experience yielded an F ratio of F (1, 180) = 0.634, p > .05.It indicates that there was no significant group effect indicating that there was no difference between L1 Minangkabauand L1 English speakers.However, the statistical results showed that there is a main effect of vowel [F(9, 180) = 31.31,p < 0.001].The interaction between groupand vowelwas significant as well, F (9, 180) = 19.6362,p < 0.001.The result indicates that the difference of vowel production between L1 Minangkabau and L1 English speakers for F1 values or the degree of closing and opening of oral cavity was not influenced by the experience with a specific linguistic concept through the native language, but it is merely based on the nonnative vowels.Some vowels appeared to create problems for the Minangkabau speakersregardless the specific linguistic features that are absent in the L1.The interaction between L1 groups and L2 vowels indicates that for F1 values, the impact of L1 groups was modulated by the L2 vowels.
For F2 values, there is a significant main effect of group [F (1, 180) = 25.86,p < 0.001].The factor vowel [F (9, 180) = 53.52,p < 0.001] was significant.The interactionbetween groupand vowelwas significant as well [F (9, 180) = 6.10 p < 0.001].The results indicate that for the F2 values, the L1 Minangkabau and L1 English speakers were significantly different between groups and between non-native vowels.Due to the cross linguistic experience, the nonnative vowels were pronounced differently from the native English speakers.The quadrilateral graph of the ten English vowels spoken in isolation is specified in Figure 2. Figure 2 shows the vowel space area including the formant frequencies of Minangkabau and English speakers.The considerable scatter of English vowels /iː/, /ɪ/, /e/, /ae/, /ɜː/, /ʌ/, /ɑː/, /ɔː/, /ʊ/, /uː/ in isolation showed a greater overlap.The degree of opening/ closing of oral cavity represented in the F1 valuesfor the Minangkabau learners appears to be smaller and lower than the L1 English speakers, though the difference between the groups were not statistically significant.The F2 values or the degree of frontness and backness of the highest, however, were statistically different and smaller than L1 English speakers.
To sum up, L1 Minangkabau speakers did not largely open the jaw anddid not position their tongues as similar as the native English speakers in pronouncing English vowels.The Minangkabau-English learners did not make any difference between long and short vowels such as /i:/ vs /I/ and /ʌ/ vs. /ɑː/ due to the absence of long vowels in the L1 Minangkabau.Vowel /i:/ largely falls within the area of vowel /I/.Moreover, the vowel space areashowed L1 Minangkabau had a hard time differentiating /e/ and /ae/, as L1 Minangkabau did not have any experience pronouncing vowel /ae/ in the native language.As they created smaller and lowerVSA, their speech production and intelligible speech were not as clearest as the native English speakers.It further supports the idea of Bradlow and Bent (2002).
These results indicating the significant difference in the production of English vowels by Minangkabau speakers are in agreement with the Feature Hypothesis (McAllister, Flege, & Piske, 2002).The current study found that L2 features that are not contrastive in L1 would be difficult to acquire.The Minangkabau learners of English have differently produced the long vowels, which are contrastive in Minangkabau.However, the absence of the specific feature in the L1 does not the only cause of production difference between the L2 learners and the native speakers.The learners' experience with non-native vowels was found to affect the production.Some vowels such /e/ which exist in the L1, were not pronounced correctly by the Minangkabau-English learners.The results have shown that the learners did not show the speech production in a native like manner.The difficulty in producing English vowels was shown in low production accuracy of English features.

Conclusion
The results obtained so far indicate that Minangkabau learners of English produce English vowels with smaller vowel space area.The difference of first formant frequency which represent the degree of opening and closing of oral cavitiy, was not significant between the Minangkabau learners of English and native English speakers.In contrast, the second formant, which represents the frontness and backness of highest part of the tongue, appeared to be statistically significant.
Overall, the Minangkabau-English learners have difficulties in producing certain English vowels.The Minangkabau speakers were attributed to the failure to produce some English vowels such as (/i:/, /I/, /ʌ/, /ɑː/, /e/ /ae/).Therefore, although the Minangkabau learners of English seemed to undershoot of the movements of some English vowels, the results may not provide a real implication and changes in phonetic perception, which may occur in the English learning.
Previous studies on L2 speech production in Indonesia have rarely been done and it is mainly based on auditory judgment and experience of teachers or researchers in teaching practice.Hence, the measurement of this study can be useful for teachers of English to detect L2 error production of vowels replacing the subjective judgment.This experimental study is expected to shed light on second language acquisition or more specifically in English pronunciation of nonnative speakers.

Figure 1
Figure 1 Formant frequencies of L1 Minangkabau Figure 1 shows the vowel /I/ is considered as front and close vowel in Minangkabau.Vowel /e/ appears to be a more close mid vowelwith F1 which as almost as low as the closed vowels /u/ and /o/.Vowel /a/ seems to be fairly centralized, open mid vowel.Vowel /o/ appears to be more closed mid, back vowel.Vowel /u/ is considered as a close mid and

Figure 2
Figure 2 Vowel space area of Minangkabau learners of English in pronouncing English vowels in the /bVd/ context.The tokens per vowels were connected by the solid lines for L1 and the dotted lines for L2.

Table 1
shows the mean formant values in Hz of the five Minangkabau vowels spoken in isolated sentences.

Table 2
Formant frequencies of Minangkabau learners of English when compared to native English Speakers in the /bVd/ context.