Listening comprehension test Grade 9 PDF

Aileen Edele, Krisn Schoe, and Petra Stanat

ASSESSMENT OF IMMIGRANT

STUDENTS LISTENING COMPREHEN-

SION IN THEIR FIRST LANGUAGES [L1]

RUSSIAN AND TURKISH IN GRADE 9:

EXTENDED REPORT OF TEST CON-

STRUCTION AND VALIDATION

NEPS Working Paper No. 57

Bamberg, May 2015

NEPS WORKING PAPERS

Working Papers of the German National Educational Panel Study [NEPS]

at the Leibniz Institute for Educational Trajectories [LIfBi] at the University of Bamberg

The NEPS Working Papers publish articles, expertises, and findings related to the German

National Educational Panel Study [NEPS].

The NEPS Working Papers are edited by a board of researchers representing the wide range

of disciplines covered by NEPS. The series started in 2011.

Papers appear in this series as work in progress and may also appear elsewhere. They often

represent preliminary studies and are circulated to encourage discussion. Citation of such a

paper should account for its provisional character.

Any opinions expressed in this series are those of the author[s] and not those of the NEPS

Consortium.

The NEPS Working Papers are available at

//www.neps-data.de/projektübersicht/publikationen/nepsworkingpapers

Editorial Board:

Jutta Allmendinger, WZB Berlin

Cordula Artelt, University of Bamberg

Jürgen Baumert, MPIB Berlin

Hans-Peter Blossfeld, EUI Florence

Wilfried Bos, University of Dortmund

Claus H. Carstensen, University of Bamberg

Henriette Engelhardt-Wölfler, University of Bamberg

Frank Kalter, University of Mannheim

Corinna Kleinert, IAB Nürnberg

Eckhard Klieme, DIPF Frankfurt

Cornelia Kristen, University of Bamberg

Wolfgang Ludwig-Mayerhofer, University of Siegen

Thomas Martens, DIPF Frankfurt

Manfred Prenzel, TU Munich

Susanne Rässler, University of Bamberg

Marc Rittberger, DIPF Frankfurt

Hans-Günther Roßbach, LIfBi

Hildegard Schaeper, DZHW Hannover

Thorsten Schneider, University of Leipzig

Heike Solga, WZB Berlin

Petra Stanat, IQB Berlin

Volker Stocké, University of Kassel

Olaf Struck, University of Bamberg

Ulrich Trautwein, University of Tübingen

Jutta von Maurice, LIfBi

Sabine Weinert, University of Bamberg

Contact: German National Educational Panel Study [NEPS] Leibniz Institute for Educational

Trajectories Wilhelmsplatz 3 96047 Bamberg − Germany

Assessment of Immigrant Students’ Listening

Comprehension in Their First Languages [L1] Russian and

Turkish in Grade 9: Extended Report of Test Construction

and Validation

Aileen Edele, Humboldt-Universität zu Berlin

Kristin Schotte, Humboldt-Universität zu Berlin

Petra Stanat, Institut zur Qualitätsentwicklung im Bildungswesen [IQB],

Humboldt-Universität zu Berlin

E-mail address of lead author:

aileen.edu-berlin.de

Bibliographic data:

Edele, A., Schotte, K., & Stanat, P. [2015]. Listening Comprehension Tests of Immigrant

Students’ First Languages [L1] Russian and Turkish in Grade 9: Extended Report of Test

Construction and Validation [NEPS Working Paper No. 57]. Bamberg: Leibniz Institute for

Educational Trajectories, National Educational Panel Study.

NEPS Working Paper No. 57, 2015

Edele, Schotte, & Stanat

Assessment of Immigrant Students’ Listening

Comprehension in Their First Languages [L1] Russian and

Turkish in Grade 9: Extended Report of Test Construction

and Validation

Abstract

In large-scale studies, immigrant students first language [L1] proficiency is typically

measured with subjective instruments, such as self-reports, rather than with objective tests.

The National Educational Panel Study [NEPS] addresses this methodological limitation by

testing the L1 proficiency of the two largest immigrant groups in Germany, namely students

whose families have immigrated to Germany from the area of the Former Soviet Union or

Turkey. Listening comprehension tests in Russian and Turkish were developed for this

purpose. The current paper describes the general framework and requirements for testing

first language proficiency within the NEPS and describes the construction of the L1 tests for

9th grade students. Subsequently, the paper reports the item difficulty and reliability of the

tests as well as analyses of measurement equivalence indicating that the Russian and Turkish

test assess the same construct [configural equivalence]. The ability scores and their

correlations with other variables are, however, not directly comparable. Analyses of

construct validity confirm the unidimensional structure expected for the test. In addition,

the L1 test scores correlate with other indicators of L1 proficiency as well as with factors

regarded as crucial for L1 acquisition, such as exposure to L1, in the expected way

[convergent validity] and they are not substantially related to measures of general cognitive

abilities [discriminant validity]. We conclude that the listening comprehension tests

developed in the NEPS are valid measures of L1 proficiency.

Keywords

first language proficiency, L1 proficiency, listening comprehension, test construction, validity

NEPS Working Paper No. 57, 2015 Page 2

Edele, Schotte, & Stanat

1. Introduction

The effects immigrant students’ first language [L1] proficiency may have on their social

integration and educational success are highly disputed.1 On the one hand, some theoretical

perspectives suggest positive effects of L1 proficiency on second language acquisition [e.g.,

Cummins, 2000; Scheele, Leseman, & Mayo, 2010; Verhoeven, 2007] as well as on third

language learning [Hesse, Göbel, & Hartig, 2008; Rauch, Jurecka, & Hesse, 2010] and

bilingualism is assumed to promote cognitive development [Adesope, Lavin, Thompson, &

Ungerleider, 2010; Bialystok, 2007]. On the other hand, neutral or negative effects of

proficiency in L1 are proposed [e.g., Dollmann & Kristen, 2010; Esser, 2006; Mouw & Xie,

1999].

The empirical evidence necessary to elucidate this controversy is, however, inconclusive.

This is also a result of the methodological constraints of most studies on this issue. In

particular, there is a lack of investigations assessing L1 proficiency with objective tests rather

than with subjective measures, especially when it comes to analyses with larger sample

sizes. Previous large-scale studies typically relied on self-report measures of L1 proficiency.

For instance, the National Educational Longitudinal Study [NELS; Mouw & Xie, 1999], the

Children of Immigrants Longitudinal Study [CILS; Portes & Rumbaut, 2012], the International

Comparative Study of Ethno-cultural Youth [ICSEY; Berry, Phinney, Sam, & Vedder, 2006a],

as well as the German Socio-Economic Panel Study [GSOEP; TNS Infratest Sozialforschung,

2012] and the Programme for International Student Assessment [PISA; Frey et al., 2009]

examine participants’ L1 proficiency based on self-reported questionnaire data. Participants

are typically asked to indicate their L1 proficiency on four-point or five-point rating scales

with regard to one or several dimensions [understanding, speaking, reading, writing].

A few small-scale studies on the effects of L1 proficiency, however, have implemented

objective instruments for assessing L1. For instance, Rauch and colleagues [2010] examined

reading comprehension of 8th grade immigrant students’ in their L1 [Turkish] with a

computer-based adaptive test. Similarly, Leseman, Scheele, Mayo and Messer [2009]

assessed pre-schoolers’ receptive vocabulary in their L1 Turkish or Tarifit-Berber. Verhoeven

[2007] tested various aspects of children’s phonological, lexical, morpho-syntactic and

textual proficiency in their L1 Turkish. Dollmann and Kristen [2010] used C-tests in order to

examine elementary school students’ proficiency in their L1 Turkish. These studies, however,

are based on small, non-representative samples. It is therefore questionable whether the

findings from these studies are generalizable.

The National Educational Panel Study [NEPS; Blossfeld, Roßbach, & von Maurice, 2011] set

out to address this research gap by measuring L1 proficiency of students from the two

largest immigrant groups in Germany with objective tests. The project assesses the listening

comprehension proficiency of children and adolescents whose families immigrated to

Germany from the area of the Former Soviet Union or Turkey. As no suitable instruments for

1 In line with the terminology commonly used in the literature, the term immigrant students refers to the first, second and third immigrant

generation in this paper. The term first language [L1] is used here interchangeably with the language in an immigrant family’s country of

origin, regardless of whether this language was actually acquired prior to the language of the destination country [in our case German], as

the labelL1suggests, or simultaneously.

NEPS Working Paper No. 57, 2015 Page 3

Edele, Schotte, & Stanat

this purpose were available, tests in Turkish and Russian were developed by the Berlin

project team within pillar 4 of the NEPS.2

This paper describes the construction of the L1 tests for 9th grade students and reports

analyses exploring the tests’ validity. The following section delineates the general framework

for testing L1 within the NEPS as well as the requirements the instruments had to meet in

order to be suitable for implementation in the study. In section 3, we describe the tests

developed for students in grade 9. Section 4 reports analyses pertaining to the tests’ validity.

2. General Framework and Requirements for Assessing L1 Proficiency within

the NEPS

The NEPS assesses L1 proficiency with objective tests at three measurement points. More

specifically, the instruments capture basic listening comprehension skills in Russian and

Turkish in grade 9 [starting cohort 4 and later on starting cohort 33], in grade 7 [starting

cohort 3] and in grade 2 [starting cohort 2]. The current paper focuses on the tests

developed for 9th grade. At this measurement point, L1 proficiency is assessed shortly

before students attending the vocational school tracks transit from school to work life. This

makes it possible to explore the extent to which L1 proficiency affects the transition to

vocational training and the labor market success. In addition, analyses based on starting

cohort 4 are particularly robust due to the particularly large sample size of this starting

cohort [Aßmann et al., 2011].

To ensure that analyzed patterns between starting cohorts can be roughly compared, the

tests for 7th grade are analogously construed to the tests in 9th grade, but mainly consist of

different texts and items. The tests for 2nd grade also resemble the L1 tests for secondary

school in general characteristics, but are adapted in some aspects to the test-takers’ age.

2.1 Defining and Identifying the Target Population

The NEPS set out to assess the L1 proficiency of the two largest immigrant groups in

Germany, namely students from families who have immigrated to Germany from the area of

the Former Soviet Union [e.g., Russia, Kazakhstan] or from Turkey. To ensure that all

students from these immigrant groups participating in the NEPS are included in the initial

sample, the target population is defined as students of the first, second and third

generation.4

However, not all immigrants maintain their heritage language [Chiswick & Miller, 2001; Esser,

2006; Rumbaut, Massey, & Bean, 2006] and competence testing is, of course, only meaningful

if test-takers possess some proficiency in the tested domain. Therefore, we implemented

screening tests with very low item difficulty prior to L1 testing. Only students who

demonstrated a minimal level of listening comprehension in these tests were asked to

participate in the actual L1 assessment [see section 4.2 for further information].

2 The tests for grade 9 were developed by Aileen Edele and Petra Stanat.

3 See Blossfeld, von Maurice, and Schneider [2011] for a description of the starting cohorts and project structure in general.

4 To include all students whose L1 is potentially Russian or Turkish, we initially defined the target population based on country of birth,

even though Russian or Turkish is not necessarily the L1 of all families from the Former Soviet Union or Turkey.

NEPS Working Paper No. 57, 2015 Page 4

Edele, Schotte, & Stanat

2.2 Efficiency

As the NEPS assesses a large number of constructs, testing time available for each

competence domain is limited. As a consequence, it was impossible to include

multidimensional L1 tests measuring the various components of language proficiency

separately, such as vocabulary and grammar. Instead, global and efficient, yet

comprehensive measures of L1 proficiency had to be developed. Listening comprehension

constitutes a complex process requiring the integration of phonological, syntactic, semantic

and pragmatic skills [Anderson, 1995; Flowerdew & Miller, 2005]. We therefore decided to

assess this aspect as an indicator of general language proficiency.

In order to limit the testing time and financial costs, moreover, the NEPS L1 tests were

required to be applicable in group settings and to use a paper-and-pencil format. To avoid

costs associated with coding open-response items, moreover, all test items had to be in a

multiple-choice format.

2.3 Focus on Listening Comprehension

Models of language proficiency and language testing often distinguish between four basic

dimensions of language proficiency: listening, reading, speaking and writing [Harris, 1969;

Lado, 1961]. Large-scale studies assessing language proficiency focus mainly on reading

comprehension. However, children of immigrants typically acquire the L1 in the family context,

and the L1 is rarely used in school instruction. A large proportion of immigrant students is

therefore unable to read or write in that language. To ensure that students at all levels of L1

proficiency can participate in the assessment, we decided to test the domain of listening

comprehension.

Including students who are unable to read and write in their L1 and who may overall have

limited L1 skills is important because analyses on most research questions require that a

broad spectrum of L1 proficiency is represented in the data. For instance, to identify factors

predicting the maintenance or loss of L1, it is crucial to include lower proficiency levels in L1.

Similarly, in estimating effects of L1 proficiency on L2 or L3, it may be informative to

differentiate between effects of lower and higher levels of L1 proficiency [e.g., Dollmann &

Kristen, 2010; Edele & Stanat, submitted].

2.4 Coverage of Proficiency Distribution

In order to ensure that the L1 tests developed for the NEPS would cover a broad range of

proficiency levels, we developed listening comprehension texts with varying linguistic

difficulty. Based on data from a preliminary study and a larger pilot study, moreover, we also

ensured that the difficulty of the items varied substantially [for details see section 4.5; Edele,

Schotte, Hecht, & Stanat, 2012]. Due to the tests’ limited number of items, however, the

instruments differentiate most accurately at intermediate proficiency levels, while their

capacity to measure precisely very high or low proficiency levels is somewhat restricted.

NEPS Working Paper No. 57, 2015 Page 5

Edele, Schotte, & Stanat

2.5 Independence of Test Performance from Previous Knowledge

The L1 tests aim at assessing students’ ability to understand spoken language in L1. To

ensure that the test does, in fact, measure language proficiency rather than prior

knowledge, we used texts which either cover topics that should be familiar to all students

alike, such as everyday situations in school or family contexts, or topics that are likely to be

equally unfamiliar to all participants, such as events in a fictitious narration written

specifically for the test.

One aspect that needs to be taken into account in testing immigrant students is the possible

impact of cultural knowledge on their performance. There is evidence that text processing is

contingent on the fit between cultural knowledge and the content of the text. Steffensen,

Joag-Dev and Anderson [1979], for instance, found in their study that participants recalled

more information and needed less time when the text they read described content

consistent with their cultural knowledge rather than content typical for another culture.

Thus, it can be assumed that a text is easier to process when it is in line with the test takers’

culturally shaped prior knowledge.

This knowledge is likely to vary considerably in the target population for testing L1 in the

NEPS. While some students were born in Germany, others have just recently arrived from

their heritage cultures. In addition, immigrants acculturate in different ways: while some

immigrant students are strongly oriented towards the heritage culture of their family, others

are mostly involved in the culture of the receiving country or in both or neither of the

cultures [e.g., Berry, Phinney, Sam, & Vedder, 2006b; Edele, Stanat, Radmann, & Segeritz,

2013]. In order to avoid bias associated with students’ culturally shaped knowledge, the

stimuli in the L1 tests were chosen such that they should be equally familiar or equally novel

to the students with different cultural backgrounds.

2.6 Comparability of the Russian and Turkish Test

Some research questions will focus on one of the two first languages assessed in the NEPS

and the respective immigrant group only. For other research questions, however, it may be

important or interesting to determine whether the expected pattern of findings generalizes

across both L1s and immigrant groups. To ensure that the relationships between L1

proficiency and other constructs can be compared across the two groups, the tests in

Russian and Turkish need to capture the same construct. We therefore developed tests in

Russian and Turkish which are equivalent with regard to the content of the texts, the

questions and the response options. In addition, we tried to keep linguistic features

comparable that are likely to affect the difficulty of the texts.

Even the most careful translation process, however, does not necessarily ensure

measurement equivalence. Measurement equivalence can be tested with multigroup

confirmatory factor analyses [MGCFA]. Different forms of equivalence can be distinguished

by gradually constraining measurement parameters and subsequently comparing the fit of

the resulting models [see Schroeders & Wilhelm, 2011 for a detailed description of testing

invariance in categorical data]. In short, for categorical data, in the least restrictive model

factor loadings, thresholds and residual variance are freely estimated [configural invariance].

For strong measurement invariance factor loadings and thresholds are fixed to equality

across groups while constraining the residual variance in one group at 1 and estimating it

NEPS Working Paper No. 57, 2015 Page 6

Edele, Schotte, & Stanat

freely in the other group. In the most restrictive model of strict invariance factor loadings

and thresholds need to be equal and residual variances are fixed at 1 in both groups. If

configural equivalence is given, it can be assumed that an instrument assesses the same

construct in two or more groups or in our case that the instruments assess the same

construct in the respective target population. Strong invariance is necessary to directly

compare the latent means of a test and its correlations with external criteria. In strictly

invariant tests, even raw test scores are comparable. Due to the pronounced linguistic

differences between the Russian and the Turkish language on the one hand, and the limited

time and financial resources for test development on the other hand, it would have been

unrealistic to expect strong or even strict invariance of the two tests. We did, however,

strive for configural invariance.

3. Test Construction

The L1 tests developed for grade 9 consist of short texts that are orally presented to

students from a CD. They are subsequently asked to answer questions about what they have

heard. All items have a multiple-choice format.

3.1 Text and Item Construction

The texts and items of the test were initially constructed in German and subsequently

translated into Russian and Turkish. The different text units within the test are independent

from each other and consist of approximately 100 to 150 words. To assess listening

comprehension broadly, the tests include dialogues, expository texts and narrative texts. The

text types differ in their linguistic features. While the dialogues have features typical for

informal oral language, such as repetitions, redundancies, ellipses, breaks or fragmented

language, the expository and narrative texts involve linguistic features typical for written

language, such as more explicit vocabulary, correct grammar and a lack of redundancy or

repetition [Grotjahn, 2005; Shohamy & Inbar, 1991]. A preliminary version of the tests was

included in a pilot study [see section 4.2 and 4.4 for more details on the pilot study] for the

purpose of item selection.

To cover a range of language proficiency levels, we constructed texts varying in difficulty

with regard to grammar and vocabulary. More difficult texts incorporate less frequently used

vocabulary, a more complex syntax and a greater variety of tenses, including those that are

less frequently used.

For the construction of the texts and questions, we closely collaborated with two L1 experts

who have an excellent command of the Russian or Turkish language as well as a very good

understanding of the respective culture as they have grown up in Russia or Turkey,

respectively, hold university degrees in Slavic studies or Turkology and are experienced

teachers of the respective language at the university level.

To ensure that culturally shaped previous knowledge would not threaten the validity of the

tests, the L1 experts checked whether the content presented in the texts was not only in line

with the German culture, but also consistent with the respective heritage culture. They, for

instance, rejected a text describing a typical German birthday party, as a child from a family

which is strongly oriented towards its heritage culture may not be familiar with this

NEPS Working Paper No. 57, 2015 Page 7

Edele, Schotte, & Stanat

situation. We only used texts and items the L1 experts judged to be independent from

culturally shaped previous knowledge.

An institute experienced in multilingual translations and the translation of test materials of

international large-scale assessment studies, such as PISA, translated the resulting 12 texts

and 73 items. In order to keep the tests in Russian and Turkish linguistically as equivalent as

possible, we instructed the translators to maintain linguistic features of the original texts,

such as text difficulty as well as the features typical for oral versus written language, as far as

possible without jeopardizing the linguistic appropriateness in Russian or Turkish.

3.2 Text and Item Selection

Subsequently, the L1 experts submitted the test material to cognitive interviews with four

adult native speakers per language. These interviews confirmed that participants answered

items based on information retrieved from the text rather than based on their prior

knowledge. In addition, the interviews probed once more whether the test material was

consistent with the Russian and Turkish culture. Based on the findings, we excluded one text

from the tests involving a couple on a hiking tour, as hiking is less popular in Turkey and the

area of the Former Soviet Union than in Germany and immigrants from these countries may

therefore lack the corresponding background knowledge about navigation [e.g., deploying a

map and compass]. The resulting version of the tests only contains texts that were judged to

be consistent with the Russian and Turkish culture.

To get a first indication of the texts’ difficulty, the adult native speakers rated this

characteristic on a 3-point scale [1 = easy; 2 = intermediate; 3 = difficult]. The mean of the

ratings across raters ranged from 1 to 3 for the Russian as well as for the Turkish test with

means across all texts of 2.0 for the Russian version and 2.2 for the Turkish version.

After these revisions of the test material had been made, native speakers of Russian or

Turkish audio-recorded the remaining 11 texts and items as well as the test instructions. In

order to obtain a rough indicator of the item difficulties, 10 adolescents with Russian as L1

and 12 adolescents with Turkish as L1 completed the recorded tests. The item difficulty in

terms of the proportion of students who answered the items correctly ranged from p = .33

to p = 1.00 [M = .76] in the Russian test and from p = .00 to p = 1.00 [M = .51] in the Turkish

test.

Based on these findings, we chose 9 texts varying in difficulty for the inclusion in a pilot

study [see section 4.2 and 4.4 for more details on the study] and excluded a number of

extremely easy and difficult items pertaining to the remaining texts. The resulting test

versions comprised four dialogues, two narrative texts and three expository texts with five

to seven multiple-choice questions each, totaling to 54 items. Every multiple-choice item

had four or five response options.

Based on item analyses from the pilot study, we excluded two more texts and 22 questions

from the tests. In addition, wrong response options [distractors] which correlated positively

with the overall test score were eliminated.

NEPS Working Paper No. 57, 2015 Page 8

Edele, Schotte, & Stanat

3.3 Final Test Version

The final tests for grade 9 included in the main studies of the NEPS are comprised of seven

text units, namely two dialogues, two narrative texts and three expository texts. The audio-

recorded texts and questions are presented to the students once before they answer the

questions about what they have heard. Every text unit is followed by three to six multiple-

choice questions, resulting in a total of 315 test items, each with four or five response

options. The administration of the test takes 30 minutes [Russian version] and 32 minutes

[Turkish version].

4. Validity of the L1 Tests

4.1 Validation Strategy

To investigate the construct validity of the L1 tests, we examined their dimensionality and

correlated students’ scores with other indicators of L1 proficiency as well as with a

nomological net of relevant constructs [Cronbach & Meehl, 1955].

As a first step, we tested whether our L1 tests possess the expected unidimensional

structure. To establish the convergent validity [Campbell & Fiske, 1959] of our L1 measures,

we then correlated the test scores with another indicator of proficiency in Russian or

Turkish, namely the Bilingual Verbal Ability Test [Muñoz-Sandoval, Cummins, Alvarado, &

Ruef, 1998]. As both instruments are objective tests of language proficiency, we expected

the correlations to be substantial. However, as the instruments examine different aspects of

language proficiency [see below], we did not expect a particularly tight association between

them. If the criterion test assessed different aspects of language proficiency than the

developed tests, for instance productive instead of receptive language, somewhat lower

correlations between the two instruments should occur.

As another indicator of convergent validity, we correlated our L1 test scores with subjective

measures of L1 proficiency. Even though subjective assessments, and in particular self-

assessments, are susceptible to biases [Edele, Seuring, Kristen, & Stanat, in press], we

expected at least a moderately high correlation.

General cognitive abilities served as criteria in our analyses of discriminant validity. They are

assumed to influence the efficiency of language acquisition and should therefore relate

positively with proficiency [e.g., Esser, 2006; Spolsky, 1989]. In addition, text comprehension

requires deductive reasoning. However, L1 proficiency depends on a multitude of other

factors and should be clearly distinguishable from general cognitive abilities. We therefore

expected a significant yet moderate relationship between listening comprehension in L1 and

reasoning ability. Speed of perception, by contrast, should be unrelated to performance in

our L1 tests as they do not contain a speed component.

To extend the nomological net for the construct validation of the L1 tests, we draw on

models of language acquisition from different disciplines. These models suggest a number of

conditions that should foster the acquisition and maintenance of L1 proficiency in

5 Of the 32 items originally included in the final test version, one item was later excluded due to a poor model fit in the main study [see

Edele et al., 2012 for further information on the scaling of the tests].

NEPS Working Paper No. 57, 2015 Page 9

Edele, Schotte, & Stanat

immigrants and their children, such as exposure and motivation for language acquisition and

improvement [e.g., Chiswick & Miller, 1994, 2001; Esser, 2006; Spolsky, 1989].

Immigrant students are exposed to their L1 in different contexts. The most important

environment for the acquisition and improvement of L1 skills is typically the family. In

addition, children and adolescents from immigrant families may have the opportunity to

improve their L1 in interactions with co-ethnic peers. The use of media in L1 can also present

an important learning opportunity for L1 acquisition. Therefore, students’ exposure to L1 in

the family, in the peer group and in the media should be positively related to their L1

proficiency [e.g., Duursma et al., 2007; Scheele et al., 2010].

The immigrant generation status can also be assumed to affect exposure to L1 [e.g., Chiswick

& Miller, 2001]. Generally, a decrease in L1 use and proficiency and an increase in L2 use and

proficiency can be observed in immigrants over the time [Rumbaut, 2004; Stanat, Rauch, &

Segeritz, 2010]. First generation immigrant students thus typically have more opportunities

to acquire the L1 in their family context than students who were born in the country of

residence. Moreover, first generation immigrants may have acquired the L1 in their heritage

country. Accordingly, we expected first generation immigrants to be more proficient in L1

than successive immigrant generations.

Within the first immigrant generation, students who immigrated at an older age were

extensively exposed to L1 while they lived in the heritage country and the quality of the

language input can be assumed to be relatively high. Therefore, we expected age at

migration to correlate positively with L1 test scores.

In addition to providing learning opportunities, using L1 with family and peers and in media

can foster immigrant students’ motivation to further improve their L1 skills. As a strong

identification with the heritage culture should boost the motivation to improve in L1,

moreover, it should be positively related to L1 proficiency.

4.2 Study Design

We draw on data from two studies to examine the validity of our tests. This allows us, on the

one hand, to cross-check the findings, as most validation criteria were measured in both

studies. On the other hand, the two studies complement each other, as some validation

criteria were assessed in only one of the studies. In addition, the second study includes a

larger sample [for further details on sampling within the NEPS see Aßmann et al., 2011].

The first investigation [study 1] is a pilot study that was carried out within the NEPS to select

items for the final L1 tests. The target population was identified based on information from

parents. On the test day, students filled out a questionnaire and subsequently completed a

preliminary version of an L1 test in their respective first language. The preliminary test

versions implemented in this study included nine texts and 45 corresponding items. The

analyses presented in this paper are based on the seven texts and 31 items included in the

final test version that was subsequently administered in the second study [see below]. The

L1 tests in both studies are largely identical, with the only exception that a few false

response options [distractors] were excluded from some items in the final tests. This should

not substantially affect the patterns of findings relevant for validating the tests. A few

NEPS Working Paper No. 57, 2015 Page 10

Edele, Schotte, & Stanat

months after the assessment, a sub-sample of the students completed another test

measuring proficiency in Russian or Turkish, respectively.

The second study [study 2] on which we draw in the following analyses is the main study of

the NEPS for the 9th grade sample of starting cohort 4 [school and vocational training

education pathways of students in 9th grade and higher, doi:/10.5157/NEPS:SC4:4.0.06; see

Frahm et al., 2011; von Maurice, Sixt, & Blossfeld, 2011, for further information on this

starting cohort]. The target population for testing L1 was identified based on student

questionnaires implemented in a prior wave. The L1 tests were administered in a group

setting on a separate testing day. To ensure that the students had at least a very basic

proficiency level in Russian or Turkish, they completed a screening test in the respective

language prior to the L1 test. The screening test consists of recordings of eight simple

spoken sentences, such as “the dog walks.” Participants were asked to relate each sentence

to the corresponding picture among five options. Test administrators instantly scored the

screening tests using a template. Students who answered a minimum of three items

correctly were eligible for participation in the L1 tests.

4.3 Assessment of Validation Criteria

For the validation analyses, we draw on a number of variables measured with student

questionnaires, student competence tests and computer-assisted telephone interviews

[CATIs] with students’ parents.

Objective measure of L1 proficiency

As a concurrent validation criterion, proficiency in Russian and Turkish was tested in

individual testing sessions with the Bilingual Verbal Ability Test, BVAT [Muñoz-Sandoval,

Cummins, Alvarado, & Ruef, 1998]. The goal of the BVAT is to capture bilingual participants’

overall language ability and to thereby avoid underestimating their linguistic capacity

by taking into account language proficiency in L1 and L2. More specifically, the BVAT starts

with examining participants’ proficiency in L2, typically English. If they fail to solve an item in

L2, it is presented in the respective L1. The test results consequently reflect the additive

language ability in L1 and L2. The BVAT is available for 17 languages besides English, among

them Russian and Turkish. The target population of the test ranges from 5 years to old age.

The instrument examines productive language proficiency and includes the four subscales

picture vocabulary, synonyms, antonyms and verbal analogies. The test is adaptive by

specifying starting items according to participants’ age as well as termination criteria after a

series of eight [picture vocabulary, verbal analogy] or six [synonyms, antonyms] unsolved

items.

As we are specifically interested in students’ L1 proficiency, we presented students only with

items in Russian or Turkish. In addition, we refrained from adaptive testing and instead

presented the same item set to all participants in order to obtain comparable test scores.

Due to time constraints, we only employed the subscales picture vocabulary and synonyms.

The picture vocabulary scale requests participants to name drawings of objects or activities,

6 From 2008 to 2013, NEPS data were collected as part of the Framework Programme for the Promotion of Empirical Educational Research

funded by the German Federal Ministry of Education and Research [BMBF]. As of 2014, the NEPS survey is carried out by the Leibniz

Institute for Educational Trajectories [LIfBi] at the University of Bamberg in cooperation with a nationwide network.

NEPS Working Paper No. 57, 2015 Page 11

Edele, Schotte, & Stanat

while the synonyms scale requires an active production of synonyms for verbally presented

words. We excluded the first eight items of the vocabulary subscale as we considered them

too easy for the targeted age group. In total, we administered 51 items from the subscale

vocabulary and 20 items from the subscale synonyms of the BVAT.

Trained test administrators, who were native speakers of Russian or Turkish, coded students’

responses during the test session. The test sessions were recorded and 61% [Russian

sample] or 66% [Turkish sample] of answers were additionally coded by two other native

speakers. Inter-rater reliability was very high with Yules Y = .88 in the Russian sample and Y =

.87 in the Turkish sample, confirming that the test administrators coded answers adequately.

To deviate as little as possible from the original BVAT, we kept items even if their

discriminatory power was lower [.30 > d > 0] than would usually be considered acceptable

for psychometric tests [Bortz & Döring, 2002]. Only items with a discrimination 0 in the

picture vocabulary scale were excluded from further analyses [five items in the Russian test;

eight items in the Turkish test], leaving 66 items in the Russian test and 63 items in the

Turkish test in total. The sum of the correctly answered items from both scales served as the

validation criterion for our L1 proficiency tests.

The BVAT suffers from a number of conceptual and methodological limitations. Most

importantly, the test versions in the various languages are simple translations of the English

test. As a consequence, some of the items in the Russian and Turkish versions are not

solvable. For instance, for some items in the synonyms scale no equivalent expression exists.

The BVAT also contains a number of items which are highly dependent on cultural

knowledge. The picture vocabulary scale, for instance, employs illustrations of men canning

gold or of an American-style movie theater with the English word “Grand” on top. These

items may be of reasonable difficulty for someone familiar with the history, culture and

language of the United States of America, but they should be considerably more difficult for

someone not possessing this cultural knowledge. What is more, some of the items are

outdated, such as the illustration of a printing press or of an antiquated telephone, so that

adolescents today are likely to be unfamiliar with the concept or unable to recognize the

antiquated illustration. The BVAT also does not provide an exhaustive list of correct

responses, such that coding objectivity is jeopardized. Despite its limitations, we decided to

use the BVAT as a validation criterion for our L1 tests because no other instruments suitable

for testing oral language proficiency in Russian and Turkish exist for the target population of

our study.

General cognitive abilities

The NEPS examines perceptual speed and reasoning as two aspects of cognitive mechanics

or indicators of basic information processing [Lang, Kamin, Rohr, Stünkel & Williger, 2014].

Aspects of cognitive mechanics are generally assumed to operate relatively independent

from content, culture and previous experience and to reflect the neurophysiological basis of

the mind [Baltes, Staudinger, & Lindenberger, 1999]. The perceptual speed test requires

students to allocate numbers to symbols according to a provided key. The reasoning test

consists of matrices similar to those of the RAVEN test [Raven, 1977]. In our analyses, we use

the sum of correct answers for each scale as ability estimates.

NEPS Working Paper No. 57, 2015 Page 12

Edele, Schotte, & Stanat

Subjective indicators of L1 proficiency

The validation analyses also draw on several subjective indicators of studentsL1 proficiency.

First, the student questionnaires measured self-reported L1 proficiency of students with

another first language than German [“How good is your command of the other language?” 7]

for the dimensions of understanding, speaking, reading and writing. The 5-point response

scale was “very good rather good rather poor very poor not at all.” For the analyses,

the arithmetic mean of students’ ratings across the four dimensions was computed, resulting

in the scale self-estimated global L1 proficiency, which could vary from 0 [not at all] to 4

[very good].

Second, we included two scales reflecting students’ self-estimated L1 skills in oral

comprehension and written production. These scales require students to estimate their

ability with regard to specific examples of linguistic tasks. For instance, the oral

comprehension scale assesses students self-estimated ability to comprehend the

temperature predicted for the next day in a radio weather forecast. In the written

production scale, students were, for example, asked to estimate their ability to write a

description of their activities in the last weekend. The items reflect the proficiency levels A1

to C1 as defined by the Common European Framework of Reference for Languages:

Learning, Teaching, Assessment, CEFR [Council of Europe, 2011] and were developed based

on a compilation of tasks students are expected to master at each proficiency level

[Glaboniat, Müller, Rausch, Schmitz, & Wertenschlag, 2005]. With these scales we wanted to

check whether students are able to estimate their language proficiency more accurately

based on detailed examples compared to the commonly used general dimensions

understanding or writing. The oral comprehension scale contains 15 items and the written

production scale contains 10 items. The answering scale was the same 5-point response

scale measuring the self-estimated global L1 proficiency. We computed the mean of each

scale and labeled them self-estimated oral comprehension [specific examples] and self-

estimated written production [specific examples].

Students were additionally asked to estimate the number of items they had answered

correctly in the L1 test. The self-estimated number of items solved is contingent on the total

number of items in the test and could thus range from 0 - 328 [see Lockl, 2013 for further

information on the assessment of procedural metacognition in the NEPS]. We interpret this

scale as another subjective indicator of L1 proficiency.

Parents’ estimates of their children’s L1 proficiency on the dimensions speaking and writing

served as a fourth subjective indicator of students’ L1 proficiency. The rating scale was the

same as for the students’ self-estimated global L1 proficiency, and the two dimensions were

averaged.

7 Before students reached this item, the questionnaire had defined “the other language.“ Specifically, students were asked to indicate the

language other than German they had learned as a child in their family. Afterwards, they were informed that the questionnaire would

subsequently refer to this language as “the other language.”

8 Students estimated the number of solved items on the basis of the 32 items originally included in the test of which one was later on

eliminated due to poor model fit.

NEPS Working Paper No. 57, 2015 Page 13

Edele, Schotte, & Stanat

Language use with family and peers

Another set of items in the student questionnaires measured the patterns of language use in

the family [with mother, with father, with siblings] and with peers [with best friend and with

classmates] for students with another first language than German. An example for these

questions is: “What language do you speak with your mother?” The 4-point response scale

was “only German mostly German, sometimes the other language mostly the other

language, sometimes German only the other language.Students’ ratings of language use

with the mother and with the father were averaged. Similarly, the ratings of language use

with the best friend and with classmates were combined into an indicator of the language

used with peers.

Language of media use

The student questionnaires assessed the language of media consumption with seven items

capturing the language in which, among other things, students read books, watch television

or surf the web. The same 4-point scale as for language use with family and peers was

employed. We averaged the seven items to a single indicator of language in media use.

Immigrant generation status and age at immigration

We defined the target persons’ immigrant generation status based on the country of birth of

the students themselves, their parents and grandparents. The student questionnaires

assessed the countries of birth based on a list of countries including Germany, Kazakhstan,

the Russian Federation, Ukraine and Turkey, as well as based on an open question allowing

students to indicate countries not covered in the list. We classified students who themselves

and their parents were born abroad as first generation, students who were born in Germany

but whose parents were both born abroad as second generation and students who were

born in Germany, whose parents were born in Germany and whose grandparents [at least

two] were born abroad as third generation. We further defined students with one parent

born abroad and one parent born in Germany as one parent born abroad. Students’

immigrant generation status was only classified when all data necessary for its univocal

identification were available. The student questionnaires also assessed the age which

students who were born abroad immigrated to Germany.

Identification with heritage culture

Four items captured students’ emotional identification with the heritage culture of their

families. One item for instance states: “I feel closely attached to this culture9.” The 4-point

response scale was “does not applypartially appliesmostly appliesfully applies.”

While most validation criteria, particularly the information from the student questionnaires,

were measured in both studies, the BVAT and the self-estimated L1 proficiency based on

detailed examples were only administered in study 1 whereas parents’ estimates of

9 Before students reached the item, the questionnaire requested students to indicate the country other than Germany from which their

family originates. Afterwards, it explained that subsequent questions would refer to the culture of this country as students’ “heritage

culture.” An example for this was presented.

NEPS Working Paper No. 57, 2015 Page 14

Edele, Schotte, & Stanat

students’ L1 proficiency, the self-estimated number of items solved in the L1 test as well as

general cognitive abilities were only included in study 2.

4.4 Sample

Both studies tested L1 proficiency of immigrant students [first, second and third generation]

whose families immigrated to Germany from the area of the Former Soviet Union [e.g.,

Russia, Kazakhstan] or Turkey.10 More specifically, we defined the target population as

students who were either themselves born in one of these countries and students with at

least one parent or two grandparents born in the area of the Former Soviet Union or Turkey.

Table 1 presents descriptive information on the samples of studies 1 and 2.

Study 1 is based on data from schools located in four federal states [Bavaria, Berlin,

Hamburg, North Rhine-Westphalia] attended by high percentages of students speaking

Turkish and/or Russian. The Russian L1 test was conducted in 17 schools and the Turkish L1

test in 15 schools.

The Russian sample consists of 224 participants11 [53% female]. On average, students were

16 years old at the time of data assessment. Of these students, 37% were enrolled in a

Hauptschule [lowest school track of the German secondary education], 31% in a

Gesamtschule [comprehensive schools], 17% in a Schule mit mehreren Bildungsgängen

[schools with several educational tracks] and 15% in a Gymnasium [highest track leading to

university entrance degree]. More than two-thirds [71%] were classified as first generation

immigrant students, while only 20% were second generation.

The Turkish sample in study 1 consists of 310 participants12 [50% female]. Participants’ mean

age was 15.7 years. The majority of students either attended the Hauptschule [28%] or the

Gesamtschule [31%], while a somewhat lower proportion of the students attended a Schule

mit mehreren Bildungsgängen [20%] or a Gymnasium [22%]. The majority of students in the

Turkish sample [66%] were second generation, while only 7% were first generation

immigrants.

A subsample of 79 participants in the Russian sample and of 101 participants in the Turkish

sample completed the BVAT.

Study 2 draws on data from schools located in all federal states of Germany. The original

Russian sample comprised 743 students and the original Turkish sample 916 students. Not

all selected students attended the screening tests, however [see IEA Data Processing and

Research Center, 2013]: 129 [17%] of the Russian sample and 152 [17%] of the Turkish

sample were absent from school at the testing day. In addition, 48 [6%] Russian students and

23 [3%] Turkish students refused to participate in the screening test, due to a complete lack

of proficiency in L1. Furthermore, 10 students of the Russian sample [1%] and seven

students of the Turkish sample [1%] had left the school in the meantime. The schools

10 For the sake of brevity, the former group will further be referred to as “the Russian sample” and the latter as “the Turkish sample.” These

terms do not allude to citizenship or the like, but to the L1 that was tested in the sample.

11 Overall, 225 students participated in the Russian L1 test. We excluded one student because the number of valid answers provided was

too small to scale the test score.

12 In total, the Turkish sample consisted of 311 students. One student did not provide enough valid answers and therefore had to be

excluded from the scaling procedure.

NEPS Working Paper No. 57, 2015 Page 15

Edele, Schotte, & Stanat

attended by seven students from the Russian sample [1%] and by 29 students [3%] from the

Turkish sample did not participate in the NEPS study anymore. The Russian L1 tests were

administered in 257 schools and the Turkish L1 tests in 237 schools. 35 students [5%] failed

the screening test and were thus excluded from the L1 testing session in the Russian sample.

In the Turkish sample, 38 students [4%] failed the screening test. We further excluded 12

[2%] Russian and 4 [0.5%] Turkish students who had participated in the L1 tests from all

analyses because the number of valid answers they provided was too small to scale their test

scores [see Edele, et al., 2012 for further information on the scaling of the L1 tests].

The resulting Russian sample includes 502 students in total [51% female]. On average, they

were 15.8 years old at the time of testing. The largest proportion of students attended the

Hauptschule [41%], followed by students attending the Realschule [25%] and the

Gymnasium [20%]. Almost equal proportions of students belonged to the first immigrant

generation [47%] and to the second generation [41%]. On average, the students born abroad

were 5.3 years old when they came to Germany.

The final Turkish sample consists of 662 students [48% female]. On average, students in this

sample were 15.7 years old. Half of them attended the Hauptschule [50%], 20% the

Realschule and 14% the Gymnasium. In this sample, the majority of students [65%] were

second generation immigrants; only 9% were first generation.

Table 1: Gender, Age, Attended School Track and Immigrant Generation Status of the Russian

and the Turkish Samples in Studies 1 and 2

Study 1 Study 2

Russian Turkish Russian Turkish

n % n % n % n %

Total 224 310 502 662

Gender

Male 106 47.3 155 50.0 248 49.4 342 51.7

Female 118 52.7 155 50.0 254 50.6 320 48.3

School track

Hauptschule 82 36.6 87 28.1 206 41.0 330 49.8

Realschule - - - - 124 24.7 130 19.6

Gymnasium 34 15.2 67 21.6 98 19.5 94 14.2

SMB 38 17.0 61 19.7 27 5.4 11 1.7

Gesamtschule 70 31.2 95 30.6 47 9.4 97 14.7

NEPS Working Paper No. 57, 2015 Page 16

Edele, Schotte, & Stanat

Note. SMB = Schule mit mehreren Bildungsgängen [schools with several educational tracks]

4.5 Results

We analyzed the data from the Russian and the Turkish samples separately, as these groups

differ with regard to several important characteristics, such as the proportion of first and

second immigrant generation students and the attended school types.

Scaling, item difficulty and reliability

The L1 test scores of both studies were scaled based on a one-parameter logistic model

[Rasch model], yielding estimates of students’ L1 proficiency as Weighted Maximum

Likelihood Estimates [WLEs]. Item fit indices are generally satisfying in study 113 and in study

2 [see Edele et al., 2012 for further details on scaling].

In study 1, item difficulty on a logit scale of the Russian test ranges from -2.01 to 0.61 [mean

item difficulty: b = -.57]. The mean of students’ proficiency estimates given as WLEs is MWLE =

0.02 ranging from -2.57 to 3.83. In the Turkish test, the mean item difficulty is b = -0.17

ranging from -2.61 to 1.23. Person estimates of Turkish proficiency range between -2.24 and

4.26 with MWLE = 0.01.

In study 2, the mean item difficulty in the Russian L1 test is b = -0.12 ranging from -1.48 to

1.41. The mean of the person estimates is MWLE = .04, ranging from -2.78 to 4.26. The

average item difficulty in the Turkish L1 test is b = -0.23, ranging from -1.78 to 1.30. The

person estimates range from -3.46 to 3.00 with MWLE = 0.00.

The concurrence of the means in item difficulty and in person estimates suggests that item

difficulty and students’ L1 proficiency in both studies match well, on average. The range of

item difficulties is somewhat limited, however. Graphical analyses of the joint distributions

of person estimates and item estimates further indicate that in both studies the majority of

items clusters around the center of the scale [medium difficulty], while the boundaries of the

scale are covered only by a few items. The tests thus capture very high and very low

proficiency levels somewhat less precisely than the proficiency of students at intermediate

proficiency levels.

13 In study 1, one item in each test showed a poor fit [WMNSQ = 1.33 in the Russian test; WMNSQ = 1.21 in the Turkish test], most probably

due to attractive distractors. These distractors were eliminated in the final test version.

Immigrant

generation

1st generation 159 71.0 23 7.4 234 46.6 62 9.4

2nd generation 44 19.6 205 66.1 206 41.0 430 64.9

3rd generation - - 3 1.0 - - 19 2.9

One parent born

abroad

6 2.7 52 16.8 28 5.6 132 19.9

Not

determinable

15 6.7 27 8.7 34 6.8 19 2.9

NEPS Working Paper No. 57, 2015 Page 17

Edele, Schotte, & Stanat

Yet the tests proved to be highly reliable. In study 1, the WLE-reliability was .86 for the

Russian test and .79 for the Turkish test. In study 2, reliability coefficients were .85 for the

Russian test and .83 for the Turkish test.

Measurement equivalence

To test measurement equivalence, we conducted a MGCFA on the L1 tests in study 2 [see

Table 2]. The results show that the fit indices of the model assuming configural invariance

are acceptable [see Yu, 2002, for cut-off criteria of fit indices]. The more restrictive models

assuming strong and strict invariance, however, do not hold, as the model fit indices are not

satisfactory and the test of change in model fit is significant.

To check whether the observed lack of invariance is caused by items with low factor loadings

in one of the tests [loadings < .40] we excluded seven items from the model. The resulting

model assuming configural invariance [CFI = .94; TLI = .96; RMSEA = .04], however, does not

fit better than the model including the complete set of items. The more restrictive models,

consequently, do not show satisfying fit indices either. Thus, there is no indication for partial

strong invariance, either.

These findings confirm configural equivalence of the Russian and the Turkish tests, implying

that they measure the same construct. Because more restrictive models of equivalence are

not supported, however, neither the ability scores in the Turkish and the Russian tests nor

their correlation coefficients with other variables are directly comparable.

Table 2: Tests of Measurement Equivalence of the Russian and the Turkish L1 Tests

Note. In computing these models with MPlus 6.1 [Muthén & Muthén, 2009], we employed a robust weighted least squares approach

[estimator: WLSMV] and estimated model parameters based on Theta parameterization. CFI = comparative fit index; TLI = Tucker-Lewis

Index; RMSEA = root mean square error of approximation.

Dimensional structure

To examine whether our L1 tests exhibit the expected unidimensional structure, a 1-

dimensional model was tested against an alternative, theoretically plausible 2-dimensional

model. The 2-dimensional model assigns items on dialogues to the first dimension and items

on expository as well as narrative texts to the second dimension. We computed these

models with ConQuest 2.0 [Wu, Adams, Wilson, & Haldane, 2007], using Marginal Maximum

Likelihood [MML] estimation with Gauss-Hermite quadrature.

In study 1, the two dimensions correlate very highly, with .97 in the Russian sample and .94

in the Turkish sample. The 1-dimensional model fits the data better than the 2-dimensional

model in both language groups, as two indicators of model fit, namely Akaike’s information

NEPS Working Paper No. 57, 2015 Page 18

Edele, Schotte, & Stanat

criterion [AIC; Akaike, 1974] and the Bayesian information criterion [BIC; Schwarz, 1978],

indicate [see Table 3].

The two dimensions are also highly correlated in study 2, with .94 in the Russian sample and

.98 in the Turkish sample. The model fit indices suggest that the 2-dimensional model fits

negligibly better according to the AIC and slightly more poorly according to the BIC in the

Russian sample. In the Turkish sample, the 2-dimensional model fits somewhat more poorly

as indicated by both indicators [see Table 3]. The very high correlation between the two

dimensions indicating their near identity and the very similar model fit slightly in favor of the

1-dimensional model suggest that the construct measured with the L1 tests is

unidimensional rather than 2-dimensional.

Table 3: Results of the 1-dimensional and 2-dimensional Scaling: AIC and BIC Model Selection

Criteria

Note. AIC = Akaike information criterion; BIC = Bayesian information criterion.

Criterion validity: Other indicators of L1 proficiency and reasoning

To examine the convergent validity of the L1 tests, we correlated in a first step the L1 test

scores with students’ scores on the BVAT. In the Russian sample, the two objective

indicators correlate quite highly, in the Turkish sample the correlation is moderate [see Table

4].

In a second step, we correlated immigrant students’ L1 test scores with a number of

subjective indicators of L1 proficiency. As expected, students’ self-estimated global L1

proficiency is positively related to their L1 test score. Along the same lines, the scales self-

estimated oral comprehension [specific examples] and self-estimated written production

[specific examples] also substantially relate to tested L1 proficiency in both language groups.

These correlations are, however, not higher than those between the L1 tests and the global

self-estimates, indicating that an assessment of L1 proficiency based on concrete examples

does not enhance the validity of self-estimates. Students’ self-estimated number of items

solved in the test is also substantially related to their results in the L1 tests; strong

correlations emerged in both language groups. Similarly, parents’ estimates of their

children’s L1 proficiency are also strongly [Russian sample] or moderately [Turkish sample]

associated with the L1 test scores.

Analyses exploring the discriminant validity of the L1 tests indicate that, as expected, the L1

test scores correlate moderately with reasoning but only weakly and inconsistently with

perceptual speed.

NEPS Working Paper No. 57, 2015 Page 19

Edele, Schotte, & Stanat

Table 4: Pairwise Correlations between Immigrant Students’ L1 Test Scores [WLEs] and

Validation Criteria

Subjective measures of L1 proficiency

Self-estimated global L1 proficiency

[220]

[302]

[426]

[572]

Self-estimated oral comprehension

[specific examples]

[223]

[306]

Self-estimated written production

[specific examples]

[223]

[305]

Self-estimated number of solved

items

[488]

[636]

Parents’ estimates of L1 proficiency

[182]

[232]

General cognitive abilities

[483]

[632]

Age at immigration

[224]

[305]

[437]

[581]

[205]

[293]

[399]

[545]

[223]

[304]

[437]

[572]

[222]

[304]

[420]

[559]

Identification with heritage culture

[212]

[290]

[465]

[566]

Note. Correlations are given as Pearson’s r. Spearman’s rank correlation coefficients which we additionally computed because of the

non-normal distribution of some validation criteria, yielded almost equal results. Number of cases [N] in parentheses.

*p < .05, **p < .01, ***p < .001

NEPS Working Paper No. 57, 2015 Page 20

Edele, Schotte, & Stanat

Criterion validity: L1 exposure and motivation

Further analyses of the L1 tests’ validity with indicators of exposure to L1 and motivation for

L1 acquisition as criteria generally also show the expected pattern of results. In the Russian

group, students’ L1 test scores in both studies correlate substantially with their age at

immigration as well as with the language they use with parents, siblings, peers and in media

consumption. In the Turkish group, the age at immigration is also positively related to L1 test

scores in study 2 but not in study 1. Yet, the coefficient in study 1 is based on a very small

number of students and may therefore not be reliable. The language of media use also

shows the expected correlation with L1 test scores in the Turkish group. Language use in the

family and with peers overall shows the expected pattern as well, yet the correlation

coefficients are somewhat smaller and less consistent in the Turkish group than in the

Russian sample.

In a next step, we examined whether the duration of the family’s residence in Germany is

associated with the L1 test scores. Because the number of third-generation students is very

small, the analyses focus on the first and second generations. As expected, the test scores in

Russian are higher for first immigrant generation than for second generation students [see

Table 5].

Contrary to expectations, in the Turkish sample the first generation does not show higher L1

test scores than the second generation. One possible explanation for this finding is that

second generation students with a Turkish background may not encounter significantly

fewer learning opportunities for L1 acquisition than first generation students who

immigrated from Turkey themselves. To test this idea, we compared first and second

generation students’ language use with parents, siblings, peers and in media with data from

study 2. The results of the t-tests support this explanation to some extent. Students from the

first generation of the Turkish group use L1 more often with their parents [M = 3.1] than

students from the second generation [M = 2.9, t = 2.38, p

Chủ Đề