> >

Data and Documentation

NEPS Data Portfolio

SC1 Newborns

Data and Citation

Documentation

NEPS-SC1-ADIAB

SC2 Kindergarten

Data and Citation

Documentation

SC3 Grade 5

Data and Citation

Documentation

SC4 Grade 9

Data and Citation

Documentation

NEPS-SC4-ADIAB

CILS4NEPS

SC5 Students

Data and Citation

Documentation

NEPS-SC5-ADIAB

SC6 Adults

Data and Citation

Documentation

NEPS-SC6-ADIAB

SC8 Grade 5

Data and Citation

Documentation

NEPS-C

NEPS-ADIAB

Study Thuringia

Data and Citation

Documentation

Study Baden-Wuertt.

Data and Citation

Documentation

Data Access

Download

RemoteNEPS

Guide RemoteNEPS

Guide Export Import

FAQ RemoteNEPS

On-site

Sensitive Information

Data Use Agreements

Overview and Assistance

Variable Search

Instruments

Release Schedule

Regional Data

Stata Tools

Plausible Values

Campus Files

Research Projects

Publications

Publication Award

Award Winners

Conferences

User Training

Downloads

Newsletter

Newsletter Archive

Online Forum

Regional Data

Data of the National Educational Panel Study contains the following locations (Please use the NEPSplorer to receive more information about the variables):

Label	Starting Cohort	Dataset	Variable
Place of birth	3 4 5 6	pTarget	t700101
Residence	3 4 5 6	pTarget	t751001
	1 2 3 4	pParent	p751001
History of residence	6	spResidence	th21111
Secondary residence	3 4 6	pTarget	t751011
Place of work	3 4 5 6	spEmp	ts23237
Institution (panel frame)	2 3 4	CohortProfile	tx80109
Institution (episodes)	3 4 5 6	spSchool	ts11202
	1 2 3 4	spParentSchool	p723030
	2	pParent	pb11610
	3 4	pTarget	tx44401
	5	pTarget	tg15207
University entrance qualification acquisition	5	spSchool	tg2232b
Place of measure	3 4 5	spVocPrep	ts13105
Place of vocational training	3 4 5 6	spVocTrain	ts15207
Educator: place of study	3 4	pEducator	e537110
Educator: place of Staatsexamen	3 4	pEducator	e537170

All of those locations are collected during the interview and are given by the respondent. We surveyed the place name, thus, the smallest regional unit available is the town or city. Smaller regional entities are only available within the scope of Microm- or infas geodata (see below).

The place name is recoded into the municipal key (amtlicher Gemeindeschlüssel, AGS, 8 digits, as at 12/31/2013) during Scientific Use File preparation. Out of data protection issues, the full key is not made available to the community. Researchers have access to the following derived regional entities across the three access modes:

Starting Cohort	Download	RemoteNEPS	On-Site
SC1	Federal State	Federal State	Federal State Administrative Region Administrative District
SC2-SC4	--	Federal State	Federal State Administrative Region Administrative District
SC5	--	Federal State Administrative Region* Administrative District*	Federal State Administrative Region Administrative District
SC6	Federal State	Federal State Administrative Region Administrative District	Federal State Administrative Region Administrative District

* exception: place of higher education institution

For all analyses with regional data, please observe the conditions of the Data Use Agreement (see § 2 sentence 5 and § 5), in particular the handling of federal state variables.

Matching of Regional Data

If you like to link your own or self-researched regional data (e.g. from official statistics) to the NEPS data, you can do so using the above availabilities. Please consider the dating of the key variable (12/31/2013), as during local government reforms, the key is subject to change. To use your own data inside RemoteNEPS, you must first import it into our system (see here how). To use data On-Site, please get in touch with a staff member of the RDC.

If you want to link data using a key not available in a specific access way (e.g., districts in SC2-SC4 inside RemoteNEPS or municipality in all Starting Cohorts), this is also possible. In that case, the RDC will handle the matching of the data, so you do not need access to the specific key variable.

To ensure a simple and fast provision of the matched data, please prepare your data as follows:

Create a dataset in Stata format (alternatively: csv).
The first variable of the dataset should contain the municipal key (AGS), or parts of it (e.g., district code). Please choose a numeric data type (no string variable, ignore leading zeros). Again, consider the used municipal key is time-variant and may be affected by territorial reforms. Currently, we use the municipal key based on the status as of 12/31/2013 (in older SUF-Releases it is based on the status as of 2006).
The format of the subsequent variables can be chosen as required (even string variables).
Use at most 8 characters for the variable names; use no umlauts or special characters.
Please make sure that variable and value names contain sufficient information.
If you like working inside RemoteNEPS (does not apply for On-Site access): the attributes in the data file may not identify the regional unit uniquely, nor may any combination of values. Please be aware that regional data that identifies municipalities or districts, even without the regional key, will not be matched. In case you are unsure if such uniqueness is given, use for example the Stata command duplicates report varname1 varname2 ...; this verifies whether the variable combination varname1 varname2 is a unique identifier and therefore a key (the command gives the frequencies of the variable combinations; make sure there are no unique values). If you are not able to reduce your data to satisfy this condition, you are invited to work On-Site instead. We have no restrictions on the data there.

Your data file should then look like this (using fictional attributes type and status):

district	type	status
1001	A	0
1002	B	0
1003	A	1
...	...	...
16077	C	0

Please email this dataset to fdz@lifbi.de; including the following information (if you prefer to exchange the data by other means, please contact us directly):

Your username (nu..) and the number of your Data Use Agreement (DUA).
The Starting Cohort(s) you are interested in.
Which places you want the matching to be done on (see table above).

The RDC will then review the received data. Please be aware that we will not match any data if we see any data protection regulations violated (even if the above statements are fulfilled). In that case, we will reach out to you to find a solution.

The result contains the IDs of the respondents and your attributes. The regional key is removed from the file. Accordingly, the number of rows in your dataset multiplies (depending on how many of the respondents are assigned to the same district). The example above might now look like this:

ID_t	wave	type	status
402301	1	A	0
402301	2	A	0
402301	3	B	0
402302	1	B	0
402302	2	B	0
402303	1	A	1
402303	2	C	0
...	...	...	...

This dataset is provided to you in a project folder inside our Remote- or On-Site-System. You can then use the respondents’ ID to merge your data to our data, e.g. using the CohortProfile dataset:

  
	. use CohortProfile.dta
	. merge 1:1 ID_t wave using "your_datafile.dta"

Please note in the example that one person can be assigned to different places in different waves. Therefore you need the variable combination ID_t wave as an unique identifier (this even gets more complex when merging to episode data).

Microdata (Microm and infas geodata)

The NEPS Scientific Use Files already contain some small-scale regional indicators from the companies Microm and infas geodata. To find out more about those datasets, please consult the respective documentation (see Microm here, infas geodata here). These data files are available On-Site only.

In contrast to the above places, the source of the regional coding here was the real postal address of the respondents. Therefore, the regional indicators are available on scales more detailed than the municipality (smallest entity is the house level). Please note that the real identity of the small scale entity is unknown to us, so those can not be used to merge external data.

Besides this, you might be interested in the fact that the Microm data contains an identificator for the regional level. With this, you are able to detect which respondents reside in the same region. See more about this in the above mentioned documentation.