Guide to good practice in data documentation
Good data documentation is of a crucial importance both for preserving data for the future use, and for secondary research. Any person working with a dataset needs to have all the information available in order to produce relevant tables and draw consistent conclusions.
For a well documented dataset, the data producers should prepare:
- The dataset: cleaned, with both the variables and their values labelled
- The questionnaire (in original, both printed and electronic versions)
- Information about the research that yielded the dataset
The important factors for the users of a dataset are the clarity, variables in the database in conformity with the questions in the questionnaire and the extent to which the variables are ‘self explained’. There are many cases when potentially valuable variables can no longer be used due to the lack of information about their value (e.g. for open ended questions). After a while, such information will be permanently lost. Information regarding the dataset's structure should be prepared, with a complete list of all variables and their description, includind details on the coding process and the classifications used.
Equally useful is the information related newly computed variables. The computation algorithm should be explained, if possible together with the formula that has been used (if any).
Although some variables are self-explanatory, there are many cases when reading the corresponding question in the questionnaire helps to a better understanding.
In case of quite recent researches for which there are questionnaire electronic forms, preparing such forms in a recent Microsoft Word version, with the original page layout, would be of much help. The electronic version should preferably contain diacritical marks, too.
For the less recent studies, we can scan a physical copy of the questionnaire ant try to transform it into a text using a optical character recognition program.
Information about the research
Such information is essential for users; the data set and its contents cannot be fully understood unless having prepared this information.
For each dataset, RODA fills in a study description form. This form is used by all major data archives in the world and it is based on the DDI (Data Documentation Initiative) structure, a project promoted by ICPSR, Michigan SUA.
It is necessary to prepare detailed information on the data gathering methods and instructions for the field operators, sampling procedures and weighting procedures.
The interested persons can consult our Depositing form in order to fill in the relevant information.