Fundamentals of Data Collection and Analysis in the Socio-Economic Research Environment
1. Introduction
Data capturing and analysis are key phases in research, and these phases come after research instrument development and data collection. It is necessary to ensure proper procedures are followed during data capturing since poor workmanship during this process can delay the completion of a project. Likewise, data analysis is a key stage just before report writing and affects the entire output resultant from the research process. This short article discusses fundamental issues for consideration in the capturing and analysis of data. It covers issues such as the types of data, processes of data capturing, ensuring data quality and data analysis processes.
Knowledge of these processes is relevant for to anyone working in Socio-Economic Research and Surveys (SERS); Development Programme Monitoring and Evaluations (M&E); Economic Impact & Econometric Analysis (EIA); Education and Training Services (ETS) environments.
2. Types of data
While data can be classified in several different categories, data in research can be broadly classified as qualitative or quantitative (Donges, 2018). Qualitative data is collected through techniques such as interviews, focus groups, observations, document analysis, images and other sources. Data capturing processes for qualitative data are usually case sensitive and unstandardised; and are not the main focus of this article. Meanwhile, quantitative data that includes categorical (such as gender, age, etc.) and numerical/continuous (such as Gross Domestic Product (GDP) absolute figures, Inflation etc.) requires known and standardised procedures for capturing and analysis.
3. Data capturing processes and practices
The process of data capturing in quantitative data (hereinafter = ‘data capturing’) is highly premised on the choice of an appropriate data capturing Software. Common data capturing software include Datlabs, SurveyMonkey (online), Census and Survey Processing System (CSPro) and Microsoft Excel. Alavi and Massman (2016) note that choosing an appropriate data capturing software is affected by factors such as: compatibility with your chosen data analysis software, ease of use, speed of data capturing using that specific software and the number of entries to be made among other factors.
Whichever choice you make on a data capturing software, it should allow you to create a complete data capturing screen that is a mirror image of your questionnaire, in case of primary data. With CSPro for example, the data capturing screen (dictionary) allows you to specify whether a variable is numerical, string (words) or categorical (with specified categories). While the outlook provided by different software is not the same, any data capturing screen should allow you to capture data with relative speed, and be compatible with a data analysis software you wish to export the data to.
In number of studies which involved primary research, Underhill Corporate Solutions (UCS’s) software of choice for data capturing has been CSPro. CSPro allows for much faster data capturing when compared to SurveyMonkey, Excel, etc. This is so because from the first entry to the last, the software automatically jumps from one entry to the other without requiring the data capturer to press ‘enter’. For faster capturing speed, the data capturing screen should be developed such that there is a numerical figure for each categorical variable (i.e. 1 = Strongly Disagree, 2 = Disagree etc.). With this process, the speed of capturing is affected by the data capturer’s own speed rather than software limitations, thus resulting in much faster capturing speeds.
4. Data cleaning and quality control
While it is exciting and rewarding to capture data as fast as possible, the chances of mistakes usually increase with higher speed. Other mistakes occur regardless of how fast the capturer is working, and all these can be corrected during the data cleaning phase. Cleaning can be done using two complementary techniques: physical verification and software data cleaning. With physical verification, the data capturing supervisor can grab a sample of physical questionnaires, trace their captured entry in the software using a unique identification number, and confirm whether the two match-up. This often unearths a myriad of small and big errors, all of which need rectification. Software data cleaning can also be used to identify outliers, duplicated entries and other errors. Once the data capturing supervisor is satisfied with data quality, data is exported to an analysis software.
5. Data analysis processes and practices
The actual process of data analysis in quantitative research (hereinafter = ‘data analysis’) is too broad and complex to be covered in this short essay. We will make an attempt to discuss data analysis from a process standpoint, as part of the complete research process.
For your data analysis to be successful, the very first stage you should initiate is a review of your project’s research questions and/ hypotheses. Besides the presentation of demographic information, your entire data analysis should focus on interrogating each of the research hypotheses using appropriate statistical tests. There are several tests that can be carried out, and these can be broadly classified into fields such as descriptive analysis, regression analysis, comparison of means, analysis of variance (ANOVA), reliability analysis, correlation tests, non-parametric analysis, simulations and several other groups.
To select the most appropriate statistical test, Johnson and Karunakaran (2014) indicate that the following two factors should be considered:
- The purpose of the study/ research problem – this gives a broad indication as to which analysis will be most appropriate. For instance, psychological studies that seek to measure the influence of medication on academic performance almost usually contain pre-and post-test data. This data can be analysed using mean plots, paired samples tests, ANOVA tests etc.
- Type of/ characteristics of data – Data can follow a nominal scale, ordinal scale, interval scale or ratio scale (Johnson & Karunakaran, 2014:55). Nominal scales have no inherent order, e.g. gender and race. Such data can be tested using Chi-square tests (for binary numbers) and t-tests for the comparison of means. Ordinal scales possess some form of ranking order e.g. Likert Scales. These respond better to parametric tests (if data is normally distributed) or non-parametric tests (if data is not normally distributed). Interval and ratio scales are both continuous, and can be analysed using parametric or non-parametric tests based on the normality of distribution. Meanwhile, secondary data on the impact of economic variables on GDP for instance, responds better to regression analysis techniques.
Conclusion
Data capturing and analysis are very broad and complex subjects. This article barely scratched the surface on any one of them; but hopefully provided some meaningful insight about the placement of these stages into the overall research process. Reading through this article may also have ignited a passion in you to pursue statistics and data analytics. If so, welcome to the club! The road ahead of you is long and windy, but more exciting than you can ever imagine before beginning the journey.
Author: This blog article was produced by Wonder Mahembe, a Research Economist at Underhill Corporate Solutions. He can be contacted at wonderm@underhillsolutions.co.za | +27 12 751 3237 | +27 73 818 2256.
Further reading:
Alavi, C. B. & Massman, J., 2016. Selecting an Electronic Data Capture System. Urology Practice, 3(3).
Donges, N., 2018. Data Types in Statistics. [Online]
Available at: https://towardsdatascience.com/data-types-in-statistics-347e152e8bee
[Accessed 25 July 2020].
Johnson, L. R. & Karunakaran, U. D., 2014. How to Choose the Appropriate Statistical Test Using the Free Program “Statistics Open For All” (SOFA). Annals of Community Health, 2(2), pp. 54-62.