The dataset for this exercise (males.xlsx) contains data for young working males in the USA with some professional and personal characteristics. Please only use data for the year 1987. We want to explain the log wages from the other variables using the following model:

Logwage ᵢ = β₁ + β₂ school + β₃ exper ᵢ + β₄ union ᵢ + β₅ mar ᵢ + β₆ black ᵢ + β₇ hisp ᵢ + εᵢ

We assume that all εᵢ and all explanatory variables are independent and that εᵢ are independently distributed with expectation 0 and variance σ².

A. Compare summary statistics of all the variables in the model and provide a brief interpretation.

Based on the table above, there are 545 observations for each variable for the year 1987 observation. Looking at the mean values in the table above, one can conclude that the years of schooling are the most important variable that influences log wages and the lowest value of 0.12 for the Black race.

Based on the summary statistics above for the year 1987, 26.2% are part of union, 61.5% are married, 11.6% are black and 15.6% are Hispanic.

B. Estimate the parameters by OLS. Report and interpret the estimation results, including the R². Pay attention to economic interpretation as well as statistical significance.

This table provides the R and R² values.

The R-value represents the simple correlation between the dependent and independent variables (Jain, 2019). In this case, the value is 0.394, which is good.
The R² value indicates the coefficient of determination – how much of the total variation in the dependent variable, log wages, can be explained by the independent variable (being part of a union member, Hispanic, black, years of schooling, married status and experience). In this case, the value is 0.155, so it is good.
Adjusted R-square shows the generalization of the results. In this case, the value is 0.145, which is not far off from 0.155 (R²), so it is good.

Interpretation of Pearson’s correlation values

Independent variable name	Pearson correlation value	Result
Years of schooling	0.337	Positive correlation
Union member or not	0.082	Very weak positive correlation
Married	0.140	Very weak positive correlation
Black	-0.148	Very weak negative correlation
Hispanic	-0.018	Very weak negative correlation
Years of participating in the labour market (age-6-school)	-0.203	Weak negative correlation

Interpretation of significance (2-tailed) values

Independent variable name	Significance (2-tailed) value	Result (at 95% confidence interval)
Years of schooling	<0.001	Acceptable
Union member or not	0.054	Acceptable
Married	0.001	Acceptable
Black	<0.001	Acceptable
Hispanic	0.675	Not acceptable
Years of participating in the labour market (age-6-school)	<0.001	Acceptable

The next table is the ANOVA table, which determines whether the model is significant enough to determine the outcome. It looks like the one below.

Elements of this table relevant for interpreting the results are

P-value/Sig value: Generally, a 95% confidence interval or 5% level of the significance level is chosen for the study. In the above table, it is <0.001. Therefore, the result is significant.
F-ratio: It represents an improvement in the prediction of the variable by fitting the model after considering the inaccuracy present in the model (Jain, 2019). A value is greater than 1 for F-ratio yield efficient model. In the above table, the value is 16.437, which is good.

These results estimate that as the p-value of the ANOVA table is below the tolerable significance level, thus there is a possibility of rejecting the null hypothesis in further analysis.

C. Test on the basis of the results in b, test the null hypothesis that being a union member, ceteris paribus affects a person’s expected wage by a 5% significant level. Also, test the joint hypothesis that race does not affect wages. In each case formulate the null and alternative hypotheses and present the test statistic.

Interpretation will be as follows:

Independent variable name	Sig value	Hypothesis TestingResult at 95% confidence interval	Interpretation
Years of schooling	<0.001	Null Hypothesis Rejected (<0.001<0.05)	There’s a significant change in the log wages due to the years of schooling, because of the Sig. value is less than 0.001, which is less than the acceptable value of 0.05. With a 1% increase in the years of schooling, the log wages will increase by 0.088% (B value).
Years of participation in the labour market (age-6-school)	0.890	Null Hypothesis not rejected (0.890>0.05)	No significant change in log wages due to the years of participation in the labour market from age 6 school. This is because of the Sig. value is 0.890, which is more than the acceptable limit of 0.05.
Union member or not	0.006	Null Hypothesis Rejected (0.006<0.05)	There’s a significant change in the log wages due to union member or not, because of the Sig. value is 0.006, which is less than the acceptable value of 0.05. With a 1% increase in union membership, the log wages will increase by 0.117% (B value).
Married	0.020	Null Hypothesis Rejected (0.020<0.05)	There’s a significant change in the log wages due to marital status, because of the Sig. value is 0.020, which is less than the acceptable value of 0.05. With a 1% increase in marital status, the log wages will increase by 0.091% (B value).
Black	0.001	Null Hypothesis Rejected (0.001<0.05)	There’s a significant change in the log wages due to the race Black, because of the Sig. value is 0.001, which is less than the acceptable value of 0.05. With a 1% increase in the race Black, the log wages will decrease by 0.088% (B value).
Hispanic	0.420	Null Hypothesis not rejected (0.420>0.05)	No significant change in log wages due to the race Hispanic. This is because of the Sig. value is 0.420, which is more than the acceptable limit of 0.05.

Therefore, the analysis suggests that the years of schooling, union member or not, married and black race has a significant positive relationship with the log wages.

D. Consider a more general model that includes experᵢ². Compare this model with the model given above using R², adjusted R² and t-test. What is your conclusion?

This table provides the R and R² values with a model that includes experᵢ².

The R-value in this model, the value is 0.406 which is higher than the previous model which is 0.394, which is good.
The R² value in this model, the value is 0.165 which is higher than the previous model (0.155), so it is good.
Adjusted R-square in this model, the value is 0.154, which is not far off from 0.165 (R²), so it is good. Compare to the original model which is 0.145.

Interpretation will be as follows:

Independent variable name	Original model		New model (include exper2i)		Hypothesis TestingResult at 95% confidence interval	Interpretation
	t	Sig value	t	Sig value
Years of schooling	6.689	<0.001	7.100	<0.001	Null Hypothesis Rejected (<0.001<0.05)	There’s a positive relationship in the log wages due to the years of schooling, because of the t-value is more than 1.96 in both models. With a 1% increase in the years of schooling, the log wages will increase by 0.095% (B value).
Years of participation in the labour market (age-6-school)	-0.139	0.890	-2.542	0.11	Null Hypothesis rejected (0.11<0.05)	There’s a negative relationship in the log wages due to the years of participation in the labour market. T-value is less than -1.96 in both models. But in the original model, the sig value is 0.890 and the null hypothesis was not rejected. However, in the new model, the null hypothesis was rejected as the p-value is 0.11.
Union member or not	2.733	0.006	3.006	0.003	Null Hypothesis Rejected (0.003<0.05)	There’s a positive impact in the log wages due to union member or not because the t-value is more than 1.96 in both models. With a 1% increase in union membership, the log wages will increase by 0.129% (B value).
Marital status	2.342	0.020	2.361	0.019	Null Hypothesis Rejected (0.020<0.05)	There’s a positive relationship in the log wages due to marital status because the t-value in both models is more than 1.96. With a 1% increase in marital status, the log wages will increase by 0.091% (B value).
Black	-3.253	0.001	-3.191	0.002	Null Hypothesis Rejected (0.002<0.05)	The t-values on both the previous model and the new model show a negative relationship on log wages. T is less than -1.96. With a 1% increase in the race Black, the log wages will decrease by 0.194% (B value).
Hispanic	0.806	0.420	0.628	0.530	Null Hypothesis not rejected (0.530>0.05)	No relationship in log wages due to the race Hispanic in both the original model and new model as the t-value is 0. The Sig. values on both models are more than the acceptable limit of 0.05.
Experience squared			2.551	0.011	Null Hypothesis Rejected (0.011<0.05)	There’s a positive relationship in the log wages due to experience squared because the t-value is 2.551, which is more than 1.96. With a 1% increase in the experience square, the log wages will increase by 0.010% (B value).

In the new model which includes experᵢ², the significant value changes. The log wages are affected by the year of participation in the labour market and are still not affected by Hispanics in this analysis (p-value = 0.530>0.05). Thus we can consider removing only Hispanic in the new model.

E. Save the OLS residuals from the original model. Run a regression where you try to explain the residuals from the explanatory variables in the original regression. What do you find? Explain.

Here we see the histogram of residuals roughly following the shape of the normal curve that is superimposed over them.

The scatterplot gives a general idea of the relationship between the log wages and the 6 independent variables. Here there appears to be a positive relationship as there are more points in the bottom-left and top-right quarters of the plot than in the top-left and bottom-right corners.

As shown in the chart above, the residuals are normally distributed in the normal p plot of regression standardized residual and it more or less follows the line. Generally, the points do seem to follow the line so we would assume we have a normal distribution.

F. Extend the model to investigate whether black union members benefit more from union membership than non-black union members. Estimate the extended model and test the hypothesis.

There are 143 observations for each variable – black and non-black which is a member of the union for the year 1987 observation. Looking at the mean values in the Table above, one can conclude that the non-black union is the more important variable (mean = 0.7832) that influences log wages and the lower value of the mean of 0.2168 for the black union.

Table for Non-Black Union

Table for Black Union

Interpretation will be as follows:

Independent variable name	t-value	Sig value	Hypothesis TestingResult at 95% confidence interval	Interpretation
Non-Black Union Members	2.109	0.037	Null Hypothesis Rejected (0.037<0.05)	There’s a positive relationship (t=2.109 > 1.96) and significant change in the log wages due to non-black union membership, because of the Sig. value is 0.037, which is less than the acceptable value of 0.05. With a 1% increase in the non-black membership, the log wages will increase by 0.174% (B value).
Black Union Members	-2.109	0.037	Null Hypothesis Rejected (0.037>0.05)	There’s a negative relationship (t=-2.109<-1.96) and significant change in the log wages due to the Black union membership, because of the Sig. value is 0.037, which is less than the acceptable value of 0.05. With a 1% increase in the Black union membership, the log wages will decrease by 0.174% (B value).

In conclusion, non-black union members benefit more from the membership as it positively impacts the log wages than the black union members.

G. Make plots that can be used to investigate heteroskedasticity

We test the existence of heteroskedasticity and perform a linear regression with log wages as the dependent variable and the independent variables consist of: years of schooling, years of participation in the labour market (age-6-school), marital status, races black, races Hispanic and union membership.

Based on the Scatterplot output above, it appears that the spots are diffused and do not form a clear specific pattern. So it can be concluded that the regression model (the log wages and the 6 independent variables: union, marital status, exper, school, black and Hispanic) does not occur a heteroskedasticity problem.

Based on the ANOVA test, where we test statistically using the square of residuals as the dependent variable, we see the p-value is 0.152 which is more than 0.05. It means we do not reject the null hypothesis and we don’t have a heteroskedasticity problem.

References

George, D., & Mallery, P. (2013). IBM SPSS Statistics 21 Step by Step: A Simple Guide and Reference. Available at: https://mymoodle.lnu.se/pluginfile.php/7349976/mod_book/chapter/428979/IBM_SPSS_Statistics_Brief_Guide%20%281%29.pdf.

Accredited Professional Statistician For Hire. (2022). Use and Interpret Multiple Regression in SPSS. [online] Available at: https://www.scalestatistics.com/multiple-regression.html [Accessed 25 Jan. 2023].

Jain, R. (2019). How to interpret the results of the linear regression test in SPSS? [online] Knowledge Tank. Available at: https://www.projectguru.in/interpret-results-linear-regression-test-spss/ [Accessed 25 Jan. 2023].

Pallant, Julie. (2010). SPSS survival manual: a step by step guide to data analysis using SPSS. Maidenhead :Open University Press/McGraw-Hill.

Paulina's Page

SPSS

The dataset for this exercise (males.xlsx) contains data for young working males in the USA with some professional and personal characteristics. Please only use data for the year 1987. We want to explain the log wages from the other variables using the following model:

Logwage ᵢ = β₁ + β₂ school + β₃ exper ᵢ + β₄ union ᵢ + β₅ mar ᵢ + β₆ black ᵢ + β₇ hisp ᵢ + εᵢ

We assume that all εᵢ and all explanatory variables are independent and that εᵢ are independently distributed with expectation 0 and variance σ².

A. Compare summary statistics of all the variables in the model and provide a brief interpretation.

B. Estimate the parameters by OLS. Report and interpret the estimation results, including the R². Pay attention to economic interpretation as well as statistical significance.

D. Consider a more general model that includes experᵢ². Compare this model with the model given above using R², adjusted R² and t-test. What is your conclusion?

E. Save the OLS residuals from the original model. Run a regression where you try to explain the residuals from the explanatory variables in the original regression. What do you find? Explain.

F. Extend the model to investigate whether black union members benefit more from union membership than non-black union members. Estimate the extended model and test the hypothesis.

G. Make plots that can be used to investigate heteroskedasticity

References

Leave a Reply Cancel reply

SPSS

The dataset for this exercise (males.xlsx) contains data for young working males in the USA with some professional and personal characteristics. Please only use data for the year 1987. We want to explain the log wages from the other variables using the following model:

Logwage ᵢ = β₁ + β₂ school + β₃ exper ᵢ + β₄ union ᵢ + β₅ mar ᵢ + β₆ black ᵢ + β₇ hisp ᵢ + εᵢ

We assume that all εᵢ and all explanatory variables are independent and that εᵢ are independently distributed with expectation 0 and variance σ².

A. Compare summary statistics of all the variables in the model and provide a brief interpretation.

B. Estimate the parameters by OLS. Report and interpret the estimation results, including the R2. Pay attention to economic interpretation as well as statistical significance.

D. Consider a more general model that includes experᵢ². Compare this model with the model given above using R2, adjusted R2 and t-test. What is your conclusion?

E. Save the OLS residuals from the original model. Run a regression where you try to explain the residuals from the explanatory variables in the original regression. What do you find? Explain.

F. Extend the model to investigate whether black union members benefit more from union membership than non-black union members. Estimate the extended model and test the hypothesis.

G. Make plots that can be used to investigate heteroskedasticity

References

Leave a Reply Cancel reply

B. Estimate the parameters by OLS. Report and interpret the estimation results, including the R². Pay attention to economic interpretation as well as statistical significance.

D. Consider a more general model that includes experᵢ². Compare this model with the model given above using R², adjusted R² and t-test. What is your conclusion?