Yemi Gabriel

View the Project on GitHub yemigabriel/UniEssexMsc

Unit 4: Linear Regression with Scikit-Learn

Unit 4: Linear Regression with Scikit-Learn on Github

Before correlation analysis, I began by handling missing data in the datasets. I then calculated and added mean columns for both data frames. Next, I merged both data frames and dropped missing data. Once the data was prepared, I calculated the Pearson correlation coefficient of the mean population and mean GDP. The resulting positive correlation, suggests that countries with larger populations tend to have higher mean GDPs. This relationship can imply that population size may play a significant role in economic output of a country.

A regression analysis was done to show the relationship between mean population (independent variable) and mean GDP (dependent variable) using a linear regression model. By fitting the model, we obtained a regression equation that indicates how much GDP is expected to change for each unit increase in population. I plotted a graph showing the regression line and how well it fits the data. This confirms that population is an important predictor of GDP in countries.