Introduction

Today, we examining two factors that are an concern for the environment: (1) C02 Emissions; and (2) Wildfires in California

We chose to examine C02 Emission data because it is the main greenhouse gas and it correlates with negative health effects. We chose to examine wildfires because they are a serious threat in the western United States.

First, we import all the relevant packages for our notebook.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(stringr)
library(readxl)
library(dplyr)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
library(ggplot2)
library(broom)
library(ggpubr)

Carbon Emissions

C02

We will first examine C02 emission data in 2016. We took this data from Kaggle. Source:https://www.kaggle.com/sansuthi/global-co2-emissions We first read and prepare the data.

Carbon_data <- read.csv("C:/Users/Wilson Bao/Desktop/CO2 Data/Emission_Data.csv")
Carbon_data2 <- as_tibble(Carbon_data)

We see this data has country names, C02 emissions, population, and life expectancy. Our data is well structure and is not missing any values.

head(Carbon_data2)
colSums(is.na(Carbon_data2))
##        Country           Code   CO2Emissions   YearlyChange      Percapita 
##              0              0              0              0              0 
##     Population LifeExpectancy 
##              0              0

We now show a scatter plot and a regression on the C02 emissions and population of the country. We can see that the top polluters, China, U.S., and India, are the large outliers.

emissions <- select(Carbon_data2, CO2Emissions, Population)
ggplot(emissions, aes(x=Population, y=CO2Emissions)) + geom_point(color='darkblue')

Our regression suggests for every million people in the population, the amount of C02 in metric tons increases by 4.951 million.

Carbon_Regression <- lm(CO2Emissions ~ Population, data = emissions)
summary(Carbon_Regression)
## 
## Call:
## lm(formula = CO2Emissions ~ Population, data = emissions)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.015e+09 -2.853e+07  6.855e+06  1.241e+07  3.441e+09 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.498e+06  3.371e+07  -0.282    0.778    
## Population   4.951e+00  2.357e-01  21.002   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.71e+08 on 206 degrees of freedom
## Multiple R-squared:  0.6817, Adjusted R-squared:  0.6801 
## F-statistic: 441.1 on 1 and 206 DF,  p-value: < 2.2e-16

When we examine the graph without outliers, we can see the general positive correlation between population and C02 emissions.

Carbon_data3 <- read.csv("C:/Users/Wilson Bao/Desktop/CO2 Data/Emission_Data_Modified.csv")
Carbon_data4 <- as_tibble(Carbon_data3)
emissions2 <- select(Carbon_data4, CO2Emissions, Population)
ggplot(emissions2, aes(x=Population, y=CO2Emissions)) + geom_point(color='darkblue')

California Wildfires

Wildfires We next examine the wildfires in California.

As we can see from the plot chart, for every increasing year, not only is there many incidents of wildfires, but there is also larger number of acres being burnt.

California_Fire <- California_Fire_Incidents <- read.csv("C:/Users/Wilson Bao/Desktop/CO2 Data/California_Fire_Incidents.csv")
California_Fire2 <- select(California_Fire, AcresBurned, ArchiveYear)
ggplot(California_Fire2, aes(x=AcresBurned, y= ArchiveYear)) + geom_point()
## Warning: Removed 3 rows containing missing values (geom_point).

yearsAcres <- select(California_Fire_Incidents, AcresBurned, ArchiveYear)
head(yearsAcres)

And as seen in this bar graph, the number of acres burned had a large uptick in 2017 and 2018. Before 2017, the number of acres burned stabilized around 500,000. Since then, the problem of wildfires has only increased, and it is imperative that protective measures be put in place to stop this trend.

str(yearsAcres)
## 'data.frame':    1636 obs. of  2 variables:
##  $ AcresBurned: int  257314 30274 27531 27440 24251 22992 20292 14754 12503 11429 ...
##  $ ArchiveYear: int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
yearsAcres2 <- as.factor(yearsAcres$ArchiveYear)
TotalByYear <- aggregate(yearsAcres, by=list(yearsAcres$ArchiveYear), FUN=sum)
TrueYearsAcres <- select(TotalByYear, Group.1, AcresBurned)

options(scipen = 4000000)
sp <-ggplot(TrueYearsAcres, main = "Wildfire", xlab = "Year", cex.names = .8, aes(x= Group.1, y= AcresBurned)) + geom_bar(stat = "identity")

sp + scale_x_continuous(breaks = seq(2013, 2018, by = 1), name="Year") 
## Warning: Removed 1 rows containing missing values (position_stack).

Conclusion

C02

With increasing trends of population, our C02 emissions are likely to increase if trends continue. Furthermore, California wildfires are increasing in the number of incidents and acres burnt, both alarming for people in the west.

That was our presentation, thank you so much.