Data

Featured Image

Data Selection

  1. Our primary dataset, StormEvent of 2020, documents the occurrence of storms and other significant weather phenomena from the NOAA’s National Weather Service. https://www.ncdc.noaa.gov/stormevents/

    It appears that the data from the StormEvent dataset was collected to document and track occurrences of storms. With the information collected, researches can determine how these storms cause loss of life, injuries, significant property damage, and/or disruption to commerce. This dataset can also be beneficial to track rare, unusual, weather phenomena, such as snow flurries in South Florida or the San Diego coastal area. In addition, the StormEvents data can help document other significant meteorological events, such as record maximum or minimum temperatures that occur in connection with another event.

  1. Another dataset we used was from the United States Census Bureau to sort the storm events by states into regions. https://github.com/cphalpert/census-regions

    This dataset appears to be curated to categorize states by regions. Narrowing down each state to a specific geographic region is beneficial for analyzing storm patterns in those regions.

  1. We also downloaded the following datasets which are used in as potential factors in modeling property damage amount, combined in the dataset “damageMatrix”

    Overall, the data obtained in the following datasets are documented to record the relationship of certain variables in respect to each state. Whether it is GDP, average precipitation, home value, or population distribution, researchers can determine how these factors influence other variables in respect to U.S. states. Variables such as the amount of damage caused by storms can affect property. These variables will be further explained.

    Gross Domestic Product Per Capita by State, from Bureau of Economic Analysis. https://www.bea.gov/data/gdp/gdp-state

    Average Amount of Precipitation by State, from Statista. https://www.statista.com/statistics/1101518/annual-precipitation-by-us-state/

    Typical Home Value by State, from Zillow. https://www.zillow.com/research/data/

    Average Elevation by State. https://www.atlasbig.com/en-us/usa-states-average-elevation

  1. We then obtained the shp file of USA from the US Census Bureau. https://www2.census.gov/geo/tiger/GENZ2019/shp/

    There are many reasons why the US Census is collected. For the purpose of this project, this data is important since it records demographic data.

Variable Description

StormEvents(relevant variables)

State The state name where the event occurred
event_type The type of storm events Ex: Hail, Thunderstorm Wind, Snow, Ice
damage_property The estimated amount of damage to property incurred by the weather event
Magnitude The measured extent of the magnitude type ~ only used for wind speeds (in knots) and hail size (in inches to the hundredth)

damageMatrix (final data set)

damageAmount total estimated amount (in dollars) of damage to property incurred by summing all thunderstorm wind occurred within each state
count Number of occurrences of Windstorms by state
GDPCAPITA Gross Domestic Product Per Capita (in dollars) by State
MAGNITUDE Average Magnitude of thunderstorm wind (in knots) by state
AvgElevation Average Elevation of Each State, in feet
AvgHome Typical Home Value by State, in dollars

Data Loading and Cleaning

The link to our load_and_clean_data.R file.

The orignal StormEvents dataset records storm events for over 60 regions, such as Porto Rico,Great Lakes. We first remove the rows that storm events are not happpened in 50 states and district of Columbia. Since the dataset contains many different types of storm events and magnitude scale are different by events, we only focus on wind storms. Therefore, we remove the rows that not belong to wind storms(such as high wind, medium wind, thunderstorm wind). Lastly, we only keep columns STATE, Damage Property and Magnitude.

We then calculate the average magnitude, total number of wind storms, total amount of damage properties by state and joined them with datasets of average home value,average elevation, and GDP CAPITA. Finally, we got the dataset called damageMatrix.

For the race dataset, We were able to find a dataset for racial breakdown of each state from the United States Census Bureau. The categories included White, Asian, Black, Hispanic, Mixed Races, Native Pacific Islander, and American Indian/Alaskans. We were able to add up all Native Pacific Islander and American Indian/Alaskans to form a new category under Natives. We later then joined another dataset with GDP for each state to form a new Dataset which is shown with an interactive table.

Outside R Package Used

The caret package - contains functions to streamline the model training process for complex regression and classification problems. https://cran.r-project.org/web/packages/caret/vignettes/caret.html

The reshape package - Flexibly restructure and aggregate data https://cran.r-project.org/web/packages/reshape/reshape.pdf

The DT package - Data objects in R can be rendered as HTML tables using the JavaScript library ‘DataTables’ https://cran.r-project.org/web/packages/DT/index.html

Previous Big Picture