top of page

Data Mining - Research on US Electricity Grid

Background Info

I took an Elective Math course at Rose-Hulman during the winter quarter of 2017-18 school year under the instruction of Dr. Yosi Shibberru. 

In the course, we learned about statistical analysis methods implemented on data sets with the goal of extracting value from the data, and the half of the time during the final 5 weeks of the course was spent researching a topic of choice. 

This page will highlight some of the findings my team of 3 found, in an attempt to research power consumption on the US electricity grid, in hopes to find optimization/ places to insert renewables. 

Project Overview

I spent the majority of my time on the project by analyzing data on current fossil fuel consumption (Coal, Oil, ect) throughout the united states. 

The biggest resource I used throughout the project was the API and information available through The United States Energy Information Administration.

We used python jupyter notebooks for much of the actual data-mining along with other python graphing/statistic package modules such as numpy, scikit-learn, and plotly. 

By the end, I had learned a great deal about where and which fossil fuel resources come from what geographical reagions as well as the concentration of their consumption across the united states from the data obtained from the US EIA's government API. 

Skills Gained

Some of the most valuable skills gained from this project were in increasing my comfortably in programming. Specifically in being able to work with API's (Application Programing Interface) to extract data/ information form a database, merge/combine that data into a single table, and then use many different visualization techniques, and statistical analysis methods to try to extract some kind of value from the data, to tell  a cool story.

Much of my analysis was performed using the python module "Pandas". I would reccomend this as a great tool for data analytics and visualization.

I gained experience in working on remote machines and in a Linux terminal environment. Analyzing some of the data required machines with more RAM than any of our laptops could provide. 

Lessons Learned

I think the biggest lesson learned was in how valuable well-kept databases can be in terms of being able to get a really clear and accurate picture of what is going on, without any speculation.

We weren't able to find any groundbreaking loop-hole in the current electricity grid, from the data available, but I learned a heck of a lot what resources are used where and why and the path that resources like coal and natural gas take. 

I also did analysis on the most efficient way to transport the energy found in coal from the mine to a consumer. 

Fun Facts

I don't know how to outline everything that i learned from the project. It had such a wide scope and I learned so many things that its hard to tie them all together into a concise story/summary. But here are a sample of some of the fun fact i learned about during the Data-mining process. 

Indiana, The state I was born and go to school in consumed the most coal in tons, second only to Texas, (the second most populous state). 

Hawaii uses Petroleum Liquid for Fuel. In a plot of the entire US consumption per ca-pita Hawaii uses so much oil that the rest of the US in a linearly-scaled heat map appears cold, except for Hawaii.

Wyoming and North Dakota are both Resource hubs for coal. They export a lot of coal to the rest of the united states. During the data analysis, when plotting the USA on a heat map of coal consumption per ca-pita, these two states stuck out as red flags. Turns out they burn some of the coal close to the mines in each state and export not only coal but also nearly 60% of all its electricity to other neighboring states. 

Resources Used - 

Source Code available upon request. Shoot me a txt or email at hillecn@rose-hulman.edu or (812) 893-1164

bottom of page