Building A Natural Disaster Alert System Using Social Media Data

Brianna Lytle
5 min readDec 16, 2019
Southern California Wildfires

Project members: De’Varus May , Sibel Tanoglu, Brianna Lytle
Project Repo: https://github.com/mayjordata/Project-TeamAlert

Introduction:

California Fires October 2019

My group members and I were given the task by New Light Technologies to create a rapid alert and notification system that would notify users about a natural disaster.

Traditional methods alert systems rely specifically on official sources such as the USGS. Our groups aimed to build a web tool that utilizes social media activity and alerts a user when the event first occurs.

When this project was assigned, California was dealing with a wildfire epidemic. We felt it was essential to focus on California wildfires.

Goal: Use social media data to build an alert system notifying residents about wildfire-related emergencies.

Methods:

Step 1: Data Acquisition

Tweepy allows users to scrape tweets within the previous 7 days. There were about 5 significant wildfires taking place across California while working on this project. We filtered the scrape of these tweets by searching for a specific term (e.g. “maria fire”), specific coordinates, and the range of miles outside those coordinates.

GetOldTweets3 is a python library that is basically an archive of previously scraped tweets. We scraped tweets from this library to (1)add to the Non-related emergency tweet class and (2) add to the language that would be used when fires were not occurring in California. This is useful for our model to identify the language, terms, and sentiment used when classifying whether a tweet is fire-

The final dataset contained 24,410 tweets. The target class (emergency-related tweets) consisted of 23.3% of our data (5,691 tweets). Our non-target class (unrelated tweets) consisted of 76.6% of our data (18,719 tweets).

Step 2: EDA — Significant Findings

Tweet Structure: During our EDA process, we found that fire emergency-related tweets were more likely to have more words. They also were more likely to contain at least one mention to another twitter account.

Most Common Quad-grams in Tweet Data.

NLP:

While investigating the test, we found there were a lot of “words” that contained random numbers and letters grouped together. When converted to text, some characters (e.g. emojis) are translated in a Unicode Hex Character. Most of these characters were accounted for in the stop words when we ran our final model.

For emergency-related tweets, the most common terms used were related to weather patterns, warning synonyms (“red flag”, “evacuation”), and to local related activities. The fire-related tweets in the Los Angeles area contained a lot of language relating to work, the “405 freeway”, and “Santa Ana Winds”).

LeBron James’ tweet made a huge impact on our data

Outliers in NLP: In Los Angeles, many celebrities were affected by the fires. Lebron James made a big impact on our model because he and his family had to evacuate from his home and spent the night seeking shelter. John Cena also donated $500,000 to the efforts of supporting the firefighters out in the field. Both athletes made an impact because other twitter accounts such as LA Times and Barstool sports picked up this information creating their actions as a trending topic. Because of this, we had to add them and their accounts to our stop words.

Step 3: Modeling

tf-idf and Random Forest visualization
Final Pipeline Model for Alert.ly Web Applications

Our group agreed that we should perform several grid searches to ensure we get the best model possible. We also knew that it would be best to use TF-idf as the model transformer. We decided to use the transformer because it would examine all of the words in a document and determine it’s importance relative to other documents.

Final Model: Tf-idf & RandomForest using only Text data.

Flask App

We made two versions of our web application. Alert.ly 1.0 allows a user to input a tweet and the application will display if the tweet is emergency-related or not. Alert.ly 2.0 displays a time series analysis of when there is an emergency based on how many emergency-related tweets were tweeted within the previous 2 hours.

ALERT.LY 1.0

We created a tool to test our model’s accuracy more efficiently. Alertly 1.0 is a responsive web application utilizes TF-idf vectorizer and Random Forest model, Flask framework and it is hosted by PythonAnywhere platform. It takes a tweet text as input and displays the outcome of language processing classification.

In the video below, you can see different uses of the word “fire” and how the model classifies the tweet. If it is fire-related, the bird will turn Red.

Tweet 1 — Emergency: Used the term “wildfire” instead of “fire”
Tweet 2 — Non Emergency: Same as Tweet 1 and adding “lol lmao” to change the sentiment
Tweet 3 — Emergency: News Source about Local Fire
Tweet 4 — Non Emergency: Using “fire” as a slang term”

Alert.ly 1.0 Demo

ALERT.LY 2.0

Time series analysis
Time Series Analysis — Classifying Spikes of Emergency Related Tweets

We used the moving average of our time series data to capture short-term (hourly) fluctuations. We simply calculated the difference between hourly related tweets and the past 2 hours average. We created a secondary application that iterates through the time series data and displays alert when there is a significant increase in the number of relevant tweets. It evaluates the trend based on a selected threshold value. As soon as it captures a spike, it pops up the alert by turning the twitter logo to red color and shows the time of the emergency.

Alert.ly 2.0 Demo

Conclusion / Next Steps

We were able to create a model that could classify tweets on whether a tweet is fire-related or not. We plan to further improve our model and the next steps by:

  • Gathering more data
  • more investigation of different features that could be used in our model
  • implementing/investigating data from other natural disasters in order to make the application more diverse
  • Further, improve/optimize Type II Errors
  • Create a model to detect anomalies of emergency related tweets
  • Integrate live streaming Twitter API with Alert.ly web app for real-time alerts
  • Final Mobile Application from ALERT.LY

--

--