Date of Award

Spring 2019

Document Type

Restricted Thesis

Terms of Use

© 2019 Byron Biney. All rights reserved. Access to this work is restricted to users within the Swarthmore College network and may only be used for non-commercial, educational, and research purposes. Sharing with users outside of the Swarthmore College network is expressly prohibited. For all other uses, including reproduction and distribution, please contact the copyright holder.

Degree Name

Bachelor of Arts

Department

Educational Studies Department, Computer Science Department

First Advisor

Edwin Mayorga

Second Advisor

Richard H. Wicentowski

Abstract

Social media data mining is a relatively new field, but it is also being increasingly applied to research on activism, healthcare, and several other fields implicating public service. Social media provides large corpuses of informal unstructured text data written by user accounts, creating unparalleled opportunities for data collection and data analysis in all of these related areas. Twitter is commonly used for data mining due to its popularity and due to the openness of the website's privacy policy to research. The question remains whether or not there are applications of social media data mining to educational policy and practice. Twitter posts show a 140-character limit, increasing the use of emoticons, abbreviations, slang, and other terms that are more commonly found in unstructured text data. These extra features of social media data require additional preprocessing and management before some of the most wellknown text-mining algorithms could be applied for data analysis. STEM Education is of interest given the nature of social media as digital services, and also due to the large amount of research devoted to measuring public outreach in STEM among patterns of underrepresentation in the field. This work reports on the results of LOA topic modeling of to a dataset of 198,030 tweets containing the hashtag "#STEM", each tweet made between a 3-month period from March 2018 to June 2018. LOA is a programmable algorithm made to search through a set of text documents for latent topics by identifying words that are commonly used together. LOA was applied to this dataset to identify the topics discussed about STEM on social media spaces. We reflect upon how our LOA results inform us of the public perception of STEM fields, and we reflect on whether using an unsupervised text mining approach provides coherent results for disciplines such as Education.

Recommended Citation

Biney, Byron , '19, "LDA Clustering on the #STEM Dataset: Applications of Unsupervised Text Mining to Education" (2019). Senior Theses, Projects, and Awards. 369.
https://works.swarthmore.edu/theses/369

Download

COinS

Senior Theses, Projects, and Awards

LDA Clustering on the #STEM Dataset: Applications of Unsupervised Text Mining to Education

Date of Award

Document Type

Terms of Use

Degree Name

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Search

Browse

Resources

Senior Theses, Projects, and Awards

LDA Clustering on the #STEM Dataset: Applications of Unsupervised Text Mining to Education

Author

Date of Award

Document Type

Terms of Use

Degree Name

Department

First Advisor

Second Advisor

Abstract

Recommended Citation

Share

Search

Browse

Resources