Date of Award
Spring 2019
Document Type
Restricted Thesis
Terms of Use
© 2019 Byron Biney. All rights reserved. Access to this work is restricted to users within the Swarthmore College network and may only be used for non-commercial, educational, and research purposes. Sharing with users outside of the Swarthmore College network is expressly prohibited. For all other uses, including reproduction and distribution, please contact the copyright holder.
Degree Name
Bachelor of Arts
Department
Educational Studies Department, Computer Science Department
First Advisor
Edwin Mayorga
Second Advisor
Richard H. Wicentowski
Abstract
Social media data mining is a relatively new field, but it is also being increasingly applied to research on activism, healthcare, and several other fields implicating public service. Social media provides large corpuses of informal unstructured text data written by user accounts, creating unparalleled opportunities for data collection and data analysis in all of these related areas. Twitter is commonly used for data mining due to its popularity and due to the openness of the website's privacy policy to research. The question remains whether or not there are applications of social media data mining to educational policy and practice. Twitter posts show a 140-character limit, increasing the use of emoticons, abbreviations, slang, and other terms that are more commonly found in unstructured text data. These extra features of social media data require additional preprocessing and management before some of the most wellknown text-mining algorithms could be applied for data analysis. STEM Education is of interest given the nature of social media as digital services, and also due to the large amount of research devoted to measuring public outreach in STEM among patterns of underrepresentation in the field. This work reports on the results of LOA topic modeling of to a dataset of 198,030 tweets containing the hashtag "#STEM", each tweet made between a 3-month period from March 2018 to June 2018. LOA is a programmable algorithm made to search through a set of text documents for latent topics by identifying words that are commonly used together. LOA was applied to this dataset to identify the topics discussed about STEM on social media spaces. We reflect upon how our LOA results inform us of the public perception of STEM fields, and we reflect on whether using an unsupervised text mining approach provides coherent results for disciplines such as Education.
Recommended Citation
Biney, Byron , '19, "LDA Clustering on the #STEM Dataset: Applications of Unsupervised Text Mining to Education" (2019). Senior Theses, Projects, and Awards. 369.
https://works.swarthmore.edu/theses/369