2018 Midwest Big Data Summer School Agenda - Big Data Applications Track

Thursday, May 17 - Location: Sun Room
8:00 - 8:45am
Location: Sun Room
8:45 - 10:15am
Location: Sun Room
An Introduction to Computational Propaganda Research
Dr. Shawn Dorius

Abstract: As an ever-larger share of social life occurs in online environments, malicious actors (e.g. individuals, organizations, and states) use novel technologies to more efficiently engage with, and potentially influence, the public. While efforts to influence the public via coordinated disinformation campaigns are not new, there is concern that the application of computational methods to conventional propaganda campaigns represents a new kind of threat to social institutions, civic engagement, social cohesion, and public well-being. The volume and granularity of the data exhaust flowing from social media engagement have made it possible to identify actors directing such campaigns (and to do so faster and more accurately than previously possible) and to establish cause and effect relationships between disinformation and public opinions, sentiments, and behaviors. Likes, tweets, retweets, up votes, comments, page edits, and reviews are just a few of the areas in which researchers can observe the effects of disinformation campaigns on the public. In this workshop, you will learn how to think like an investigative social scientist, look for patterns and anomalies, identify distortionary influencers, and evaluate public response to disinformation exposure using observational and experimental methods. The workshop will feature sample R code, training datasets, and a little hands-on work with some of the models and methods for the emerging field of computational propaganda.

About Dr. Shawn Dorius: More

10:15 - 10:45am
Location: Sun Room
Break - refreshments provided
10:45 - 12:15pm
Location: Sun Room
Tools for Big Data-Driven Research: Data-Squashing using Numerical Moment Matching and Data-Curing using Fractional Hot Deck Imputation
Dr. In-Ho Cho

Abstract: With the rise of advanced computational power and large-scale databases in the science and engineering fields, strong demand emerges for novel computational and numerical tools that can facilitate big data-oriented research. The topic of this talk will be twofold. First, I will present a numerical moment matching (NMM) technique and a relevant 64-bit window program that can be directly used to represent (or squash) large-scale irregular data populations. This technique will help researchers to quantify uncertainty and foresee how Big Data will behave (structurally, economically, biologically, etc.) in conjunction with advanced simulation/prediction models. Next, I will talk about a general data-curing tool, an R package named FHDI, which stands for fractional hot deck imputation in statistics. FHDI exhibits little restrictions to prior knowledge, statistical assumptions, a large number of variables, and complex missing patterns of original big data. Hands-on practical examples and publicly open programs will be presented through the tutorial-style talk.

About Dr. In-Ho Cho: Dr. Cho is an assistant professor at Iowa State University's Department of Civil, Construction and Environmental Engineering. He began at Iowa State University in 2014. His research focuses on novel big data approaches to traditional engineering fields, including earthquake engineering and structural engineering. Advanced parallel computing technology and cutting-edge multi-scale finite element analyses are his primary research focuses. Dr. Cho is actively applying theories of structures and computational engineering to the areas of micro soft robotics. More

12:15 - 1:30pm
Location: Sun Room
1:30 - 3:00am
Location: Sun Room
Natural Language Processing with Python
Dr. Sowmya Vajjala

Abstract: Unstructured text data is everywhere, in the form of web pages, social media posts, news articles, blogs, and a lot of other documents. Natural Language Processing (NLP) deals with developing methods to solve various language processing problems involving these large  quantities of text data. Some examples of real-world problems where NLP is useful are: email spam classification, machine translation, question answering/search, automatically tagging content, identifying fake news etc. In this tutorial, we will learn about working with textual data using the state of the art methods in NLP and Python.  I will demonstrate the use of nltk and gensim libraries for processing text, and scikit-learn and keras libraries for building predictive models with the processed textual data. I will take text classification as my primary use case, and demonstrate how these tools enable us to build text classification models and use them in our own applications.

NLP Tutorial

About Dr. Sowmya Vajjala: More

3:00 - 3:30pm
Location: Sun Room
3:30 - 5:00pm
Location: Sun Room
Analytics in Transportation
Dr. Anuj Sharma

About Dr. Anuj Sharma: Dr. Anuj Sharma is an associate professor in the Civil Construction and Environmental Engineering Department at Iowa State University. He also holds a joint appointment as a research scientist with the Institute of Transportation. In these positions, he teaches transportation engineering courses to undergraduate and graduate civil engineering students, conducts research in the transportation operations area and participates in numerous professional organizations. Dr. Sharma’s research has been recognized by many funding agencies. Dr. Sharma is currently leading research at the REACTOR (REaltime AnalytiCs of TranspORtation data) laboratory. The High Performance Cluster (HPC) assembled for the lab is able to ingest multiple streams of real-time data from multiple sources. The current efforts are focused on ingestion, real-time analytics, batch processing, visualization/front end development and archiving of data streams. Some of the tools developed under this effort can be found at the lab's website. REACTOR-HPC and a memorandum of understanding with Iowa DOT has placed Iowa State University among one of the very few facilities in the US transportation arena using big data analytics in the field of transportation. More