Midwest Big Data Summer School Agenda

Monday, June 20 - Campanile Room
8:00 - 8:30am
Campanile Room
Registration and Light refreshments
8:30 - 8:45am
Campanile Room
Opening
Dr. Hridesh Rajan and Dr. Pavan Aduri
8:45 - 9:15am
Campanile Room
Welcome, and why are we here?
Dean Schmittmann

About Dr. Schmittmann: Beate Schmittmann has served as dean of the College of Liberal Arts and Sciences at Iowa State University since April 2, 2012. She is leading several key Liberal Arts and Sciences initiatives to promote research, student success, and college advancement. Most notable among these are the college's Signature Themes, which build on existing strengths to grow and sustain an internationally competitive profile in selected research areas. Strategic faculty hiring, seed grant funding, and enhanced support for proposal writing form an integral part of these efforts. She also furthers LAS' reputation as a student-centered college through an integrated approach to student recruitment, academic advising, and career services, a focus on high quality teaching and innovative pedagogy, especially in the STEM fields, and efforts to support the success of traditionally underrepresented students. She has built a strong external relations effort, in which development, alumni relations, and strategic communications support and enhance one another. Schmittmann is a Fellow of the American Association for the Advancement of Science, a Fellow of the American Physical Society and a winner of the organization's 2010 Jesse W. Beams Award. Her research interests focus on statistical and biological physics. She has authored or co-authored more than 100 peer-reviewed articles and one book. Schmittmann earned a diploma (M.S.) in physics from RWTH Aachen University in her native Germany (1981), and a Ph.D. in physics from the University of Edinburgh, Scotland (1984). Prior to joining ISU, she was a member of the physics faculty at Virginia Tech, Blacksburg, since 1991 and served as the physics department chair since 2006. More

9:15 - 10:00am
Campanile Room
Perspectives on Data Driven Discovery
Dr. Sarah Nusser

Data driven science represents a paradigm shift in how we engage with discovery. It affects organizations from all sectors, and is having profound impacts on nearly every field of endeavor and how we think about research collaboration. In this presentation, we will discuss features of data driven discovery and the role data science plays in enabling this transformation, how the NSF Big Data Innovation Hub program is helping to speed the exchange of knowledge across sectors, and what lies ahead with the promise of open data.

About Dr. Nusser: Dr. Sarah Nusser is Vice President for Research and a Professor in the Department of Statistics at Iowa State University. As VPR, she works with faculty to spear head the Data Driven Science Initiative at ISU. She is a co-PI for the Midwest Big Data Hub, which seeks to accelerate partnerships that foster the continued development of data driven discovery. Prior to joining the Office of the Vice President for Research in 2014, Dr. Nusser served as the director of the Center for Survey Statistics and Methodology at Iowa State University for 15 years, where she conducted research in statistical sampling and measurement error models for both land-based and human population surveys. More

10:00 - 10:30am
Campanile Room
Big Data - the education and engagement challenge
Dr. Wolfgang Kliemann

Across the US, science seems to be in a phase of having to justify its approach, its results, and its value for decision making in community, state and federal contexts. Communicating research and science is one of the priority topics at many meetings of university associations and of science communities. Big Data has the potential to widen the gap between scientific information and its naive, intuitive perception. Simply because Big Data is as invisible and untouchable as subatomic particles, but it can serve as the basis for decision making in industries and governments on all levels.

In this presentation we will explore some approaches for the scientific community to use data, and Big Data, as a tool for interaction with the public of all age groups to foster better understanding of research and science, their potential and their limitations.

About Dr. Kliemann: Professor of mathematics Wolfgang Kliemann joined the Office of the Vice President for Research July 1, 2014, as an associate vice president for research. Wolfgang came to Iowa State in 1983. He served as associate dean for research in the College of Liberal Arts and Sciences from 2000 to 2001, as associate vice provost for research from 2001 to 2005, and as chair of the Department of Mathematics from 2008 through 2013. More

10:30 - 11:00am
Campanile Room
Break - refreshments provided
11:00 - 12:30pm
Campanile Room
Big Data in Context Panel
Dr. Arne Hallam (moderator), Dr. Xiaoqiu Huang
Dr. Kevin Kane, Dr. Eric Rozier, and Dr. Jason Wille.
12:30 - 2:00pm
Campanile Room
Lunch - Box lunches provided
2:10 - 3:10pm
Campanile Room
Big Data in Context Panel (continued)
Dr. Joe Colletti, Dr. Gary Mirka (moderator),
Dr. Sree Nilakanta and Dr. Arun Somani
3:10 - 3:40pm
Campanile Room
Break - refreshments provided
3:40 - 5:30pm
Campanile Room
Introduction to Statistics
Dr. Kris De Brabanter

This module will provide summer school participants a gentle introduction to probability and statistics concepts and prepare them for later modules in this summer school.

  • Slides: [PDF]
  • Supporting material: see last page of slides above.

About Dr. De Brabanter: Dr. Kris De Brabanter is an assistant professor of Statistics at the Department of Statistics at Iowa State University. His research interest are in mathematical statistics, nonparametric regression, analysis of big data sets, machine learning, model selection methods, density estimation, nonparametric inference. More

Tuesday, June 21 - South Ballroom, Campanile Room
8:00 - 8:30am
South Ballroom
Light refreshments
8:30 - 10:30am
South Ballroom
Introduction to Python
Dr. Steve Kautz

This module is aimed at introducing audience to the Python programming language and programming concepts.

  • Slides: [PDF]
  • Supporting material: [Link]

About Dr. Kautz: Dr. Kautz holds an M.S. in computer science and a Ph.D. in mathematics from Cornell University. Prior to joining the teaching faculty at Iowa State he spent 10 years on the faculty of Randolph College of Lynchburg (Virginia) and then 8 years as a senior software engineer for NewMonics (later acquired by Aonix, Inc), the developers of the PERC(tm) virtual machine, a platform for real-time Java. Dr. Kautz's time as an engineer was divided between work on the virtual machine itself, including contributions to several implentations of threads, and consulting services for customers. As part of the latter effort he developed a one-week, hands-on course on concurrent Java that has been presented to teams of developers worldwide over an 8-year period. Dr. Steve Kautz is currently a lecturer of Computer Science at Iowa State University. More

10:30 - 11:00am
South Ballroom
Break - refreshments provided
11:00 - 12:30pm
South Ballroom
Introduction to Python (continued)
Dr. Steve Kautz
12:30 - 2:00pm
South Ballroom
Lunch - Box lunches provided
2:10 - 3:10pm
Campanile Room
Introduction to R
Dr. Heike Hofmann

This module introduces R, a widely popular language and environment for statistical computing and graphics. This module is a prerequisite for the visualization module.

  • Slides: [Link]
  • Supporting material: see above.

About Dr. Hofmann: Dr. Heike Hofmann is a professor of Statistics at the Department of Statistics at Iowa State University. Her areas of interest are Data Visualization, Multivariate Categorical Data Analysis, Statistical Computing, Exploratory Data Analysis and Interactive Statistical Graphics. More.

3:10 - 3:40pm
Campanile Room
Break - refreshments provided
3:40 - 5:30pm
Campanile Room
Introduction to R (continued)
Dr. Heike Hofmann
Wednesday, June 22 - South Ballroom, Campanile Room
8:00 - 8:30am
South Ballroom
Light refreshments
8:30 - 10:30am
South Ballroom
Data Acquisition
Dr. Adisak Sukul

This module will introduce participants to challenges in and solutions to data acquisition. This module will build on the Python module.

  • Slides: [PDF]
  • Supporting material: [Link].

About Dr. Sukul: Dr. Adisak Sukul obtained his Ph.D. in Computer Science from Chulalongkorn University. Following his Ph.D., he was a visiting researcher at Iowa State University, lecturer in the computer science department at the King Mongkut's Institute of Technologies Ladkrabang, assistant director of Computer Service Center at the King Mongkut's Institute of Technologies Ladkrabang. He was also EIFL - OA/FOSS Country Coordinator for Thailand, Coordinate for the Open Access and the Free and Open Source Software working groups for EIFL (Electronic Information for Libraries), a global non-profit organization for developing country. Dr. Sukul has over 14 years of experience in IT Project Management and System Architect, and has co-founded three software companies in Thailand. Dr. Sukul has also consulted on e-Library and Institutional repository development project for various organizations including Thailand House of Representatives, Bangkok Metropolitan Administration, numbers of libraries and universities. Dr Adisak Sukul is currently a lecturer of Computer Science at Iowa State University. More

10:30 - 11:00am
South Ballroom
Break - refreshments provided
11:00 - 12:30pm
South Ballroom
Data Processing
Dr. Adisak Sukul
12:30 - 2:00pm
South Ballroom
Lunch - Box lunches provided
2:10 - 3:10pm
Campanile Room
Management, Access, and Use of Big and Complex Data
Dr. Beth Plale

Part I - Data Pipelines in e-Science: What is a data pipeline? Data rarely instantly show up ready to use in whatever exploratory purpose a science researcher may have in mind. Data from creation to use undergoes numerous steps, some of which are end products in themselves. This session discusses data lifecycle, data pipeline, e-Science, cyberinfrastructure, Big Oh notation, and data analysis.

  • Video: [Video]
  • Discussion Readings: [Link].

About Dr. Plale: Beth A. Plale is a Full Professor of Informatics and Computing at Indiana University where she directs the Data To Insight Center and serves as Science Director of the Pervasive Technology Institute. Dr. Plale's research interests are in Big Data, long-term preservation and curation of scientific and scholarly data, large-scale data management, metadata and provenance, data trustworthiness and security, and data-driven cyberinfrastructure and cloud computing. Plale is deeply engaged in interdisciplinary research and education in earth and environmental sciences, digital humanities, health, and social sciences. Professor Plale's postdoctoral studies were at Georgia Institute of Technology, and her PhD in computer science from State University of New York Binghamton. Her deep interest in technology for societal change arises in part from the MBA she received at the same time as spending a handful of years working in Southern California as a software developer. Plale is founder and Co-director of the HathiTrust Research Center which provisions analysis to nearly 14 million digitized books from research libraries, past chair of the Technical Advisory Board (TAB) of the 3,500+ member international Research Data Alliance (RDA), and is vice-chair of RDA/US. She is Department of Energy (DOE) Early Career Awardee and past Fellow of the Midwest university consortium CIC's Academic Leadership Program. More

3:10 - 3:40pm
Campanile Room
Break - refreshments provided
3:40 - 5:30pm
Campanile Room
Management, Access, and Use of Big and Complex Data (continued)
Dr. Beth Plale

Part II - Pipelines in Business: This session introduces the business perspective of data pipelines. It draws inspiration from a 2011 talk by Wernert Vogels "Data Without Limits". Vogels is CTO of Amazon, and in this nice 2011 talk discusses data pipelines in context of business computing. He argues that cloud computing is core to a business model "without limits". The pipeline he proposes is: collect | store | organize | analyze | share. Vogels talks about mapreduce extensively during his discussion of analysis.

  • Video: [Video]
  • Discussion Readings: [Link].

Thursday, June 23 - South Ballroom, Campanile Room
8:00 - 8:30am
South Ballroom
Light refreshments
8:30 - 10:30am
South Ballroom
Applied Text Mining
Dr. Drew Zhang

This module will introduce techniques for applied text mining. It will also explore popular tools for text mining such as the NLTK and SpaCy.

  • Slides: [PDF]
  • Supporting material: see above.

About Dr. Zhang: Zhu ("Drew") Zhang is an associate professor of Information Systems in the College of Business, Iowa state University. He obtained his Ph.D. in Computer and Information Science from University of Michigan. His core expertise is in natural language processing, web search/mining, and applied machine learning. More

10:30 - 11:00am
South Ballroom
Break - refreshments provided
11:00 - 12:30pm
South Ballroom
Applied Text Mining (continued)
Dr. Drew Zhang
12:30 - 2:00pm
South Ballroom
Lunch - Box lunches provided
2:10 - 3:10pm
Campanile Room
Big Data Visualization
Dr. Heike Hofmann

This module is designed to help you get started with creating elegant and high quality graphics in R, based on the ggplot2 package. The module will be data centric, with lots of different data sets that illustrate examples of the different techniques used for different problems.

The module will be a mix of instruction and follow-up exercises. You are encouraged to bring your own laptops, with software already loaded.

  • Slides: [Link]
  • Supporting material: See link above

About Dr. Hofmann: Dr. Heike Hofmann is a professor of Statistics at the Department of Statistics at Iowa State University. Her areas of interest are Data Visualization, Multivariate Categorical Data Analysis, Statistical Computing, Exploratory Data Analysis and Interactive Statistical Graphics. More.

3:10 - 3:40pm
Campanile Room
Break - refreshments provided
3:40 - 5:30pm
Campanile Room
Big Data Visualization (continued)
Dr. Heike Hofmann
Friday, June 24 - South Ballroom
8:00 - 8:30am
South Ballroom
Light refreshments
8:30 - 10:30am
South Ballroom
Machine Learning I: Introduction
Dr. Kris De Brabanter

This module will introduce machine learning concepts and explain their usage via practical examples. The machine learning modules will use the R language.

  • Slides: [PDF]
  • Supporting material: see last page of slides above.

About Dr. De Brabanter: Dr. Kris De Brabanter is an assistant professor of Statistics at the Department of Statistics at Iowa State University. His research interest are in mathematical statistics, nonparametric regression, analysis of big data sets, machine learning, model selection methods, density estimation, nonparametric inference. More

10:30 - 11:00am
South Ballroom
Break - refreshments provided
11:00 - 12:30pm
South Ballroom
Machine Learning II: basic to advanced methods
Dr. Kris De Brabanter

This module will continue introducing advanced machine learning concepts such as support vector machines and k-means clustering.

  • Slides: [PDF]
  • Supporting material: see last page of slides above.

12:30 - 2:00pm
South Ballroom
Lunch - Box lunches provided
2:10 - 3:10pm
South Ballroom
Introduction to Scalable Tools for Big Data
Dr. Robert Dyer

This module will provide a gentle introduction to scalable tools for Big Data Analytics such as Apache Hadoop and Spark. It will also identify key challenges in effectively using such tools and common pitfalls.

  • Slides: [PDF]
  • Supporting material: see slides above.

About Dr. Dyer: Robert Dyer is an Assistant Professor in the Department of Computer Science at Bowling Green State University. He received his Ph.D. from Iowa State University in 2013. His research areas are in Software Engineering, Big Data applications, and Programming Languages. Currently his research focuses on the Boa project, that provides a domain-specific language and infrastructure to allow researchers to easily mine a very large number of software repositories. Robert has served on the organizing committee for ICSE, program committee for Modularity and OOPSLA Artifacts, and reviewed for journals such as Empirical Software Engineering. He is a member of ACM SIGSOFT and SIGPLAN and is the ACM SIGSOFT Webinar Coordinator. More

3:10 - 3:40pm
South Ballroom
Break - refreshments provided
3:40 - 5:30pm
South Ballroom
Introduction to Scalable Tools for Big Data (continued)
Dr. Robert Dyer
5:30pm - 5:45pm
South Ballroom
Big Data Summer School Closing Session