11. Get Down with Your Data: Learn to web scrape, clean, visualize and preserve data!

Instructors: Adam Doan, Carrie Breton, Lucia Costanzo (University of Guelph Library)

Researchers spend a great deal of time gathering and cleaning data so that it can be analyzed and visualized. This time consuming task can take up to 80% of the total project time but, if done correctly, it can save time later on by building a solid foundation for data analysis and visualization. Once the data is properly cleaned, visualizations can be generated quickly to aid in understanding difficult concepts and finding new patterns.

This workshop will guide participants through:

  • Scraping websites for data - Learn how to quickly and automatically collect data from the web.  Various web scraping methods exist from manually copying and pasting content to using parsers, web-scraping software, and APIs.
  • Cleaning raw data - Understand the process of identifying corrupt and missing data and using OpenRefine to cleanse the data.
  • Visualizing clean data - Generate data visualizations to communicate visually the analytical results formed from the clean data.
  • Preserving data - Learn about strategies, tools and best practices for preparing data for sharing and preservation.

This workshop will have engaging demonstrations and participants will have a chance to practice with data and hands-on exercises related to the Digital Humanities.  Participants will be required to bring their own laptop and software installation instructions will be provided prior to the workshop. At the end of workshop, participants will be comfortable with using various tools to harvest, clean, visual and preserve data.