We will explain how to scrape websites to retrieve this information and share different analysis techniques that will equip you to answer policy analysis and evaluation questions. Moreover, we will go through methods such as Optimal Character Recognition to extract digitalised text data from physical printed documents.
Course content
Week 1
The first week will provide an overview of coding in general. This module provides students with important hard digital skills, introducing them to one of the most used community-based software packages in policy analysis: R. Despite the differences between the languages of different software packages used in policy analysis (eg R, Stata, Python), some concepts are common to these languages, such as the “if” function, loops and so on. The main functions students learn will be pave the way to learn other languages, such as Python, SQL and other more complicated languages.
Week 2
The second class will focus on data gathering: how data may be gathered and imported from the internet. This will introduce policymakers to web harvesting techniques in R. We will first discuss legal, ethical and technical considerations for scraping data on the internet. We will then study the structure of HTML pages and see the main features that are useful when doing web scraping. Finally, we will go through the main R packages and functions for web scraping and we will apply them to hands-on case studies.
Week 3
In the final week we will show how R can be used to visualise and analyse text as data. We will start with text pre-processing and some descriptive text features such as length, N-grams and so on. We will them cover the basic supervised and unsupervised approaches to text analysis: sentiment analysis, scaling techniques, topic modelling etc. We will finally go through the R packages and functions for these techniques and apply them to hands-on case studies.
Learning outcomes
Learning outcomes of this module are the following:
- To demonstrate a sophisticated grasp of concepts of coding, especially the application of coding in R for the use of complex policymaking
- To demonstrate the critical ability to undertake web scraping and text analysis in R
- To acquire the skills to critically assess web data in the context of writing policy proposals
- To have an advanced understanding of the application of data processes to the context of problem-solving in a critical way
- To learn basic concept of coding (which in future can be applied to any coding language)
- To learn specific software packages, such as R