class: left, middle, title-slide # Teaching computational methods and tools ### Michael Scharkow | Zeppelin University |
michael.scharkow@zu.de
### ICA 2019 Preconference: Expanding Computational Communication ### These slides:
https://underused.org/slides/ica19
--- # Disclaimer - Teaching formats and styles for methods classes vary a lot across universities, especially between Europe and the US - I assume you have between 90 and 180 minutes per session, about 12 sessions per term, and between 10 and 25 students in a single class - I assume the computational methods class is **not** mandatory, so we can expect motivated students --- # Curriculum and requirements - A computational methods class is rarely scheduled in the first-year - When should we schedule the class and for whom? - Senior undergrade students with previous knowledge (of what?) - First year grad students who might have studied in another program? - What kind of knowledge and skills should we expect? - Research design and data collection - Statistics 101 and more? - Math - (do comm majors ever take math classes?) - Computer science and basic programming skills - Internet and web technology literacy, aka "what the hell is that app for?" --- # Syllabus What **should** be part of a computational methods class? 1. Web scraping, online APIs 1. Automatic text analysis (supervised, unsupervised) 1. Computer vision (and audio-visual analyses?) 1. Large-scale, dynamic social network analysis 1. Statistical simulation (data-based, Monte-Carlo) 1. Agent-based modeling 1. Machine learning (traditional and neural networks) 1. Advanced statistical modeling 1. ... something something with Artificial Intelligence --- # Other content What about more general topics? 1. Research design 2. Research ethics 1. Data collection, storage, management 1. Efficient programming 1. Open and reproducible science One could argue that these are the more **sustainable** and general skills, which also help in non-computational field. --- # Breadth vs depth - There is substantial demand (both from students and other faculty) for overview courses that cover many methods and topics. - The goal is to discuss **fundamentals**, and show students "whats's out there" - If you want to include any practical exercises, it's often a quick succession of **tech demos** with little time to really learn a method - Alternatively , you can focus on single topics (e.g. automatic content analysis, or simulation), and cover these from design to data collection to analysis - If multiple classes are offered on different computional methods, there will likely be some **redundancy**, e.g. collecting online data, or statistical modeling, - If different people teach these classes, students will be confused by the **variety** of tools/languages/styles used by different instructors --- # Theory and research areas - How much **theory** is needed, and which one? <br/>(Item Response Theory is theory, too!) - Should the class be focused on a specific phenomenon, topic or research area? - **Yes**, all students have a common goal and shared understanding of the topic - **No**, sound methods can be applied everywhere, and students will be more motivated if they can choose their own topics - Should we cover the instructive papers from outside communication research, at the risk of talking a lot about other research areas? --- # Practical work in class - There are two basic setups for methods classes? - Lecture with few interactions plus a **separate lab**, often taught by a TA - **Combined class** with lecture and practical exercises withine one session - With a limited number of students, I strongly favor coupling lecture and exercises within a single session, e.g. 1 hour lecture, 1 hour exercises - Instead of short weekly lectures, use longer bi-weekly or half-day sessions since you'll lose a few minutes at the start of each lab to troubleshooting <br/>(i.e. students can't find their own project files or open a CSV file in week 6) - If you can, get a TA or peer supporters for the lab sessions, in order to **minimize waiting times** for students with technical difficulties --- # More practical issues - Have students work in **pairs**, so that while one student clicks and types, the other can listen to what you say - Use many (very) **small exercises** instead of one large one in order to minimize risk and provide quick feedback - If you use a programming language such as R or Python, use a good, cross-platform editor or IDE - It's difficult to strike a balance between **prepared code** to copy & paste and having students **type everything** themselves - Prepare quick **solutions** for those who have fallen behind in an exercise --- # Infrastructure - A computer lab with **pre-installed software** makes things easier in class, but harder at home - If you let students work on their **own computers**, they (and you) will be frustrated early on because of installation issues unsure file locations - However, in the long term, students will feel more empowered and motivated if they can run analyses on their own computers - **Never rely** on commercial tools, even when your university provides them at a discount or for free. - Remind your students that tools/libraries/programming style are not set in stone, and things can be (and are) done differently --- # Tools 1. Fancy point-and-click tools - GUI apps - [Facepager](https://github.com/strohne/Facepager) - Web apps - [AmCAT](https://amcat.nl) or [Social Feed Manager](https://gwu-libraries.github.io/sfm-ui/) 2. Ready-made solutions - CLI apps - [Instagram Scraper](https://github.com/rarcega/instagram-scraper) - API services - Google, Amazon, Microsoft, IBM, etc. 3. General purpose languages and specialized libraries - R - [quanteda](https://quanteda.io/), [tidytext](https://tidytextminning.com), [rtweet](https://rtweet.info/), [RFacebook](https://github.com/pablobarbera/Rfacebook), [rvest](https://github.com/tidyverse/rvest) - Python - [spaCy](https://spacy.io/), [scikit-learn](https://scikit-learn.org/), [tweepy](https://www.tweepy.org/) --- # How low (level) can you go? Do I start with high-level tools and proceed with lower level tools, or vice-versa? - Intuitively, **starting simple** (i.e. with a high-level tool) and proceeding to more complex solutions is easier - However, one could argue that a student can **evaluate** and appreciate a high-level tool better if she has already done the job using a low-level approach - Moreover, students will learn more about **computational thinking** when they learn to write a simple program - Doing everything from scratch in Fortran 77 is not as impressive as you think --- # External services Do I use a local toolchain or online APIs? - Ideally, we do not want to rely on **black box solution**s for research - In some areas (e.g. computer vision) the **initial setup** and the amount of **training data** needed is really difficult to manage in classroom setups - Using available APIs can generate some **early results** and might motivate further study - Moreover, learning to (re-)use available tools, data and services for your own research project is a **valuable skill** inside and outside academia - As above, I'd suggest starting with **simple local applications** and then, if it makes sense, use online APIs (accessed programmatically in R or Python) --- # Homework, exams and term papers - **Group work** is generally less risky, more realistic and also often more appropriate for larger final projects - **Replication** (or even reproduction) of published studies is always a good idea - Think about alternatives to the traditional end-term paper, e.g. a more accessible data journalism article or a **blog post** or a **poster** - Require all code to be submitted, but do not grade the code quality - I don't have good ideas for multiple choice exams, but for conceptual discussions, **short** essays are very effective as written exams --- # Closing remarks - It's **never** about the tools - You can learn to write code in many languages, no need to stick to only one - Even students with minimal (no) previous methods exposure will appreciate practical coding exercises because it makes the algorithmic black box **less mysterious** - The most important skill in computational communication research is to match **theoretical ideas** to formal **models** and practical **implementations** in software. - Given the methodological specialisation in other areas (psycho-physiology, qualitative methods), students will have to choose a **limited set of methods** to learn, but learn properly. --- class: center, middle # Thank you! michael.scharkow@zu.de | [@mscharkow](https://twitter.com/mscharkow) https://underused.org/slides/ica19