Analysis of Social Media course
From ScribbleWiki: Analysis of Social Media
Contents |
[edit] Overview & Description
The class means Tuesday 4:30-6:30 in Wean Hall 4623. The instructors are William Cohen and Natalie Glance (Google Pittsburgh). The course is MLD 10-802 and also LTI 11-772.
The most actively growing part of the web is "social media"—e.g.. wikis, blogs, bboards, and collaboratively-developed community sites like Flikr and YouTube. This seminar course will review selected papers from the recent research literature that address the problem of analyzing and understanding social media. This will be a 6-credit course, with the primary workload being attending class and presenting material.
Topics that will be covered include:
- Text analysis techniques for sentiment analysis, analysis of figurative language, authorship attribution, and inference of demographic information about authors (e.g., age or sex).
- Community analysis techniques for detecting communities, predicting authority, assessing influence (e.g. in viral marketing), or detecting spam.
- Visualization techniques for understanding the interactions within and between communities.
- Learning techniques for modeling and predicting trends in social media, or predicting other properties of media (e.g., user-provided content tags.)
Students should have a machine learning course (e.g., 15-781 or 15-681) or consent of the instructor. The content of the course will be complimentary to another new course, “The Social Web: Content, Communities, and Context” (05-320/05-820) which is also being offered in fall 2007.
[edit] Course Projects
For those students that have elected to upgrade the course to a full 12-credit course and submit a course project:
- By 10/18, everyone should send the instructors a three-page writeup of your proposed project describing: the problem you are studying, the inputs and outputs of the method you plan to develop; the dataset you plan to use; and a short discussion of what techniques you plan to use.
- The final project will be due midnight EST on 12/13, and will be a paper, in the format used by ICWSM i.e., 8 pp 2-col conference paper format. (Unfortunately the ICWSM deadline is earlier than this, Dec 3, but so it goes).
[edit] Schedule
[edit] August and September
- Aug 28: Organizational meeting (William). Slides.
- Sep 4: Lecture on sentiment analysis (William). Slides Sentiment demos.
Papers discussed: Turney, ACL 2001, Pang et al, EMNLP 2002, Wiebe et al, Computational Linguistics 2005
- Sep 11: Lecture on graph-based analysis (Natalie). Slides part 1; Slides part 2.
Papers discussed: Kleinberg et al, ICCC 1999, Page et al, 1999
- Sep 18: Lecture on generative models of text and links (Guest lecture from Ramesh Nallapati). Slides.
Papers discussed: Cohn and Hoffman, NIPS 2001, Erosheva et al, PNAS 2004 , Rosen-Zvi et al, UAI, 2004, McCallum et al, IJCAI 2005, Dietz et al, ICML 2007 . Ramesh also suggested some background reading papers on PLSA, LDA, and topic models.
- Sep 25. Lecture on advanced topics in graph-based analysis (Guest lecture from Christos Faloutsos). Slides part 1; Slides part 2a;
Slides part 2b; Slides part 2c.
Papers discussed: Sun et al, KDD 2006, Wang et al, SRDS 2003, Chakrabarti et al, KDD 2004.
[edit] October
- Oct 2. More on graph-based analysis.
- William: local navigation in networks. Papers discussed: Travers & Millgram, Sociometry 1967; Kleinberg, STOC 2000; Liben-Nowell et al, PNAS 2005. Slides.
- Student 1: Mary McGlohon. Reading List. Slides.
- Oct 9. Spam in weblogs and social networks.
- Student 1: Moira Burke Papers: Mishne et al, WWW 2005, Zinman and Donath, CEAS 2007 Slides
- Student 2: Jingrui He (Reading List) Slides
- William: Background on link-oriented spam detection methods. Papers (planned to be) discussed: Gyongyi and Garcia-Molina, 2004; Boykin and Roychowdhury, 2005; Gerecht et al, 2005; Kolari et al, 2006. We actually only covered the first of these. Slides.
- Oct 16. Visualization of social media. (Guest lecture from Matt Hurst, Microsoft LiveLabs). Interested in meeting Matt while he's here?
- Oct 23. Viral marketing and the spread of influence.
- Student 1: Sameer (Reading List) Slides
- Student 2: Xiaonan (Reading List) Slides
- Student 3: Udhay (Reading List) Slides
- William: Continuation of the Oct 9th lecture. Papers discussed: Boykin and Roychowdhury, 2005; Gerecht et al, 2005; Kolari et al, 2006. Slides.
- Oct 30. Trend analysis and marketing.
- Student 1: Mohit Kumar(Reading List)
- Student 2: Yichia (Reading List) Slides
- Natalie: Marketing applications of social network analysis. Slides.
[edit] November
- Nov 6. Politics and social media.
- Student 1: Mahesh: Reading List, Slides
- Student 2: Emil Albright: readings TBA
- Student 3: Swapna : (Reading List) , slides
- Nov 13. Community dynamics.
- Student 1: Shilpa (Reading List), (Slides)
- Student 2: Hanghang Tong (Reading List)
- Student 3: Sachina (Reading List) [ Modeling Community growth and Community evolution analysis ]
- Nov 20. Recommender systems & the TREC Blog Track.
- Nov 27. Design of online communities.
- Guest lecture from Bob Kraut, HCII. Reading: Arguello et al, CHI 2006 and Burke et al, C&T 2007.
- Jon Elsas: Trec Blog Track
- 2007 TREC Blog Track Overview (summary)
- UIC's Opinion Task Notebook Paper Best performing group in the Opinion tasks (summary)
- CMU's Feed Distillation Task Notebook Paper Best performing group in the Feed Distillation task (local version)
[edit] December
- Dec 4. Anonymity and privacy issues.
- Student 1: Yimeng
- Dec 11. Tagging and folksonomys.
- Student 1: Hideki Shima (Slides)
- Student 2: Emil Albright - readings TBA
- Student 2: Justin