Glossary of Terms
From ScribbleWiki: Analysis of Social Media
Contents |
[edit] A
Academic Genealogy
A project attempting to collect information about all mathematicians (who hold a doctoral degree, including degrees in computer science) in the world. It links students to advisors and you can see if you have Gauss somewhere in your tree:). Academic Genealogy Project
Aggregation
Aggregation refers to the process of gathering and remixing content from blogs and other websites that provide RSS feeds. The results may be displayed in an aggregator website like Bloglines, or directly on your desktop using software often also called a newsreader. Glossary of Social Media
Author Dispersion
A measure of how spread out the discussion of a particular topic is. High values indicate that many people are talking about a particular topic, where low values indicate that discussion is centered around a small group of people. This measure is more indicative than just counting of unique authors for a topic, as error in the topic classifications dilutes the understanding of the spread of discussion. (Glance et al, KDD 2005)
Average Diameter
Same as the characteristic path length except that we take the mean of the average shortest path lengths over all nodes, instead of median. (Chakrabarti & Faloutsos, CSUR 2006)
[edit] B
Board Dispersion
Similar to author dispersion, this measures how many different places are seeing discussion about a particular topic. Topics that have a board dispersion that grows rapidly over time indicates a viral issue. If such a viral issue is negative, prompt attention is often recommended. (Glance et al, KDD 2005)
Boardscape
The concept of the world of boards. The collection of all the boards and the potential aggregated power of all board communities and their members (http://www.boardscape.com) The term was coined by Ron Kass of Boardtracker.
Burst (of Activity)
A signal of the appearance of a topic in a document stream with certain features rising sharply in frequency as the topic emerges. It does not typically rise smoothly to a crescendo and then fall away, but rather exhibits frequent alternations of rapid flurries and longer pauses in close proximity. (Kleinberg, SIGKDD 2002)
Buzz Tracking
Following trends in topics of discussion and understanding what new topics are forming. (Glance et al, KDD 2005)
[edit] C
Characteristic path length
For each node in the graph, consider the shortest paths from it to every other node in the graph. Take the average length of all these paths. Now, consider the average path lengths for all possible starting nodes, and take their median. (Bu & Towsley, 2002)
Collaborative filtering (CF)
Collaborative filtering is any algorithm that filters information for a user based on a collection of user profiles. Users having similar profiles may share similar interests. For a user, information can be filtered in/out regarding to the behaviors of his or her similar users. (Jun Wang)
Communities of Practice
This term describes activities of groups of people inside organizations who share a concern or a passion about a topic, and who interact to extend and exchange their knowledge and their expertise. (Falkowski et. al., 2007)
Cophenetic correlation coefficient (CPCC)
In hierarchical clustering, CPCC is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points. (Wikipedia)
Creative Commons
Creative Commons is a not-for-profit organization and licensing system that offers creators the ability to fine-tune their copyright, spelling out the ways in which others may use their works. (http://ourmedia.org/learning-center/topic/law/creative-commons)
[edit] D
[edit] E
Early Alerting
Informing subscribers when a rare but critical, or even fatal, condition occurs. (Glance et al, KDD 2005)
Effective Diameter (a.k.a. eccentricity)
Minimum number of hops in which some fraction (say, 90%) of all connected pairs of nodes can reach each other (Tauro et al, 2001) Can be calculated from hop-plot. (Chakrabarti & Faloutsos, CSUR 2006)
Erdos number
It is your distance from Erdös in terms of coauthorship. The Erdös Number Project
[edit] F
Feed
A feed is a wrapper for pieces of regularly and sequentially-updated content, be they news articles, weblog posts, a series of photographs, and more. (Dave Shea, 2004)
Folksonomy (a.k.a. collaborative tagging, social classification, social indexing, social tagging)
Sets of categories that are derived based on the tags that are used to characterize some resource. Tags are given by users, not experts. (Halpin et al, WWW 2007)
[edit] G
[edit] H
HITS
Hypertext Induced Topic Selection (HITS) is a link analysis algorithm which iteratively ranks web pages in terms of their authority and hub weights. Authority weight measures the value of the topic of the page and hub weight measures the value of its link to other pages. (Kleinberg et al, ICCC 1999)
Hop-plot
Starting from a node u in the graph, we find the number of nodes <math>\,\! N_h(u)</math> in a neighborhood of h hops. We repeat this, starting from each node in the graph, and sum the results to find the total neighborhood size <math>N_h</math> for h hops <math>(N_h = \Sigma_u N_h(u))</math> . The hop-plot is just the plot of <math>N_h</math> versus h (Chakrabarti & Faloutsos, CSUR 2006)
[edit] I
[edit] J
[edit] K
[edit] L
[edit] M
Meta Noise
In Folksonomy, meta noise is generated by tags that are not useful. These tags usually occur very rarely in the tag corpus (E.g. "could this possibly be a poem")
[edit] N
[edit] O
Opinion Retrieval
Retrieving sentences or documents that contain either positive or negative sentiments. (Eguchi and Lavrenko, EMNLP 2006) Cf. Sentiment Retrieval
[edit] P
PageRank
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." Using these and other factors, Google provides its views on pages' relative importance. (Google Technology)
[edit] Q
[edit] R
RDF
The RDF format allows to mix together two XML documents into a single one, describing the relations in the data, and is used by XUL to describe in the data file itself how data is displayed by the graphical interface of the program. (W3C specifications and Mozilla's documentation)
RSS
It is a format to share data, defined in the 1.0 version of XML. You can deliver information in this format et one can get this information, and information from other various sources, in this format. Information provided by a website in an XML file is called an RSS feed. Recent browsers can read directly RSS files, but a special RSS reader or aggregator may be used too. (Denis G. Sureau, 2006)
[edit] S
Semantic Web
The Semantic Web is an extension of the current Web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation. (Berners-Lee et al., 2001)
Sentiment Mining
Extracting aggregate measures of positive vs. negative opinion. (Glance et al, KDD 2005)
Sentiment Retrieval
Retrieving sentences containing information with a specific sentiment polarity on a certain topic. (Eguchi and Lavrenko, EMNLP 2006) Cf. Opinion retrieval.
Six Degrees of Separation
The idea that, if a person is one "step" away from each person he or she knows and two "steps" away from each person who is known by one of the people he or she knows, then everyone is no more than six "steps" away from each person on Earth. Several studies, such as Milgram's small world experiment, have been conducted to empirically measure this connectedness. (Wikipedia)
Social Media Optimization
The concept behind SMO is to implement changes to optimize a site so that it is more easily linked to, more highly visible in social media searches on custom search engines (such as Technorati), and more frequently included in relevant posts on blogs, podcasts and vlogs. (Rohit Bhargava, 2006)
Social Network Analysis (SNA)
Social network analysis is a methodology that has evolved to study social groupings, particularly in terms of social and communication connections within a group. (Mike Thelwall, 2004)
Splog
False blogs with machine generated or hijacked content whose sole purpose is to host ads or raise the PageRank of target sites. (Kolari et al, 2006)
[edit] T
Tag Cloud
Tag clouds are visual presentations of a set of words, typically a set of tags, in which attributes of the text such as size, weight or colour can be used to represent features (e.g., frequency) of the associated terms. (Martin Halvey and Mark T. Keane, 2007)
Time series analysis investigates techniques that are useful for analyzing time series data, that is, sequences of measurements that follow non-random orders. (http://www.statsoft.com/textbook/sttimser.html)
Trackback
Some blogs provide a facility for other bloggers to leave a calling card automatically, instead of commenting. Blogger A may write on blog A about an item on blogger B's site, and through the trackback facility leave a link on B's site back to A. The collection of comments and trackbacks on a site facilitates conversations. (http://www.designingforcivilsociety.org/2007/02/glossary_of_soc.html)
Trend Analysis is defined as a comparative analysis of a company's financial ratios over time. (http://www.investorwords.com/5068/trend_analysis.html)
TrustRank
TrustRank is a link analysis technique which semi-automatically separate good pages from spam. (Gyöngyi et al, VLDB 2004)
[edit] U
[edit] V
Viral Marketing
A type of marketing which makes use of the Client social network and its influence in decision making of potential customer.
Vlog
A Vlog (video blog) is a blog which comprises video. (Red Herring, 2007)
[edit] W
Weblog
A weblog, sometimes written as web log or Weblog, is a Web site that consists of a series of entries arranged in reverse chronological order, often updated on frequently with new information about particular topics. The information can be written by the site owner, gleaned from other Web sites or other sources, or contributed by users. (http://searchsoa.techtarget.com/sDefinition/0,,sid26_gci213547,00.html)
Web Spamming (a.k.a. Spamdexing)
Any deliberate human action that is meant to trigger an unjustifiably favorable relevance or importance for some web page, considering the page’s true value. (Gyongyi and Garcia-Molina, 2004)
Web syndication
Web syndication is a form of syndication in which a section of a website is made available for other sites to use. This could be simply by licensing the content so that other people can use it; however, in general, web syndication refers to making web feeds available from a site in order to provide other people with a summary of the website's recently added content (for example, the latest news or forum posts). (Wikipedia)
[edit] X
[edit] Y
[edit] Z
[edit] Other
-sphere
A collection of a particular data on the internet. (e.g. blogospehere, splogosphere, twittersphere)