YOW User Study Data: Implicit and Explicit Feedback
for News Recommendation
This data set was originally collected by the PI in Carnegie Mellon University. This data set was not shared before, and it is related to the PIIR project and might be interesting for people who are doing related research. Thus we are making it publicly available.
The data contained in this
directory was collected in a user study. The goal of the study was to collect a
data set from a variety of users with explicit feedback, implicit feedback, a
wider range of information about documents and topics, and a heterogeneous set
of documents.
To achieve these goals, we
developed a web based news story filtering system. This system constantly
gathers and recommends information to the users. The system has six major
components: a news crawler, a text indexer, a database server, an adaptive filtering
system, a web server, and a browser. The crawler has about 8000 candidate RSS
news feeds to crawl frequently. RSS is a format
for syndicating news and the content of news-like sites. The
adaptive filtering system learns from users' explicit feedback and recommends
documents to the users. An indexer Lemur parses each incoming news story and
incrementally builds a text index of documents to facilitate the computation of
the filtering system. Users use a special browser to log into the system daily
to read and evaluate what the system has delivered to them. The special browser
captures the implicit and explicit user feedback from a user and sends the
information to the web server at real time. The information is saved in a
central database and used by the filtering system to learn user preferences.
The interface also provides some simple search functionalities so that the
users can search the articles in Yow-now's news archive using keywords and get
a list of news, or search the news sources using keywords and get a list of RSS
feeds.
21 paid subjects participated
in the study for about 4 weeks. The subjects are otherwise not affiliated with
our research. The subjects were required to read the news for about 1 hour per
day and provide explicit feedback for each page they visited. In the last week
of the study, some subjects read 2 hours per day. They are encouraged but not
required to do so. 28 users tried this system. However, only 21 users were
official paid subjects, among which one worked only for 2 weeks and 20 worked
for about 4 weeks.
We have collected 7000+
feedback entries from all users. Each entry contained several different forms of
evidence for each news story a user clicked.Our
intention to collect the evidence was not to be exhaustive, but representative.
The evidence can be roughly classified into the following five categories:
Explicit User Feedback: After finishing reading a news story, a user clicked
a button on the toolbar of the browser to bring up an evaluation interface.
Through this interface, the user provided the explicit feedback to tell the
hidden properties about current story, including the topics the news belongs to
(classes), how the user
liked this news (user_like), how relevant the news
was related to the class(es) (relevant), how novel the news is (novel), whether
the news matched the readability level of the user (readable), and whether the
news was authoritative (authoritative). user_like,
relevant and novel were recorded as integers ranging from 1 (least) to 5
(most). readable
and authoritative were recorded as 0
or 1. A user has the option to provide partial instead of all explicit
feedback. A user could create new classes, and choose multiple classes for one
document.
User Actions: The
special browser recorded some user actions, such as mouse activities, scroll
bar activities, and keyboard activities TimeOnPage is the number of seconds the
user spent on a page, and EventOnScroll is the number
of clicks on the scroll bars. When the mouse is out of
the browser
window or when the browser window is not focused, the
browser does
not capture any activities.
The data is in an excel
file. Please download the data here
More information about the
data set can be found in Chapter 5 of
Yi Zhang Bayesian
Graphical Models for Adaptive Information Filtering (Ph.D. Dissertation 2005) [pdf]