Visualization of a Search Engine over Time
Brent Arata
barata@slugmail.ucsc.edu
cmps 161

Abstract:
Figuring out what people are intersted in on the internet has been a very big problem for many
search engine companies. Many pieces of software have been invented to visualize this. These products
range from Google's TouchGraph product, ThinkMap, and many other software applications. However,
what if we could visualize how the same data changes over time? In the project, "Searching the
Seasons" demonstrates this idea with the use of a graph structure called a tree.
However, unlike from traditional graph structures such as one the one show below for visualizing text:

Project Description:

I plan on visualizing AOL's search queueries from 2006 in the form of an actual tree. A tree that is very user friendly. Unlike
the visualization shown above. I plan on adding my own
artistic twist to it as well as a bit of interactivity to my visualization. I will use the asthetics of
clip art in the representation for my search tree. The query information will be displayed in the form
of a tree in which each branch symbolizes a relation between the nodes that are attacted to that branch and branches are attached to ther branches. The user will be able to rollover the branch or the node to see the what phrase that node represents as well as the url link to where the user last clicked based on that query. As
time goes on, the tree will sprout new nodes from the tree. When two branches can
be grouped together, either the nodes of the branch will fall off and regrow on other branches
of the tree. Determing which nodes should grow on each branch and/or which branch should
fall off the tree is all determined in the analysis of the data.



Analysis Overview:

To visualize such a large dataset in the query, one must take into account the frequency
of which words appear in a queury. In order to categorize data, I segement each different
word as a vector in a database. By comparing the magnitude of each vector, I will be able
to determine which phrase appears more often within a AOL's query for a particular month.
I will also categorize words by deconstructing them into their parts of speech: noun, verb,
adjective, and adverb. I will load a thesaurus to compare incoming words as they are being read from the
database to see whether they are similar or not so that similar that are being loaded from the
query can be mapped to other similar phrases that are already loaded
from the query. However, memes would be very hard to parse without some way of knowing what the meme stands for.
Unforunately, for this project, memes are mapped one to one with other phrases that are syntactically.


Materials

Timeline:

Sources:

Google TouchGraph

Thinkmap

Adding Natural Language Processing Techniques to the Entry Vocabulary Module Building Process

Keyboard Searching vs. Subject Searching