Brent Arata
barata@slugmail.ucsc.edu
cmps 161
Abstract:
Figuring out what people are intersted in on the internet has been a very big problem for many
search engine companies. Many pieces of software have been invented to visualize this. These products
range from Google's TouchGraph product, ThinkMap, and many other software applications. However,
what if we could visualize how the same data changes over time? In the project, "Searching the
Seasons" demonstrates this idea with the use of a graph structure called a tree.
However, unlike from traditional graph structures such as one the one show below for visualizing text:
Project Description:
Analysis Overview:
To visualize such a large dataset in the query, one must take into account the frequency
of which words appear in a queury. In order to categorize data, I segement each different
word as a vector in a database. By comparing the magnitude of each vector, I will be able
to determine which phrase appears more often within a AOL's query for a particular month.
I will also categorize words by deconstructing them into their parts of speech: noun, verb,
adjective, and adverb. I will load a thesaurus to compare incoming words as they are being read from the
database to see whether they are similar or not so that similar that are being loaded from the
query can be mapped to other similar phrases that are already loaded
from the query. However, memes would be very hard to parse without some way of knowing what the meme stands for.
Unforunately, for this project, memes are mapped one to one with other phrases that are syntactically.
Materials
Timeline:
Sources:
Adding Natural Language Processing Techniques to the Entry Vocabulary Module Building Process