The Concept:
Zipf's law implies the Principle of Least Effort when it comes to spoken language, and universally, spoken language has evolved so that the most commonly used words are the shortest.
However, written language is another matter, generally speaking, most languages' written length is directly proportional to its spoken length. However, Asian languages differ in this regard, the written form for a single syllable(or mora) can be very complex and time consuming to write.
Thus, I would like to graphically represent how Zipf's Law extends to written Japanese and its writing system.
Readme stuff
To compile the program:
Compiling the program can be done by opening the project in eclipse and running a build, this will ensure that the required depencies (external .jar files) are included in the compile.
Running the program:
- It should be as simple as clicking the .jar file.
- It is IMPORTANT that the /database folder is in the same directory as the running executable.
- The database folder needs to have the database files named to be named jpdb.
- If the database is not in the correct file path, the visualization will crash.
- If recompiling the project, make sure to copy the /database folder from the .jar file into the folder that is running the java executable.
Using the program:
The applet has three main buttons, these each create their respective scatter plots in seperate JFrames.
- Once the scatter plots have launched, the user can change what the characteristic of the data is being mapped to the axis using the top toolbar.
- This is somewhat analagous to having a scatter plot matrix.
- Hovering over a data item in the scatter plot will show information about the item.
Notes:
'Render 3D Scatter' is particularly slow because it executes 2500 SQL queries, as well the visualization is not very insightful so should be used last.