Google Star Project

We are the Fluminense group, bunch of graduate students at UMD Computer Science Dept.Our project deals with the Google N Gram data.
As of now we are actively pursuing experiments on how we can manipulate the Google N Gram data to gain insightful trends. Ideally, this should help users to use the data in AI applications and searching for specific word co-occurrences in a large data corpus.Once the project is completed the source code and executables will be available for download.
The program is being created under the GNU Public License.

References:

http://acl.ldc.upenn.edu/C/C94/C94-1049.pdf

Niwa, Y. and Nitta, Y., Co-occurrence vectors from corpora versus distance vectors from dictionaries

```
http://acl.ldc.upenn.edu/P/P97/P97-1067.pdf
```
Edmonds, P., Choosing the Word Most Typical in Context Using a Lexical Co-Occurrence Network

```
http://acl.ldc.upenn.edu/C/C04/C04-1194.pdf  
```
Ferret, O. Discovering word senses from a network of lexical co-occurrences.
http://dli.iiit.ac.in/ijcai/IJCAI-99%20VOL-2/PDF/006.pdf

Higinbotham, D. Semantic co-occurrences networks.

http://www.mt-archive.info/TMI-1990-Higinbotham.pdf

Veling, A. and van der Weerd, P. Conceptual grouping in word co-occurrence networks.

Report Presentation Code Downloads

Project Report

Project Presentation Slides

Project Code Download

Project Guide

Ted Pedersen

Fluminense Group:

Ankur Sarika Prafulla Aneerudh