We are the Fluminense group, bunch of graduate students at UMD Computer Science Dept.Our project deals with the Google N Gram data.
As of now we are actively pursuing experiments on how we can manipulate the Google N Gram data to gain insightful trends. Ideally, this should help users to use the data in AI applications and
searching for specific word co-occurrences in a large data corpus.Once the project is completed the source code and executables will be available for download.
The program is being created under the GNU Public License.
References:
http://acl.ldc.upenn.edu/C/C94/C94-1049.pdf
Niwa, Y. and Nitta, Y., Co-occurrence vectors from corpora versus distance vectors from dictionaries
http://acl.ldc.upenn.edu/P/P97/P97-1067.pdf
Edmonds, P., Choosing the Word Most Typical in Context Using a Lexical Co-Occurrence Network
http://acl.ldc.upenn.edu/C/C04/C04-1194.pdf
Ferret, O. Discovering word senses from a network of lexical co-occurrences.
Higinbotham, D. Semantic co-occurrences networks.
Veling, A. and van der Weerd, P. Conceptual grouping in word co-occurrence networks.
Report Presentation Code Downloads
Project ReportProject Guide
Ted Pedersen          Ankur             Sarika         Prafulla       Aneerudh