NAME README - General Information about WordNet-SenseRelate-WordToSet OVERVIEW This module takes as input a single target word, and a set of one or more other words. It finds the sense of that target word that is most related to those words in the set. For example, if the target word is "bank", and the words in the set are "money cash loan stock", we might expect that the most related sense of "bank" is that pertaining to financial instituations. This is potentially useful in determining the predominant sense of a word in a particular domain. For example, if the target word is "game", and the words in the set are from the domain of board games (e.g., "monopoly chess checkers", then the sense of "game" that we'd expect to be most similar or related would be that of games you play rather than the game you hunt. For example, here's some output when game is compared to board games: wordtoset.pl game monopoly checkers chess --type WordNet::Similarity::wup game#n#1 : 2.52631578947368 : a contest with rules to determine a winner; "you need four people to play this game" game#n#10 : 2.17777777777778 : your occupation or line of work; "he's in the plumbing game"; "she's in show biz" game#n#3 : 2.17777777777778 : an amusement or pastime; "they played word games"; "he thought of his painting as a game that filled his empty time"; "his life was all fun and games" Here's some output when we compare that to wild animals: wordtoset.pl game turkey boar deer --type WordNet::Similarity::wup game#n#4 : 1.98 : animal hunted for food or sport game#n#7 : 1.27777777777778 : the flesh of wild animals that is used for food game#n#9 : 1.24542124542125 : the game equipment needed in order to play a particular game; "the child received several games for his birthday" Note that wordtoset.pl will output all of the senses, but we've only shown the top three here in the interests of brevity. We can see that according to the Wu-Palmer measure (wup), the sense of game most similar to the given sense is as we've described above. WordToSet might also be useful in detecting sentiment orientation. For example, suppose the target word is "war". You could compare that to two different sets such as : "peace love happiness" and "hate death fear". While the predominant sense of "war" might not change, if it has a substantially higher score relative to one of the sets then it could be concluded that war is more associated with that set than the other. This module uses WordNet and measures of semantic relatedness and similarity from WordNet::Similarity to arrive at its output. SYNOPSIS # from the command line wordtoset.pl star nebula cosmos orion --type WordNet::Similarity::lin wordtoset.pl star movie hollywood director --type WordNet::Similarity::vector # from within a program use WordNet::SenseRelate::WordToSet; use WordNet::QueryData; my $qd = WordNet::QueryData->new; my %options = (wordnet => $qd, measure => 'WordNet::Similarity::lesk'); my $wsd = WordNet::SenseRelate::WordToSet->new (%options); my $result = $wsd->disambiguate (target => 'java', context => ['programming_language', 'applet']); foreach my $key (keys %$result) { print $key, ' : ', $result->{$key}, "\n"; } CONTENTS When the distribution is unpacked, several subdirectories are created: /lib This directory contains the Perl modules that do the actual work of disambiguation. By default, these files are isntalled into /usr/local/lib/site_perl/PERL_VERSION (where PERL_VERSION is the version of Perl you are using), or a similar directory. See the INSTALL file for more details. /bin This directory contains a script, wordtoset.pl, that lets you run the WSD software without writing your own Perl script. /doc This directory contains pod files for README, CHANGES, and INSTALL. These are what should be changed, the files found in the top level directory should be considered read-only. /t This directory contains test scripts. These scripts are run when you run 'make test'. SEE ALSO L AUTHORS Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu Jason Michelizzi This document last modified by : $Id: README.pod,v 1.5 2008/04/07 03:28:36 tpederse Exp $ COPYRIGHT AND LICENSE Copyright (c) 2004-2008, Ted Pedersen and Jason Michelizzi Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. Note: a copy of the GNU Free Documentation License is available on the web at and is included in this distribution as FDL.txt.