Dean Collins wrote:
> My cousin has just completed his doctorate in Australia with a thesis
> on" natural language comment sentiment" so he could possible help you on
> a consulting basis
> This is sentiment analysis. There are several academic solutions to
> deriving sentiment from text. A search on "computational linguistics
> sentiment" will bring up a lot of information that may help direct you
> on how people have thought through identifying the feeling of
> free-text. RapidMiner is an open source data mining tool that may
> help. I have not used it but others have recommended it as useful for
> this. If the comments are basic, teen-age stuff you can pretty easily
> create a score based on the ratio of negative-positive words. Build
> out a glossary of good/bad or run do a dictionary lookup. Slang is
> important to track.
But in all reality, I don't think I'm looking in a sentiment/semantic
analysis approach... the technology (and research!) here is too limited
to be of use, imo, and the "misses" are not as easy to fix as in a
community based solution.
The misses are also really ugly, imo... unlike a Google search where you
know that the algorithm may be "gamed", "gaming" a system like this just
looks amateurish and ineffectual -- no matter how good the underlying
technology is, it doesn't play well to the end user.
Bayes' integration is a great idea for filtering out spam before it hits
the database, or even after, since spammers are generally really bad at
changing up their posts.
I think that computers simply aren't as good as people at doing real
time sentiment analysis, although Google may be able to do something
with it with all of the data that they have access to.
I'm looking for a technology solution that harnesses the inherent
intelligence of the users of the site, but in a manner that relies
solely on the users' interactions, not on the content of the
submissions, especially since (and here's a wrinkle I likely should have
mentioned earlier) -- some of these comments may not be in English (the
plan is to be international)!
Congrats to your cousin, though... Google is surely a nice place to be
to explore this type of thing.