“Nobody can go beyond 70% accuracy”. ”Our tool reaches 90% accuracy”.
The statements above are meaningful and meaningless at the same time.
Meaningful because they make it clear that there is an issue here, and 100% accuracy is beyond our wildest dreams. Meaningless because they don’t provide details on how accuracy is measured and, most important, they don’t specify for which task the accuracy is accomplished: for document-level sentiment (by the way, is that useful for business purposes?), for entity-level sentiment… In other words, when sentiment is assigned to a piece of text, do we know for which brand (Microsoft, Apple…) or for which topic (operating system, price…) is it assigned? So maybe the question is rather: accuracy on what?
Evaluation of accuracy is a scientific task that should be performed with open methods and metrics. This includes issues like: what’s the difference between accuracy and precision and recall? Or how do we measure inter-tagger agreement? Are there genuinely ambiguous texts for humans?
In our view, there is one factor that plays a major role here: business rules, i.e. the way a company sees its space. If I say “ACME just launched a new release of its explosive tennis balls”, is that a positive statement (new release) or just a neutral fact that shouldn’t distract marketeers? Being able to efficiently implement these peculiarities (business rules) is key in achieving high accuracy in a way that is meaningful for the end user of the information.
We will show why linguistic approaches to sentiment analysis (symbolic approaches as opposed to machine learning approaches) are better suited to efficiently respond to this challenge: integrating business rules. And we will use real corpora for sentiment evaluation and study their peculiarities.
We will make references to Seth Grimes article “Never Trust Sentiment Accuracy Claims“, a common reference for the industry.
This event wil be of interest to users of Sentiment Analysis and Text Analytics Technology in these sectors:
Social CRM: because customer sentiment in social media is key
Business Intelligence: because their new challenge is integrating unstructured data
Contact Center: because Social Media is becoming the channel of choice for many customers
Big Data: because most of the data in “big data” is text
Date: Wednesday October 2nd, 2013
Place: WeWork (room to be announced)
Address: 156 2nd Street, San Francisco, CA 94104
Presentation by: Antonio Valderrabanos, CEO & Founder, Bitext
Feel free to join via the usual Meetup procedures, or sending an email to Vicky Ortiz at [masked]