What we're about

This meetup aims to develop best practices for data mining and practical analytics that get results. Our focus will be on the practical areas around:

• understanding analytics and "big data"

• ingesting, formatting, validating, and exploring data

• designing behavior-changing metrics

• developing and designing appropriate data architectures

• proven strategies, technologies, stacks, and pipelines

• when to build, when to buy, what to buy, why buy

We have no affiliations with data product / platform vendors. If you would like to contribute to our budding Meetup, please reach out!

Areas: data mining, analytics, big data, machine learning, data science, business intelligence, olap oltp, recommendation systems, predictive analytics, design patterns, message queues, actor pattern, publisher subscriber pattern, hash joins, leading indicators, classification, batch processing

Technologies: hadoop, hdfs, hive, pig, spark, sql, kmine, r, scikit-learn, drill, prestodb, flume, protodata, nosql, druid, rethinkdb, kafka

Upcoming events (5)

Deep Piplines: Will we ever escape this rabbit hole?
Needs a date and time

Needs a location

Feeling down about all the plumbing it takes to get to the data exploration drinking fountain from your data lake? Let's get to know each other and talk about our battles. If you've got a fantastic stack, bring your learnings!

SQL: Is the future of Big Data Declarative?
Needs a date and time

Needs a location

Ever get the feeling that we'll never move away from our Structure Query Language buddy? Is that a bad thing? Should we evaluate technologies touting SQL-over-Hadoop?

Are you crazy about Spark SQL? Have you tried PrestoDB? Is Impala your son's middle name? Did you know Ted Dunning created Apache Drill? Let's get together to discuss what we've learned -- what works, what's suspect, and where we're headed.

Cloudera vs Hortonworks vs MapR
Needs a date and time

Needs a location

What are the benefits of going with each vendor? What are the drawbacks? Did you go with Cloudera because of their support? Or Hortonworks because of better Microsoft integration? Or MapR strictly for performance?

Are these just myths or do you have first-hand knowledge. Let's get together and discuss.

KMINE 101: GUI-based analytics and a whole lot more
Needs a date and time

Needs a location

Do you use KMINE, or have used it in the past? KMINE is a GUI-based tool for designing and executing data-mining as well as machine learning data flows. If you haven't had a chance to try it, it's a great way to get your feet wet without diving in the deep-end.

If there is enough interest, I'd like to put together a workshop highlighting the major areas of KMINE, running through a practical data flow, and exploring some of its ML features.

If you are a KMINE user and enjoy presenting, please reach out to me to discuss running this session!