• Social Media Analytics & Causal Inference with the Jacksonville Jaguars

    ******PLEASE NOTE******** Due to renovation at the NLP Logix offices, this Meetup will be held just down the block at Keiser University! This month, we're super excited to welcome Victor Li of the the Jacksonville Jaguars! In this talk, we will take a look into how the Jacksonville Jaguars utilize data to better understand, grow, and engage with our fanbase through social media. We will go beyond the "likes" and "comments" and share how new metrics are constructed using a variety of statistical methods and machine learning techniques to evaluate and optimize social media strategy. We will also cover football-specific social media trends and how improving measurement accuracy of social media performance helps our corporate sponsors maximize their partnership with the Jaguars. Some questions we will address: How should we value different types of engagements (comments, clicks, shares)? When are the best times to post content? Do certain types of content perform better on one platform vs. another? When should we air our live shows to optimize viewers? How big of a boost do we get on social media after a win? Victor C. Li grew up in sunny Los Angeles among traffic and taco trucks, raised by a computer scientist and a statistician - shocking the world when he became a data scientist. He froze (yet thrived) in Providence for his undergraduate years at Brown University where he studied statistics and was involved with sports analytics, opioid research, orchestral music, poker, and a brief stint in catering. After graduating, Victor moved to Jacksonville to work as an Advanced Analytics Developer for the Jaguars and after two years has almost gotten used to the humidity. As usual, all are welcome and there will be beer, pizza, and awesome networking. Tentative schedule: 5:30-6:00 Socializing and refreshments 6:00-7:00 Social Media Analytics & Casual Inference with the Jacksonville Jaguars 7:00-7:30 Closing remarks and questions

    3
  • Splice Machine and Hadoop with Clearsense

    NLP Logix

    This month, we're excited to have David Quickstad and Matt Calderaro from Clearsense talking about SpliceMachine and Hadoop. Clearsense is a scalable data platform for healthcare that enables real-time insights into clinical, operational, and financial metrics. There will be a high level discussion of how SpliceMachine works followed by a discussion of the flavors of SpliceMachine - the free dockerized version and two cloud versions (AWS and Clearsense). There will then be a discussion of how Clearsense is implementing SpliceMachine with Hadoop and a few examples of data ingestion, SQL queries, and ML examples. David Quickstad is the VP of Product Development and Delivery at Clearsense. Before that, David worked at Availity and the PGA Tour. David is a graduate of the United States Military Academy at West Point with a Bachelors in Computer Science. Matt Calderaro is the Senior Performance Solutions Architect. Previously he worked as an Enterprise Architect at Bank of America. Matt is a graduate of Keiser University.

    1
  • Building Data Pipelines for Machine Learning

    NLP Logix Building

    This month, we are excited to have Man Zhang, Solutions Architect at Qubole, speaking! Qubole delivers a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Qubole was started by the team that built and ran Facebook's Data Service when they founded and authored Apache Hive. With Qubole, a data scientist can now spin up hundreds of clusters on their public cloud of choice and begin creating ad hoc and/or batch queries in under five minutes and have the system autoscale to the optimal compute levels as needed. Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data scientists is the need to build get access to large data sets so that trained models can scale to run with production data. Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Apache Airflow and Spark addresses these challenges by providing a highly scalable technology for autoscaling big data engines. In this presentation we will cover: - Some of the typical challenges faced by data scientists when building pipelines for machine learning. - Typical uses of the various big data engines to address these challenges. - Real-world example using Apache Spark and Airflow to operationalize a recommendation engine As always, we'll have a fun group with Pizza, Beer, and Refreshments! Tentative schedule: 5:30 - 6:00 Socializing 6:00 - 7:00 Building Data Pipelines for Machine Learning 7:00 - 7:30 Questions and Closing Remarks

  • Employing Cloud Networking principles for Big Data Applications

    This month, we're excited to present Nolan Lee, Sr. Systems engineer, Arista Networks, who will be talking at Employing Cloud Networking principles for Big Data Applications. Arista Networks will share a modern cloud principled approach to big data networking. Using industry standard SW architectures along with value added Hardware enable outstanding performance, stability and superior operational efficiency for company's and their ability to extract maximum business value. Arista Networks overview: Arista Networks is a leader in building scalable, high-performance and ultra-low latency cloud networks with low power consumption and a small footprint for modern datacenters, service providers and campus environments. Arista pioneered and perfected Software defined networking and delivers true cloud scale operating efficiency, performance and scale which earned it as the networking solution of choice for 7 of the 8 largest cloud hyper scalers in the world including Microsoft Azure and Google cloud, along with over 5000 companies worldwide. Black Knight, FNF and the PGA Tour are some local Jacksonville Arista customers. Arista has designed the network infrastructures and implemented some of the largest and most mission critical Hadoop Clusters around the world with applications ranging from: Ad Serving and Targeting National Intelligence Network Security and Pattern Matching Pharmaceutical Research Retail Merchandising Web Analytics Data Mining Nolan Lee is a Systems Engineer at Arista Networks, specializing in Software-Defined Networking and Network Virtualization. He has a MS in Telecommunications from George Mason University and a BS in Electrical Engineering from University of Florida. As always, everyone is welcome to attend, and there will be beer, pizza, and great socializing! Tentative schedule: 5:30-6:00 Socializing, Beer, and Pizza 6:00-7:00 Employing Cloud Networking principles for Big Data Applications 7:00-7:30 Closing remarks and questions

    4
  • AdTheorent Double Header!

    NLP Logix Building

    This month we have a AdTheorent two-fer! "Career Jumping onto the Big Data Train" Chad Gardner will take us through the journey of transitioning from a life of education and a career as a AP Physics Teacher into the world of Big Data with a company specializing in AI-driven marketing solutions. After studying data analysis and machine learning techniques online, Chad was offered his first tech role on the Data Warehouse Team at AdTheorent. Chad currently lives in Historic Springfield in downtown Jacksonville with his wife Agata, son Ira, and daughter Cedine. "AdTheorent: Harnessing the power of big data for advertising using machine learning" Following Chad, Andrew Anderson, Chief Technology Officer for AdTheorent, will be discussing how he and his team harness the power of big data for advertising using machine learning. Andrew leads development of RTB Platform, Data Warehouse, Cross Environment Map and user facing applications for the six-year-old New York-based company. Andrew brings over 20 years of experience building high performing technology teams that deliver innovative technology solutions to challenging business problems. Prior to joining AdTheorent, Andrew held various roles at Citi, where he served as technology lead for the implementation of their global e-learning and e-recruitment platforms. In addition, Andrew managed several development teams responsible for the development of multiple HR applications to support the North American O&T organization of Citi. As always, we will have an awesome crowd, great refreshments and informative speakers. Everyone is welcome to join! Tentative schedule: [masked] Socializing, Beer, and Pizza[masked] AdTheorent Double Header - Chad Gardner and Andrew Anderson[masked] Questions and Closing Remarks

    1
  • Self-Tuning Data Systems

    NLP Logix Building

    Overview: While we have been doing analysis of data forever, the problem of consuming data at scale (volume, veracity, velocity, and variety) continues to grow daily. According to Google, every 2 days we create as much data as we did from the dawn of humanity to 2003. There’s a good chance that we already have the data for the next big breakthrough… we just have to be able to extract the knowledge. The Harvard Data Systems Lab conducts ongoing research in designing, tuning, and using data systems. We’ll talk a bit about what the lab is working on, my journey through grad school, the Harvard program and what it’s like to be in it, and talk about my area of research. Data systems take on many forms and necessarily become more and more complex as breakthroughs occur. Think of it like this: the more complex the system, the more knobs to tune. The gap in expertise required to tune data systems is becoming untenable; more and more scarce while becoming increasingly complex. I began my research in the database Kernel optimizing SQL system joins. We’ll briefly talk about how a system performs a join at the low level to illustrate the problem. We’ll move more deeply into the problem scans and indexes present. While joins are relevant to some systems, every system (SQL, NoSQL, Spark, Kafka, etc.) has indexes and scans. The margin of advantage between using an index and just scanning the entire set is becoming a much more interesting and relevant problem. While my thesis focuses on methods of using AI to get there, the real journey is in the discoveries along the way and decisions you have to make as a researcher when you gain new knowledge. Bio: Angelo Kastroulis is a consultant and entrepreneur that focuses on Health IT, AWS cloud computing, Big Data (Spark, SOLR, Kafka, and Cassandra), and Data Science (Machine Learning and Neural Networks). He’s helped companies like Disney, Walmart, Optum Health, and McKesson solve some tough problems. As a member of Harvard’s Data Systems Laboratory, his area of focus is self-tuning data systems. As always, we'll have a great group of people, pizza, and beer. Tentative Schedule: 5:30-6:00 Refreshments and Socializing 6:00-7:00 Self-Tuning Data Systems with Angelo Kastroulis 7:00-7:30 Closing Remarks and Questions

    4
  • Defining The Industrial IoT

    NLP Logix Building

    Whether you are an IT or OT role, everyone within your corporate ladder is trying to address the following and clear the fog and haziness around the ‘Internet of Things or IoT’. Is it possible for academia to collide with real world industrial applications to solve mission critical problems? What’s the business value for you and your enterprise in formulating IoT roadmaps? What are the tools being used today to bridge the gap from IoT ideas and concepts to problem solving solutions? Aldo Ferrante will be discussing why and how Big Data & ML technologies play a pivotal role in the IIoT ecosystem for deploying operational and economical improvements rapidly and at scale. Additional topics that fall under the umbrella of ‘Industrial Internet of Things’. · What is means to enable pushed ML intelligence down to the ‘Edge’ whether it be a GPU device or a locomotive transporting coal. o Why industry is heading down this direction as opposed to cloud computing and analysis. · Is it possible to automate ML in order to lower the barrier of entry for everyday boots on the ground, or data scientists, wanting to derive insight from their current datasets? · Discuss the two most talked about applications: predictive maintenance and process optimization and possibly show use cases? Speaker: Aldo Ferrante is the president and CEO of ITG Technologies, overseeing business development and operations while leading a management team and engineering staff. With more than 30 years of experience in automation and information technology industries, he’s achieved great success in providing turn-key solutions to clients in the technology sector, including the manufacturing, transportation, and municipality industries. Beyond software engineering and automation controls background, he has extensive knowledge in Machine Learning, Predictive Analytics, and IoT technologies. As usual, everyone is welcome, and we will have beer, pizza, and great people! Tentative Schedule: 5:30-6:00 Networking and Pizza 6:00-7:00 Defining The Industrial IoT 7:00-7:30 Questions and Final Remarks Can't wait to see everyone there!

    2
  • Building modern data lakes with Minio, Hadoop, Spark & Unified Data Architecture

    This month, we're so excited for the return of Ravi Nair. Ravi will be teaching us about building a modern data lake using Minio, Hadoop, Spark, and Unified Data Architecture. The explosion of data is causing people to rethink their long-term storage strategies. Most agree that distributed systems, one way or another, will be involved. When migrating big data workloads to the cloud, one of the most commonly asked questions is how to evaluate HDFS versus the storage systems provided by cloud providers, such as Amazon’s S3,Microsoft’s Azure Blob Storage, and Google’s Cloud Storage. In this blog post, we share our thoughts on why cloud storage is the optimal choice for data storage. In this talk, Ravi Nair we use open source Minio with S3 as an example, but the conclusions generalize to other cloud platforms. We compare S3 and HDFS along the following dimensions: Cost Elasticity SLA (availability and durability) Performance per dollar Transactional writes and data integrity Then we see how complete ecosystem Hadoop, Hive, Spark and Unified Data Architecture can seamlessly work with Object Storage Ravi Nair, the seasoned speaker at Jax Big Data is giving an insight to how the future data lakes are going to be. As always, all are welcome to attend. Thanks to CyberSURE for sponsoring this month's meetup! There will be beer, pizza and great company. Tentative Schedule 5:30-6:00 Socializing, Beer, and Pizza 6:00-7:00 Building modern data lakes with Minio, Hadoop, Spark & Unified Data Architecture 7:00-7:30 Questions and Closing Remarks

    31
  • Data Science for Social Good

    NLP Logix Building

    This month, we're extremely excited to have the Northeast Florida Data Science for Social Good 2018 Cohort. The presentation last year was so great that we couldn't miss the chance to celebrate this years interns! Florida Data Science for Social Good (FL-DSSG) program hosted at the University of North Florida (UNF) blends data science and technology design to inform and solve important social problems in Northeast Florida. FL-DSSG program is an intensive 12-week internship that invites students to tackle data-rich projects that have the potential for substantial social impact. The 2018 internship program ran from June 4th to August 24th, supported eight interns, and four community partners. Each project addressed a wicked problem facing Jacksonville residents, such as public health, children welfare, generational poverty, and at-risk youth. At the Big Data JAX meetup, FL-DSSG program directors and interns will discuss findings and reveal insights gained from the Baptist Health, Family Support Services, Girls Inc. of Jacksonville, and Performers Academy projects. 2018 FL-DSSG Internship program was funded by the Nonprofit Center for Northeast Florida and the University of North Florida. You can find more information about the 2018 DSSG program at http://dssg.unf.edu/2018program.html. Tentative Schedule: 5:30 Socializing, Beer, and Pizza! 6:00 Data Science for Social Good 7:00 Questions and Concluding Remarks.

  • 3rd Annual Jax Summer Social - Collision of Tech, Business, Art, and Culture

    Details Live DJ, Interactive Virtual Reality Gaming, Great Food, Drinks, Amazing People, and some Free Giveaways all in a killer beachside venue all to ourselves! You are invited once again to Jacksonville’s largest networking event for technologists, artists, entrepreneurs, and enthusiasts at the 3rd annual Summer Social event in Jax Beach. This year we are taking over Surfer the Bar for 4 hours of fun, networking, drinks, food, and special surprises with live music, some free drink/food giveaways, and great people. Meet local leaders, peers, and celebrate the community that makes Jax such an amazing place to live. Participating groups in this year’s summer social event include: ** JaxTech, sponsored by SourceFuse (www.sourcefuse.com) ** #StartupJax, sponsored by Community First (www.communityfirstfl.org/) ** BigDataJax, sponsored by NLP Logix (www.nlplogix.com) ** Tech on Tap, sponsored by Robert Half Technology (www.roberthalf.com) ** JITC and Tech Coast Conference, sponsored by Assessment Technologies Group (www.assessment-tech.com) -- Community sponsors include: ** ADVOS Legal (www.advoslegal.com) ** VOID Magazine (www.voidlive.com) ** Edible NorthEast Florida Magazine (http://ediblenortheastflorida.ediblecommunities.com) ** Art Republic (www.artrepublicglobal.com/) THE PARTY - Live Music, DJ, and Virtual Reality Experience We are taking over Surfer the Bar for the evening from 5-9pm, including: ** Downstairs: Interactive digital display and music ** Full Virtual Reality experience provided by VRPonteVedra.com! ** Upstairs: Live DJ and outdoor porch ** Food & Drink: Free taco (first 150 people), full food truck and cash bar, and some special surprise free drinks This is a FREE event but you must RSVP with one of the participating groups to guarantee your spot and claim your freebies.

    1