Skip to content

Apache Flink and Generating Query Suggestions on Hadoop

Photo of Friso van Vollenhoven
Hosted By
Friso van V.
Apache Flink and Generating Query Suggestions on Hadoop

Details

Probably the last meetup of this year for the HUG! This time, we're happy to announce talks about Apache Flink and an interesting Hadoop use case from Sanoma Media.

We thank Sanoma Media (http://www.sanoma.nl/) (see here (http://www.sanoma.nl/pagina/vacatures-en-stages/senior-developer-dynamic-content-platform/) or here (http://www.sanoma.nl/pagina/vacatures-en-stages/?jobs-type=&jobs-area=online) for an idea about what goes on there) for their kind offer to host this meetup and offer us food and drinks.

Agenda

18.00: Arrive, drink, eat

18.45: Presentations

Generating query suggestions using query-flow graphs, Dirk Guijt, Graduate Intern @ Sanoma Media

Query suggestions can support search engine users in their search process by guiding them to the right queries. The supporting suggestions are especially important when not every query returns results, which is the case in the search engine of Startpagina.nl. We use historical query data stored in the Startpagina.nl query logs to create query-flow graphs that are used to generate meaningful query suggestions for the Startpagina.nl users. In this talk I will go into detail about how we transformed the logs into query-flow graphs using Hadoop as underlying technology and how we generate query suggestions by performing random walks on these graphs.

Introducing Apache Flink - Fast and Reliable Data Analytics in Clusters, Stephan Ewen, committer on Apache Flink, co-founder @ Data Artisans

The talk introduces the Apache Flink (incubating) project, (http://flink.incubator.apache.org (http://flink.incubator.apache.org/)), a project at the Apache Software Foundation that is compatible with the Hadoop ecosystem can run on top of HDFS and YARN. Flink pushes the technology forward in many ways: The system is built on the principle "write like a programming language, execute like a database", using a unique style of execution engine that aggressively uses in-memory execution, but very gracefully degrades to disk-based execution when memory is not enough, allowing very robust execution behavior. Flink introduces native closed-loop iteration operators, making graph analysis and machine learning application very fast on the platform.

Flink programs are not executed directly but are optimized by Flink's cost-based optimizer This means that Flink applications require little (re-)configuration and little maintenance when the cluster characteristics change and the data evolves over time. Finally, Flink's runtime is a true data streaming engine, and ongoing work in the community is unifying batch and true stream processing (rather than mini batches) in a single system. Flink is an active open source project with more than 60 contributors from industry and academia.

20.15: Some more drinks, socialize

21.00: Everybody out!

Photo of Netherlands Hadoop User Group group
Netherlands Hadoop User Group
See more events
Sanoma Media Netherlands B.V
Capellalaan 65 · Hoofddorp