Natural Language Translation and Apache Spark Test Driven Development

Shopee Singapore

5 Science Park Dr · Singapore

How to find us

We'll distribute a building security form to fill up. Then, Shopee @ #03-01 (Canteen Area)

Location image of event venue


Spring is when you feel like a spark flying! Come over to Shopee's offices, where we'll go through two really cool topics presented by Apache Spark and Deep Learning practitioners:

- "Natural Language Translation at Shopee’s Data Science" - the project on tensorflow, pytorch and OpenNMT helps Shopee's cross-border business with automated translation, saving costs that would be incurred by using third-party vendors. Some state-of-the-art Deep Learning models in the current Machine Translation research field. Learn how Shopee data science team implemented it!

- "Data Integration using Guzzle and Test Automation" - the resilient, enterprise-grade, YAML-driven ETL framework that Just Analytics is known for wouldn't have existed without Cucumber, daily build and regression testing. Batch, Streaming, Near Real-Time. Jenkins, Docker, Apache Atlas, Kafka, - everything is vigorously scrutinised to ensure the changes are introduced without any disruption to the customer's existing performance and resilience SLA-s.


Speaker 1: Shao Hongxin (Shopee)
Data Scientist in Shopee and a part-time PhD from Nanyang Technological University. His research interest is sequential data and natural language processing.

Speaker 2: Umesh Kakkad (Just Analytics)
Umesh is Co-founder of Just Analytics (JA), a specialized IT consulting firm focusing on data and analytics space. He has over 16 years of experience in big data, data warehouse and analytics, spanning across wide range of industry domains. He is the Delivery and R&D Head at JA and manages a team of over 40 consultants both in Singapore and the region and oversees delivery of data lakes , data warehouse, BI and analytics projects across the region. He is also leading the design and build of JA’s flagship product Guzzle, a data integration workbench which simplifies building, managing, orchestrating and monitoring data engineering jobs and uses Apache Spark as the runtime.