PyData @ Citi

Dec 15, 2021 · Tel Aviv-Yafo, Israel

We would like to thank Citi Innovation lab for hosting us.
Agenda
18:00-18:30 Gathering
18:30-18:45 A word from our sponsor
18:45-19:15 Different Approaches for Document Augmentation (Dr. Shelly Aviv, Senior & Eylon Gueta, Citi)
19:15-19:30 Break
19:30-20:00 Generating Synthetic Data at Scale with the Help of Modern Execution Technologies (Or Sher / Datagen)
20:00-20:30 Semantic column matching (Ran Dan / Argmax)
========================
Different Approaches for Document Augmentation
-----------------------------------
In the last few years deep learning models and architecture are rapidly evolving, which result an ongoing improvement in the performance of different NLP tasks. However, as advanced the cutting-edge models would be, one of the major bottleneck in their daily usage is the amount of annotated data that is available for their training. Though different methods for data augmentation were successfully applied in image processing, in NLP data augmentation is still maturing. In this talk we will present different approaches for tackling the limited dataset size issue, by using data augmentation and synthetic data generation. Text documents may contain several different formats of textual data. Our methodologies make use of different ways of augmentation, based on the input ontology and its positional coordinates in the document.

========================
Generating Synthetic Data at Scale with the Help of Modern Execution Technologies
Speaker: Or Sher, Infrastructure Team Lead, Datagen
-------------------------
Datagen started creating synthetic images using on-premise consumer GPU machines which did not provide the flexibility and scalability required for larger scale operations.
We needed a scalable system that enables large-scale generation of 3D environments, a CPU intensive process, and rendering the images from within the 3D environments, a GPU intensive process.
This presentation will share our journey of building our internal K8s based, cloud agnostic system to enable us to provision and utilize thousands of GPU and CPU resources exactly and only when we need them..
We will cover aspects of reliability, performance, efficiency, cost optimization, and also:

What is synthetic data
The challenges of generating simulated data at scale serving many customers.
Architecture and coding challenges
Move fast and keep code clean

========================
Semantic column matching
-----------------------------------
In the data age, we are swamped by various data sources with different naming conventions and query styles.
In this talk we would go over a solution we developed for a client to match column names and schemas across various data sources.
We would demonstrate how word2vec and dynamic programming assist us in semantic matching.

Event organizers
  • PyData Tel Aviv

    PyData brings together data scientists and developers to share ideas and learn from each other. The goals are to provide data science enthusiasts, across various domains, a place to discuss how best to apply languages and tools to the challenges of data management, processing, analytics, and visualization. PyData Tel Aviv will start with a series of meetups and continue with a PyData conference. Each meetup will include a few lectures from industry experts in various practical data science fields.

    Recent Events
    More

Are you organizing PyData @ Citi?

Claim the event and start manage its content.

I am the organizer
Social
Topics
Rating

based on 0 reviews

Featured Events