Processing hierarchical tables with Spark, by José Luis Sánchez from Zurich

Apr 26, 2018 · Barcelona, Spain

Hello Sparklers!
We have organized another great event for you this April!
This time we will dive into the insurance world, thanks to our next host, Zurich!
Our next speaker - José Luis Sánchez - is Platform Manager at the Big Data Delivery Center of Zurich in Barcelona and he will talk about a very complex topic: handling schema evolution in a large corporate. It doesn't sound easy, isn't? ;)

So, next Thursday 26th of April, 19:00. Don't miss it!!!

Abstract:
Zurich receives a lot of information from different aggregators, brokers, and other sources. Most of this data moved from the online questionnaires in XML format contains useful and interesting data for research and analysis purposes. This data is semi-structured so that it is validated and structured against XSD schemas, which may evolve since at any point in time there might be new fields added, changed or even removed.
Once these inputs are structured, they are available for being inserted into partitioned Hive tables that require a unique schema for all their partitions regardless of any previous schema. In order to conciliate the older schemas with the new ones, a "Schema Evolution Pipeline" is performed, whose main target is to keep schema versioning, availability and conciliation for all tables avoiding any possibility of table inconsistency. Later on, all the data coming from different entities have to be joined into one single entity in a hierarchical raw format which is available for data research and analysis, not so friendly with traditional BI techniques. For this, it is run at the end the consumption-lake which creates hierarchical-less tables fully available for BI purposes.

Bio:
Jose L Sánchez: José Luis has been working with data since the beginning of his career and studied Computer Science in Murcia University. For more than 9 years, Jose Luis has dealt with small and big data in different industries: banking, public services, airlines, software... He has worked as a full-stack data engineer from the role of pure developer to operations. He is currently Platform Manager at the Big Data Delivery Center of Zurich in Barcelona. He is a professor at the MBIT school of BI & Big Data Master, speaker @ codemotion and Barcelona Big Data Congress.

Sergio Álvarez: Sergio studied Telecommunication Engineering in the UPM University (Madrid), by the last year there he went to the Helmut-Schmidt Universität in Hamburg where he did his Master Project about the development of a system for automated image recognition of ships with Machine Learning. This project meant for Sergio the interest and motivation for continuing working with data so, after that, he spent 4 years working with data for financial services, Marketing and Telecommunication in several projects related with Alarm Management, Product Sentiment Analysis and Bank Ratios.

Gerard Solà: Gerard has a degree in Telecommunications Engineering from the Telecom BCN (UPC) in October 2014. As senior developer, he really cares about the quality of the software, and how to use correctly the agile methodologies and technologies during the Project development. Accustomed to work under tight deadlines, being agile and adaptive and contributing to teamwork focused in excellence. From 2015 is working in Big Data technologies and during this time he worked with most used platforms: Cloudera and Hortonworks.

Carlos Herrera: Carlos studied Computer Engineering in the UPC University (Barcelona) in September 2002. After 10 years of programming, analysis and architecture in the world of traditional consulting in different business environments, he decided to make the transition into the world of Big Data to learn how to transfer the knowledge acquired to distributed, scalable and flexible environments in which to process large amounts of information. Currently in ServiZurich, he works as a Technical Lead helping different projects to implement them correctly, also carrying out development and management tasks.

Event organizers

Are you organizing Processing hierarchical tables with Spark, by José Luis Sánchez from Zurich?

Claim the event and start manage its content.

I am the organizer
Social
Rating

based on 0 reviews