The Austin Python Meetup Monthly Meetup

Mar 9, 2022 · Austin, United States of America

We typically have a main presentation or a series of lightning talks, followed by discussion and Q&A. There is a diversity of domains and experience levels represented, so come with your questions and be prepared to talk about how you use Python!

Talk 1: Samuel Oranyeli - Helping Pandas with Pyjanitor

Talk 2: Niels Bantilan - Pandera: A Statistical Data Testing Toolkit for Dataframe-like Objects

-------------------------------------

Talk 1: Helping Pandas with Pyjanitor

Description:

Pyjanitor aims to help with cleaning data within Pandas space, while offering verb-like methods that abstract cleaning/wrangling, while still being chainable and interoperable with Pandas.

Speaker Bio:

Samuel Oranyeli is a Snr Engineer at Slalom Australia. Loves wrangling data. Find him on stackoverflow (@sammywemmy)

Talk 2: Pandera: A Statistical Data Testing Toolkit for Dataframe-like Objects

Description:

Data manipulation is a core part of any computational process. Whether it’s processing data for business analytics reports, statistical scientific studies, or predictive machine learning models, data needs to be reshaped into a form intended for a particular use case. Data testing is the act of validating not only data but also the functions that produce those data based on a priori assumptions obtained through domain expertise or exploratory analysis.

This talk will dive deep into Pandera, a data testing toolkit for dataframe-like objects in Python, including pandas, modin, dask, and koalas. We’ll cover the basics of defining schemas, creating custom checks, and type-checking dataframes in functions. We’ll also introduce you to more advanced data testing concepts like property-based testing, data profiling, and statistical hypothesis testing. Finally, this talk will outline the roadmap for the project and highlight newly released integrations with other libraries in the Python ecosystem. By the end of this talk you’ll be able to define your own schemas, validate dataframes flowing through your data pipelines, and create property-based unit tests using the tools provided by Pandera.

Speaker Bio:

Niels is a machine learning engineer and core maintainer of Flyte, an open source ML orchestration tool, and author and maintainer of Pandera, a data testing tool for dataframes. He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems. He enjoys developing open source tools for improving data science and machine learning practice.

Event organizers
  • The Austin Python Meetup

    Meet other local Python programming language enthusiasts!  Ask your questions about any aspect of Python development, including "how do I start learning Python?" You may also join the merged APUG/AWPUG mailing list to participate in more discussions.  Visit http://austinpython.org to sign up! Additionally, you can find us on IRC in the #austinpy channel on Freenode.

    Recent Events
    More

Are you organizing The Austin Python Meetup Monthly Meetup?

Claim the event and start manage its content.

I am the organizer
Social
Topics
Rating

based on 0 reviews