Apr 14, 2018 · San Francisco, United States of America

This meetup is co-hosted with the Data Institute at USF.

Prodigy: An annotation tool designed for rapid iteration and developer productivity
Ines Montani (Co-founder of Explosion AI)

Most developers working with machine learning recognize that data quality and quantity is a more important factor for the success of their project than the specifics of their statistical model. Despite this, it's common for inexperienced teams to make almost no investment into their data. Even amongst more experienced teams, developers often under-estimate the extent to which annotation is a knowledge-based process that requires several iterations to perfect. As a solution, we suggest machine learning developers perform initial annotations themselves, to help them refine the schema. To enable this workflow, we've developed Prodigy, an annotation tool with several features designed to improve productivity. In this talk I'll discuss what we've learned about annotation, and show you how we've implemented these insights into Prodigy.

spaCy: Multi-lingual natural language understanding with spaCy
Matthew Honnibal (Co-founder of Explosion AI and author of spaCy)

spaCy is a popular open-source Natural Language Processing library designed for practical usage. In this talk, I'll outline the new parsing model we've been developing to improve spaCy's support for more languages and text types. The parsing model takes an incremental approach, reading the words one-by-one and updating the parse state, by pushing or popping words to a stack, creating arcs between them, inserting sentence boundaries, or splitting and merging tokens. This allow a single neural network model to determine the sentence segmentation, tokenization and dependency parse of a whole document. This joint approach improves parse accuracy on many types of text, especially for languages such as Chinese. When the new model is complete, spaCy will be able to support a much wider variety of languages, with a better balance of efficiency, accuracy and customisability.

