Feature engineering: secret weapon for improving Machine Learning (ML) Solutions!!!
These days we have developed some examples of Geospatial Maps with CDMX Open Data, in particular historical data from Linea Mujeres, until Nov 2020 that was migrated to LOCATEL
We generated a heat map, with GeoPandas and Folium, of CDMX that puts colors according to the number of calls from women
https://catedraunescodh.unam.mx/catedra/Papiit2017/mapas/MapaLineaMujeresDatosHistoricos202011.html
The data was obtained from:
https://datos.cdmx.gob.mx/dataset/linea-mujeres
Another that offers access to data in Federal Entities
They are examples of how to display information with GeoSpatial Maps using Python, Folium and soon with Streamlit and inserting Time Series and numerical prediction with Prophet
Only open source libraries and open data with Python, don't miss our next workshops on these topics and also on NLP for text handling with SpaCy!!
We have already recorded several tutorials on the subject.
They are the shorts of the film (Trailer) before the premiere in 2022.
Free access on YouTube:
Conference: Data Science with a Gender Perspective. LOCATEL case
A first tutorial in Python with Jupyter Notebook
Second tutorial
Third tutorial
Forth tutorial
Fifth tutorial
Sixth tutorial
Wednesday, March 02, 2022 at 7:00 p.m.
Course with RECOVERY FEE TO RECEIVE THE MATERIAL: 900 pesos
Workshop duration: 2 hrs
It is the Quality Ingredients Process!!!!
Great models cannot exist without great quality data.
Now we say: “machine learning is basically feature engineering”
A feature as an attribute/column of data that is meaningful to an ML model.
Now Datasets have a large number of columns compared to the number of observations.
This can lead to what is known as the curse of dimensionality which describes an extremely sparse universe of data that ML models have difficulty learning from Interpretability of the data and model is key.
Feature engineering is the most interesting technique now!!
Feature engineering is the methodology used to extract numerical representations from unstructured data for an unsupervised (trying to extract structure from a previously unstructured dataset) model.
Feature engineering is transforming data into a format that optimally represents the underlying problem that an ML algorithm is trying to model
How we use algorithms and statistical testing procedures to identify the strongest features?
Stop tuning your ML model; learn more about features engineering; it will change your life.
Deliver huge improvements to your machine learning pipelines without spending hours fine-tuning parameters!
Feature engineering is the secret weapon for improving your machine learning’s output.
By enhancing the data ingestion, manipulation, and transformation elements of your pipeline, you can see dramatic improvements in your downstream results without endlessly fine-tuning parameters or chasing the latest models.
Without understanding the data, it is impossible to capture, learn from, and scale up the patterns locked within the data.
The CRISP-DM is a Machine Learning Pipeline in 6 steps. The first 3 are strategic and close related to Feature engineering.
We have our practical methodology of 12 tasks that we share at the workshop for the followings aspects of our CRISP-WebGestiones Methodology
1. Defining the Problem Domain
2. Obtain data and Exploratory Data Analysis
3. Feature Engineering. Prepare data as numerical features for the algorithms
On Friday, December 10, 2021, we closed our 2021 Data Science cycle.
Course with gender perspective: Python, Matplotlib, Seaborn, Folium, GeoPandas and Streamlit deployment with Open Data from CDMX associated with women
In the workshop we use Open Data from CDMX to show how we carry out the Exploratory Data Analysis (EDA) to determine feature importance with Feature engineering methods.
Much of the current discourse around Artificial Intelligence (AI) and Machine learning (ML) is inherently model-centric, focusing on the latest advancements in ML and deep learning.
This model-first approach often comes with at best little regard and at worst, total disregard to the data being used to train said models.
Workshop duration: 2 hrs
RECOVERY FEE TO RECEIVE THE MATERIAL: 900 pesos
Check our YouTube channel
https://www.youtube.com/channel/UCf49zOpIRJo5zK9P0Ma-sFQ/videos
Paypal payment
PayPal.me/saxsa2000
Bank payment, request CLABE BBVA
Dr Gabriel Guerrero
saxsa2000 (at) gmail.com
Claim the event and start manage its content.
I am the organizer