Sofia Castellanos

Sofia is a Python data engineer with 7 years building ingestion and transformation systems for media and adtech. She spent three years at Spotify on the personalization-data team, where she shipped a streaming-to-batch reconciliation pipeline that processes around 90 billion playback events per day, and two years before that at The New York Times on the subscriber-analytics platform. She focuses her writing on production pandas patterns (chunked reads, categorical memory tricks, Arrow interop), Airflow 2.x task groups, and the kinds of dbt + Python hybrid pipelines that show up once your warehouse bill stops being cute. She also maintains pyspark-helpers, a small library for column-name munging she keeps porting between jobs. Sofia is based in Madrid, originally from Bogota, and a relentless defender of type hints in notebook code.

مقالات بواسطة Sofia Castellanos