pdpipe

Easy pipelines for pandas DataFrames

Ever written a preprocessing pipeline for pandas dataframes and had trouble serializing it for later deployment on a different machine? Ever needed fit-able preprocessing transformations, with tunable parameters that are inferred from training data, to be used later to transform input data? Ever struggled with preprocessing different types of data in the same pandas dataframe?

Enter pdpipe, a simple framework for serializable, chainable and verbose pandas pipelines. Its intuitive API enables you to generate, using only a few lines, complex pandas processing pipelines that can easily be broken down or composed together, examined and debugged, and that adhere to scikit-learn's Transformer API. Stop writing the same preprocessing boilerplate code again and again!

Need help? Get live help on the pdpipe community Gitter chat or open an issue on the pdpipe repository.

Compatible with Python 3+

Python 3.6 and up. Crucial for new or forward-looking projects.
Fully documented

Every pipeline stage and parameter are meticulously documented and accompanied by working code examples.
Zero configuration

Pdpipe stages use sensible defaults for everything. Get things going immediately, tune only what you need.

Handle mixed-type data

Easily create pipelines that process different types of data separately without breaking, enabling easier use of stacking-based ensemble models down the pipeline.
Customizable stages

Pipeline stages are highly configurable, and creating new custom stages is easy.
Chainable constructors & pipeline arithmetics

Chaining pipeline stages constructor calls for easy, one-liners creating complex pipelines. Supports pipeline arithmetics.

Built for productization

Pipelines and stages are written with productization in mind; fit on training data, serialize, deserialize and transform in production.
Fully tested

Pdpipe is thoroughly tested on Linux, macOS and Windows systems, as well as all Python development branches, and boasts full test coverage.
Verbose

Informative prints and errors on pipeline application, including smart pre-conditions before application and post-conditions to validate successful application.

Download

PyPI GitHub

Usage

then read the

Documentation

Help

Chat for help on the pdpipe community

Gitter