Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package provides R wrapper classes and functions for the JPMML-SparkML library. For the full list of supported Apache Spark ML Estimator and Transformer types, please refer to JPMML-SparkML documentation.

Prerequisites

Apache Spark 2.0.X, 2.1.X, 2.2.X, 2.3.X, 2.4.X, 3.0.X, 3.1.X or 3.2.X.
R 3.3 or newer.

Installation

Install from GitHub using the devtools package:

library("devtools")

install_git("git://github.com/jpmml/sparklyr2pmml.git")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark version	JPMML-SparkML development branch	JPMML-SparkML uber-JAR file
2.0.X	`1.1.X` (Archived)	1.1.23
2.1.X	`1.2.X` (Archived)	1.2.15
2.2.X	`1.3.X` (Archived)	1.3.15
2.3.X	`1.4.X` (Archived)	1.4.21
2.4.X	`1.5.X` (Archived)	1.5.14
3.0.X	`1.6.X`	1.6.6
3.1.X	`1.7.X`	1.7.3
3.2.X	`master`	1.8.0

Adding the JPMML-SparkML uber-JAR file to Sparklyr execution environment:

library("sparklyr")

config = spark_config()
config[["sparklyr.jars.default"]] = "/path/to/jpmml-sparkml-executable-${version}.jar"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
	ft_r_formula(Species ~ .) %>%
	ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io