sparklyr2pmml

×

Error message

  • Warning: count(): Parameter must be an array or an object that implements Countable in _term_reference_tree_output_list_level() (line 202 of /var/www/html/sites/all/modules/term_reference_tree/term_reference_tree.widget.inc).
  • Warning: count(): Parameter must be an array or an object that implements Countable in _term_reference_tree_output_list_level() (line 202 of /var/www/html/sites/all/modules/term_reference_tree/term_reference_tree.widget.inc).
  • Warning: count(): Parameter must be an array or an object that implements Countable in _term_reference_tree_output_list_level() (line 202 of /var/www/html/sites/all/modules/term_reference_tree/term_reference_tree.widget.inc).
Published by jpmml on October 20, 2021
Wednesday, October 20, 2021

Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package provides R wrapper classes and functions for the JPMML-SparkML library. For the full list of supported Apache Spark ML Estimator and Transformer types, please refer to JPMML-SparkML documentation.

Prerequisites

  • Apache Spark 2.0.X, 2.1.X, 2.2.X, 2.3.X, 2.4.X, 3.0.X, 3.1.X or 3.2.X.
  • R 3.3 or newer.

Installation

Install from GitHub using the devtools package:

library("devtools")

install_git("git://github.com/jpmml/sparklyr2pmml.git")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Apache Spark version JPMML-SparkML development branch JPMML-SparkML uber-JAR file
2.0.X 1.1.X (Archived) 1.1.23
2.1.X 1.2.X (Archived) 1.2.15
2.2.X 1.3.X (Archived) 1.3.15
2.3.X 1.4.X (Archived) 1.4.21
2.4.X 1.5.X (Archived) 1.5.14
3.0.X 1.6.X 1.6.6
3.1.X 1.7.X 1.7.3
3.2.X master 1.8.0

Adding the JPMML-SparkML uber-JAR file to Sparklyr execution environment:

library("sparklyr")

config = spark_config()
config[["sparklyr.jars.default"]] = "/path/to/jpmml-sparkml-executable-${version}.jar"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
	ft_r_formula(Species ~ .) %>%
	ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io


5

Technologies: 
R
How it helps users: 
License: 
GNU Affero General Public License v3.0
Rating: 
No votes yet