Apache Spark MLlib

It is a scalable machine learning library and runs on Apache Mesos, Hadoop, Kubernetes, standalone, or in the cloud. In addition to this, it can access data from multiple data sources. A wide array of algorithms is included like for Classification: naive Bayes, logistic regression, Regression: generalized linear regression, Clustering: K-means, to name a few. Its workflow utilities are ML Pipeline construction, Feature transformations, ML persistence, etc.

Key Features:
• Hadoop data source like HDFS, HBase, or local files can be used. So it is easy to plug into Hadoop workflows.
• Ease of use. It can be usable in Java, Scala, Python, and R.
• MLlib fits into Spark’s APIs and inter-operates with NumPy in Python and R libraries.
• It contains high-quality algorithms and outperforms better than MapReduce.

More Information and Official Website:
Download : https://spark.apache.org/downloads.html