Big Data Mining

Data mining involves exploring and analyzing large amounts of data to find patterns for big data. The techniques came out of the fields of statistics and artificial intelligence (AI), with a bit of database management thrown into the mix.


2017 Open Source Big Data Mining Tools

As you’ll see on the following pages, the open source community has risen to the challenge of today’s exploding Big Data market in many – and highly creative – ways. The increase in number and type of open source data mining tools is truly remarkable. Some of the tools are easy to use, suitable for beginners or even hobbyists. Others are deeply wonky and provide a toolset that’s robust enough for the most complex enterprise needs. Also note that, in true open source style, some of these Big Data tools have deep-pocketed corporate sponsors, while others appear to be classic open source projects, staffed partially (or completely) by volunteers. Whatever your level of involvement in Big Data, you’ll likely find an open source tool in these pages that will be useful for you.


The successor to jHepWork, DataMelt can do mathematical computation, data mining, statistical analysis and data visualization. It supports Java and related programming languages including Jython, Groovy, JRuby and Beanshell. Operating System: OS Independent.


KEEL stands for "Knowledge Extraction based on Evolutionary Learning," and it aims to help uses assess evolutionary algorithms for data mining problems like regression, classification, clustering and pattern mining. It includes a large collection of existing algorithms that it uses to compare and with new algorithms. Operating System: OS Independent.


This Apache project offers algorithms for clustering, classification and batch-based collaborative filtering that run on top of Hadoop. The project's goal is to build scalable machine learning libraries. Operating System: OS Independent.


This project hopes to make data mining "fruitful and fun" for both novices and experts. It offers a wide variety of visualizations, plus a toolbox of more than 100 widgets. Operating System: Windows, Linux, OS X.


RapidMiner claims to be "the world-leading open-source system for data and text mining." RapidAnalytics is a server version of that product. In addition to the open source versions of each, enterprise versions and paid support are also available from the same site. Operating System: OS Independent.


Rattle, the "R Analytical Tool To Learn Easily," makes it easier for non-programmers to use the R language by providing a graphical interface for data mining. It can create data summaries (both visual and statistical), build models, draw graphs, score datasets and more. Operating System: Windows, Linux, OS X.


Another Java-based data mining framework,SPMF originally focused on sequential pattern mining, but now also includes tools for association rule mining, sequential rule mining and frequent itemset mining. Currently, it includes 46 different algorithms. Operating System: OS Independent.


Short for "Waikato Environment for Knowledge Analysis," Weka offers a set of algorithms for data mining that you can apply directly to data or use in another Java application. It's part of a larger machine learning project, and it's also sponsored by Pentaho. Operating System: Windows, Linux, OS X.

More information: We hope this page was helpful and provided you with some information about big data mining. Check out our main page for more components of artificial intelligence resources.