Big Data Transfer & Aggregate

Data transfer aggregation is a type of data and information mining process where data is searched, gathered and presented in a report-based, summarized format to achieve specific business objectives or processes and/or conduct human analysis.

Share

2017 Open Source Big Data Transfer and Aggregate

The task of transferring and aggregating large data sets for Big Data purposes clearly requires some heavy duty tools. Efficiently indexing a massive data set, for instance, calls for a software solution that can digest hefty levels of GB per hour as if it’s a light snack. And transferring data requires advanced software that can quickly interoperate between today’s complex, default platforms, like Hadoop and the key RDMBSes in use. As you’ll see on the following pages, many of the current leading heavyweight Big Data tools for transferring and aggregating data sets are open source. Clearly, the fact that these powerful tools are open source is testament to the growing dominance of open source in the enterprise.

Chukwa

Built on top of HDFS and MapReduce, Chukwa collects data from large distributed systems. It also includes tools for displaying and analyzing the data it collects. Operating System: Linux, OS X.

Flume

Another Apache project, Flume collects, aggregates and transfers log data from applications to HDFS. It's Java-based, robust and fault-tolerant. Operating System: Windows, Linux, OS X.

Lucene

The self-proclaimed "de facto standard for search libraries," Lucene offers very fast indexing and searching for very large datasets. In fact, it can index over 95GB/hour when using modern hardware. Operating System: OS Independent.

Solr

Solr is an enterprise search platform based on the Lucene tools. It powers the search capabilities for many large sites, including Netflix, AOL, CNET and Zappos. Operating System: OS Independent.

Sqoop

Sqoop transfers data between Hadoop and RDBMSes and data warehouses. As of March of this year, it is now a top-level Apache project. Operating System: OS Independent.

More information: We hope this page was helpful and provided you with some information about big data transfer and aggregate. Check out our main page for more components of artificial intelligence resources.

Share