Big Data File Systems

Because a distributed file system typically is deployed on low-cost commodity hardware, server failures are common. The file system is designed to be highly fault-tolerant, however, by facilitating the rapid transfer of data between compute nodes and enabling Hadoop systems to continue running if a node fails. That decreases the risk of catastrophic failure, even in the event that numerous nodes fail.

Share

2017 Open Source Big Data File Systems

The file system is, in many ways, the very center of the Big Data universe. It’s the tools provided by the file system that enables an overall structure to a data set, that helps turns it from a vast pool of information to something that can be held and mined for insights. And if there’s a file system that is clearly the star of the show in the Big Data world, it’s HDFS, the key to Hadoop – the open source platform that, for many users, is all but synonymous with Big Data itself. Hadoop is one of the greatest success stories from the open source community. But as you’ll see on the following pages, there are other file systems and languages that are central to the Big Data world that are also open source. In fact, this list of file systems and programming languages demonstrates that importance of open source to today’s rapidly evolving Big Data toolset.

Hadoop Distributed File System

Also known as HDFS, this is the primary storage system for Hadoop. It quickly replicates data onto several nodes in a cluster in order to provide reliable, fast performance. Operating System: Windows, Linux, OS X.

ECL

ECL ("Enterprise Control Language") is the language for working with HPCC. A complete set of tools, including an IDE and a debugger are included in HPCC, and documentation is available on the HPCC site. Operating System: Linux.

Gluster

Sponsored by Red Hat, Gluster offers unified file and object storage for very large datasets. Because it can scale to 72 brontobytes, it can be used to extend the capabilities of Hadoop beyond the limitations of HDFS (see below). Operating System: Linux.

Pig

Another Apache Big Data project, Pig is a data analysis platform that uses a textual language called Pig Latin and produces sequences of Map-Reduce programs. It helps makes it easier to write, understand and maintain programs which conduct data analysis tasks in parallel. Operating System: OS Independent.

R

Developed by Bell Laboratories, R is a programming language and an environment for statistical computing and graphics that is similar to S. The environment includes a set of tools that make it easier to manipulate data, perform calculations and generate charts and graphs. Operating System: Windows, Linux, OS X.

More information: We hope this page was helpful and provided you with some information about big data file systems. Check out our main page for more components of artificial intelligence resources.

Share