• About
  • Privacy Policy
  • Contact us
  • Submit a story
Linux-News Linux news from the Blogosphere
  • Article
  • Howto
  • News
  • Opinion
  • Review
  • RSS Feed
  • Twitter
  • Facebook

An Overview of Apache’s Hadoop

By
News
– June 14, 2012Posted in: Article, Review, Submitted

By Chandra Heitzman

Designed by the Apache Software Foundation, Hadoop is a Java-based open-source platform designed to process massive amounts of data in a distributed computing environment. Hadoop’s key innovations lay in its ability to store and access massive amounts of data over thousands of computers and to coherently present that data.

Though data warehouses can store data on a similar scale, they are costly and do not allow for effective exploration of huge amounts of discordant data. Hadoop addresses this limitation by taking a data query and distributing it over multiple computer clusters. By distributing the workload over thousands of loosely networked computers (nodes), Hadoop can potentially examine and present petabytes of heterogeneous data in a meaningful format. Even so, the software is fully scalable and can operate on a single server or small network.

Hadoop’s distributed computing abilities are actually derived from two software frameworks: the Hadoop Distributed File System (HDFS) and MapReduce. HDFS facilitates rapid data transfer between computer nodes and allows continued operation even in the event of node failure. MapReduce distributes all data processing over these nodes, thus reducing the workload on each individual computer and allowing for computations and analysis beyond the capabilities of a single computer or network. For example, Facebook uses MapReduce for analysis of user behavior and advertisement-tracking, amounting to about 21 petabytes of information. Other prominent users include IBM, Yahoo, and Google, typically for use in search engines and advertising.

A typical application of Hadoop requires the understanding that it is designed to run on a large number of machines without shared hardware or memory. When a financial institution wants to analyze data from dozens of servers, Hadoop breaks apart the data and distributes it throughout those servers. Hadoop also replicates the data, preventing data loss in the event of most failures. In addition, MapReduce expands potential computing speed by dividing and distributing LARGE data analysis through all servers or computers in a cluster, but answers the query in a single result set.

Though Hadoop offers a scalable approach to data storage and analysis, it is not meant as a substitute for a standard database (e.g. SQL Server 2012 database). Hadoop stores data in files, but does not index them for easy locating. Finding the data requires MapReduce, which will take more time than what can be considered efficient for simple database operations. Hadoop functions best when the dataset is too large for conventional storage and too diverse for easy analysis.

The digitization of information has increased nine times in the last five years, with companies spending an estimated four trillion dollars worldwide on data management in 2011. Doug Cutting, creator of Cloudera and Hadoop, estimates that 1.8 zettabytes (1.8 trillion gigabytes) were created and replicated in the same year. Ninety percent of this information is unstructured, and Hadoop and applications like it offer the only current method of keeping this data comprehensible.


For more information about SQL 2012 development, visit Magenic who have been one of the leading software development companies providing innovative custom software development to meet unique business challenges for some of the most recognized companies and organizations in the nation.

Article Source: http://EzineArticles.com/?expert=Chandra_Heitzman

Article Source: http://EzineArticles.com/6928548

Tags: apache software foundation, big data, computer clusters, computer nodes, data query, data warehouses, Facebook, google, hadoop, hdfs, individual computer, massive amounts, open source platform, rapid data transfer, search engines

About News

No Comments

Start the ball rolling by posting a comment on this article!

Leave a Reply Cancel reply

Your email address will not be published.

  • Suggested Sites
    http://linuxaria.com everything about Linux http://linuxaria.com everything about Linux
  • Curated topic
  • Recent Posts
    • Get Verified: Ultimate Guide to Family Member Verification Proof of Residency Letter in PDF
    • How to Shrink PDF File Size Without Losing Quality
    • Unlock Your ASVAB Success: A Comprehensive Guide to Practice Test PDFs
    • How to Import a PDF into Excel: A Comprehensive Guide
    • How to Effortlessly Edit PDFs in Word: A Comprehensive Guide
    • How to Shrink a PDF File: Ultimate Guide to Reducing PDF Size
    • Unlock Spanish Proficiency with Realidades 1 Para Empezar PDF: A Comprehensive Guide
    • Printable Child Travel Consent Form PDF: A Guide for Parents
    • Master Simple Purchase Agreements: A Comprehensive Guide in PDF
    • Free PDF Editing: A Comprehensive Guide to Unleash Your Document Potential
  • Ranking

About Arras WordPress Theme

All site content, except where otherwise noted, is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.