Using Spark DataFrames for large scale data science

Posted by bob on Mar 26, 2015 3:46 PM EDT
Opensource.com
Mail this story
Print this story

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens. read more

Full Story

  Nav
» Read more about: Groups: Python; Story Type: News Story

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.