scala - Spark RDD Lifecycle: whether RDD will be reclaimed out of scope -


in method, create new rdd, , cache it, whether spark unpersist rdd automatically after rdd out of scope?

i thinking so, what's happens?

no, won't unpersisted automatically.

why ? because maybe looks rdd not needed anymore, spark model not materialize rdd until needed transformation, it's hard tell "i won't need rdd" anymore. you, can tricky, because of following situation :

javardd<t> rddunion = sc.parallelize(new arraylist<t>()); // create empty merging (int = 0; < 10; i++) {   javardd<t2> rdd = sc.textfile(inputfilenames[i]);   rdd.cache(); // since used twice, cache.   rdd.map(...).filter(...).saveastextfile(outputfilenames[i]); //  transform , save, rdd materializes   rddunion = rddunion.union(rdd.map(...).filter(...)); // transform t , merge union   rdd.unpersist(); // seems not needed. (but needed actually)   // here, rddunion materializes, , needs 10 rdds unpersisted. so, rebuilding 10 rdds occur.  rddunion.saveastextfile(mergedfilename); } 

credit code sample spark-user ml


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -