scala/spark mapping [String,List[String]] to String pairs -
i have rdd structures of rdd:[string a,list(strings bs)]
map rdd:[string a,string b]
, each element in list matched string a. efficient way this?
i using flatmapvalues
, efficient way? (i have huge dataset)
rdd.flatmapvalues(identity)
should job done.
that should pretty efficient , simple way. optimize performance, compare implementation using mappartitions
, pick better of two. wouldn't expect huge difference in both cases wrapper objects need created anyway.
rdd.mappartitions(iter => iter.flatmap(elem => elem._2.map(v => (elem._1,v)))