groovy - Execution of bubble sort is 5 times slower with --indy -
i wrote implementation of bubble sort play around bit groovy , see if --indy
have noticeable effect on performance.
essentially, sorts list of 1 thousand random integers 1 thousand times , measures average execution time sorting list.
half of time list integer[]
, other half it's arraylist<integer>
.
the results confusing me:
$ groovyc bubblesort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. bubblesort average: 22.926ms min: 11.202ms [...] 26.48s user 0.84s system 109% cpu 25.033 total $ groovyc --indy bubblesort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. bubblesort average: 119.766ms min: 68.251ms [...] 166.05s user 1.52s system 135% cpu 2:03.82 total
looking @ cpu usage when benchmarks running, cpu usage lot higher when compiled --indy
without.
this intrigued me ran benchmarks again - time yourkit agent , cpu tracing enabled. here recorded call trees:
and here performance charts - note time scale different because --indy
code slower.
as can seen, cpu usage stabilises @ 100% of 1 core (12.5% in graph) when compiled without --indy
varies wildly between 12.5% , ~35% when compiled --indy
. what's more confusing yourkit reports 1 live thread (and code uses main thread), still manages keep 2 , half cores occupied.
the code compiled --indy
uses lot of kernel time @ start, though drops , stabilises @ 0% after while - @ point code seems speed bit (heap usage growth-rate increases) , cpu usage increases.
can explain behaviour me?
versions:
$ groovy -v groovy version: 2.4.3 jvm: 1.8.0_45 vendor: oracle corporation os: linux $ java -version java version "1.8.0_45" java(tm) se runtime environment (build 1.8.0_45-b14) java hotspot(tm) 64-bit server vm (build 25.45-b02, mixed mode)
bubblesort.groovy:
class bubblesort { final def array bubblesort(final def array) { this.array = array } private void swap(int a, int b) { def tmp = array[a]; array[a] = array[b] array[b] = tmp; } private void rise(int index) { for(int = index; > 0; i--) { if(array[i] < array[i - 1]) { swap(i, i-1) } else { break } } } void sort() { for(int = 1; < array.size(); i++) { rise } } final static random random = new random() static void main(string[] args) { def n = 1000 def size = 1000 // warm dobenchmark 100, size def results = dobenchmark n, size printf("average: %.3fms%n", results.total / 1e6 / n) printf("min: %.3fms%n", results.min / 1e6) } private static def dobenchmark(int n, int size) { long total = 0 long min = long.max_value n.times { def array = (1..size).collect { random.nextint() } if(it % 2) { array = array integer[] } def start = system.nanotime() new bubblesort<integer>(array).sort() def end = system.nanotime() def time = end - start total += time min = math.min min, time } return [total: total, min: min] } }
i'm not interested in optimisations on bubble sort implementation unless related invokedynamic
behaviour - aim here isn't write best performing bubble sort possible come grips why --indy
has such large negative performance impact.
update:
i converted code jruby , tried same thing , results similar, although jruby isn't quick without invokedynamic
:
$ java_opts="-djruby.compile.invokedynamic=false" jruby bubblesort.rb average: 78.714ms min: 35.000ms $ java_opts="-djruby.compile.invokedynamic=true" jruby bubblesort.rb average: 136.287ms min: 92.000ms
update 2:
if remove code changes list integer[]
half time performance increases significantly, though it's still faster without --indy
:
$ groovyc bubblesort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. bubblesort average: 29.794ms min: 26.577ms $ groovyc --indy bubblesort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. bubblesort average: 37.506ms min: 33.555ms
if same jruby, invokedynamic
faster:
$ java_opts="-djruby.compile.invokedynamic=false" jruby bubblesort.rb average: 34.388ms min: 31.000ms $ java_opts="-djruby.compile.invokedynamic=true" jruby bubblesort.rb average: 20.785ms min: 18.000ms
the answer easy quite easy, groovy not have pic yet.
... or can have inline cache of size 1. means every time change list array type, invalidate caches did exist before , cached version thrown away. normal groovy same indy, normal groovy uses runtime generated classes , indy uses invokedynamic/lambda forms. lambda forms slower, while peak performance better. let hotspot start scratch of method calls, preventing apply optimizations time. of course not fault, fault of groovy not having pic yet. , make clear... no problem of language, did not yet implement.
jruby on other hand has pic , has not suffer overhead of creating new method handles time.