groovy - Execution of bubble sort is 5 times slower with --indy -


i wrote implementation of bubble sort play around bit groovy , see if --indy have noticeable effect on performance.

essentially, sorts list of 1 thousand random integers 1 thousand times , measures average execution time sorting list.

half of time list integer[], other half it's arraylist<integer>.

the results confusing me:

$ groovyc bubblesort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. bubblesort average: 22.926ms min: 11.202ms [...] 26.48s user 0.84s system 109% cpu 25.033 total  $ groovyc --indy bubblesort.groovy $ time java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. bubblesort average: 119.766ms min: 68.251ms [...] 166.05s user 1.52s system 135% cpu 2:03.82 total 

looking @ cpu usage when benchmarks running, cpu usage lot higher when compiled --indy without.

cpu usage

this intrigued me ran benchmarks again - time yourkit agent , cpu tracing enabled. here recorded call trees:

without --indy: call tree without <code>--indy</code>

with --indy: call tree <code>--indy</code>

and here performance charts - note time scale different because --indy code slower.

without --indy (1s scale): performance without <code>--indy</code>

with --indy (60s scale): performance <code>--indy</code>

as can seen, cpu usage stabilises @ 100% of 1 core (12.5% in graph) when compiled without --indy varies wildly between 12.5% , ~35% when compiled --indy. what's more confusing yourkit reports 1 live thread (and code uses main thread), still manages keep 2 , half cores occupied.

the code compiled --indy uses lot of kernel time @ start, though drops , stabilises @ 0% after while - @ point code seems speed bit (heap usage growth-rate increases) , cpu usage increases.

can explain behaviour me?

versions:

$ groovy -v groovy version: 2.4.3 jvm: 1.8.0_45 vendor: oracle corporation os: linux  $ java -version java version "1.8.0_45" java(tm) se runtime environment (build 1.8.0_45-b14) java hotspot(tm) 64-bit server vm (build 25.45-b02, mixed mode) 

bubblesort.groovy:

class bubblesort {     final def array      bubblesort(final def array) {         this.array = array     }      private void swap(int a, int b) {         def tmp = array[a];         array[a] = array[b]         array[b] = tmp;     }      private void rise(int index) {         for(int = index; > 0; i--) {             if(array[i] < array[i - 1]) {                 swap(i, i-1)             } else {                 break             }         }     }      void sort() {         for(int = 1; < array.size(); i++) {             rise         }     }      final static random random = new random()      static void main(string[] args) {         def n = 1000         def size = 1000         // warm         dobenchmark 100, size         def results = dobenchmark n, size         printf("average: %.3fms%n", results.total / 1e6 / n)         printf("min: %.3fms%n", results.min / 1e6)     }      private static def dobenchmark(int n, int size) {         long total = 0         long min = long.max_value         n.times {             def array = (1..size).collect { random.nextint() }             if(it % 2) {                 array = array integer[]             }             def start = system.nanotime()             new bubblesort<integer>(array).sort()             def end = system.nanotime()             def time = end - start             total += time             min = math.min min, time         }         return [total: total, min: min]     } } 

i'm not interested in optimisations on bubble sort implementation unless related invokedynamic behaviour - aim here isn't write best performing bubble sort possible come grips why --indy has such large negative performance impact.

update:

i converted code jruby , tried same thing , results similar, although jruby isn't quick without invokedynamic:

$ java_opts="-djruby.compile.invokedynamic=false" jruby bubblesort.rb average: 78.714ms min: 35.000ms  $ java_opts="-djruby.compile.invokedynamic=true" jruby bubblesort.rb average: 136.287ms min: 92.000ms 

update 2:

if remove code changes list integer[] half time performance increases significantly, though it's still faster without --indy:

$ groovyc bubblesort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3.jar:. bubblesort average: 29.794ms min: 26.577ms  $ groovyc --indy bubblesort.groovy $ java -cp ~/.gvm/groovy/current/embeddable/groovy-all-2.4.3-indy.jar:. bubblesort average: 37.506ms min: 33.555ms 

if same jruby, invokedynamic faster:

$ java_opts="-djruby.compile.invokedynamic=false" jruby bubblesort.rb average: 34.388ms min: 31.000ms  $ java_opts="-djruby.compile.invokedynamic=true" jruby bubblesort.rb average: 20.785ms min: 18.000ms 

the answer easy quite easy, groovy not have pic yet.

... or can have inline cache of size 1. means every time change list array type, invalidate caches did exist before , cached version thrown away. normal groovy same indy, normal groovy uses runtime generated classes , indy uses invokedynamic/lambda forms. lambda forms slower, while peak performance better. let hotspot start scratch of method calls, preventing apply optimizations time. of course not fault, fault of groovy not having pic yet. , make clear... no problem of language, did not yet implement.

jruby on other hand has pic , has not suffer overhead of creating new method handles time.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -