c++ - Get precise line/column debug info from LLVM IR -


i trying locate instructions in llvm pass line , column number (reported third-party tool) instrument them. achieve this, compiling source files clang -g -o0 -emit-llvm , looking information in metadata using code:

const debugloc &location = instruction->getdebugloc(); // location.getline() // location.getcol() 

unfortunately, information absolutely imprecise. consider following implementation of fibonacci function:

unsigned fib(unsigned n) {     if (n < 2)         return n;      unsigned f = fib(n - 1) + fib(n - 2);     return f; } 

i locate single llvm instruction corresponding assignment unsigned f = ... in resulting llvm ir. not interested in calculations of right-hand side. generated llvm block including relevant debug metadata is:

[...]  if.end:                                           ; preds = %entry   call void @llvm.dbg.declare(metadata !{i32* %f}, metadata !17), !dbg !18   %2 = load i32* %n.addr, align 4, !dbg !19   %sub = sub i32 %2, 1, !dbg !19   %call = call i32 @fib(i32 %sub), !dbg !19   %3 = load i32* %n.addr, align 4, !dbg !20   %sub1 = sub i32 %3, 2, !dbg !20   %call2 = call i32 @fib(i32 %sub1), !dbg !20   %add = add i32 %call, %call2, !dbg !20   store i32 %add, i32* %f, align 4, !dbg !20   %4 = load i32* %f, align 4, !dbg !21   store i32 %4, i32* %retval, !dbg !21   br label %return, !dbg !21  [...]  !17 = metadata !{i32 786688, metadata !4, metadata !"f", metadata !5, i32 5, metadata !8, i32 0, i32 0} ; [ dw_tag_auto_variable ] [f] [line 5] !18 = metadata !{i32 5, i32 11, metadata !4, null} !19 = metadata !{i32 5, i32 15, metadata !4, null} !20 = metadata !{i32 5, i32 28, metadata !4, null} !21 = metadata !{i32 6, i32 2, metadata !4, null} !22 = metadata !{i32 7, i32 1, metadata !4, null} 

as can see, metadata !dbg !20 of store instruction points line 5 column 28, call fib(n - 2). worse, add operation , subtraction n - 2 both point function call, identified !dbg !20.

interestingly, clang ast emitted clang -xclang -ast-dump -fsyntax-only has information. thus, suspect somehow lost during code generation phase. seems during code generation clang reaches internal sequence point , associates following instructions position until next sequence point (e.g. function call) occurs. completeness, here declaration statement in ast:

|-declstmt 0x7ffec3869f48 <line:5:2, col:38> | `-vardecl 0x7ffec382d680 <col:2, col:37> col:11 used f 'unsigned int' cinit |   `-binaryoperator 0x7ffec3869f20 <col:15, col:37> 'unsigned int' '+' |     |-callexpr 0x7ffec382d7e0 <col:15, col:24> 'unsigned int' |     | |-implicitcastexpr 0x7ffec382d7c8 <col:15> 'unsigned int (*)(unsigned int)' <functiontopointerdecay> |     | | `-declrefexpr 0x7ffec382d6d8 <col:15> 'unsigned int (unsigned int)' function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)' |     | `-binaryoperator 0x7ffec382d778 <col:19, col:23> 'unsigned int' '-' |     |   |-implicitcastexpr 0x7ffec382d748 <col:19> 'unsigned int' <lvaluetorvalue> |     |   | `-declrefexpr 0x7ffec382d700 <col:19> 'unsigned int' lvalue parmvar 0x7ffec382d3d0 'n' 'unsigned int' |     |   `-implicitcastexpr 0x7ffec382d760 <col:23> 'unsigned int' <integralcast> |     |     `-integerliteral 0x7ffec382d728 <col:23> 'int' 1 |     `-callexpr 0x7ffec3869ef0 <col:28, col:37> 'unsigned int' |       |-implicitcastexpr 0x7ffec3869ed8 <col:28> 'unsigned int (*)(unsigned int)' <functiontopointerdecay> |       | `-declrefexpr 0x7ffec3869e10 <col:28> 'unsigned int (unsigned int)' function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)' |       `-binaryoperator 0x7ffec3869eb0 <col:32, col:36> 'unsigned int' '-' |         |-implicitcastexpr 0x7ffec3869e80 <col:32> 'unsigned int' <lvaluetorvalue> |         | `-declrefexpr 0x7ffec3869e38 <col:32> 'unsigned int' lvalue parmvar 0x7ffec382d3d0 'n' 'unsigned int' |         `-implicitcastexpr 0x7ffec3869e98 <col:36> 'unsigned int' <integralcast> |           `-integerliteral 0x7ffec3869e60 <col:36> 'int' 2 

is either possible improve accuracy of debug metadata, or resolve corresponding instruction in different way? ideally, leave clang untouched, i.e. not modify , recompile it.

turns out, has been fixed introduction of mdlocation in llvm release 3.6.0. @ time of writing, current clang compiler shipped xcode command line tools still generates former "buggy" location information, though it's version string says apple llvm version 6.1.0 (clang-602.0.49) (based on llvm 3.6.0svn). after downloading pre-built binary, generated llvm ir looks this:

[...]  ; <label>:7                                       ; preds = %0   call void @llvm.dbg.declare(metadata i32* %f, metadata !21, metadata !14), !dbg !22   %8 = load i32* %2, align 4, !dbg !23   %9 = sub i32 %8, 1, !dbg !23   %10 = call i32 @fib(i32 %9), !dbg !24   %11 = load i32* %2, align 4, !dbg !25   %12 = sub i32 %11, 2, !dbg !25   %13 = call i32 @fib(i32 %12), !dbg !26   %14 = add i32 %10, %13, !dbg !24   store i32 %14, i32* %f, align 4, !dbg !22   %15 = load i32* %f, align 4, !dbg !27   store i32 %15, i32* %1, !dbg !28   br label %16, !dbg !28   [...]  !22 = !mdlocation(line: 5, column: 14, scope: !4) !23 = !mdlocation(line: 5, column: 22, scope: !4) !24 = !mdlocation(line: 5, column: 18, scope: !4) !25 = !mdlocation(line: 5, column: 35, scope: !4) !26 = !mdlocation(line: 5, column: 31, scope: !4) !27 = !mdlocation(line: 6, column: 12, scope: !4) !28 = !mdlocation(line: 6, column: 5, scope: !4) 

the location metadata points beginning of expression. assignment, instance, left hand side specifier f @ line 5 column 14. seen in !dbg !24, might still ambiguous, unfortunately.

there has been 1 more change: access getline() , getcolumn() fail if no debug metadata attached instruction. debugloc class offers convenient way check this:

const debugloc &location = instruction->getdebugloc(); if (location) {     // location.getline()     // location.getcol() } else {     // no location metadata available } 

Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -