c++ - Get precise line/column debug info from LLVM IR -
i trying locate instructions in llvm pass line , column number (reported third-party tool) instrument them. achieve this, compiling source files clang -g -o0 -emit-llvm
, looking information in metadata using code:
const debugloc &location = instruction->getdebugloc(); // location.getline() // location.getcol()
unfortunately, information absolutely imprecise. consider following implementation of fibonacci function:
unsigned fib(unsigned n) { if (n < 2) return n; unsigned f = fib(n - 1) + fib(n - 2); return f; }
i locate single llvm instruction corresponding assignment unsigned f = ...
in resulting llvm ir. not interested in calculations of right-hand side. generated llvm block including relevant debug metadata is:
[...] if.end: ; preds = %entry call void @llvm.dbg.declare(metadata !{i32* %f}, metadata !17), !dbg !18 %2 = load i32* %n.addr, align 4, !dbg !19 %sub = sub i32 %2, 1, !dbg !19 %call = call i32 @fib(i32 %sub), !dbg !19 %3 = load i32* %n.addr, align 4, !dbg !20 %sub1 = sub i32 %3, 2, !dbg !20 %call2 = call i32 @fib(i32 %sub1), !dbg !20 %add = add i32 %call, %call2, !dbg !20 store i32 %add, i32* %f, align 4, !dbg !20 %4 = load i32* %f, align 4, !dbg !21 store i32 %4, i32* %retval, !dbg !21 br label %return, !dbg !21 [...] !17 = metadata !{i32 786688, metadata !4, metadata !"f", metadata !5, i32 5, metadata !8, i32 0, i32 0} ; [ dw_tag_auto_variable ] [f] [line 5] !18 = metadata !{i32 5, i32 11, metadata !4, null} !19 = metadata !{i32 5, i32 15, metadata !4, null} !20 = metadata !{i32 5, i32 28, metadata !4, null} !21 = metadata !{i32 6, i32 2, metadata !4, null} !22 = metadata !{i32 7, i32 1, metadata !4, null}
as can see, metadata !dbg !20
of store
instruction points line 5 column 28, call fib(n - 2)
. worse, add operation , subtraction n - 2
both point function call, identified !dbg !20
.
interestingly, clang ast emitted clang -xclang -ast-dump -fsyntax-only
has information. thus, suspect somehow lost during code generation phase. seems during code generation clang reaches internal sequence point , associates following instructions position until next sequence point (e.g. function call) occurs. completeness, here declaration statement in ast:
|-declstmt 0x7ffec3869f48 <line:5:2, col:38> | `-vardecl 0x7ffec382d680 <col:2, col:37> col:11 used f 'unsigned int' cinit | `-binaryoperator 0x7ffec3869f20 <col:15, col:37> 'unsigned int' '+' | |-callexpr 0x7ffec382d7e0 <col:15, col:24> 'unsigned int' | | |-implicitcastexpr 0x7ffec382d7c8 <col:15> 'unsigned int (*)(unsigned int)' <functiontopointerdecay> | | | `-declrefexpr 0x7ffec382d6d8 <col:15> 'unsigned int (unsigned int)' function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)' | | `-binaryoperator 0x7ffec382d778 <col:19, col:23> 'unsigned int' '-' | | |-implicitcastexpr 0x7ffec382d748 <col:19> 'unsigned int' <lvaluetorvalue> | | | `-declrefexpr 0x7ffec382d700 <col:19> 'unsigned int' lvalue parmvar 0x7ffec382d3d0 'n' 'unsigned int' | | `-implicitcastexpr 0x7ffec382d760 <col:23> 'unsigned int' <integralcast> | | `-integerliteral 0x7ffec382d728 <col:23> 'int' 1 | `-callexpr 0x7ffec3869ef0 <col:28, col:37> 'unsigned int' | |-implicitcastexpr 0x7ffec3869ed8 <col:28> 'unsigned int (*)(unsigned int)' <functiontopointerdecay> | | `-declrefexpr 0x7ffec3869e10 <col:28> 'unsigned int (unsigned int)' function 0x7ffec382d490 'fib' 'unsigned int (unsigned int)' | `-binaryoperator 0x7ffec3869eb0 <col:32, col:36> 'unsigned int' '-' | |-implicitcastexpr 0x7ffec3869e80 <col:32> 'unsigned int' <lvaluetorvalue> | | `-declrefexpr 0x7ffec3869e38 <col:32> 'unsigned int' lvalue parmvar 0x7ffec382d3d0 'n' 'unsigned int' | `-implicitcastexpr 0x7ffec3869e98 <col:36> 'unsigned int' <integralcast> | `-integerliteral 0x7ffec3869e60 <col:36> 'int' 2
is either possible improve accuracy of debug metadata, or resolve corresponding instruction in different way? ideally, leave clang untouched, i.e. not modify , recompile it.
turns out, has been fixed introduction of mdlocation in llvm release 3.6.0. @ time of writing, current clang compiler shipped xcode command line tools still generates former "buggy" location information, though it's version string says apple llvm version 6.1.0 (clang-602.0.49) (based on llvm 3.6.0svn)
. after downloading pre-built binary, generated llvm ir looks this:
[...] ; <label>:7 ; preds = %0 call void @llvm.dbg.declare(metadata i32* %f, metadata !21, metadata !14), !dbg !22 %8 = load i32* %2, align 4, !dbg !23 %9 = sub i32 %8, 1, !dbg !23 %10 = call i32 @fib(i32 %9), !dbg !24 %11 = load i32* %2, align 4, !dbg !25 %12 = sub i32 %11, 2, !dbg !25 %13 = call i32 @fib(i32 %12), !dbg !26 %14 = add i32 %10, %13, !dbg !24 store i32 %14, i32* %f, align 4, !dbg !22 %15 = load i32* %f, align 4, !dbg !27 store i32 %15, i32* %1, !dbg !28 br label %16, !dbg !28 [...] !22 = !mdlocation(line: 5, column: 14, scope: !4) !23 = !mdlocation(line: 5, column: 22, scope: !4) !24 = !mdlocation(line: 5, column: 18, scope: !4) !25 = !mdlocation(line: 5, column: 35, scope: !4) !26 = !mdlocation(line: 5, column: 31, scope: !4) !27 = !mdlocation(line: 6, column: 12, scope: !4) !28 = !mdlocation(line: 6, column: 5, scope: !4)
the location metadata points beginning of expression. assignment, instance, left hand side specifier f
@ line 5 column 14. seen in !dbg !24
, might still ambiguous, unfortunately.
there has been 1 more change: access getline()
, getcolumn()
fail if no debug metadata attached instruction. debugloc
class offers convenient way check this:
const debugloc &location = instruction->getdebugloc(); if (location) { // location.getline() // location.getcol() } else { // no location metadata available }