There are (at least) two kinds of casts in LLVM IR: BitCastInst and bitcast values. You have the later. Fortunately, there is a method for retrieving the original value within the bitcast: stripPointerCasts(). It took me sometime to figure out this distinction.
Here is my usage of the routine, where I was trying to identify the function called (BasicBlock::iterator I):
if (CallInst *ci = dyn_cast<CallInst>(&*I)) {
Function *f = ci->getCalledFunction();
if (f == NULL)
{
Value* v = ci->getCalledValue();
f = dyn_cast<Function>(v->stripPointerCasts());
if (f == NULL)
{
continue;
}
}
const char* fname = f->getName().data();
Answer from Brian on Stack OverflowThere are (at least) two kinds of casts in LLVM IR: BitCastInst and bitcast values. You have the later. Fortunately, there is a method for retrieving the original value within the bitcast: stripPointerCasts(). It took me sometime to figure out this distinction.
Here is my usage of the routine, where I was trying to identify the function called (BasicBlock::iterator I):
if (CallInst *ci = dyn_cast<CallInst>(&*I)) {
Function *f = ci->getCalledFunction();
if (f == NULL)
{
Value* v = ci->getCalledValue();
f = dyn_cast<Function>(v->stripPointerCasts());
if (f == NULL)
{
continue;
}
}
const char* fname = f->getName().data();
As explained in the previous question asking the same thing [marginally different], you are better off using the AST form that the Clang compiler produces, rather than the LLVM IR form. It is a much more direct representation of the C or C++ code than the LLVM IR, and easier to work with in general.
But from the StoreInst you can use getValueOperand to get the value that is being stored, and then getName of the value. Of course, like I also said in comments the previous answer, it's not very hard to make the code hard to derive what the original value stored was.
In otherwords, if we have an llvm::Instruction *inst, we could do this:
if (llvm::StoreInst* si = llvm::dyn_cast<llvm::StoreInst>(inst))
{
std::string name = si->getValueOperand()->getName();
}
[Code is not tested, not compiled, no guarantee provided, I just wrote it as part of this answer with the intention that it may work]
Instruction* I = ...
if (isa<CallInst>(I)) {
StringRef name = cast<CallInst>(I).getCalledFunction().getName();
...
}
For more information on this, see the relevant section in LLVM Programmer's Manual. In general, I wholeheartedly recommend this guide for beginners.
Instruction is the common base class for all LLVM instructions.
CallInst is a subclass of Instruction for call instructions.
If you have Instruction *inst, you can get a CallInst by
CallInst *ci = cast<CallInst>(inst);
You may reuse Mangler from LLVMCore.
Here is an example of usage:
std::string mangledName;
raw_string_ostream mangledNameStream(mangledName);
Mangler::getNameWithPrefix(mangledNameStream, "add", m->getDataLayout());
// now mangledName contains, well, mangled name :)
libstdc++ has a nice demangling library, just include cxxabi.h
Then you can change Function *call = m->getFunction("_Z3addv");
to
int status;
Function *call = m->getFunction(abi::__cxa_demangle("_Z3addv"), nullptr, nullptr, &status);
This is part of the debug information that's attached to LLVM IR in the form of metadata. Documentation is here. An old blog post with some background is also available.
$ cat > z.c
long fact(long arg, long farg, long bart)
{
long foo = farg + bart;
return foo * arg;
}
$ clang -emit-llvm -O3 -g -c z.c
$ llvm-dis z.bc -o -
Produces this:
define i64 @fact(i64 %arg, i64 %farg, i64 %bart) #0 {
entry:
tail call void @llvm.dbg.value(metadata !{i64 %arg}, i64 0, metadata !10), !dbg !17
tail call void @llvm.dbg.value(metadata !{i64 %farg}, i64 0, metadata !11), !dbg !17
tail call void @llvm.dbg.value(metadata !{i64 %bart}, i64 0, metadata !12), !dbg !17
%add = add nsw i64 %bart, %farg, !dbg !18
tail call void @llvm.dbg.value(metadata !{i64 %add}, i64 0, metadata !13), !dbg !18
%mul = mul nsw i64 %add, %arg, !dbg !19
ret i64 %mul, !dbg !19
}
With -O0 instead of -O3, you won't see llvm.dbg.value, but you will see llvm.dbg.declare.
Given a Value, getting variable name from it can be done by traversing all the llvm.dbg.declare and llvm.dbg.value calls in the enclosing function, checking if any refers to that value, and if so, return the DIVariable associated with the value by that intrinsic call.
So, the code should look something like (roughly, not tested or even compiled):
const Function* findEnclosingFunc(const Value* V) {
if (const Argument* Arg = dyn_cast<Argument>(V)) {
return Arg->getParent();
}
if (const Instruction* I = dyn_cast<Instruction>(V)) {
return I->getParent()->getParent();
}
return NULL;
}
const MDNode* findVar(const Value* V, const Function* F) {
for (const_inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
const Instruction* I = &*Iter;
if (const DbgDeclareInst* DbgDeclare = dyn_cast<DbgDeclareInst>(I)) {
if (DbgDeclare->getAddress() == V) return DbgDeclare->getVariable();
} else if (const DbgValueInst* DbgValue = dyn_cast<DbgValueInst>(I)) {
if (DbgValue->getValue() == V) return DbgValue->getVariable();
}
}
return NULL;
}
StringRef getOriginalName(const Value* V) {
// TODO handle globals as well
const Function* F = findEnclosingFunc(V);
if (!F) return V->getName();
const MDNode* Var = findVar(V, F);
if (!Var) return "tmp";
return DIVariable(Var).getName();
}
You can see above I was too lazy to add handling of globals, but it's not that big a deal actually - this requires iterating over all the globals listed under the current compile unit debug info (use M.getNamedMetadata("llvm.dbg.cu") to get a list of all the compile units in the current module), then checking which matches your variable (via the getGlobal method) and returning its name.
However, keep in mind the above will only work for values directly associated with original variables. Any value that is a result of any computation will not be properly named this way; and in particular, values that represent field accesses will not be named with the field name. This is doable but requires more involved processing - you'll have to identify the field number from the GEP, then dig into the type debug information for the struct to get back the field name. Debuggers do that, yes, but no debugger operates in LLVM IR land - as far as I know even LLVM's own LLDB works differently, by parsing the DWARF in the object file into Clang types.
That %0 is the instruction's name, not a register name - there are no registers in the LLVM intermediate representation.
In any case, all instructions inherit from the Value class which defines a getName() method, and that's what you should call. However, keep in mind that typically many instruction will be unnamed and thus getName() won't return anything useful - names such as %0 are only assigned when emitting the module as text, and do not exist before that.
The first thing is that %0 is just a label. If we want to explicitly give it a name, there is an LLVM pass called instnamer. The following cmd I used to explicitly give the name for each label using instnamer pass
$ clang++ -std=c++11 -g -emit-llvm -c hello.c -o hello.bc
$ opt -instnamer -load ../your/path/to/library.so -passname <hello.bc> hello.bc
Then in your LLVM pass i.e LLVM API :
if (LoadInst *loadInst = dyn_cast<LoadInst>(&I)) {
loadInst->dump();
errs()<<loadInst->getName(); // This is your %temp0 not %0 anymore the pass explicitly rewritten the LLVM IR to %temp0.
}
Hope this helps..