Function/Type database (DBJSON) provides a possibility to gather information regarding dereference information used in the code of a given function. It is implemented as a derefs
entry in the function array of DBJSON. It is a list of dereference information records and has the following format:
"derefs": [
{
"kind": dereference kind; this is "unary", "array", "member", "function", "assign", "init", "offsetof", "return", "parm", "cond" or "logic"
"offset": computed constant offset value in the dereference expression [not present for the "member" and "return" kind]
for the "function" kind this is an index into the 'callrefs'+'refcallrefs' concatenated array of call information for this function
for the "assign" kind this is a value that describes the kind of this assignment operator
for the "parm" kind this is an index of the function call argument this entry pertains to
for the "init" kind this field indicates how many elements (in case of structure type initializer list) have been actually initialized
for the "offsetof" kind this is the computed offset value from a given member expression (or -1 if the value cannot be computed statically)
for the "cond" kind this is the associated compound statement id in 'csamp'
for the "logic" kind this is a value that describes the kind of this operator
"basecnt": number of variables referenced in the base array expression (most of the times the value is 1) [only present for the "array" kind]
number of variables referenced on the left-hand-side expression [only present for the "logic" kind]
"offsetrefs" : variables referenced in the dereference offset expression
for the "array" kind the first 'basecnt' elements describe the base variable of the array
for the "init" kind the first element describes the variable being initialized
for the "assign" kind the first element describes the variable expression being assigned to
"expr": full dereference expression written in plain text combined with the location of the expression in the source (i.e. "[<loc>]: <expr>")
"ord": indicates order of expressions within function (not present in the examples below)
"csid": compound statement id where this dereference expression was taken
// The following entries are present only for the "member" and "offsetof" kinds
"member" : list of member references for the variable being dereferenced for "member" kind
list of field references (or -1 for array offset expressions) for the "offsetof" kind
"type": type of the corresponding member reference (through outermost cast or field type)
// The following entries are present only for the "member" kind
"access": member expression kind, i.e.
variable object access (.): 0
variable pointer access (->): 1
"shift": computed constant offset value for the corresponding member reference
"mcall": index into the 'callrefs'+'refcallrefs' concatenated array of call information for this function for corresponding function call through member reference
or -1 if there is no function call for a given member [this field is only present when there is at least one call for some of its members]
},
(...)
]
where the offsetrefs
single entry has the following format:
"offsetrefs": {
"kind": scope of the referenced variable, i.e.:
"global", "local", "parm", // ordinary variable id
"integer", "float", "string", "address", // literal values
"callref", "refcallref", "addrcallref", // function call information
"unary", "array", "member", "assign", "function" or "logic" // variable-like expressions
"unary", "array", "member", "assign" and "logic" kind variables are actually indexes into parent array with corresponding dereference entry
"id": unique id of the referenced variable;
for "address" or "integer" type this is the integer value extracted from original expression
for "callref", "refcallref" or "addrcallref" type this is an index into the 'callrefs'+'refcallrefs' concatenated array of call information for this function
for "unary", "array", "member" or "assign" kinds this is an index into 'derefs' array for this function that points to the appropriate entry of this kind
for "function" this is the function id for the function reference
"mi": index of the member reference in the member expression chain (or offsetof array expression) this particular variable contributes to [only present for the "member" or "offsetof" kind of the parent entry]
"di": index into parent array with corresponding dereference entry [only present for "refcallref" kind] or constant address used as a base for the function call [only present for the "addrcallref" kind]
"cast": type of the cast used directly on the expression for the referenced variable
for the first variable expression referenced for the parent "assign" entry this is the type of the variable expression
}
The different kind values for logic operator are:
{
'<=>': 9,
'<': 10,
'>': 11,
'<=': 12,
'>=': 13,
'==': 14,
'!=': 15,
'&': 16,
'^': 17,
'|': 18,
'&&': 19,
'||': 20
}
The different kind values for assignment operator are:
{
'=': 21,
'*=': 22,
'/=': 23,
'%=': 24,
'+=': 25,
'-=': 26,
'<<=': 27,
'>>=': 28,
'&=': 29,
'^=': 30,
'|=': 31
}
Let's see how it works through extensive examples. Consider the following code:
struct A;
struct B;
struct C;
struct B* getB(char c, float f) {
return 0;
}
typedef struct B* (*pfun_t)(char c, float f);
typedef int (*pfi_t)(void);
typedef void* (*pfv_t)(void);
struct A {
int i;
void* p;
struct B* pB;
pfun_t pF;
};
struct B {
int i;
char T[10];
void* p;
struct A a;
struct A Ta[4][4];
struct C* pC;
};
struct C {
float f;
void* p;
unsigned long* pul;
struct B b;
struct A* pA;
union {
void *arg;
int* B;
};
union {
void *arg;
int* B;
} N;
};
struct A gA;
unsigned long gi;
int getN(void) {
return 0;
}
void* getV(void) {
return 0;
}
struct B* (*pfun)(char c, float f);
int (*pfi)(void);
void* (*pfv)(void);
void f(int* px, char b) {
int i = 2;
char T[10] = {};
int** ppx = &px;
struct A oA;
struct B* pB = 0;
struct B** ppB = &pB;
void* q = pB;
void** pq = &q;
pfun = getB;
pfi = getN;
pfv = getV;
pfun_t F[2] = { pfun, pfun };
(void) getB('s',6.);
(void) *px; // (1)
(void) *(px+3*2); // (2)
(void) *((4+1)+3-(1+2)+px); // (3)
(void) *(px+2+2*gi+b-i); // (4)
(void) *(px+((void*)&q-(void*)pB)); // (5a)
(void) *(px+((void*)400-(void*)300)+100); // (5b)
(void) *((int*)400); // (6)
(void) *((int*)0+getN()+pB->i); // (7)
(void) *(px+({ do {} while(0); 4;})); // (8)
(void) *(px+({ do {} while(0); 4+gi*getN()-pB->i;})); // (9)
(void) **ppx; // (10)
(void) *(*ppx+4); // (11)
(void) T[4]; // (12)
(void) ({do {} while(0); (struct A*)0+gi;})[4+i]; // (13)
(void) 4[T]; // (14)
(void) T[+(4+1)+3+(1+2)]; // (15)
(void) T[-3]; // (16)
(void) T[gi+2+1]; // (17)
(void) T[getN()+pB->i]; // (18)
(void) ( *(*ppx+4+T[2]-pB->i+(2*3&0xFF)-1*0)+((pB->i)) ); // (19)
(void) T[getN()+pB->i*({ do {} while(0); 4+gi*getN()-pB->i;})]; // (20)
(void) oA.p; // (21)
(void) (&oA)->i; // (22)
(void) ((struct B*)q)->i; // (23)
(void) (pB->pC+4+gi)->f; // (24)
(void) ((struct C*)((pB+4+gi)->p)+gi+2)->f; // (25)
(void) ((struct C*)((4+2+pB)->p)+gi+2)->f; // (26)
(void) ((struct C*)(((struct B*)(12+4+16))->p)+gi+2)->f; // (27)
(void) ((struct A*)((struct C*)((struct A*)pB->p)->p)->p+2+gi)->i; // (28)
(void) ((struct A*)oA.pB->pC->pA->pB->pC->pA->pB->pC->p)->i; // (29)
(void) *((int*)oA.p); // (30)
(void) *(pB->pC->pul); // (31)
(void) *( (*(pB->pC)).arg ); // (32)
(void) *( (*(pB->pC)).B ); // (33)
(void) (*(pB->pC)).N.arg; // (34)
(void) (*(pB->pC)).N.B; // (35)
(void) pB->T[4]; // (36)
(void) pB->T[4+(2+gi)]; // (37)
(void) *(pB->pC->pul+2+(3+1)+(((
((struct B*)((int*)(&oA)+sizeof(int)+sizeof(void*)))->T[4] )))); // (38)
(void) T[i+1+2+*px-
((struct A*)(void*)(struct A*)(((struct B*)(pB->pC->p))->p))->i]; // (39)
(void) getB('x',3.0)->a.i; // (40)
(void) (*pfun)('x',3.0)->a.i; // (41)
(void) (*getB)('x',3.0)->a.i; // (42)
(void) pfun('x',3.0)->a.i; // (43)
(void) (*pfun)('x',3.0); // (44)
(void) pfun('x',3.0); // (45)
(void) (*getB)('s',5.); // (46)
(void) (*({do {} while(0); (struct A*)0+gi; pfun;}))('x',3.0)->a.i; // (47)
(void) (*({do {} while(0); (struct A*)0+gi; getB;}))('x',3.0)->a.i; // (48)
(void) (*({do {} while(0); (struct A*)0+gi; pfun;}))('x',3.0); // (49)
(void) (*({do {} while(0); (struct A*)0+gi; getB;}))('x',3.0); // (50)
(void) (*F[1])('x',3.0)->a.i; // (51)
(void) (*F[1])('x',3.0); // (52)
(void) F[1]('x',3.0)->a.i; // (53)
(void) F[1]('x',3.0); // (54)
(void) ((pfun_t)(3333+1))('x',3.0)->a.i; // (55)
(void) ((pfun_t)(3333+1))('x',3.0); // (56)
(void) oA.pF('x',3.0); // (57)
(void) ((struct A*)((struct C*)oA.p)->p)->pF('@',1.5); // (58)
(void) oA.pF('x',3.0)->i; // (59)
(void) ((struct A*)oA.pF('x',3.0)->p)->pF('u',999.1)->i; // (60)
(void) ((pfun_t)oA.pB->p)('x',16.5)->i; // (61)
(void) (0 ? (struct B *)0 : (pB))->p; // (62)
(void) ((1+313) ? (struct B *)0 : (pB))->p; // (63)
(void) (*px ? (struct B *)0 : (pB))->p; // (64)
(void) ((struct B*)0 ? : (pB))->p; // (65)
(void) ((struct B*)(1+313) ? : (pB))->p; // (66)
(void) ((struct B*)*px ? : (pB))->p; // (67)
(void) ((struct B *)30)->a; // (68)
(void) ((struct A){.i=3,.pB=oA.pB+gi}).pB->a; // (69a)
(void) ((struct A){.i=(long)(3+1),.pB=(void*)oA.pB+(short)gi}).pB->a; // (69b)
(void) ({do {} while(0); (struct A*)0+gi;})->i; // (70)
(void) (((&((&oA)->pB+4)->a)+gi+pB->i)->pB->pC+10*T[9])->f; // (71)
(void) (&((&oA)->pB+({do {} while(0); (int)10+gi; }))->a)->pB->p; // (72)
(void) ({ ((void)(sizeof ((long)(0 && getN())))); getB(0,0); })->p; // (73)
(void) (*((struct B**)q))->i; // (74)
(void) *((unsigned char*)px + (long)gi - (unsigned long)2 +
(signed int)pB->i + (int)T[4] + ((unsigned)*px+1) + (long long)getV()); // (75)
(void) ((struct A*)(oA.pB) ? : ((struct A*)pB))->p; // (76)
(void) ((struct A*)(oA.pB) ? (&oA) : ((struct A*)pB))->p; // (77)
(void) ( *( ((struct A*)(oA.pB) ? (&oA) : ((struct A*)pB)) ) ).i; // (78)
(void) ((struct A*)(oA.pB) ?
(&oA) : (i?(((struct A*)pB)):((struct A*)pfv())))->p; // (79)
(void) ( (struct A*)
(((struct A*)((struct C*)((struct A*)pB->p)->p)->p+2+gi)->i ?
(struct B *)0 : (unsigned long)(*(px+((void*)&q-(void*)pB)) +
(int)T[getN()+(long)pB->i]) ? (pB) : ((void*)ppB)))->p; // (80)
int vi0 = 0; // (81)
int vi1 = (int)1.0; // (82)
unsigned vu0 = 2; // (83)
void* vq0 = q; // (84)
void* vq1 = (struct A*)pB; // (85)
void* vq2 = pB; // (86)
void* vq3 = getB('a',3.); // (87)
int vi2 = (*getN)(); // (88)
unsigned long vul0 = (long)pfi(); // (89)
unsigned long vul1 = (*pfi)(); // (90)
int* vpi0 = "ABRAKADABRA"; // (91)
unsigned vu1 = i+*((int*)pB->p)-(long)oA.i; // (92)
struct A vA0 = ((struct A){.i=(long)3,.pB=oA.pB->pC+gi}); // (93)
double vd0; vd0 = 4UL; // (94)
pB->pC->p = pB; // (95)
long vl0 = 0xff; vl0&=0x3f; // (96)
unsigned long vul3 = (long)*((int*)pB->p)-(short)T[2]+(int)(gi+=3); // (97)
unsigned long ul = 4+__builtin_offsetof(struct C,b.Ta[4][oA.i].p); // (98)
if(pfi) pfi(); // (99)
while(i<10) i++; // (100)
}
The first kind of dereference is ordinary pointer dereference (through so called unary expression).
In the (1) case we have mere indirection with offset 0. We are referencing a local variable (function parameter in this case) with id=0 in the locals entry of the given function.
The underlying JSON entry might have the following format:
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*px"
}
In the (2) case the indirection offset is computed to 6.
The offset expression can be more complicated as in (3). The offset
field will be populated with the sum of the offset expressions for which it is possible to compute the value at compile time (i.e. integer constant expression). In the (3) case we are able to compute the offset value 5.
Offset expression might have many variables as in (4). In that case references to all variables can be found in offsetrefs
list:
{
"kind" : "unary",
"offset" : 2,
"offsetrefs" : [
{
"kind" : "local",
"id" : 2
},
{
"kind" : "parm",
"id" : 1
},
{
"kind" : "global",
"id" : 1
},
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*(px + 2 + 2 * gi + b - i)"
}
Here we are referencing some global variable, automatic (local) variable created inside function and two function parameters.
Please note that in general it is difficult to resolve which variable is the actual pointer variable in offset expression. In most cases it should be a variable of pointer type. There might be cases however (as in (5a)) where multiple pointer variables are involved:
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
},
{
"kind" : "local",
"cast" : 3,
"id" : 8
},
{
"kind" : "local",
"cast" : 3,
"id" : 6
}
],
"expr" : "*(px + ((void *)&q - (void *)pB))"
}
Here we cannot know (without inspecting the dereference expression) whether the base pointer variable is px
or pB
. Both expressions that involve variables q
and pB
are casted to void*
hence the value 3 in the cast
field.
Whenever the value of the expression used as a part of the sum that constitute the dereference address can be precomputed (as in (5b)) it becomes a part of the offset:
{
"kind" : "unary",
"offset" : 200,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*(px + ((void *)400 - (void *)300) + 100)"
}
No cast information is available at this point.
In some low-level routines it is useful to use pointer dereference on direct address value like in (6). It is implemented as an address
variable kind:
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "address",
"cast" : 12,
"id" : 400
}
],
"expr" : "*((int *)400)"
}
The cast information for the pointer type for direct address value is available as well.
It is also possible to use a function call or a member expression inside the dereference offset expression as in (7).
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "address",
"cast" : 12,
"id" : 0
},
{
"kind" : "callref",
"id" : 0
},
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*((int *)0 + getN() + pB->i)"
}
Here we have two additional types of variables inside the offsetrefs
list. For function call we have callref
kind which points to the calls
array of the JSON information for this function. For member expression we have member
kind which points to the appropriate member
kind entry in the dereference information list (more on that later).
Original GNU extension to the C language allows to use very peculiar form of expression, called statement expression. It defines a series of statements embedded into ({})
and returns the value of the last expression. This is very widely used in the Linux kernel source, especially in safe macro handling, i.e. evaluating its macro operands only once. When statement expression is used inside the dereference offset expression all variables used in the value yielding expression are reported through the offsetrefs
list. For example for (8) we have:
{
"kind" : "unary",
"offset" : 4,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*(px + ({\n do {\n } while (0);\n 4;\n}))"
}
For (9) it could be:
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "global",
"id" : 1
},
{
"kind" : "callref",
"id" : 0
},
{
"kind" : "member",
"id" : 0
},
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*(px + ({\n do {\n } while (0);\n 4 + gi * getN() - pB->i;\n}))"
}
For nested pointer dereference (10) there is a unary
kind for the referenced variable which points to the relevant pointer dereference information for this function:
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 4
}
],
"expr" : "*ppx"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 0
}
],
"expr" : "**ppx"
}
Constant expressions in offset expression (11) will populate the offset
field as described in previous examples:
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 4
}
],
"expr" : "*ppx"
},
{
"kind" : "unary",
"offset" : 4,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 0
}
],
"expr" : "*(*ppx + 4)"
}
The second kind of dereference is an array access (through so called array subscript expression). This is handled in similar fashion to ordinary pointer dereference with tiny modifications. The variable that describes the array itself is placed at the beginning of offsetrefs
list as in (12):
{
"kind" : "array",
"offset" : 4,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[4]"
}
More variables can constitute the array (which can be found in statement exression) as in (13):
{
"kind" : "array",
"offset" : 4,
"basecnt" : 2,
"offsetrefs" : [
{
"kind" : "global",
"id" : 1
},
{
"kind" : "address",
"cast" : 11,
"id" : 0
},
{
"kind" : "local",
"id" : 2
}
],
"expr" : "({\n do {\n } while (0);\n (struct A *)0 + gi;\n})[4 + i]"
}
The basecnt
variable in the main dereference entry suggests how many variables from the offsetrefs
list constitute the array (which are placed at the beginning of the list).
Weird and obscure expressions for array usage (14) are handled properly as well as constant expression computation (15),(16):
{
"kind" : "array",
"offset" : 11,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[+(4 + 1) + 3 + (1 + 2)]"
}
{
"kind" : "array",
"offset" : -3,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[-3]"
}
For (17) it could be:
{
"kind" : "array",
"offset" : 3,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
},
{
"kind" : "global",
"id" : 1
}
],
"expr" : "T[gi + 2 + 1]"
}
For (18) it could be:
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "array",
"offset" : 0,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
},
{
"kind" : "member",
"id" : 0
},
{
"kind" : "callref",
"id" : 0
}
],
"expr" : "T[getN() + pB->i]"
}
To save space whenever some dereference expression is encountered and at the same time the same expression (semantically) was already processed it is not added to the derefernce list unless it is referenced by some other dereference entry. For example for (19):
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 4
}
],
"expr" : "*ppx"
},
{
"kind" : "unary",
"offset" : 10,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
},
{
"kind" : "array",
"id" : 4
},
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "*(*ppx + 4 + T[2] - pB->i + (2 * 3 & 255) - 1 * 0)"
},
{
"kind" : "array",
"offset" : 2,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[2]"
}
The use of the ((pB->i))
expression is semantically equivalent to the pB->i
encountered earlier in the (19) therefore it is not added to the dereference entries. However in (20):
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "array",
"offset" : 0,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
},
{
"kind" : "global",
"id" : 1
},
{
"kind" : "callref",
"id" : 0
},
{
"kind" : "member",
"id" : 0
},
{
"kind" : "member",
"id" : 1
},
{
"kind" : "callref",
"id" : 1
}
],
"expr" : "T[getN() + pB->i * ({\n do {\n } while (0);\n 4 + gi * getN() - pB->i;\n})]"
}
The second use of the pB->i
expression is added to the dereference entries (even though it is semantically equivalent to its first encounter) because it is referenced by the main array subscript expression for T
.
The third kind of dereference is a member expression. In other words it is getting the value of a structure member (possibly dereferencing memory when ->
operator is involved). To keep the information in order the object like expression (using .
operator) is reported as well. The simplest form of member expression is shown in (21) which translates to:
{
"member" : [ 1 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.p"
}
Here we are using the member p
with index 1 in the struct A
structure (member
entry list). The type id of the struct A
structure is 0 which is stored in the type
entry list. We are using object access (.
) which translates to 0 inside the access
entry list. The member is accessed directly without any memory offset (hence the 0 in the shift
entry list). The base for the member expression is a local variable with id 5 (oA
).
In another example (22) we have slight modification to use the pointer access (->
). The type for the base structure changes as well as the access
specifier.
{
"member" : [ 0 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "(&oA)->i"
}
Sometimes there is cast required to properly designate the type of member expression. For example in (23) the q
variable has type void*
. In order to make a member expression we need to cast it to proper type beforehand. The ultimate type will be placed in type
entry list as below:
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"cast" : 16,
"id" : 8,
"mi" : 0
}
],
"expr" : "((struct B *)q)->i"
}
We can also apply some offset value to the pointer in the structure member (24):
{
"member" : [ 4,0 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 0,4 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 1
},
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "(pB->pC + 4 + gi)->f"
}
In the second member expression in the chain (pC + 4 + gi
) we have constant offset 4 which is placed in the shift
entry list at index 1. Furthermore global variable gi
is also used in the offset expression. We can find it by looking at the offsetrefs
array and finding variables with mi
value 1. All these variables takes part in the offset computation for the second member expression in the chain (one global variable in our case).
More examples can be found in (25-29):
{
"member" : [ 2,0 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 4,2 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 1
},
{
"kind" : "global",
"id" : 1,
"mi" : 0
},
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct C *)((pB + 4 + gi)->p) + gi + 2)->f"
}
All member expressions are done through the pointer (->
) hence access values are set to 1. In the first member expression in the chain we have offset 4 (index 0 in shift
entry list) and two variables with mi
set to 0. The same global variable (mi
set to 1) is used in the second member expression in the chain with additional offset 2 (index 1 in shift
entry list).
{
"member" : [ 2,0 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 6,2 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 1
},
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct C *)((4 + 2 + pB)->p) + gi + 2)->f"
}
Similar as above two offset values are computed and placed in the shift
entry list. The type for first member expression in the chain is set to struct B*
as original variable pB
designates (first type
entry set to 16), the second member expression type is struct C*
that originates from the cast used (second type
entry set to 15).
{
"member" : [ 2,0 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 0,2 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 1
},
{
"kind" : "address",
"cast" : 16,
"id" : 32,
"mi" : 0
}
],
"expr" : "((struct C *)(((struct B *)(12 + 4 + 16))->p) + gi + 2)->f"
}
Here the first member expression is based on the integer values hence the address
kind for offsetrefs
variable with mi
set to 0.
{
"member" : [ 2,1,1,0 ],
"type" : [ 16,11,15,11 ],
"access" : [ 1,1,1,1 ],
"shift" : [ 0,0,0,2 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 3
},
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct A *)((struct C *)((struct A *)pB->p)->p)->p + 2 + gi)->i"
}
Here we have global variable (and integer value) used in the offset for the last member expression in the chain (mi
set to 3, last shift
value set to 2). Local variable in offsetrefs
with mi
set to 0 designates the original varialbe pB
which is the base for the member expression.
{
"member" : [ 2,4,4,2,4,4,2,4,1,0 ],
"type" : [ 0,16,15,11,16,15,11,16,15,11 ],
"access" : [ 0,1,1,1,1,1,1,1,1,1 ],
"shift" : [ 0,0,0,0,0,0,0,0,0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "((struct A *)oA.pB->pC->pA->pB->pC->pA->pB->pC->p)->i"
}
In this case only the first member expression in the chain is not done through the pointer (hence the 0 value in the access
entry list).
When the member expression is used inside the unary dereference it is referred through the member
kind of offsetrefs
variable as in (30) and (31):
{
"member" : [ 1 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.p"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"cast" : 12,
"id" : 0
}
],
"expr" : "*((int *)oA.p)"
}
{
"member" : [ 4,2 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC->pul"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*(pB->pC->pul)"
}
In both the cases above the referred member expression is first in the dereference list for this function hence the id (which represents index in the main dereference list) set to 0.
When member expression refers to a member of an union (as in (32) and (33)) there are two entries in the member
entry list (and all remaining relevant entries as well):
{
"member" : [ 4 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC"
},
{
"member" : [ 5,0 ],
"type" : [ 7,13 ],
"access" : [ 0,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2,
"mi" : 0
}
],
"expr" : "(*(pB->pC)).arg"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*(pB->pC)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 1
}
],
"expr" : "*((*(pB->pC)).arg)"
}
First entry points to the type definition for the union in the enclosing type (index 5 in the type id 7 which is struct C
). Second entry points to the actual member of the union (index 0 in the type id 13 which is the anonymous union definition).
{
"member" : [ 4 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC"
},
{
"member" : [ 5,1 ],
"type" : [ 7,13 ],
"access" : [ 0,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2,
"mi" : 0
}
],
"expr" : "(*(pB->pC)).B"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*(pB->pC)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 1
}
],
"expr" : "*((*(pB->pC)).B)"
}
In similar fashion second member of the union is accessed here.
Even though the members of the anonymous union are accessed as internal parts of the anonymous type itself they still takes part in the index computation for the members of the enclosig type. For example in (34) and (35):
{
"member" : [ 4 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC"
},
{
"member" : [ 7,0 ],
"type" : [ 7,14 ],
"access" : [ 0,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2,
"mi" : 0
}
],
"expr" : "(*(pB->pC)).N.arg"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*(pB->pC)"
}
The index value for the member N
of the struct C
type is 7 (the union members leak into the enclosing parent type). Similar situation happens below:
{
"member" : [ 4 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC"
},
{
"member" : [ 7,1 ],
"type" : [ 7,14 ],
"access" : [ 0,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2,
"mi" : 0
}
],
"expr" : "(*(pB->pC)).N.B"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "*(pB->pC)"
}
The member expression can resolve to array as in (36-39). In such cases the member expression is referred in the enclosing array dereference entry.
{
"member" : [ 1 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->T"
},
{
"kind" : "array",
"offset" : 4,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "pB->T[4]"
}
Here the array entry with offset 4 refers to underlying member expression in its first offsetrefs
entry.
{
"member" : [ 1 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->T"
},
{
"kind" : "array",
"offset" : 4,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
},
{
"kind" : "global",
"id" : 1
}
],
"expr" : "pB->T[4 + (2 + gi)]"
}
Here'a a bit more complicated expression with computed constant offset 6 and additional global variable referenced therein. The base variable for the array is also a reference to a member expression.
{
"member" : [ 1 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 12 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "((struct B *)((int *)(&oA) + sizeof(int) + sizeof(void *)))->T"
},
{
"member" : [ 4,2 ],
"type" : [ 16,15 ],
"access" : [ 1,1 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->pC->pul"
},
{
"kind" : "unary",
"offset" : 6,
"offsetrefs" : [
{
"kind" : "member",
"id" : 1
},
{
"kind" : "array",
"id" : 3
}
],
"expr" : "*(pB->pC->pul + 2 + (3 + 1) + (((((struct B *)((int *)(&oA) + sizeof(int) + sizeof(void *)))->T[4]))))"
},
{
"kind" : "array",
"offset" : 4,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "member",
"id" : 0
}
],
"expr" : "((struct B *)((int *)(&oA) + sizeof(int) + sizeof(void *)))->T[4]"
}
In this example the array dereference is a part of much more complicated unary dereference. The base member expression contains computed offset (shift
entry list) using the sizeof
operator.
{
"member" : [ 4,1,2,0 ],
"type" : [ 16,15,16,11 ],
"access" : [ 1,1,1,1 ],
"shift" : [ 0,0,0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct A *)(void *)(struct A *)(((struct B *)(pB->pC->p))->p))->i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*px"
},
{
"kind" : "array",
"offset" : 3,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
},
{
"kind" : "local",
"id" : 2
},
{
"kind" : "unary",
"id" : 1
},
{
"kind" : "member",
"id" : 0
}
],
"expr" : "T[i + 1 + 2 + *px - ((struct A *)(void *)(struct A *)(((struct B *)(pB->pC->p))->p))->i]"
}
When several casts are used in the base for the member expression the outermost cast is used to extract the base type for the member expression.
The base type for the member expression can also originate from a function call as in (40):
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "callref",
"id" : 0,
"mi" : 0
}
],
"expr" : "getB('x', 3.)->a.i"
}
In this case in the offsetrefs
list we will have a callref
kind variable. The id
field will point to the appropriate entry in the callrefs
+refcallrefs
concatenated array of call information for this function which describes relevant function call information.
It is a little bit different when at the base of offset expression is function call through pointer as in (41):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "(*pfun)('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "refcallref",
"id" : 1,
"mi" : 0,
"di" : 2
}
],
"expr" : "(*pfun)('x', 3.)->a.i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "*pfun"
}
In this case in the offsetrefs
list we will have a refcallref
kind variable. The id
field will point to the appropriate entry in the callrefs
+refcallrefs
concatenated array of call information for this function which describes relevant function call information. As refcallrefs
array doesn't have information about the pointer to function variable used to make the actual call the di
entry for the refcallref
kind variable in the offsetrefs
list will point to the appropriate dereference entry that clearly indicates the variable used.
It is worth notice that pointer through function call is a separate kind of dereference apart form mere indirection for a plain variable which has its own entry in the deference information array as the function
kind.
Strangely enough it is also possible to call normal function using the *
operator as in (42):
{
"kind" : "function",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "(*getB)('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "callref",
"id" : 0,
"mi" : 0
}
],
"expr" : "(*getB)('x', 3.)->a.i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
],
"expr" : "*getB"
}
In this case the call is detected as a normal call and appropriate callref
kind entry is added to offsetrefs
.
C language also allows to make a function call without the *
operator as in (43):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "pfun('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "refcallref",
"id" : 1,
"mi" : 0,
"di" : 0
}
],
"expr" : "pfun('x', 3.)->a.i"
}
This is where the additional function
kind dereference information starts to play a role. The di
entry in the offsetrefs
list will point to this record which is the only way to indicate the underlying pointer to function variable used to make the actual call.
When function call through pointer is not a base of member expression but just normal call in the source, the function
kind dereference information is the place to look for details about the actual call as in (44) and (45):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 1
}
],
"expr" : "(*pfun)('x', 3.)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "*pfun"
}
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "pfun('x', 3.)"
}
It will point to the variable used to make the call through the offsetrefs
list. Furthermore the offset
field in the function
kind dereference entry will point to the appropriate entry in the callrefs
+refcallrefs
concatenated array of call information for this function which describes relevant function call information (which was also placed in the di
entry of member expression references which is now absent).
Additional function dereference on function name is handled as well (46):
{
"kind" : "function",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 1
}
],
"expr" : "(*getB)('s', 5.)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
],
"expr" : "*getB"
}
As if C was not simple enough, function call dereference for all the above cases can be also done through statement expression (47-50):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n pfun;\n}))('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "refcallref",
"id" : 1,
"mi" : 0,
"di" : 2
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n pfun;\n}))('x', 3.)->a.i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "*({\n do {\n } while (0);\n (struct A *)0 + gi;\n pfun;\n})"
}
{
"kind" : "function",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n getB;\n}))('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "callref",
"id" : 0,
"mi" : 0
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n getB;\n}))('x', 3.)->a.i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
],
"expr" : "*({\n do {\n } while (0);\n (struct A *)0 + gi;\n getB;\n})"
}
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 1
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n pfun;\n}))('x', 3.)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "global",
"id" : 2
}
],
"expr" : "*({\n do {\n } while (0);\n (struct A *)0 + gi;\n pfun;\n})"
}
{
"kind" : "function",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 1
}
],
"expr" : "(*({\n do {\n } while (0);\n (struct A *)0 + gi;\n getB;\n}))('x', 3.)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
],
"expr" : "*({\n do {\n } while (0);\n (struct A *)0 + gi;\n getB;\n})"
}
Function call can be done using function pointer taken from array (51-52):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2
}
],
"expr" : "(*F[1])('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "refcallref",
"id" : 1,
"mi" : 0,
"di" : 2
}
],
"expr" : "(*F[1])('x', 3.)->a.i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "array",
"id" : 3
}
],
"expr" : "*F[1]"
},
{
"kind" : "array",
"offset" : 1,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 8
}
],
"expr" : "F[1]"
}
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "unary",
"id" : 1
}
],
"expr" : "(*F[1])('x', 3.)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "array",
"id" : 2
}
],
"expr" : "*F[1]"
},
{
"kind" : "array",
"offset" : 1,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 8
}
],
"expr" : "F[1]"
}
It can be also a direct function call without the *
operator (53-54):
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "array",
"id" : 2
}
],
"expr" : "F[1]('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "refcallref",
"id" : 1,
"mi" : 0,
"di" : 2
}
],
"expr" : "F[1]('x', 3.)->a.i"
},
{
"kind" : "array",
"offset" : 1,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 8
}
],
"expr" : "F[1]"
}
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "array",
"id" : 1
}
],
"expr" : "F[1]('x', 3.)"
},
{
"kind" : "array",
"offset" : 1,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 8
}
],
"expr" : "F[1]"
}
Finally function can be called using direct address as in (55).
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "address",
"id" : 3334
}
],
"expr" : "((pfun_t)(3333 + 1))('x', 3.)"
},
{
"member" : [ 3,0 ],
"type" : [ 16,0 ],
"access" : [ 1,0 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "addrcallref",
"cast" : 19,
"id" : 1,
"mi" : 0,
"di" : 3334
}
],
"expr" : "((pfun_t)(3333 + 1))('x', 3.)->a.i"
}
When this happens as a base of member expression special addrcallref
kind of variable in the offsetrefs
list is used to indicate that. As usual the id
field points to the appropriate entry in the callrefs
+refcallrefs
concatenated array of call information for this function which describes relevant function call information. Furthermore the di
field contains the address value at which the call was made.
When address based call happens without the member expression as in (56) the address value is stored inside the address
kind variable in the offsetrefs
list.
{
"kind" : "function",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "address",
"id" : 3334
}
],
"expr" : "((pfun_t)(3333 + 1))('x', 3.)"
}
Function call can also happen in the middle of the chain inside the member expression (57-60):
{
"member" : [ 3 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"mcall" : [ 1 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pF('x', 3.)"
}
To handle that properly additional mcall
field is introduced in the member
kind of dereference information entry. When at least one function call is made inside the member expression chain past the base member expression this value will point to the appropriate entry in the callrefs
+refcallrefs
concatenated array of call information for this function which describes relevant function call information.
The concatenated array of call information might look like follows (when only case (57) is considered):
"callrefs": [
[
{
"type" : "char_literal",
"id" : 115
},
{
"type" : "float_literal",
"id" : 6
}
]
],
"refcallrefs": [
[
{
"type" : "char_literal",
"id" : 120
},
{
"type" : "float_literal",
"id" : 3
}
]
]
The mcall
index value 1 points to two function parameters, char with value 120 'x'
and float value 3.0
.
{
"member" : [ 1,1,3 ],
"type" : [ 0,15,11 ],
"access" : [ 0,1,1 ],
"shift" : [ 0,0,0 ],
"mcall" : [ -1,-1,1 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "((struct A *)((struct C *)oA.p)->p)->pF('@', 1.5)"
}
mcall
information equals to -1 for member expressions without the function call.
{
"member" : [ 3,0 ],
"type" : [ 0,16 ],
"access" : [ 0,1 ],
"shift" : [ 0,0 ],
"mcall" : [ 1,-1 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pF('x', 3.)->i"
}
Function calls in the middle of member exression chain holds as well.
{
"member" : [ 3,2,3,0 ],
"type" : [ 0,16,11,16 ],
"access" : [ 0,1,1,1 ],
"shift" : [ 0,0,0,0 ],
"mcall" : [ 1,-1,2,-1 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "((struct A *)oA.pF('x', 3.)->p)->pF('u', 999.10000000000002)->i"
}
There might be more than one function call in the member expression chain. The corresponding call referencen information below:
"callrefs": [
[
{
"type" : "char_literal",
"id" : 115
},
{
"type" : "float_literal",
"id" : 6
}
]
],
"refcallrefs": [
[
{
"type" : "char_literal",
"id" : 120
},
{
"type" : "float_literal",
"id" : 3
}
],
[
{
"type" : "char_literal",
"id" : 117
},
{
"type" : "float_literal",
"id" : 999.1
}
]
]
This also applies to functions called through the generic members as in (61):
{
"member" : [ 2,2,0 ],
"type" : [ 0,16,16 ],
"access" : [ 0,1,1 ],
"shift" : [ 0,0,0 ],
"mcall" : [ -1,1,-1 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "((pfun_t)oA.pB->p)('x', 16.5)->i"
}
There might be conditional operator at the member expression base. Whenever the condition can be computed at compile time only control flow paths are considered for referenced variables. For example in (62):
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "(0 ? (struct B *)0 : (pB))->p"
}
It is known that the condition is false therefore only pB
variable is listed as a base variable for the member expression.
Similarly for (63):
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "address",
"cast" : 16,
"id" : 0,
"mi" : 0
}
],
"expr" : "((1 + 313) ? (struct B *)0 : (pB))->p"
}
It is known that then condition is true therefore variable with address
kind and value 0 is listed as a base variable for the member expression.
In case condition cannot be computed at compile time we need to consider both paths (64):
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
},
{
"kind" : "address",
"cast" : 16,
"id" : 0,
"mi" : 0
}
],
"expr" : "(*px ? (struct B *)0 : (pB))->p"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*px"
}
Here we have both pB
and address
0 as variables at the base for the member expression.
Binary conditional operator (GNU extension) behaves in the same way (65-67):
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct B *)0 ?: (pB))->p"
}
When condition resolves to false the result of the expression is the second argument of the binary conditional operator (variable pB
in this case).
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "address",
"cast" : 16,
"id" : 314,
"mi" : 0
}
],
"expr" : "((struct B *)(1 + 313) ?: (pB))->p"
}
When condition resolves to true the result of the expression is the condition itself (which in this case is the address
value 314).
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
},
{
"kind" : "unary",
"cast" : 16,
"id" : 1,
"mi" : 0
}
],
"expr" : "((struct B *)*px ?: (pB))->p"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*px"
}
When condition cannot be computed at compile time we need to consider both paths as usual.
To finalize let's take a look at some various examples (68-73).
{
"member" : [ 3 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "address",
"cast" : 16,
"id" : 30,
"mi" : 0
}
],
"expr" : "((struct B *)30)->a"
}
Here we have member expression that is based on some address
value.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 2,3 ],
"type" : [ 0,16 ],
"access" : [ 0,1 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 0
},
{
"kind" : "address",
"id" : 3,
"mi" : 0
},
{
"kind" : "member",
"id" : 0,
"mi" : 0
}
],
"expr" : "((struct A){.i = 3, .pB = oA.pB + gi}).pB->a"
}
Here at the base of member expression we have so called compound literal. The struct A
type is initialized as a temporary and then it serves as a base for member expression. All variables that take part in the initializer are listed in the offsetrefs
list.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 2,3 ],
"type" : [ 0,16 ],
"access" : [ 0,1 ],
"shift" : [ 0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"cast" : 33,
"id" : 1,
"mi" : 0
},
{
"kind" : "address",
"cast" : 32,
"id" : 3,
"mi" : 0
},
{
"kind" : "member",
"cast" : 3,
"id" : 0,
"mi" : 0
}
],
"expr" : "((struct A){.i = (long)(3 + 1), .pB = (void *)oA.pB + (short)gi}).pB->a"
}
All the casts used in the initializers are detected and placed into proper cast
fields in offsetrefs
.
{
"member" : [ 0 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 0
},
{
"kind" : "address",
"cast" : 11,
"id" : 0,
"mi" : 0
}
],
"expr" : "({\n do {\n } while (0);\n (struct A *)0 + gi;\n})->i"
}
Yet another member expression with statement expression at its base.
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"member" : [ 2,3,2,4,0 ],
"type" : [ 11,16,11,16,15 ],
"access" : [ 1,1,1,1,1 ],
"shift" : [ 0,4,0,0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 2
},
{
"kind" : "local",
"id" : 5,
"mi" : 0
},
{
"kind" : "member",
"id" : 0,
"mi" : 2
},
{
"kind" : "array",
"id" : 2,
"mi" : 4
}
],
"expr" : "(((&((&oA)->pB + 4)->a) + gi + pB->i)->pB->pC + 10 * T[9])->f"
},
{
"kind" : "array",
"offset" : 9,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[9]"
}
Here we have local variable with id
set to 5 as the base of member expression. Second member expression in the chain have offset 4 (shift
entry list). Third and fifth member expression in the chain use variables and other expression to facilitate offset computation.
{
"member" : [ 2,3,2,2 ],
"type" : [ 11,16,11,16 ],
"access" : [ 1,1,1,1 ],
"shift" : [ 0,0,0,0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 1
},
{
"kind" : "local",
"id" : 5,
"mi" : 0
},
{
"kind" : "address",
"cast" : 1,
"id" : 10,
"mi" : 1
}
],
"expr" : "(&((&oA)->pB + ({\n do {\n } while (0);\n (int)10 + gi;\n}))->a)->pB->p"
}
Here we have statement expression in the offset computation for the second member expression in the chain (referencing one global variable and one integer address value).
{
"member" : [ 2 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "callref",
"id" : 0,
"mi" : 0
}
],
"expr" : "({\n ((void)(sizeof ((long)(0 && getN()))));\n getB(0, 0);\n})->p"
}
We end up with statement expression that facilitates member expression through call to getB
function at its value yielding expression.
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 11,
"mi" : 0
}
],
"expr" : "(*((struct B **)q))->i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"cast" : 27,
"id" : 8
}
],
"expr" : "*((struct B **)q)"
}
In the example above variable at the base of dereference operator is being casted to proper type which results in filling the cast
field.
{
"kind" : "unary",
"offset" : -2,
"offsetrefs" : [
{
"kind" : "parm",
"cast" : 33,
"id" : 0
},
{
"kind" : "global",
"cast" : 34,
"id" : 1
},
{
"kind" : "member",
"cast" : 1,
"id" : 1
},
{
"kind" : "array",
"cast" : 1,
"id" : 2
},
{
"kind" : "unary",
"cast" : 35,
"id" : 3
},
{
"kind" : "callref",
"cast" : 36,
"id" : 0
}
],
"expr" : "*((unsigned char*)px + (long)gi - (unsigned long)2 + (int)pB->i + (int)T[4] + ((unsigned int)*px + 1) + (long long)getV())"
},
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "array",
"offset" : 4,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"expr" : "T[4]"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
}
],
"expr" : "*px"
}
In the example above each component of the dereference expression has its own cast information through the cast
field. The offset is computed to the value -2. The offset
field is a sum of each part of the binary operator (+/-) from the original dereference expression which is a proper integer constant expression (i.e. can be computed at compile time). The are 6 parts in the dereference binary operator and only one can be computed at compile time, i.e. (unsigned long)2
. The expression ((unsigned int)*px + 1)
which is another part of the dereference binary operator cannot.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 1 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"cast" : 11,
"id" : 6,
"mi" : 0
},
{
"kind" : "member",
"cast" : 11,
"id" : 0,
"mi" : 0
}
],
"expr" : "((struct A *)(oA.pB) ?: ((struct A *)pB))->p"
}
In the example above we have binary conditional operator at the base of member expression. The condition cannot be precomputed hence we have variables in both paths that can be a base for member expression. Both paths have a cast expression associated with it which is reflected in the cast
field in appropriate offsetrefs
entries.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 1 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
},
{
"kind" : "local",
"cast" : 11,
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct A *)(oA.pB) ? (&oA) : ((struct A *)pB))->p"
}
Same things happen for standard conditional operator with a cast in the false path of the operator.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 0 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "unary",
"id" : 2,
"mi" : 0
}
],
"expr" : "(*(((struct A *)(oA.pB) ? (&oA) : ((struct A *)pB)))).i"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 5
},
{
"kind" : "local",
"cast" : 11,
"id" : 6
}
],
"expr" : "*(((struct A *)(oA.pB) ? (&oA) : ((struct A *)pB)))"
}
Conditional operator can be also a base for dereference operator with the same results.
{
"member" : [ 2 ],
"type" : [ 0 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
}
],
"expr" : "oA.pB"
},
{
"member" : [ 1 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 5,
"mi" : 0
},
{
"kind" : "local",
"cast" : 11,
"id" : 6,
"mi" : 0
},
{
"kind" : "refcallref",
"cast" : 11,
"id" : 1,
"mi" : 0,
"di" : 10
}
],
"expr" : "((struct A *)(oA.pB) ? (&oA) : (i ? (((struct A *)pB)) : ((struct A *)pfv())))->p"
}
In the case of double conditional operator all variables from all viable paths are included in offsetrefs
.
{
"member" : [ 0 ],
"type" : [ 16 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "pB->i"
},
{
"kind" : "array",
"offset" : 0,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
},
{
"kind" : "callref",
"id" : 0
},
{
"kind" : "member",
"cast" : 32,
"id" : 0
}
],
"expr" : "T[getN() + (long)pB->i]"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "parm",
"id" : 0
},
{
"kind" : "local",
"cast" : 3,
"id" : 6
},
{
"kind" : "local",
"cast" : 3,
"id" : 8
}
],
"expr" : "*(px + ((void *)&q - (void *)pB))"
},
{
"member" : [ 2,1,1,0 ],
"type" : [ 16,11,15,11 ],
"access" : [ 1,1,1,1 ],
"shift" : [ 0,0,0,2 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "global",
"id" : 1,
"mi" : 3
},
{
"kind" : "local",
"id" : 6,
"mi" : 0
}
],
"expr" : "((struct A *)((struct C *)((struct A *)pB->p)->p)->p + 2 + gi)->i"
},
{
"member" : [ 1 ],
"type" : [ 11 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 6,
"mi" : 0
},
{
"kind" : "local",
"cast" : 3,
"id" : 7,
"mi" : 0
},
{
"kind" : "address",
"cast" : 16,
"id" : 0,
"mi" : 0
}
],
"expr" : "((struct A *)(((struct A *)((struct C *)((struct A *)pB->p)->p)->p + 2 + gi)->i ? (struct B *)0 : (unsigned long)(*(px + ((void *)&q - (void *)pB)) + (int)T[getN() + (long)pB->i]) ? (pB) : ((void *)ppB)))->p"
}
We need to remember that condition expression of conditional operator which is a base for member expression is not included in the offsetrefs
field of this member expression therefore we have only 3 viable paths in the complicated expression above (which are listed in the offsetrefs
field with appropriate casts).
There are two more types of expressions that can be found in the dereference information, i.e. init
and assign
kinds. First expression is a variable definition with appropriate initializer (only definitions with initializer are included). Second one is an ordinary assignment of value to existing variable.
Let's have a look for the simplest case of defining and initializing an int
variable as in (81).
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "integer",
"id" : 0
}
],
"expr" : "int vi0 = 0"
}
The first entry describes the variable being defined and initialized. The following entries are all variables extracted from the initializer (in the case above we have only integer value 0
).
If there is a cast at the initializer expression (as in (82)) the cast
field is filled with appropriate value.
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "float",
"cast" : 1,
"id" : 1
}
],
"expr" : "int vi1 = (int)1."
}
When variable is initialized or assignment operator is used the cast can also be implicit (as in (83)).
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "integer",
"cast" : 32,
"id" : 2
}
],
"expr" : "unsigned int vu0 = 2"
}
Here the initializer has originally type int
and is implicitly casted to unsigned int
based on the variable type. It is reflected in the cast
field for the initializer variable.
Same principles apply for void*
variables (84,85,86) with one exception.
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "local",
"id" : 8
}
],
"expr" : "void *vq0 = q"
}
Here the q
variable has type void*
therefore no implicit or explicit cast is performed.
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "local",
"cast" : 11,
"id" : 6
}
],
"expr" : "void *vq1 = (struct A *)pB"
}
Here we make an explicit cast to struct A*
which is reflected in the cast
field.
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "local",
"cast" : 16,
"id" : 6
}
],
"expr" : "void *vq2 = pB"
}
Here we have a special case handling for void*
type. When a value is assigned to void*
variable and there is an implicit cast from the initializer the cast
field will contain the type of the initializer and not the variable type (i.e. void*
in this case) as normally. For example in (87) we have implicit cast from struct B*
to void*
but the cast
field contains the struct B*
type and not the void*
. This can be used to properly track real types of variables passed around in generic void*
pointers.
The expression in the initializer can be a function call (87,88):
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "callref",
"cast" : 16,
"id" : 0
}
],
"expr" : "void *vq3 = getB('a', 3.)"
}
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "callref",
"id" : 0
}
],
"expr" : "int vi2 = (*getN)()"
}
It can also be a function call through pointer variable (89,90):
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "refcallref",
"cast" : 32,
"id" : 1,
"di" : 11
}
],
"expr" : "unsigned long vul0 = (long)pfi()"
}
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "refcallref",
"cast" : 9,
"id" : 1,
"di" : 12
}
],
"expr" : "unsigned long vul1 = (*pfi)()"
}
When a string literal is used as an initializer there is no implicit cast information in the cast
field (even though there is implicit cast from char*
to int*
as in (91)).
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "string",
"id" : "ABRAKADABRA"
}
],
"expr" : "int *vpi0 = \"ABRAKADABRA\""
}
Initializer can have many variables and other expressions involved as in (92):
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "local",
"id" : 2
},
{
"kind" : "member",
"cast" : 33,
"id" : 12
},
{
"kind" : "unary",
"id" : 13
}
],
"expr" : "unsigned int vu1 = i + *((int *)pB->p) - (long)oA.i"
}
Initializer can also be a compound literal as in (93):
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "global",
"id" : 1
},
{
"kind" : "address",
"cast" : 32,
"id" : 3
},
{
"kind" : "member",
"cast" : 33,
"id" : 11
}
],
"expr" : "struct A obA = ((struct A){.i = (long)3, .pB = (long long)oA.pB->pC + gi})"
}
Assignment expressions are very similar to variable initialization in all aspects. The first entry in offsetrefs
is the variable being assigned to. Further entries are variables or expressions from the right hand side of the assignment. One difference is the offset
field which is populated by specific id number of the assignment kind (and there are 11 kinds of assignment) whereas it was always 0 for variable initialization. For example in (94) we have implicit convertion of unsigned long
value 4 to long
which is reflected by the typeid value in cast
field. In (95) we're assigning to the result of member expression (which has type void*
and therefore cast to the right hand side type is extracted). In both below cases the assignment kind is 21 which is the ordinary =
assignment.
{
"kind" : "assign",
"offset" : 21,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "integer",
"cast" : 32,
"id" : 4
}
],
"expr" : "vd0 = 4UL"
}
{
"kind" : "assign",
"offset" : 21,
"offsetrefs" : [
{
"kind" : "member",
"id" : 10
},
{
"kind" : "local",
"cast" : 16,
"id" : 6
}
],
"expr" : "pB->pC->p = pB"
}
In (96) we take the value from variable vl0
, apply binary and operator to it with the value 63 and store the result back in vl0
. This is achieved throught the &=
assignment operator which kind is 29 as shown below:
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "integer",
"cast" : 32,
"id" : 255
}
],
"expr" : "long vl0 = 255"
},
{
"kind" : "assign",
"offset" : 29,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "integer",
"cast" : 32,
"id" : 63
}
],
"expr" : "vl0 &= 63"
}
Last missing piece of information is presented in the example below. Assignment expression can be found in various value yielding expressions like in (97). In this case there is assign
kind entry in the offsetrefs
list which points to the appropriate assignment in the derefs
array (assignment kind 25 which is the +=
assignment operator).
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "unary",
"cast" : 32,
"id" : 12
},
{
"kind" : "array",
"cast" : 33,
"id" : 13
},
{
"kind" : "assign",
"cast" : 1,
"id" : 1
}
],
"expr" : "unsigned long vul3 = (long)*((int *)pB->p) - (short)T[2] + (int)(gi += 3)"
},
{
"kind" : "assign",
"offset" : 25,
"offsetrefs" : [
{
"kind" : "global",
"id" : 1
},
{
"kind" : "integer",
"cast" : 9,
"id" : 3
}
],
"expr" : "gi += 3"
}
There are facilities in the C/C++ languages to extract offset of a specified member inside the structure. This is exactly what offsetof
macro does. This is supported in the dereference information as a "offsetof" kind entry. Consider the example as in (98):
{
"kind" : "init",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 13
},
{
"kind" : "offsetof",
"id" : 1
}
],
"expr" : "unsigned long ul = 4 + __builtin_offsetof(struct C, b.Ta[4][oA.i].p)"
},
{
"member" : [ 3,4,-1,-1,1 ],
"type" : [ 9,4,4,4,0 ],
"kind" : "offsetof",
"offsetrefs" : [
{
"kind" : "integer",
"id" : 4,
"mi" : 1
},
{
"kind" : "member",
"id" : 11,
"mi" : 1
}
],
"expr" : "__builtin_offsetof(struct C, b.Ta[4][oA.i].p)"
}
The ul
variable initializer uses offsetof expression to compute the initialization value. This is marked in the offsetrefs
array as a offsetof
kind. The member
array of the offsetof
kind entry specifies which member was used for offset extraction. Usually it is only one member but the expression can get more complicated as in this example. First used member was b
which has offset 3 in the struct C
definition. The type of struct C
is placed at first position in the type
array. Second member in the chain was Ta
which has index 4 in the struct B
definition (and the type id is 4). Next we have two array index expressions which are marked in the member
array as an -1
index value. Array expression type in the type
arary is the same type as the array member (type index 4 in this case). References to components of the array index expressions can be found in the offsetrefs
array where mi
field indicates which member in the chain uses the array index expression. Finally the p
member (with index 1) of the struct A
type (with type index 0) is used for offset computation as a final component.
There is additional information that can be found in the dereference information that allows to track data information flow between functions, i.e. return expression information and information about passed function arguments. First is implemented as return
kind entry. Consider the following code:
int main(void) {
long x = 3;
if (x<0) {
return x;
}
else if (x==0) {
return (char)100;
}
else {
return (int)x + (int*)4;
}
return 3;
}
The dereference information for this simple program could look as follows:
(...)
{
"kind" : "return",
"offsetrefs" : [
{
"kind" : "local",
"cast" : 1,
"id" : 0
}
],
"ord" : [ 1 ],
"expr" : "return x;\n"
},
{
"kind" : "return",
"offsetrefs" : [
{
"kind" : "integer",
"cast" : 2,
"id" : 100
}
],
"ord" : [ 2 ],
"expr" : "return (char)100;\n"
},
{
"kind" : "return",
"offsetrefs" : [
{
"kind" : "local",
"cast" : 0,
"id" : 0
},
{
"kind" : "address",
"cast" : 3,
"id" : 4
}
],
"ord" : [ 3 ],
"expr" : "return (int)x + (int *)4;\n"
},
{
"kind" : "return",
"offsetrefs" : [
{
"kind" : "integer",
"id" : 3
}
],
"ord" : [ 4 ],
"expr" : "return 3;\n"
}
We have special entry of the return
kind which contains references to variables/expressions used in the return expression in the offsetrefs
list. For example in the first if
clause (if (x<0)
) we have local variable x
of type long
. There is implicit cast from long
to the returned value of type int
which is reflected in the cast
field. The cast
field for the variable/expression used in the return statement will contain type id of a type of this variable/expression whenever this type is different from the return type or explicit cast is used on the variable/expression. For example in the second if
clause (if (x==0)
) there is explicit cast to char
therefore the cast
field has the value 2 (which points to the char
type). On the other hand in the final return statement (return 3
) the value returned has type int
, exactly the same as the returned type of this function therefore no cast
field is propagated. Finally in the return (int)x + (int*)4
expression we have more complicated case where the return expression is composed of two other expressions (one variable and one literal value). Each of the components have their casts in place but in such complex expressions the cast
attribute of the return expression is not directly accessible.
The second one is implemented as parm
kind entry. Consider the following code:
int foo(int a, const char* b) {
return 0;
}
struct A {
void* p;
const char* s;
int (*pf)(int x, const char* q);
};
typedef int (*pfun_t)(int a, const char* b);
int main(void) {
struct A* pA = 0;
struct A a = {};
char T[10];
pfun_t f = foo;
foo(*((int*)pA->p),a.s);
foo(10,a.s);
foo(10,0);
(*f)(20,"roll!");
pA->pf(20,T);
return 0;
}
The dereference information could look as follows:
(...)
{
"member" : [ 0 ],
"type" : [ 9 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 0,
"mi" : 0
}
],
"ord" : [ 8 ],
"expr" : "pA->p"
},
{
"member" : [ 1 ],
"type" : [ 3 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 1,
"mi" : 0
}
],
"ord" : [ 9 ],
"expr" : "a.s"
},
{
"member" : [ 1 ],
"type" : [ 3 ],
"access" : [ 0 ],
"shift" : [ 0 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 1,
"mi" : 0
}
],
"ord" : [ 13 ],
"expr" : "a.s"
},
{
"member" : [ 2 ],
"type" : [ 9 ],
"access" : [ 1 ],
"shift" : [ 0 ],
"mcall" : [ 3 ],
"kind" : "member",
"offsetrefs" : [
{
"kind" : "local",
"id" : 0,
"mi" : 0
}
],
"ord" : [ 24 ],
"expr" : "pA->pf(20, T)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "member",
"cast" : 12,
"id" : 0
}
],
"ord" : [ 7 ],
"expr" : "*((int *)pA->p)"
},
{
"kind" : "unary",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "local",
"id" : 3
}
],
"ord" : [ 20 ],
"expr" : "*f"
},
{
"kind" : "return",
"offsetrefs" : [
{
"kind" : "integer",
"id" : 0
}
],
"ord" : [ 25 ],
"expr" : "return 0;\n"
},
{
"kind" : "parm",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "unary",
"cast" : 0,
"id" : 4
}
],
"ord" : [ 5 ],
"expr" : "*((int *)pA->p)"
},
{
"kind" : "parm",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "member",
"cast" : 2,
"id" : 1
}
],
"ord" : [ 6 ],
"expr" : "a.s"
},
{
"kind" : "parm",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "integer",
"id" : 10
}
],
"ord" : [ 11,15 ],
"expr" : "10"
},
{
"kind" : "parm",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "member",
"cast" : 2,
"id" : 2
}
],
"ord" : [ 12 ],
"expr" : "a.s"
},
{
"kind" : "parm",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "address",
"cast" : 0,
"id" : 0
}
],
"ord" : [ 16 ],
"expr" : "0"
},
{
"kind" : "parm",
"offset" : 0,
"offsetrefs" : [
{
"kind" : "integer",
"id" : 20
}
],
"ord" : [ 18,22 ],
"expr" : "20"
},
{
"kind" : "parm",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "string",
"cast" : 13,
"id" : "roll!"
}
],
"ord" : [ 19 ],
"expr" : "\"roll!\""
},
{
"kind" : "parm",
"offset" : 1,
"offsetrefs" : [
{
"kind" : "local",
"cast" : 13,
"id" : 2
}
],
"ord" : [ 23 ],
"expr" : "T"
}
Here we have 5 function calls, each with 2 arguments. This gives 8 parm
entries in the derefs
array (2 entries with integer value arguments are deduplicated). Each entry describes the argument expression. The call_info
and refcall_info
looks like follows:
"call_info": [
{
"start":"2249:5",
"end":"2249:13",
"ord":14,
"args": [ 9,11 ],
"expr": "foo(10, 0)"
},
{
"start":"2248:5",
"end":"2248:15",
"ord":10,
"args": [ 9,10 ],
"expr": "foo(10, a.s)"
},
{
"start":"2247:5",
"end":"2247:27",
"ord":4,
"args": [ 7,8 ],
"expr": "foo(*((int *)pA->p), a.s)"
}
],
"refcall_info": [
{
"start":"2251:5",
"end":"2251:16",
"ord":26,
"args": [ 12,14 ],
"expr": "pA->pf(20, T)"
},
{
"start":"2250:5",
"end":"2250:20",
"ord":27,
"args": [ 12,13 ],
"expr": "(*f)(20, \"roll!\")"
}
],
The values in the args
array of each call information entry points to the appropriate parm
entry in the derefs
array.
In (99) and (100) we have examples of cond
and logic
derefs. First cond
refers to global checked in if(pfi)
.
Second cond
refers to logic
deref which represents comparison i < 10
. In logic
deref offset
vaule 10 indicates use of <
operator and basecnt
1 informs us that only the first of offsetrefs
is on the left-hand-side of the operator.
Value of offset
in cond
is the id in csmap
of (implicit) compound statement containing conditional expressions.
{
"kind" : "cond",
"offset" : 22,
"offsetrefs" : [
{
"kind" : "global",
"cast" : 23,
"id" : 3
}
],
"ord" : [ 272 ],
"expr" : "[/home/m.manko/sec-tools/clang-proc/build-6443078/test.c:294:8]: pfi",
"csid" : 0
},
{
"kind" : "cond",
"offset" : 23,
"offsetrefs" : [
{
"kind" : "logic",
"id" : 206
}
],
"ord" : [ 274 ],
"expr" : "[/home/m.manko/sec-tools/clang-proc/build-6443078/test.c:295:11]: i < 10",
"csid" : 0
},
{
"kind" : "logic",
"offset" : 10,
"basecnt" : 1,
"offsetrefs" : [
{
"kind" : "local",
"id" : 2
},
{
"kind" : "integer",
"id" : 10
}
],
"ord" : [ 275 ],
"expr" : "[/home/m.manko/sec-tools/clang-proc/build-6443078/test.c:295:11]: i < 10",
"csid" : 0
}