mstdn.io is one of the many independent Mastodon servers you can use to participate in the fediverse.

Administered by:

Server stats:

368
active users

Is this UB when p is null?

int bar(struct foo *p) {
int *py = &p->y;
if (!p) {
return 0;
}
return *py;
}

@wolf480pl so I think you've been told it's UB already, but yes.

A simple mental model is to imagine an extremely delicate cpu that crashes as soon as you load an invalid pointer into a register. The address of y is invalid when p is null, that operation will crash.

@tedu @wolf480pl My guess would be segfault, unless you optimise, in which case it'll put the calculation of py after the "if (!p)" thing and it'll work fine.

Stephen Brooks 🦆

@tedu @wolf480pl Never mind I thought it was this, in which optimisation might avoid a segfault completely:

int bar(struct foo *p) {
int py = p->y;
if (!p) {
return 0;
}
return py;
}

@sjb @tedu in the example I gave, the &p->y is, by definition, just pointer arithmetic.

So it won't segfault.
It might signed-overflow, but if the binary representation of null is zero, it probably won't.

But if optimizations are on, the compiler is free to use that statement as an excuse to remove the null check. Because if I did pointer arithmetic on p, then surely p can't be null, right?

@wolf480pl @tedu I dunno about that. On older systems like DOS, *(int *)0 was valid and just read low memory.

@sjb @tedu does the C standard say anything about tgat?

@wolf480pl @tedu I don't think so. In fact I think I read an article saying NULL could be non-zero in certain implementations of <stdlib.h>! So properly you should do "if (p==NULL)" not "if (!p)".

@sjb @tedu
unless there's a thing saying that a null pointer must evaluate to false in a boolean context, regardless of its binary representation :P

@wolf480pl @tedu In the original C, NULL was just macro in <stdlib.h>, nothing more. C++ has "nullptr" which I believe comes with extra semantics.

@sjb @tedu
I'm looking through the N3220 draft of the C standard:

open-std.org/jtc1/sc22/wg14/ww

and there's this:

> 6.3.2.3 Pointers
> [...]
> 3. An integer constant expression with the value 0, such an expression cast to type void *, or the predefined constant nullptr is called a null pointer constant. If a null pointer constant or [...] is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

1/

@sjb @tedu

> [...]
> 6. Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result is not required to be in the range of values of any integer type.
> [...]
> 6.3.2.4 nullptr_t
> 1. The type nullptr_t may be converted to void, bool or to a pointer type; the result is a void expression, false, or a null pointer value, respectively.
2/

@sjb @tedu

> 2. A null pointer constant or value of type nullptr_t may be converted to nullptr_t

and elsewhere

> 6.5.4.3 Unary arithmetic operators
> [...]
> 5. The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E)

3/

@sjb @tedu

so looks like `!p`
is equivalent to `(0 == p)`
and I'm guessing it casts the zero to the type of p?
But zero cast to a pointer type is a null pointer, and all null pointers are equal, so this should be fine?

But if you tried to cast a null pointer back to an integer, you might not get a zero (eg. on an arch that uses tagged pointers) ?

At least that's my read on this, but I haven't read the whole standard...

@sjb @tedu
anyway, I think the overall pattern is:
1. C standard authors thought of some edge case with an unusual architecture that has a weird implementation of X
2. C standard authors opted to add some UB so that implementations don't need to worry about the edge case
3. the edge case doesn't apply to most architectures
4. but optimizing compilers are happy to use any UB you invoke as an excuse to delete your code

@wolf480pl @tedu I think if you use nullptr everything is semantically correct, but NULL is just a macro to (void *)0 in <stdlib.h>, with no guaranteed compiler support. Compiler doesn't even know about it because it's subsituted in the preprocessor. And then one insane vendor put NULL (void *)-1 just to keep things interesting.

@sjb @tedu
Pretty sure the C standard assumes that stdlib.h is part of "C implementation" - i.e. it's written by the same people who wrote the compiler, and can abuse the knowledge of how the compiler is implemented.

And more importantly,
in 6.3.2.3 footnote 57:
> The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.21.

@wolf480pl @tedu Section 7.21 paragraph 4.

NULL macro is "implementation-defined".

Paragraphs immediately preceding define what nullptr and nullptr_t are.

@sjb @tedu
yes, but the implementation-defined constant must be a null-pointer.

And all null-pointers are equal to each other.

Even if they have different bits under the hood.

@wolf480pl @tedu But "if (!p)" won't work as intended if NULL is not 0.

@sjb @tedu
NULL is a null pointer
zero is a null pointer
all null pointers are equal to each other

The compiler has to implement the equality operator in a way that guarantees that

@sjb @tedu
Like, an implementation may decide that

0x00000000
0x40000000
0x80000000
0xC0000000

are all null pointers

and then if you do

if (p == 0)

it will emit

test rdi, 0x3FFFFFFF
jz